Posted:July 17, 2005

This may be a topic I return to numerous times, but I’m coming to realize that I need some INTERNALLY referenceable stuff for guiding and keeping this site integral.  I suspect that the whole ACTIVITY of writing this blog will create its own lessons (doesn’t everything?), but I’d like to take what I’ve learned to date and codify it for moving forward.

I’m sure every CMS or blog software has or not its own capabilities to do some of this management.  What iI’m working on here is the unique combination of WordPress-Xinha-utilities that are now driving the tasks necessary to care and feed this site.

Many of the previous ‘Prepare to Blog’ categories deal with various public aspects of the site. What I am coming to realize, however, is that the site itself becomes a sort of filing cabinet, attention "fly paper", or other kind of pheromone. As a result, since WordPress has its separate page creating capability, perhaps it makes sense to create "hidden" pages on the site, usefuf for site management purposes, but only available to the site administrators (me). Thus, I have created and populated a starting set of four internal management pages:

  • Common Inserts
  • Common Styles
  • Pending Topics
  • Pending Task Lists.

 

I am not showing links to these, because they are in fact works in progress, should be changing constantly, and may present thoughts and ideas that I am toying with but may never complete. However, the point is not that I have "private" aspects of this site, but that the framework of these sites can also extend into project management and task tracking, aspects not seen by the general public.

Like the guy in the bear suit taling about stream water and Dasani, perhaps there is too much salmon spawning in my efforts.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 17, 2005 at 7:45 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/71/preparing-to-blog-site-project-management/
The URI to trackback this post is: http://www.mkbergman.com/71/preparing-to-blog-site-project-management/trackback/
Posted:July 16, 2005

Now that I’m getting closer to going live, I have begun testing moving MS Word docs directly into the site.  Most of my original research and analysis stuff is done in Word or Excel because I’m pretty much a power user in quick writing, assembly and formatting.  If I could convert docs in a relatively straightforward way to AI3, that would be a real boon. However, as I discovered, there are major, major problems and issues with moving Word documents.

The Big Transfer

My first test involved a beast of an analysis piece I had done on the $3 trillion value of U.S. enterprise document assets.  (It was eventually posted here.) In its long form, it is 42 pages long, with many tables, a few figures and more than 100 citations and links.  It is over 1 MB (1070 KB) in its original form.

Most of us have created Web pages directly from a Word document, and I tried this first. I did the conversion with the filter option, then pasted the result directly into the editor, and attempted to update the site. The transfer seemed to take forever and then the server hung. My suspicion was that the Word HTML code was too complex

Cleaning the Word HTML

Before packing it in and splitting the original doc into multiple pieces so that my site could choke it down, I decided to do a bit of investigation on alternaltive clean up utilities and approaches.  One good review I found was by Laurie Rowell on ‘Clean HTML from Word:  Can it be Done?’.  I recommend the four-parter.

Laurie’s review suggested some improvements could be made from the MS Web page with filtering option using third-party tools, but they did not appear enough to enable me to proceed without splitting my files.  Nonetheless, after following that advice, mostly using the MS Web page with filtering, I again attempted a transfer with results as before.

After this initial cleaning, I used both Word and Composer (the Mozilla HTML editor) to do some search-and-replace removal of further HTML tags. Using pattern replacements, esp. with Word, it is possible to also replace line breaks and tab characters, so long as the base file does not have an expected Word extension such as .doc, .rtf or .html (for convention, I always use .txt).  This was goling well in reducing the sizes of the files (the best I was able to achieve on the 1 MB file was about 450 KB), but the process was laborious even with global search-and-replace.  Furthermore, WordPress was not choking down the smaller files. And unbeknownst at this stage, other issues were being introduced into the files that would make later steps even more difficult.

Splitting Files

I then split the document into six parts, the largest being about 250 kB. As before, they were cut-and-pasted into the editor and then posted. I again got server errors and time-outs. With the assistance of Kevin Klawonn of BrightPlanet, he was able to determine that the Apache server was timing out after 30 sec. With a minor parameter change, we were able to get all files uploaded.

However, while the system was now choking down the files, they looked terrible! Line breaks were totally messed up and being able to edit them within the Xinha editor was close to hopeless. Clearly, and unfortunately, the code was still not clean enough.

Problems with Composer

A natural assumption was that these open source editors were "buggy" and unable to handle more commercial strength requirements. As a natural response, I turned to Composer, a standard HTML utility in my Mozilla browser.

I had not used Composer much before, but found much to like. It has nice toggles between HTML source and WYSIWYG. It offers menu options for most "standard" activities I would undertake with a Web page. In short, I liked working with it and thought it might become an offline (at least from my blog) standard for doing HTML WYSIWG editing of large imports. I actually started becoming familiar with the app and its controls and features.

However, upon actual incoprporaton of the results, I found a nasty truth. Composer introduces forced line breaks at about margin 70. As a basis to incorporate into other apps — all of which need to work nicely together — this was fatal. So much for Composer. I was sad …..

Problems with Xinha

I have observed line breaks being introduced by Xinha, but have not been able to reproduce the actual steps. In general, the system seems to be OK about not introducing spurious breaks, including when editing moves from full to smaller screen.

Cleaning the Word HTML II – Textism

The entire process of using files in multiple applications with mutliple behaviors had worked to create a total HTML nightmare in my test file baseline. Remembering one of the options in Laurie Rowell’s piece (above), I decided to break my normal rule against paid products and check out the Textism site using the Word Clean utility. Dean Allen actually has an interesting pricing model which mixes aspects of free, seduction and low cost.

This is a superior and professional offering:

  • It creates files less than one-third the size of already cleaned MS Word HTML
  • It has an innovative pricing model — beginning with free and moving on to short or annual uses. For 5 euros (about $6.20) the system can be used unlimitedly for 24 hrs; payment is by Paypal. For a full-year use, perhaps likely after being hooked on the day version, the cost is 20 euros (about $25) though with a limit of 75 conversions per month (about three per work day; no constraint for a very aggressive personal use.)
  • The resulting cleaned code, while produced on the server side and needing copying and then pasting locally in order to maintain a saved copy, is extremely clean and formatted well for later management with other utilities. Paragraphs are break separated, items such as bullets are line breaked, and some characters are indented
  • The system is capabile of handling considerable complexity. My test from hell had about 20 tables, 100 citations and endnotes, a table of contents, embedded images, etc.  All code produced was correct and very clean and spare.

In short, a total recommendation. Any user needing to move a few files per month from Word to their blog should defintely consider this service.

Final Adopted Process

Depending on the length of the original Word document and its complexity, I recommend one of two approaches given current tools (at least the ones I have tested.)

For shorter Word documents, those with little complexity, or internal or external references:

  1. Save the document as a .txt file, and then
  2. Cut-and-paste into the online editor and re-establish formatting.

For Word documents that do not meet these conditions, the path is tortuous and onerous:

  1. Make sure the Word file is absolutely complete — you will not want to return to this step!
  2. Save the file with the Word Save As ‘save as type’ using the Web page, filtered option
  3. [If images are used, separately locate them, give them logical names, and later embed in the way you normally handle in-line images with your posts]
  4. Submit the HTML file created by WS Word to the Textism utilitiy. Granted, this utility costs, but the effort saved in its clean HTML procedures is well worth it
  5. Now, with the re-saved clean version, invoke a standard editor that has two capabilities:  1) no enforced word wraps or line breaks; 2) the ability to display and search-and-replace on formatting characters such as line breaks (^p), tabs  (^t), etc.  (Word can perform these functions when files are named .txt prior to input, but use of a standard editor with these features may be preferable.) With this editor, you will do some additonal code clean-up:
    • Removing unnecessary style definitions
    • Formatting the file so that paragraphs split
    • Removing any other recurring HTML code patterns that have nothing to do with the eventual display of your document on your site.
  6. Now, paste your fully cleaned code into your editor for posting on the blog, and
  7. Should you encounter major problems, select all of the code in your blog editor, re-paste it in your standard editor, and do any global replaces and clean-ups.

I know this sounds like a pain, and it is.  You should also keep saved versions of interim steps above to have fallbacks if necessary.

Note: There are instances when the size of the file and the degree of final HTML editing and clean-up may suggest offline editing because server-side editing is slow, updated posts may take forever or experience server time-outs, or they may simply crash the server. If offline editing is necessary, do make sure an HTLM editor is used that does not insert those insidious line breaks. If it does, you will spend hours of frustration trying to get everything clean again.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 16, 2005 at 11:03 am in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/48/preparing-to-blog-word-docs-to-html/
The URI to trackback this post is: http://www.mkbergman.com/48/preparing-to-blog-word-docs-to-html/trackback/
Posted:July 15, 2005

I finally decided to bite the bullet and give some concerted attention to filling in some of the "standard" background material I have been contemplating for the site.  I’d done some earlier work pulling together bio and mission material; I spent most of today (and productively so!) completing those items and some of the other linkage glue.

I’m pretty pleased with the near-readiness of the site for at least  initial release.  I have to make the plunge and just do it, and then see how exposure and a "live" environment require new efforts.

I remain very disappointed with how difficult it is to compose in a free-flowing manner without having to worry about formatting and HTML.  The WYSIWYG editor I’m currently using is not available to both posts and pages, and copy and pasting between environments does really screwy things with respect to word wraps, picking up (or not) embedded HTML (if it doesn’t, it requires pretty painstaking editing).

But, all in all, good progress.  I’m quite close to releasing, but may delay somewhat with a week of business travel coming up next week.  [As noted elsewhere, this actual post is one month out of sync from the actual release date.]

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 15, 2005 at 9:35 pm in Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/46/preparing-to-blog-standard-site-content/
The URI to trackback this post is: http://www.mkbergman.com/46/preparing-to-blog-standard-site-content/trackback/

Two things broke in adding the WordPress permalink feature. This post describes those problems, attempted fixes and descriptions for each, then followed by current resolution and approach.

Why Permalinks?

Rather than use the ?page_id=num and ?p=num internal references for postings, WordPress provides a permalink feature that converts these ID references into URL strings that help in search engine indexing. Here is the permalink structure I settled upon for my site:

 /index/ai3v2/%year%/%postname%/

This produces a URL that contains a truncation of the post name title, plus other relevant information. That is well and good and the search engines would love me, but turning this feature on caused:  1) images were lost due to reference changes; and 2) Jonathan Foucher’s ‘Popular Posts’ plugin ceased displaying.

Lost Images

The first problem is that all of my site images no longer could be found. According to the last entries in the WordPress support file, I needed to add these lines of code to my main site index file:

  <?php $basehref = "http://".$_SERVER['SERVER_NAME'].($_SERVER['SCRIPT_NAME']); ?>

    <base href="<?php echo"$basehref"; ?>">

As well, I needed to preface my internal image references by ‘/index.php/’. 

These changes again allowed my images to be properly displayed, but I guess I’m not sure why all of the pieces worked. With this first problem fixed, I could now address the second.

Popular Links

I use Jonathan Foucher’s ‘most popular’ plugin, which broke when I introduced a permalink without an ID field. This plugin itself relies on the Randy Peterson’s StatTraq statistical reporting plugin.

In researching this problem, I came across a posting by Jonathan noting the issue was indeed when no post id is included in the permalink structure. Because the post id is not added to the StatTraq table in the ‘article_id’ field, it causes the ‘page views’ StatTraq report to show only ‘Mixed’ page views, not the actual posts viewed by visitors. I tried his suggested resolution by making these line (25 and 26) changes in the stattraq.php file:

if (($p != '')){
$p = intval($p);

With this replacement:

 if (($post->ID != '')){
$p = intval($post->ID);

Unfortunately, that did not fix my problem.

Resolution

Since I could not get images and popular links to work simultaneously, I decided to pass. I suspect that late updates to StatTraq and WordPress will better address these problems (Randy has announced a pending update for StatTraq). While I like the fact these tools are extensible and many discuss successful hacks, it does concern me to hack code unnecessarily that might make installing later upgrades and bug fixes even more problematic.

So, in thnking about the fact that AI3 is likely to be very content-filled anyway, I decided to put off resolving the permalink issue until another day. I’d already spent too many hours on a dead-end.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 15, 2005 at 9:10 am in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/44/preparing-to-blog-permalink-problems/
The URI to trackback this post is: http://www.mkbergman.com/44/preparing-to-blog-permalink-problems/trackback/
Posted:July 13, 2005

After some days of delays doing my normal day job, I was able to return to getting the site ready.

The key things for today were setting up the relationships, linkages and accounts with entities like Blogstreet, etc. I was very impressed with how easy it was to add these systems; their Web sites and on-site instructions with accompanying HTML and Javascript (if used) were excellent.

I set up links for:

I will continue on getting this prep stuff done, but upcoming business travel may push out my desired site release a few days further.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 13, 2005 at 3:12 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/40/preparing-to-blog-external-credits-and-thanks/
The URI to trackback this post is: http://www.mkbergman.com/40/preparing-to-blog-external-credits-and-thanks/trackback/