Posted:July 20, 2005

The twin issues of importing external HTML and the use of the Xinha WYSIWYG editor place a premium on creating "clean" HTML.  These issues are compounded by the fact that in the move to dynamic HTML and XML-compliant XHTML, there are also some code and changes to earlier conventional HTML (4.01) standards. Creating clean HTML requires a good understanding of styles and the use of cascading style sheets (CSS).

For example, in older HTML line breaks were standardly done with the <br>. However, in new code, <br> is not handled gracefully and often is replaced with <br />. Changes occur in other tags where closing brackets, such as for <a href="mysource.html">Link here</a>, is replaced by <a href="mysource.html" />Link here</a>. 

With respect to the <br> issue, it is probably good design to use the paragragh <p> tag. (Here, too, standards have changed, wherein paragraphs now are best formatted with a close tag </p>, whereas past practice had the close tag optional.) The real advantage of the paragraph element is that spacing between paragraphs can be precisely controlled in the style definition to achieve the presentation look you prefer.

In any event, all of these considerations suggest a need to better understand how all of this CSS stuff works. Unfortunately, for me, I really didn’t understand CSS well. What little knowledge I had came from a bunch of posted style sheets that largely acted only to confuse me further.

Realizing I finally needed to bite this bullet, I set off to discover whatever useful guides and guidance I could find online. Plus I sought the help of my longstanding colleague, Jerry Tardif, who has become our internal resident guru on all things CSS. I present the summary of those findings in this post.

Some Basics

Styles can be added to a Web site in one of three ways

  1. Invoke an external style sheet (*.css) file, which is the best method;
  2. Embed and then invoke the entire style definition within a given Web page; and/or
  3. Embed style declarations within in-line HTML syntax (e.g., style="declaration").

For many reasons (see below), the first method is by far preferable, especially for a dynamic application such as WordPress.

Therefore, assuming the method used is external, we can now look at the style sheet’s building blocks. A style sheet (*.css) file is a plain text file that contains a series of syntax statements, or rules, to instruct how the HTML elements to which they refer should be displayed. Each statement has two parts; a selector and a declaration. The declaration is enclosed within curly brackets:

selector declaration
p { color: blue }

Each declaration has two sub-parts: a property (color:), and a value (blue). Depending on the number of attributes that can be declared for each specification type, there may be multiple declarations, each separated by a semi-colon, then concluded for that specification when all declarations are completed with a closing curly bracket.

The naming of the selector relates the rule to a particular element of the webpage (in this example, the paragraph tag and the declaration states how that element should be displayed – or blue color in this instance). The selector-delcaration has these syntax rules:

  • Every listed property must be followed by a colon:
  • Multiple declarations are allowed for a single selector, each has to be finished with a semi-colon;
  • You can layout the style sheet to make it easier to read; as with HTML, extra spaces don’t
    matter, and comments can be introduced (see format below);
  • There must be a space between each selector and its declaration (because of this, a selector name can not have any spaces in it);
  • There must be a space between the property and its value, after its separating colon;
  • Some selector names are "reserved" because they relate directly to HTML elements;
  • There is "hierachy" to some selectors within the CSS; earlier selectors may govern or be inherited by later selectors. Inheritance driven by the CSS occurs within the block invoked in the Web page, unless subsquently overrriden within that block, based on the pecking order attributes in the CSS; and
  • There is a sequential replacement of equal selectors by those encountered later in the CSS file.

There are three ways in which selectors can be named. The first is selectorname, with no prefix. The syntax allows this for any named selector to use this format, but this way is also treated specially for "reserved" HTML or style elements. As a result, I advise limiting this naming convention to reserved elements only. The second way is #selectorname, where the pound sign defines what is called an "id." And, the third way is .selectorname, where the period defines what is called a "class."

Some experts say a Class should be used to select recurring elements, while an ID should be used to select a unique element. Truthfully, for me, I have found these distinctions less than helpful and I have found no examples of where the distinction is useful.

Note: (Jerry Tardif’s observation is that the ID was an earlier form, with no differentiating functionality from a class, less functionality than the current class, with a use profile that could lead to possible confusion with an internal anchor link in a Web page. Moreover, Jerry notes that classes can be pseudoclassed, whereas IDs can not.)

As a result, when I need to class something, I use the class syntax and do not use ID selectors.

Getting Serious: More References

Everyone should begin with a basic introduction to CSS.  I found a very excellent one from the Lowtech Ltd organization in Sheffield, UK.  I’ve made a copy of it available here.

Download PDF file  [Click here to obtain a PDF copy of this short, 7 pp guide (55 KB).]

Another useful intro guide that is slightly longer at 19 pp. is from Patrick Griffin at htmldog.  While that site contains the document in parts, you can also obtain a PDF version from Stanford University.

Finally, the complete bible reference on all things CSS (version 2.+) is available in PDF form from the W3C organization.  This document is about 338 pages and a 1.6 MB download. While it is not useful as a starting learning guide or tutorial, if you are going to be serious about CSS this reference is indispensible.

In-line v. External Style Sheets

I find it generally cleaner to maintain all styles in an external style sheet (style.css).  Embedded styles (in-line) work fine but require each file to be edited if changes are necessary. It is also often difficult to find the style code. And, where there are browser differences, as there are, it is easier to handle exceptions for cross-browser compatibility in an external style sheet(s). By keeping everything in an external style sheet, everything is central and changes are easy.

To further make styles easier to find, I make extensive use of comments, separate my style sheet into sections, and provide a table of contents of sorts in the header. I also try to use very logical names for each style, such that they are easily remembered and logically apply to the styling task at hand.

Here is the header for this site’s style.css, also showing comments and nested brackets:

DIV v. SPAN

When you want to assign styles to longer blocks of text within your Web pages or templates use either the div or the span tags. There are considerations to the use of each:

  • The span open and close tags are useful when a block of text within a paragraph should be assigned its own style. The insertion of span does not introduce a carriage return
  • The div open and close tags can embrace whole subsections or the entire page within the body tags. The div tag, when used, causes a carriage return after the close tag. The div tag is very useful for sites such as this one where php calls or execution loops can be nested within the div element. I use div tags aggressively.

I personally do not use span, preferring to have actual named styles with open and close brackets if I need to make style changes within a paragraph. I therefore use div almost exclusively, and because of the way it works, I most often set a class style definition within div as well.

Reserved Selectors

When defining selectors, there are standard HTML elements that should not be confused with custom ones. Generally, you can define a reserved element to have any applicable attribute for that element in your style sheet. You do NOT preface a reserved selector with either the id (#) or class (.) conventions. A useful guide for reserved HTML elements is the HTML 4.01 Quick List from W3Schools. Here are some of the reserved elements that should be considered to avoid in your custom selector naming, and given great attention in your baseline CSS definitions:

This is the ‘Body’ reserved element.


<body>
Visible text goes here
</body>

There are up to six ‘Heading’ elements:


<h1>Largest Heading (by convention, but it is not absolutely required)</h1>

<h2> . . . </h2>
<h3> . . . </h3>
<h4> . . . </h4>
<h5> . . . </h5>

<h6>Smallest Heading (by convention, but it is not absolutely required)</h6>

Here are other important ‘Text’ and ‘Link’ elements :


<p>This is a paragraph</p>
<hr> (horizontal rule)
<pre>This text is preformatted</pre>
<code>This is some computer code</code>
<a href="http://www.w3schools.com/"><img src="URL"
alt="Alternate Text"></a>

There are many different ‘List’ elements, starting first with unordered (bullet) lists:

<ul>
<li>First bulletitem</li>
<li>Next bullet</li>
</ul>

… and ordered (numeric or lettered) lists:


<ol>
<li>First item</li>
<li>Next item</li>
</ol>

… and less frequently used definition lists:,/p>


<dl>
<dt>First term</dt>
<dd>Definition</dd>
<dt>Next term</dt>
<dd>Definition</dd>
</dl>

There are many elements reserved related to ‘Tables’:

<table border="1">
<tr>
<th>someheader</th>
</tr>
<tr>
<td>sometext</td>
</tr>
</table>

Also, ‘Frames’ (though it is unlikely you would ever use this in a CSS):

<frameset cols="25%,75%">
  <frame src="page1.htm">
  <frame src="page2.htm">
</frameset>

Though not frequent, it is also possible to set standard appearance conditions for ‘Forms’ (mostly for the input types and textarea):


<form action="http://www.somewhere.com/somepage.asp" method="post/get">
<input type="text" name="lastname"
value="Nixon" size="30" maxlength="50">
<input type="password">
<input type="checkbox" checked>
<input type="radio" checked>
<input type="submit">
<input type="reset">
<input type="hidden">

<select>
<option>Apples
<option selected>Bananas
<option>Cherries
</select>
<textarea name="Comment" rows="60"
cols="20"></textarea>

Finally, there are a few ‘Other Elements’:

<blockquote>
Text quoted from some source.
</blockquote>

<address>
Address 1<br>
Address 2<br>
City<br>
</address>

Margins and Related

In his style sheets, Jerry Tardif explicitly enters the four margin types (or whatever subset of margins are applicable), rather than use the four sequence convention, a practice I’ve now adopted. For example, it is possible to label a margin such as:

Margin: 10px,10px,10px,10px

But what does the ordering of these things mean? Actually, by convention, this shorthand ordering of presentation is T,R,B,L (top, right, bottom, left). However, since I can never remember TRBL (isn’t that TeRriBLe!), I agree with Jerry’s recommendation to be explicit:

margin-top: 10px
margin-right: 10px
margin-bottom: 10px
margin-left: 10 px

It takes more lines of code, but that is a minor price for clarity. This approach applies to other descriptors such as padding, etc. Generally, however, only the relevent TRBL aspect need be specified.

Custom Elements

If you avoid the reserved names above, you may create and name as many custom elements as you wish. As the discussion above noted, I tend to only use class definitions, but that is likely because I’m not sophiscated enough to know the difference and keeping things simpler in my styles sheets is important.

Use logical names with generality and inheritance in your custom elements. By general, I mean named "classes" such as ‘Panels’ or ‘BlockText’ or ‘Inventory’. By inheritance, I mean nested names from the general such as ‘LeftPanel’ and ‘RightPanel’ related to ‘Panel’. While tempting, try to avoid names that have too much style specificity such as ‘RedText’. If you later decide red is not for you and you prefer green, you will have confusing style references in your HTML.

Validate Your CSS

Finally, when all CSS changes are updated and finalized, validate your CSS syntax. The W3 organization offers an easy online CSS validator. You may need to make final editing changes based on the validation findings.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 20, 2005 at 9:59 am in Blogs and Blogging, Site-related | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/86/preparing-to-blog-use-of-styles-and-style-sheets/
The URI to trackback this post is: https://www.mkbergman.com/86/preparing-to-blog-use-of-styles-and-style-sheets/trackback/
Posted:July 19, 2005

So, today Tiger almost makes a charge and wins his next major, but Michael Campbell from New Zealand shows more cool and poise and refuses to give down the stretch. [This was written June 19; see below.]

I mention this because — in between watching US Open coverage — I’ve had mucho frustration in working with my pending blog release. The truth is, I’ve lost perhaps three hours of work today (on Father’s Day, no less) because of these quirks and screw-ups. I’d like to scream. Here’s what I think the culprit reasons may be.

  • The delays with server-side editing begin the process; it is not natural to "suspend" normal editing actions while the latency of the network and the server conspire. This, in turn, is complicated by
  • This whole "smalll fry" CMS perspective that has all data being hosted by MySQL. I truly don’t know where the bottlenecks occur, but the delays in posting and updates are HORRIBLE
  • There quite conceivably are editor issues associated with these embedded frameworks. Despite everything I’ve said about tools and testing and the like, most of these pieces "feel" untested, less than "commercial, and incomplete
  • There is also an open source aspect. Granted, anyone can put an open source project out there, and many are impressive, including the ones adopted for AI3. But they often feel unready and incomplete.

It is perhaps not a surprise that productivity benefits from information technology appeared non-existent until just a few short years ago, and then apprarently had a major effect on the accelerated growth that did occur. I suspect much is the same with open source projects and "bleeding edge" initiatives. I can definitely say that I have spent many wasteful hours in the past weeks since deciding to test this blog route. While I can see its attractions, and I believe I understand its popularity, it is also incredibly inefficient. Perhaps like IT in general, it may be years or perhaps decades before we see blog productivity benefits showing up on our income accounts.

I need to update my best practices post. Thus, while I can talk in generalities about broad productivity or not, the fact I am losing efforts, losing time, etc., is not acceptable. I can accept learning. I can accept system fragility. I can not accept not learning from those things to make sure that next steps are not more productive than previous ones.

To put it mildly, today was an interesting experience in getting ready to blog.  I thought that most items had been worked out, and I was now working on "efficient" means for converting my nornal style of working with information — MS Word and Excel, for example — into "easily" transferred blog postings. I discovered that many kinks still remain and productivity can be alarmingly low.

Maybe I’m being too aggressive in wanting to have systems and processes in place to make my work with this site be an "easy" part of my day.  Of course, I’m doing all of this for multiple reasons to:

  • Understand the blogging and self-publishing phenomenon
  • Get my hands dirty with respect to existing tools and infrastructure
  • Actually put in place a procedure that will allow me to continue to contribute in an efficient way
  • Be aggressive about capabilities and understand "gaps" for bloggers (esp. the "top 1%" in moving forward
  • My major discovery for the day is that all of these pieces I have been assembling do not play "nicely" together.

I should continue to work on these pieces to get more productive, document lessons and best practices, and avoid devastating losses of effort.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 19, 2005 at 10:20 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/84/preparing-to-blog-editor-frustrations/
The URI to trackback this post is: https://www.mkbergman.com/84/preparing-to-blog-editor-frustrations/trackback/
Posted:July 18, 2005

After about two months of preparatory effort, I released my blog site today.  I have been keeping a diary chronicle of this site called "Preparing to Blog" that I will soon compile into one major tutorial for PDF download.

For reasons noted on the other ‘Preparing …’ posts, this is the one entry that is actually conterminous with the site release.  All other entries are about one-month delayed.

The actual site release was a bit more laborious than I first would have thought, though most of the effort arose from my recording and changing dates than due to actual release posting demands.  Nonetheless, these were the efforts I needed to complete for the final unveiling:

  • Update the entire site to WordPress release 1.5.1.3, reportedly necessary for security updates and because of some trackback implementation issues (see later)
  • Complete all open postings; there were about four of these carried over from earlier posts.  I need to continue to remind myself to post and "document as you go" rather than these difficult reachback efforts
  • I needed to change the date stamps on all ‘Preparing …‘ posts and put the standard author’s note at the end of each.  Though not strictly demanding, these efforts actually took about 3-4 hours because of the delays in post updates noted earlier (I commit to track down this problem; it is totally unacceptable)
  • The site domain needed to be changed from the termporary IP that had been used to mkbergman.com, which led to the next set of problems 
  • Switchover of URLs and internal links needed to be checked and confirmed
  • Add the actual ping sites used in my options (see this posting for the reference)
  • All efforts charts and total time estimates also needed to be updated (not necessary except to support my own documentary demands).

Version 1.5.1.3 Update

Below is a list of the updated files, following the same steps as indicated in the David Russell blog. Because he was going from v1.5.1.2 to v1.5.1.3 he only had to upgrade 6 files. For us to go from v1.5.1 to 1.5.1.3 we had to upgrade 16 files:


/wp-admin/quicktags.js
/wp-content/themes/default/header.php
/wp-includes/functions-post.php
/wp-includes/functions.php
/wp-includes/pluggable-functions.php
/wp-includes/template-functions-category.php
/wp-includes/template-functions-general.php
/wp-includes/template-functions-links.php
/wp-includes/template-functions-post.php
/wp-includes/version.php
/wp-includes/wp-db.php
/readme.html
/wp-blog-header.php
/wp-login.php
/xmlrpc.php

 

Nonetheless, these final tweaks were achieved, and the site finally went live at about 10:30 am CDT. 

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 18, 2005 at 10:22 am in Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/102/preparing-to-blog-actual-site-release/
The URI to trackback this post is: https://www.mkbergman.com/102/preparing-to-blog-actual-site-release/trackback/
Posted:July 17, 2005

This may be a topic I return to numerous times, but I’m coming to realize that I need some INTERNALLY referenceable stuff for guiding and keeping this site integral.  I suspect that the whole ACTIVITY of writing this blog will create its own lessons (doesn’t everything?), but I’d like to take what I’ve learned to date and codify it for moving forward.

I’m sure every CMS or blog software has or not its own capabilities to do some of this management.  What iI’m working on here is the unique combination of WordPress-Xinha-utilities that are now driving the tasks necessary to care and feed this site.

Many of the previous ‘Prepare to Blog’ categories deal with various public aspects of the site. What I am coming to realize, however, is that the site itself becomes a sort of filing cabinet, attention "fly paper", or other kind of pheromone. As a result, since WordPress has its separate page creating capability, perhaps it makes sense to create "hidden" pages on the site, usefuf for site management purposes, but only available to the site administrators (me). Thus, I have created and populated a starting set of four internal management pages:

  • Common Inserts
  • Common Styles
  • Pending Topics
  • Pending Task Lists.

 

I am not showing links to these, because they are in fact works in progress, should be changing constantly, and may present thoughts and ideas that I am toying with but may never complete. However, the point is not that I have "private" aspects of this site, but that the framework of these sites can also extend into project management and task tracking, aspects not seen by the general public.

Like the guy in the bear suit taling about stream water and Dasani, perhaps there is too much salmon spawning in my efforts.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 17, 2005 at 7:45 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/71/preparing-to-blog-site-project-management/
The URI to trackback this post is: https://www.mkbergman.com/71/preparing-to-blog-site-project-management/trackback/
Posted:July 16, 2005

Now that I’m getting closer to going live, I have begun testing moving MS Word docs directly into the site.  Most of my original research and analysis stuff is done in Word or Excel because I’m pretty much a power user in quick writing, assembly and formatting.  If I could convert docs in a relatively straightforward way to AI3, that would be a real boon. However, as I discovered, there are major, major problems and issues with moving Word documents.

The Big Transfer

My first test involved a beast of an analysis piece I had done on the $3 trillion value of U.S. enterprise document assets.  (It was eventually posted here.) In its long form, it is 42 pages long, with many tables, a few figures and more than 100 citations and links.  It is over 1 MB (1070 KB) in its original form.

Most of us have created Web pages directly from a Word document, and I tried this first. I did the conversion with the filter option, then pasted the result directly into the editor, and attempted to update the site. The transfer seemed to take forever and then the server hung. My suspicion was that the Word HTML code was too complex

Cleaning the Word HTML

Before packing it in and splitting the original doc into multiple pieces so that my site could choke it down, I decided to do a bit of investigation on alternaltive clean up utilities and approaches.  One good review I found was by Laurie Rowell on ‘Clean HTML from Word:  Can it be Done?’.  I recommend the four-parter.

Laurie’s review suggested some improvements could be made from the MS Web page with filtering option using third-party tools, but they did not appear enough to enable me to proceed without splitting my files.  Nonetheless, after following that advice, mostly using the MS Web page with filtering, I again attempted a transfer with results as before.

After this initial cleaning, I used both Word and Composer (the Mozilla HTML editor) to do some search-and-replace removal of further HTML tags. Using pattern replacements, esp. with Word, it is possible to also replace line breaks and tab characters, so long as the base file does not have an expected Word extension such as .doc, .rtf or .html (for convention, I always use .txt).  This was goling well in reducing the sizes of the files (the best I was able to achieve on the 1 MB file was about 450 KB), but the process was laborious even with global search-and-replace.  Furthermore, WordPress was not choking down the smaller files. And unbeknownst at this stage, other issues were being introduced into the files that would make later steps even more difficult.

Splitting Files

I then split the document into six parts, the largest being about 250 kB. As before, they were cut-and-pasted into the editor and then posted. I again got server errors and time-outs. With the assistance of Kevin Klawonn of BrightPlanet, he was able to determine that the Apache server was timing out after 30 sec. With a minor parameter change, we were able to get all files uploaded.

However, while the system was now choking down the files, they looked terrible! Line breaks were totally messed up and being able to edit them within the Xinha editor was close to hopeless. Clearly, and unfortunately, the code was still not clean enough.

Problems with Composer

A natural assumption was that these open source editors were "buggy" and unable to handle more commercial strength requirements. As a natural response, I turned to Composer, a standard HTML utility in my Mozilla browser.

I had not used Composer much before, but found much to like. It has nice toggles between HTML source and WYSIWYG. It offers menu options for most "standard" activities I would undertake with a Web page. In short, I liked working with it and thought it might become an offline (at least from my blog) standard for doing HTML WYSIWG editing of large imports. I actually started becoming familiar with the app and its controls and features.

However, upon actual incoprporaton of the results, I found a nasty truth. Composer introduces forced line breaks at about margin 70. As a basis to incorporate into other apps — all of which need to work nicely together — this was fatal. So much for Composer. I was sad …..

Problems with Xinha

I have observed line breaks being introduced by Xinha, but have not been able to reproduce the actual steps. In general, the system seems to be OK about not introducing spurious breaks, including when editing moves from full to smaller screen.

Cleaning the Word HTML II – Textism

The entire process of using files in multiple applications with mutliple behaviors had worked to create a total HTML nightmare in my test file baseline. Remembering one of the options in Laurie Rowell’s piece (above), I decided to break my normal rule against paid products and check out the Textism site using the Word Clean utility. Dean Allen actually has an interesting pricing model which mixes aspects of free, seduction and low cost.

This is a superior and professional offering:

  • It creates files less than one-third the size of already cleaned MS Word HTML
  • It has an innovative pricing model — beginning with free and moving on to short or annual uses. For 5 euros (about $6.20) the system can be used unlimitedly for 24 hrs; payment is by Paypal. For a full-year use, perhaps likely after being hooked on the day version, the cost is 20 euros (about $25) though with a limit of 75 conversions per month (about three per work day; no constraint for a very aggressive personal use.)
  • The resulting cleaned code, while produced on the server side and needing copying and then pasting locally in order to maintain a saved copy, is extremely clean and formatted well for later management with other utilities. Paragraphs are break separated, items such as bullets are line breaked, and some characters are indented
  • The system is capabile of handling considerable complexity. My test from hell had about 20 tables, 100 citations and endnotes, a table of contents, embedded images, etc.  All code produced was correct and very clean and spare.

In short, a total recommendation. Any user needing to move a few files per month from Word to their blog should defintely consider this service.

Final Adopted Process

Depending on the length of the original Word document and its complexity, I recommend one of two approaches given current tools (at least the ones I have tested.)

For shorter Word documents, those with little complexity, or internal or external references:

  1. Save the document as a .txt file, and then
  2. Cut-and-paste into the online editor and re-establish formatting.

For Word documents that do not meet these conditions, the path is tortuous and onerous:

  1. Make sure the Word file is absolutely complete — you will not want to return to this step!
  2. Save the file with the Word Save As ‘save as type’ using the Web page, filtered option
  3. [If images are used, separately locate them, give them logical names, and later embed in the way you normally handle in-line images with your posts]
  4. Submit the HTML file created by WS Word to the Textism utilitiy. Granted, this utility costs, but the effort saved in its clean HTML procedures is well worth it
  5. Now, with the re-saved clean version, invoke a standard editor that has two capabilities:  1) no enforced word wraps or line breaks; 2) the ability to display and search-and-replace on formatting characters such as line breaks (^p), tabs  (^t), etc.  (Word can perform these functions when files are named .txt prior to input, but use of a standard editor with these features may be preferable.) With this editor, you will do some additonal code clean-up:
    • Removing unnecessary style definitions
    • Formatting the file so that paragraphs split
    • Removing any other recurring HTML code patterns that have nothing to do with the eventual display of your document on your site.
  6. Now, paste your fully cleaned code into your editor for posting on the blog, and
  7. Should you encounter major problems, select all of the code in your blog editor, re-paste it in your standard editor, and do any global replaces and clean-ups.

I know this sounds like a pain, and it is.  You should also keep saved versions of interim steps above to have fallbacks if necessary.

Note: There are instances when the size of the file and the degree of final HTML editing and clean-up may suggest offline editing because server-side editing is slow, updated posts may take forever or experience server time-outs, or they may simply crash the server. If offline editing is necessary, do make sure an HTLM editor is used that does not insert those insidious line breaks. If it does, you will spend hours of frustration trying to get everything clean again.

 

Author’s Note:  I actually decided to commit to a blog on April 27, 2005, and began recording soon thereafter my steps in doing so.  Because of work demands and other delays, the actual site was not released until July 18, 2005.  To give my ‘Prepare to Blog …’ postings a more contemporaneous feel, I arbitrarily changed posting dates on this series one month forward, which means some aspects of the actual blog were better developed than some of these earlier posts indicate.  However, the sequence and the content remain unchanged.  A re-factored complete guide will be posted at the conclusion of the ‘Prepare to Blog …’ series, targeted for release about August 18, 2005.  mkb

Posted by AI3's author, Mike Bergman Posted on July 16, 2005 at 11:03 am in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/48/preparing-to-blog-word-docs-to-html/
The URI to trackback this post is: https://www.mkbergman.com/48/preparing-to-blog-word-docs-to-html/trackback/