Posted:September 2, 2013

Wordpress Logo My WordPress Blog Gets Long in the Tooth; the Upgrade Story

One of the first things I did when I began my blog back in 2005 was to document all of the steps taken to set up and run WordPress on an own server. That how-to document, now retired, was a popular manual for quite a few years. At inception I began with WordPress version 1.5 (it is now at version 3.6) and I modified key files directly to achieve the look-and-feel of the site, rather than using a pre-packaged theme as is the general practice today.

Over the years I have added and replaced plugins, kept WordPress versions current, and made many other changes to the site. Site search, as one case in point, has itself gone through five different iterations. In the early years I also had a small virtual server from a hosting service, which came to perform poorly. It was at that point that I began to learn about server administration and tuning; WordPress is a notable resource hog that can perform poorly on small servers or when not properly tuned.

When Structured Dynamics shifted to cloud computing and software as a service (SaaS) for its own semantic technologies, I made the transition as well with this blog. That transition to AWS worked well, until we started to get Apache server freezes a couple of years back. We did various things to inspect plugins and apache2.conf configuration settings [1] to stablize operation. Then, a couple of months back, site lockouts became more frequent. Since all obvious and standard candidates had been given attention, our working hypothesis was that my homegrown theme of eight years ago was the performance culprit, possibly using deprecated WordPress PHP functions or poor decade-old design.

Software problems like this are an abscess. You can ignore them for a while, but eventually you have to lance the boil and fix the underlying problem.

Updating the Theme

A decade of use and refinement leads to a site theme structure and set of style sheets that is nearly indecipherable. Each new feature, each new change, results in modifications to these things, which are also then sometimes abandoned, but their instructions and data often remain. How this stuff accumulates is the classic definition of cruft.

If one accepts that a theme re-design is at some point inevitable, but it is preferable to make such changes as infrequently as possible, then it is imperative that a forward-looking theme be chosen for the next-generation re-design. HTML5 and mobile first are certainly two of the discernable directions in Web design.

With these drivers in mind, I set out to find a new baseline theme. There are many surveys of adaptive WordPress themes; I studied and installed many of the leading candidates, ultimately settling upon the underscores theme as my new design basis. I chose underscores (also known as “_s”) because its code is clean and modular, it is designed to be modified and tailored (and not simply “themed” via the interaction of colors and choices), it is open source, and there is a nice starting utility that prefixes important function calls within the theme.

Though there is a somewhat useful starting guide for underscores, this theme (and many other starting bases for responsive design) require quite a bit of basic understanding of how WordPress works (comments, the “loop”, etc.) and intermediate or better knowledge of style sheets. A newbie to WordPress or someone not at least with working familiarity of PHP and CSS would be pretty challenged to start tearing into a theme like underscores. A good starting point, for example, is WordPress’ own theme tutorial.

Nonetheless, with a modicum of that knowledge, underscores is structured well and it is fairly easy to discern the patterns and design approach. Basic structural building blocks such as headers, footers, pages, comments, etc., can be extended via their own templates by adding the underscore convention (for example, header.php extended to header_myvariation.php). Most of these specific variations occur around different types of “pages” within WordPress.

For example, my AI3 blog has its own search form, and has special sections for things like the listing of 1000 semantic technology Sweet Tools or the timeline of information history or chronology of all articles or acronyms. These sections of the site are styled slightly differently, and I wanted to maintain those distinctions.

So, one of the first efforts of the update was to add these template extensions to the baseline underscores, including attention to building block components and variants such as header and footer. (The actual first effort made was to decide on the basic layout, which I chose a uniform two-column design, rather than the mixed two- and three-column design of my predecessor. Underscores supports a variety of layouts, and may also be integrated with even more flexible grid systems, something I did not do.) The development of template extensions requires a good understanding of the WordPress theme hierarchy.

WordPress Template Hierarchy

Upon getting these structural modifications in place, the next step was to migrate my prior styles to the new framework. I did this basically by “overloading” the underscores style.css with my variants (style-ai3.css) and loading that style sheet after.

There is much toing-and-froing that occurs when making such changes. A modification is made, the file is updated, and the site page at hand is then re-loaded and inspected. These design iterations can take much time and many tweaks.

Thus, in order to not disrupt my public site while these re-design efforts were underway, I did all development locally on my Windows desktop using the XAMPP package. XAMPP allows all standard Linux components of the tradititional LAMP stack to be installed on Windows locally (as well as other desktop systems). I had used XAMPP many years back. It has really evolved into a slick, well thought-out package that is easy to install and configure on Windows. In fact, one of the reasons I had drug my feet about starting the re-design effort was my memory of the challenges of getting a local version running during development. The updated XAMPP package completely removed these concerns.

Also, I made an unfortunate choice during this local development phase. Not knowing if my migration was going to be satisfactory in the end, I created a parallel directory for my new theme, ai3v2, and kept the original directory of ai3 should I need to return to it. This understandable choice led to another issue later, as I explain below.

Upon getting the development site looking and (mostly) operating the way the original one did, I was then able to upload the new theme to the live site and complete the migration process.

Getting the Site Operational Again

Though my new site did come up at “first twitch” using the redesign, once transferred to the live server a number of issues clearly remained. As I began turning on prior plugins and inspecting the new site, I was also seeing problems sporadically appearing or disappearing. Moreover, once I began viewing my site in other browsers, other issues appeared. There were, apparently, a cascade of issues facing the new site that needed to be tackled in an orderly manner to get at the underlying problems.

The difference between what I was seeing in my standard browser versus test browsers first indicated there were caching problems between my new site and the viewing (browser) device. I was not initiallly seeing some of these because key site objects such as images and style sheets and even JavaScript had been cached. (There are also differences in templates and caching when viewing as an administrator versus a standard public user.)

The best way to isolate such effects is to use a proxy server. There are some online services that provide this, mostly used for anonymous surfing, but which can also be used to test caching and browser effects. To see what new public users would see with my new site I used the Zend2.com anonymous online proxy.

The issues I was seeing included:

  • Missing images
  • Missing layouts and style sheets, and
  • Non-performing JavaScript.

Moreover, while I was seeing some improvement in load averages and demands on my site using the Linux top and sar monitoring tools, I was also not seeing 1-min “load averages” dropping well below 1.00 as they should have. Though load spikes were somewhat better and average loads were somewhat better, too, I was not seeing the kind of load reductions I was hoping for by moving to a more modern theme or updated PHP calls. What the heck was still going on? What really was the root issue?

Tackling the Easy First: Images

The first problem I was seeing of missing images was pretty easy to diagnose. My image files had all been organized centrally under the image subdirectory in my theme, such as http://www.mkbergman.com/wp-content/themes/ai3/images/mkbergman1.jpg (which will now show a 404 error). My change to the directory name to ai3v2 was breaking these internal links. Thus, the first change was pretty straightforward. Using phpMyAdmin, I was able to change the internal database references for files and images to their proper new directory, such as http://www.mkbergman.com/wp-content/themes/ai3v2/images/mkbergman1.jpg. The example SQL for this change is:

UPDATE wp_posts SET post_content = REPLACE (
post_content,
'ai3/',
'ai3v2/');

With just a bit more minor cleanup, all of these erroneous references were updated. However, this approach also had a side effect: Saved URL links by users now pointed to an abandoned subdirectory. That was fixed by adding a re-direct to the site’s .htaccess file:

RedirectMatch ^/wp-content/themes/ai3/(.*)$ http://www.mkbergman.com/wp-content/themes/ai3v2/$1

Yet, though these changes made the site more operational, they still did not address the fundamental performance issue.

A Cascade of Caching

Whenever one tests a Web site for performance using services such as Webtest or Google’s PageSpeed, an inevitable top recommendation is to install some form of caching software. Caching software keeps a copy of Web pages and objects on disk (or memory) such that they can be served statically in response to a Web request, as opposed to being generated dynamically each time a request is made. Caching strategies abound for Web sites, but caches also occur within browsers themselves and in the infrastructure of the Web as a whole (such as the Akamai service). These various caches can be intermixed with CDNs (content delivery networks) where common objects such as files or images are served faster with high availability.

As I tried to turn off various caches and then view changes through the Zend2 anonymous proxy, it became pretty apparent there were many caches in my overall display pathway. Some of the caches, especially those on the server, are also a bit more involved to clear out. (Most server clears involved a SSH root login and sometimes use of a script.) As a measure of what I found in my caching system, here is the cascade I was able to confirm for my site:

server apc WP Super Cache PageSpeed [network cache?] browser cache

apc and the PageSpeed Apache module are particularly difficult to clear on the server. That can also pose difficulties in diagnosing and fixing JavaScript errors and conflicts (see below). In the end, in order to see style and other tweaks immediately, I turned off all caching mechanisms under my control.

What I saw immediately, even before fixing display and style issues, is that the load problems I was seeing with my site completely disappeared. I also saw that — in addition to the immediate improvements — that there were stray files and database remnants from these caches and tests of prior ones scattered across my legacy site. For example, I had tried prior caching modules for WordPress such as WP Total Cache and Quick Cache, old files and data tables for which were still strewn across my system. Clearly, it was time for a spring cleaning!

Cleaning the Augean Stables with a Vengeance

With the realization I had cruft everywhere, I decided to do a thorough scrubbing of the site from code to stylesheets and on to the MySQL data tables. To accompany my new, cleaner theme I wanted to have an entire system that was clean as well.

An Unemotional Look at Plugins

I spent time looking at each of my site’s plugins. While I do this on occasion from a functionality standpoint, this review explicitly considered functionality in relation to code size and performance. Many venerable WordPress plugins have, for example, expanded in scope and complexity over time. If I was only using a portion of the full slate of capabilities for a given plugin, I looked at and tested simpler alternatives. For example, earlier I had abandoned the Sociable plugin for ShareThis as the former got bloated; now ShareThis had become bloated as well. I was able to add four lines of code to my theme to achieve the social service references I wanted without a plugin, without JavaScript, and without reports back to tracking sites.

In all, I eliminated two-thirds of my existing plugins through this cold-blooded assessment. It is not worth listing the before and after of this listing, since the purpose of plugins is to achieve site-specific objectives, and yours will vary. But, it is probably a good idea to take periodic inventory of your WordPress plugins and remove or replace some using performance and bloat as guiding criteria.

Excising Unneeded DB Tables

At the start of this process I had something like 40 or more tables in my MySQL database for the AI3 blog. I was now intent on cleaning out all unneeded data in my database. Some of these tables were prefixed with plugin-specific names; others needed to be looked up and inspected one-by-one.

It is always a nervous time to make changes directly to a database, so I first backed up my DB and then began to eliminate old or outdated tables. The net result was a reduction of about two-thirds, leaving the eleven standard WordPress data tables:

wp_commentmeta
wp_comments
wp_links
wp_options
wp_postmeta
wp_posts
wp_terms
wp_term_relationships
wp_term_taxonomy
wp_usermeta
wp_users

and, another five due to the Relevanssi search plugin (the use of which I described in an earlier post).

One table that deserves particular attention is wp_options, wherein most plugin and standard WordPress settings are recorded. This table needs to be inspected individually to remove unused options. If there is doubt, do a search on the individual option name; most related to retired plugins will have similar prefixes. As a result, many columns (fields) in that table got removed as well.

Removing Unneeded CSS

An eight-year old style sheet, plus the addition of a generic one for the starting underscores theme shell, suggested there were many unused calls in my style sheets. I checked out a number of online options for detecting unused CSS, but did not like them as either come-ons to purchased services or their degree of obtrusiveness.

As a result, I chose to use CSS Usage, which is a Firefox addon to Firebug. When set in “auto-scan” mode, I then proceeded to view all of the unique page types across my site. When done, I was able to report out a listing of my style sheets, with all unusued selectors marked as UNUSED. For readability purposes, I was able to re-establish a clean, readable CSS file using one of the many online CSS format utilities. From that point, I then proceeded to delete all unused selectors. By keeping a backup, I was able to restore selectors that were unintentionally deleted.

In the process, I was also able to see reuse and generalizations in the CSS, which enabled me to also make direct modifications and improvements. I then proceeded to review and inspect the site and note any final CSS corrections needed.

A Critical Review of JavaScript

Finally, some of the JavaScript portions of my site still experienced conflicts or long-loading times or wrong loading orders. Some of these offenders had been removed during the earlier plugin tests. I still needed to change order and placements, though, for my site’s timeline and for the Structured Dynamics popup tab to get them to work.

Interface Tweaks

Across this entire process I was also inspecting interface components and widgets throughout the site. My prejudice here was to eliminate very occasional uses or complicated layouts or designs that added to site complexity or slower load times. I have also tweaked my masthead code to get better display of the four-column elements across browsers and devices. It is still not perfect, but better than it ever has been across this diversity of uses.

Optimizing Anew

I am still building up my site from these steps. For example, various Web page speed tests have indicated improvements, but also other areas for optimization. Currently AI3‘s speed tests range in the 90 to 92 score range, better than 85% or so of Web sites, despite the fact my blog is content and image “heavy” with multiple posts on the front page. I tried adding back in WP Super Cache, but then removed it after a few days because load resources remained too high. I most recently tried WP Total Cache again, and so far am pleased that load averages have declined while page load times have also decreased.

WP Total Cache is in-your-face for upgrading for pay and self-promotion, and is a total bitch to configure, the same reasons why I abandoned it in the first place. But, it does seem to provide a better combination of response times and server demands more appropriate to a scalable site.

I have continued to look at optimizing image loads and sprites, and am also looking at how external CSS and JS calls can be improved. Though somewhat time consuming, I now feel I have a pretty good grasp on the moving parts. I should be able to diagnose and manage my system with a much greater degree of confidence and effectiveness going forward.

Some High-level Lessons

Open source and modular code is fantastic, but eventually using it without discrimination or occasional monitoring and testing can lead to untoward effects. Lightweight uses may perhaps get by with minimal attention. However, in the case of this blog, with more than 7000 readers, more attention is required.

The abscess that caused this redesign has now gone away. Site performance is again pretty good, and most all components have been looked at with specific attention and diligence. Yet, the assumed root cause of these issues may, in fact, not have been the most important one. Rather than outdated themes or PHP functions, the greatest performance hit on this AI3 site appears to have been unintended and adverse effects from combined caching approaches. So, while it is good that the underlying code and CSS has been updated, it took investigating these issues to turn up the more fundamental importance of caching.

As for next steps, I have learned that these monitoring and focus items are important:

  • Clean DB tables whenever new plugins or options are made to the site
  • Be cognizant of caching residuals and caching conflicts or thrash
  • Still not sure all reconciliations are complete; will continue to turn over other rocks and clean them
  • Probably some mobile and cross-browser display options need further attention, and
  • Ongoing good performance requires constant testing and tweaking.

In many respects, these are simply the lessons that any sysadmin learns as he monitors and controls new applications. The real lesson is if one is to take on the responsibility of one’s own Web site, then all aspects — including system administration and knowledge discovery — need to also be a responsible part of the mix.


[*] To be hummed according to the tune of Bringing in the Sheaves.
[1] Depending on the flavor of Linux (my instance is Ubuntu), this command may differ, or the commands may be placed in .htaccess.

Posted by AI3's author, Mike Bergman Posted on September 2, 2013 at 10:36 pm in Blogs and Blogging, Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/1670/cleaning-up-the-cruft/
The URI to trackback this post is: http://www.mkbergman.com/1670/cleaning-up-the-cruft/trackback/
Posted:June 4, 2013

Wondering about WordsEven Our Standard Stuff is Subject to Miscommunication

For some reason, I seem to have been caught in a number of transactions recently where ABSOLUTE precision in communication has been required. One instance involved an insurance policy and when a particular form of coverage becomes active. One instance related to a business communication that led to vendor conflict. One involved the tax authorities — whoops, should perhaps not say more about that one. Others included . . . fill in your own answer.

As someone who prides himself (a bit) on trying to be precise in communications, these circumstances all bring pause. Even casual stuff is liable to miscommunication; one never knows. Precision errors may occur either via the lack of proper breadth or the absence of sufficient depth or the lack of clarity in whatever it is you try to say. Precise communication will never be mastered.

Yet, that being said, we must communicate, so we also need some guidelines. I think, firstly, we must speak our minds when the thought and muse strikes us. Secondly, we should try to sit on that material a bit before we hit Send.

Honest communication from the heart is warranted in all circumstances, though we may change tone due to perceptions of the audience and perceived potential of misinterpretation. Perception often misjudges audiences. Perhaps the only known is that communications with bureaucracies should be entirely factual with no adjectives.

In the end, the question we need to ask of our communications is simple: do we wish to achieve an action? or, do we wish to go on record? The latter is sometimes more satisfying and occasionally is also the most effective for action. It can be cathartic, yes, and that is also sometimes justification to speak truth to power.

But, in most cases, the purpose of communications is to persuade. There needs to be a sensitivity to tone, language and empathy. Because most of our communications are attempts to persuade and not rants, it is clear why so often our communications fail: it is frightfully hard — in the end, near impossible — to walk in someone else’s shoes.

Posted by AI3's author, Mike Bergman Posted on June 4, 2013 at 11:17 pm in Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1648/every-word-gets-parsed/
The URI to trackback this post is: http://www.mkbergman.com/1648/every-word-gets-parsed/trackback/
Posted:June 4, 2012

Popular ‘Timeline of Information History’ Expands by 30%

Since its first release four years ago, the History of Information Timeline has been one of the most consistently popular aspects of this site. It is an interactive timeline of the most significant events and developments in the innovation and management of information and documents throughout human history.

Recent requests for use by others and my own references to it caused me to review its entries and add to it. Over the past few weeks I have expanded its coverage by some 30%. There are now about 115 entries in the period ranging from ca 30,000 BC (cave paintings) to ca 2003 AD (3D printing). Most additions have been to update the past twenty or thirty years:

A Timeline of Information History

The timeline works via a fast scroll at the bottom; every entry when clicked produces a short info-panel, as shown.

All entries are also coded by a icon scheme of about 20 different categories:

Book Forms and Bookmaking Calendars Copyrights and Legal Infographics and Statististics
Libraries Maps Math and Symbology Mechanization
Networks New Formats or Document Forms Organizing Information Pre-writing
Paper and Papermaking Printing Science and Technology Scripts and Alphabets
Standardization and Typography Theory Timelines Writing

You may learn more about this timeline and the technology behind it by referring to the original announcement.

Posted by AI3's author, Mike Bergman Posted on June 4, 2012 at 9:28 am in Adaptive Information, Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/1012/information-timeline-gets-major-update/
The URI to trackback this post is: http://www.mkbergman.com/1012/information-timeline-gets-major-update/trackback/
Posted:February 13, 2012

Bandersnatch image from Final Fantasy VII, Japanese version Shun the Frumious Bandersnatch?

The Web and open source have opened up a whole new world of opportunities and services. We can search the global information storehouse, connect with our friends and make new ones, form new communities, map where stuff is, and organize and display aspects of our lives and interests as never before. These advantages compound into still newer benefits via emergent properties such as social discovery or bookmarking, adding richness to our lives that heretofore had not existed.

And all of these benefits have come for free.

Of course, as our use and sophistication of the Web and open source have grown we have come to understand that the free provision of these services is rarely (ever?) unconditional. For search, our compact is to accept ads in return for results. For social networks, our compact is give up some privacy and control of our own identities. For open source, our compact is the acceptance of (generally) little or no support and often poor documentation.

We have come to understand this quid pro quo nature of free. Where the providers of these services tend to run into problems is when they change the terms of the compact. Google, for example, might change how its search results are determined or presented or how it displays its ads. Facebook might change its privacy or data capture policies. Or, OpenOffice or MySQL might be acquired by a new provider, Oracle, that changes existing distribution, support or community involvement procedures.

Sometimes changes may fit within the acceptable parameters of the compact. But, if such changes fundamentally alter the understood compact with the user community, users may howl or vote with their feet. Depending, the service provider may relent, the users may come to accept the new changes, or the user may indeed drop the service.

The Hidden Costs of Dependence

But there is another aspect of the use of free services, the implications of which have been largely unremarked. What happens if a service we have come to depend upon is no longer available?

Abandonment or changes in service may arise from bankruptcy or a firm being acquired by another. My favorite search service of a decade ago, AltaVista, and Delicious are two prominent examples here. Existing services may be dropped by a provider or APIs removed or deprecated. For Google alone, examples include Wave and Gears, Google Labs, and many, many APIs. (The howls around Google Translate actually caused it to be restored.) And existing services may be altered, such as moving from free to fee or having capabilities significantly modified. Ning and Babbel are two examples here. There are literally thousands of examples of Web-based free services that have gone through such changes. Most have not seen widespread use, but have affected their users nonetheless.

There is nothing unique about free services in these regards. Ford was able to cease production of its Edsel and change the form factor of the Thunderbird despite some loyal fans. Sugar Pops morphed into a variety of breakfast cereal brands. Sony Betamax was beat out by VHS, which then lost out to CDs and now DVDs. My beloved Saabs are heading for the dustbin, or Chinese ownership.

In all of these cases, as consumers we have no guarantees about the permanence of the service or the infrastructure surrounding it. The provider is solely able to make these determinations. It is no different when the service or offering is free. It is the reality of the marketplace that causes such changes.

But, somehow, with free Web services, it is easy to overlook these realities. I offer a couple of personal case studies.

Case Study #1: Site Search

I have earlier described the five different versions of site search that I have gone through for this blog. The thing is, my current option, Relevanssi, is also a free plug-in. What is notable about this example, though, is the multiple attempts and (unanticipated) significant effort to discover, evaluate and then implement alternatives. Unfortunately, I rather suspect my current option may itself — because of the nature of free on the Web — need to be replaced at some time down the road.

Case Study #2: FeedBurner

Part of what caused me to abandon Google Custom Search as one of the above search options was the requirement I serve ads on my blog to use it. So, when I decided to eliminate ads entirely in 2010 I not only gave up this search option, but I also lost some of the better tracking and analytics options also provided for free by Google. Fortunately, I had also adopted FeedBurner early in the life of this blog. It was also becoming increasingly clear that feed subscribers — in addition to direct site visitors — were becoming an essential metric for gauging traffic.

I thus had a replacement means for measuring traffic trends. Google (strange how it keeps showing up!) had purchased FeedBurner in 2007, and had made some nice site and feature improvements, including turning some paid services into free. The service was performing quite well, despite FeedBurner’s infamous knack to lose certain feed counts periodically. However, this performance broke last Summer when my site statistics indicated a massive drop in subscribers.

The figure below, courtesy of Feed Compare, shows the daily subscriber statistics for my AI3 blog for the past two years. The spikiness of the curve affirms the infamous statistics gaps of the service. The first part of the curve also shows nice, steady growth of readers, growing to more than 4000 by last Summer. Then, on August 16, there was a massive drop of 85% in my subscriber counts. I monitored this for a couple of days, thinking it was another temporary infamous event, then realized something more serious was afoot:

Drop in Reported Feedburner Subscribers

It was at this point I became active on the Google group for FeedBurner. Many others had noted the same service drop. (The major surmise is that FeedBurner now is having difficulty including Feedfetcher feeds, which is interesting because it is the feed of Google’s own Reader service, and the largest feed aggregation source on the Web.)

Over the ensuing months until last week I posted periodic notices to the official group seeking clarification as to the source of these errors and a fix to the service. In that period, no Google representative ever answered me, nor any of the numerous requests by others. I don’t believe there has been a single entry on any matter by Google staff for nearly the past year.

I made requests and inquiries no fewer than eight times over these months. True, Google had announced it was deprecating the FeedBurner API in May 2011, but, in that announcement, there was no indication that bug fixes or support to their own official group would cease. While it is completely within Google’s purview to do as it pleases, this behavior hardly lends itself to warm feelings by those using the service.

Finally, last week I dropped the FeedBurner stats and installed a replacement WordPress plugin service [1]. It was clear no fixes were forthcoming and I needed to regain an understanding of my actual subscriber base. The counts you now see on this site use this new service; they show the continuation of this site’s historical growth trend.

Is Google Becoming More Frumious?

It is not surprising that in the prior discussions Google figures prominently. It is the largest provider of APIs and free services on the Web. But, even with its continuing services, I am seeing trends that disturb me in terms of what I thought the “compact” was with the company.

I’m not liking recent changes to Google’s bread and butter, search. While they are doing much to incorporate more structure in their results, which I applaud, they are also making ranking, formatting and presentation changes I do not. I am now spending at least us much of my search time on DuckDuckGo, and have been mightily impressed with its cleanliness, quality and lack of ads in results.

I also do not like how all of my current service uses of Google are now being funneled into Google Plus. I am seeing an arrogance that Google knows what is best and wants to direct me to workflows and uses, reminiscent of the arrogance Microsoft came to assume at the height of its market share. How does that variant of Lord Acton’s dictum go? “Market share tends to corrupt, and absolute market share corrupts absolutely.”

We are seeing Google’s shift to monetize extremely popular APIs such as Maps and Translate. My company, Structured Dynamics, has utilized these services heavily for client work in the past. We now must find alternatives or cost the payment for these services into the ongoing economics of our customer installations. Of course, charging for these services is Google’s right, but it does change the equation and causes us to evaluate alternatives.

I fear that Google may be turning into a frumious Bandersnatch. I’m not sure we will shun it, but we certainly are changing our views of the basis by which we engage or not with the company and its services. Once we shift from a basis of free, our expectations as to permanence and support change as well.

Big Boys Don’t Cry

This is not a diatribe against Google nor a woe is us. Us big kids have come to know that there is no such thing as a free lunch. But that message is getting reaffirmed now more strongly in the Web context.

There can be benefits from seeking, installing or adapting to new alternatives with different service profiles when dependent services are abandoned or deprecated. Learning always takes place. Accepting one’s own responsibility for desired services also leads to control and tailoring for specific needs. Early use of free services also educates about what is desired or not, which can lead to better implementation choices if and when direct responsibility is assumed.

But, in some areas, we are seeing services or uses of the Web that we should adopt only with care or even shun. Business opportunities that depend on third-party services or APIs are very risky. Strong reliance on single-provider service ecosystems adds fragility to dependence. Own systems should be designed to not depend too strongly on specific API providers and their unique features or parameters.

Free is not forever, and it is conditional. Substitutability is a good design practice to embrace.


[1] I may detail at a later time how this replacement service was set up.

Posted by AI3's author, Mike Bergman Posted on February 13, 2012 at 7:02 pm in Blogs and Blogging, Site-related | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/996/the-conditional-costs-of-free/
The URI to trackback this post is: http://www.mkbergman.com/996/the-conditional-costs-of-free/trackback/
Posted:July 25, 2011

WordPressOvercoming the Limitations of WordPress Search

Since the inception of this AI3 blog a bit over six years ago, I have gone through five different approaches to local site search, all geared to overcome the limitations of WordPress‘ native search function. The current and last iteration uses the Relevanssi plug-in, the best I have used so far. (Check it out yourself in the search box to the upper right.) I describe these five iterations in this post.

Iteration #1: Native WordPress Search

When first released, AI3 used the native search that comes with the WordPress installation (when first installed that was WP version 1.5; the current version is at 3.2.1). That was OK when few knew of my site and the number of visitors was low.

But the WP search is known to suck, mostly because of search results based on date posted not relevance and its slow performance. Once I began to get more traffic, it was time for a change.

Iteration #2: Google Custom Search

The option I have kept longest on this site is Google’s Custom Search. When first announced at the end of 2006 it was a real godsend and very innovative. I installed my first version in January 2007 and continued to make modifications and use it up through April 2010. I used it on various sites with many different types of Custom Search implementations.

Unfortunately, to use the free version it is necessary to include ads that Google provides. For a while, this served my purposes, since I was actively trying to learn whether ad revenues were viable for a standard blog and what kinds of traffic are necessary to produce meaningful revenues. However, by early 2010 I had come to the conclusion that — even with a quite popular blog for its niche — that ad revenues would never be that meaningful and it was not worth cluttering up my site. So I ended my experiment with Google ads and, being cheap, chose not to use the paid version of the search service and thus dropped the system.

What I liked:

  • Easy set up
  • Familiar search syntax and interface.

What I did not like:

  • Inclusion of Google ad panels
  • Lack of flexibility is styling search results presentation
  • Need for a Google key
  • Inability to tweak ordering of search results
  • Intrusive Google logos in multiple places.

Iteration #3: Bing Site Owner

Microsoft’s Bing was starting to come on strongly at that time so I decided next to try the Bing Site Owner’s service. I began this new approach immediately upon retiring Google.

What I liked:

  • Very easy set up
  • Acceptable flexibility in styling results
  • Nice popup implementation
  • Not overly intrusive with the Bing (MS) brand.

However, without direct notice, Microsoft ended this service as of April of this year.

What I did not like:

  • Service went dark
  • Cancelled service without any notification (except on the Bing webmaster’s site, a location I never visited)
  • No alternatives to the Bing API 2.0 with its difficult set up.

I was pretty pleased with the Bing service and would likely have continued using it because it wasn’t broke. But, the sudden plug-pulling was offputting.

Thus, I decided, heck, if I was going to have to go through the effort of learning the new Bing API, I might as well learn to do it all myself.

Iteration #4: WPSearch 2

So, it was back to researching options and WP plug-ins on the Web. After assembling the options, I first chose to go with WPSearch 2. The thing that most initially attracted me to this option was its reliance on the Lucene open source search engine, the same option that my company Structured Dynamics uses in its Solr text indexing for the Open Semantic Framework (OSF).

Since my AI3 blog theme is of my own design with many changes over the years, I had lost its original capabilities in having a native search form and search results page. So, my first task after installing the WPSearch plugin and indexing my content was to add these pages to my theme. The WP Codex has an OK set of instructions on creating a search page and related discussion.

There are some valuable tutorials out there that explain how this is done; I refer to them rather than repeat such information here.

I completed this work and kept WPSearch 2 up and active on my site for roughly the past week. But, I also kept trying to achieve some of the aspects I wanted in formatting and organizing search results and became increasingly frustrated. I also experienced numerous freezes and white screens and fatal PHP errors while editing new pages or deleting comment spam that told me I simply had to abandon this option.

In summary, what I liked:

  • Use of Lucene search engine
  • Very fast performance
  • Known search syntax.

What I did not like:

  • Duplicate results
  • Freezes and timeouts when managing comments or new edits
  • Inability to capture total search count (at least with my own PHP skills)
  • Inability to highlight search terms.

I’m sorry that I needed to abandon this option, since I do view highly the underlying Lucene text engine. But, the integration with existing WP functionality and other modules was not fully baked. I think with more work, including exposing more of the Lucene search API functionality, that this option could redeem itself. But, as of today, it is not reliable enough for my site.

Iteration #5: Relevanssi

In trying to find hacks and workarounds to some of the desires and issues noted above, I had come across reference to the Relevanssi plug-in, which appeared to embrace much of what I was looking to achieve. The download is quite small (100 K) and must therefore use the native WP MySQL for the index, but it is feature rich and has a strong relevance-ranking and with ranking flexibility. There is great flexibility and configurability in how search results get presented, also an attraction.

Installation of this system and then indexing was very clean and straightforward. It has a syntax that readily supports the Boolean AND operator (the default behavior I have set for the site) (if the AND search finds no matches, it will automatically do an OR search) and phrase searching, with the prior links showing examples from this blog (also see the search form at upper right).

As implemented, then, here is the listing of major features in Relevanssi:

  • Total number of search results (implemented)
  • Search term highlighting (implemented)
  • Contextual excerpt snippets (implemented)
  • Sort by date (not implemented)
  • Category search (not implemented)
  • Filter by date (not implemented)
  • Filter by category or tag (not implemented).

Here is a screen capture of the complete configuration menu in WordPress for Relevanssi:

Relevanssi Configuration Options

For further information, you may also want to see some more advanced search functions and the Relevanssi knowledge base.

Posted by AI3's author, Mike Bergman Posted on July 25, 2011 at 3:40 am in Blogs and Blogging, Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/966/five-iterations-of-site-search/
The URI to trackback this post is: http://www.mkbergman.com/966/five-iterations-of-site-search/trackback/