Posted:September 18, 2005
This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

My current research efforts involve the semantic Web and ontologies. By the semantic Web I include that topic, plus the related technologies and standards of metadata, ontologies, taxonomies, thesauri, controlled vocabularies, XML, RDF and OWL.

A good starting point on tools is from Michael Denny, which is an update of his 2002 ontology editor survey. Other tools surveys include a 2003 HP review from the SIMILE research program on metadata and thesaurus tools; the Semantic Web has a listing of about 245 tools on its beta Web site; and the W3C, as might be expected with its role in RDF and related standards, has an excellent starting point for developer resources, including entries for related standards and technologies.

The ONTOLOG community also lists some tools resources, but more importantly has a very excellent recommended reading compendium. These links are essential starting points for anyone beginning their investigations into the semantic Web.

Finally, Kendall Clark, editor of XML.com, just posted a fascinating piece on SPARQL 2.0, a possible query language to the semantic Web and a longer article on the possible convergence of Web 2.0 and the semantic Web. As he puts it, I’m starting to catch the scent of one of those big convergence things just possibly starting to happen. It smells like money!

Posted:September 15, 2005

Google announced its new beta blog search service this week, and I immediately went to check it out.  To my dismay, none of my AI3 blog posts were listed!  $#%&*#

My first hint of what to do came from the About Google Blog Search page, which indicated that while Google does not yet have a submission form for submitting pings, the new service does monitor updating services, specifically mentioning Weblogs.com.  I then tried to access this site, which was slower than molasses and I timed out many times (I suspect many others were following the same path I was).

That got me into a whole investigation of ping and ping success in general with my WordPress installation.  (See my earlier post on Pings and Trackbacks).  I was alarmed to discover that many of my ping locations had not been updating well, for reasons that still remain somewhat murky (though others have noted sporadic miscues by WordPress in ping updates, not to mention some of the ping update sites recommended for it such as Ping-o-matic).

The WordPress dashboard suggested that Google was using Ping-o-matic as one of its update services for new listings, so I manually submitted my site again to Ping-o-matic and waited to see the results.  Voila!   After a reasonable hour or so delay, I found my posts and sites now on the Google blog search service and other locations.

Thus, in the interim before Google completes its submission expansions, I recommend that WordPress bloggers who are not yet listed in the Google blog search:

  1. Occasionally manually ping Ping-o-matic rather than rely that your updates are being handled automatically (but, DON’T do it too frequently since that  can be interpreted as spamming behavior)
  2. On a one-time basis, up your synidcation feeds limit on the Options-Reading-Syndication Feeds panel in the dashboard to be large enough to include All of your desired recent postings
  3. Manual submit an update at Ping-o-matic, and
  4. Return the syndicated feeds number to your original amount in your dashboard.

With this simple approach, I can now happily report that all of the AI3 listings are now in the new Google blog search service, and so can yours!

Posted by AI3's author, Mike Bergman Posted on September 15, 2005 at 5:52 pm in Blogs and Blogging, Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/124/getting-listed-on-google-blog-search/
The URI to trackback this post is: https://www.mkbergman.com/124/getting-listed-on-google-blog-search/trackback/
Posted:September 14, 2005

According to iProspect, about 56 percent of users use search engines every day, based on a population of which more than 70 percent use the Internet more than 10 hours per week.[1] The average knowledge worker spends 2.3 hrs per day — or about 25% of work time — searching for critical job information.[2] IDC estimates that enterprises employing 1,000 knowledge workers may waste well over $6 million per year each in searching for information that does not exist, failing to find information that does, or recreating information that could have been found but was not.[3]

Vendors and customers often use time savings by knowledge workers as a key rationale for justifying a document or content initiative. This comes about because many studies over the years have noted that white collar employees spend a consistent 20% to 25% of their time seeking information. The premise is that more effective search will save time and drop these percentages. For example, EDS has suggested that improvements of 50 percent in the time spent searching for data can be achieved through improved consolidation and access to data.[4]

Using these premises, consultants often calculate that every 1% reduction in the total work time devoted to search works out illustratively on a fully burdened basis as a big cost savings benefit:

$50,000 (base salary) * 1.8 (burden rate) * 1.0% = $900/ employee

Beware such facile analysis!

The fact that many studies over the years have noted white collar employees spend a consistent 20% to 25% of their time devoted to search suggests it is the “satisficing” allocation of time to information search. (In other words, knowledge workers are willing to devote a quarter of their time to finding relevant information; the remainder for analysis and documentation.)

Thus, while better tools to aid better discovery may lead to finding better information and making better decisions more productively — an important justification in itself — there may not result a strict time or labor savings from more efficient search.[5] Be careful of justifying project expenditures based on “time savings” related to search. Search is likely to remain the “25% solution.” The more relevant question is whether the time that is spent on search produces better information or not.


[1] iProspect Corporation, iProspect Search Engine User Attitudes, April/May 2004, 28 pp. See http://www.iprospect.com/premiumPDFs/iProspectSurveyComplete.pdf.

[2] Delphi Group, “Taxonomy & Content Classification Market Milestone Report,” Delphi Group White Paper, 2002. See http://delphigroup.com.

[3] C. Sherman and S. Feldman, “The High Cost of Not Finding Information,” International Data Corporation Report #29127, 11 pp., April 2003.

[4] M. Doyle, S. Garmon, and T. Hoglund, “Make Your Portal Deliver: Building the Business Case and Maximizing Returns,” EDS White Paper, 10 pp., 2003.

[5] M.E.D. Koenig, “Time Saved — a Misleading Justification for KM,” KMWorld Magazine, Vol 11, Issue 5, May 2002. See http://www.kmworld.com/publications/magazine/index.cfm.

Posted by AI3's author, Mike Bergman Posted on September 14, 2005 at 12:45 pm in Information Automation, Searching | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/121/search-and-the-25-solution/
The URI to trackback this post is: https://www.mkbergman.com/121/search-and-the-25-solution/trackback/
Posted:September 12, 2005

On Aug. 22 BEA announced it was acquiring Plumtree Software for $200 million.  By any stretch, this is a fire sale price with Plumtree’s 10 years of operating history, 700 customers (including some large notables such as the US Army, US Navy, Airbus, Ford, Proctor & Gamble, Swiss Re), and 21 million reported users.  In addition, Plumtree has failrly significant intellectual property with .NET and J2EE implementations.  It also has $70 million in cash, lowering the acquisition cost still further.

According to IDC, Plumtree was #5 as a general portal vendor, behind IBM and BEA itself, among others.  Nonetheless, this acquisition appears to be the end of the independent general portal vendor.  Corechange (acquired by Open Text for $4.2 million in 2003) and Epicentric (acquired by Vignette for $32 million in 2002) were the previous two independent portal vendors.

Is it open source?  Is is the failure of the general portal model?  Is is ongoing consolidation?  Is is another example of BEA stumbling its way?  Is it all of these?

Only time will tell.  My own suspicion, however, is that the document challenge remains sufficiently broad and interconnected that the general portal is merely a gluing framework, and not the most important piece at that. 

Posted by AI3's author, Mike Bergman Posted on September 12, 2005 at 10:49 am in Information Automation, Software and Venture Capital | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/122/the-shrinking-portal-bea-acquires-plumtree/
The URI to trackback this post is: https://www.mkbergman.com/122/the-shrinking-portal-bea-acquires-plumtree/trackback/
Posted:September 10, 2005

For many years, the apocryphal view (sometimes backed by data) is that most searchers limit their search queries to one or two keywords at most. In my updated search tutorial I cited some references for this, most importantly including this finding has held true since 1994 and that internal analysis at the NEC Princeton research center indicates this limit applies to professional researchers as well.[1]

A just-published study, "Analyzing User Behavior: A Case Study", by Chris Kutler and Ray Devaney, suggests these observations still hold. The authors analyzed more than 6 million searches conducted within The National Archives (TNA) in the UK (which includes public records for England and Wales), and indeed found average searches of from one to two keywords.

However, since the dataset available to the authors spanned from 2001 to 2005, they were also able to discover that search query length did tend to modestly increase to between two to three keywords as users became more familiar with the TNA’ s four underlying databases.

Though an improvement, even 2.5 keywords is hardly adequate and sophisticated query fodder.  Even with ranking adjustments such as for popularity (as Google does) or parts of document weighting (as many search engines do), there is simply no way to square getting accurate results with limited keywords against large document repositories.

Everyone wants magic search bullets, but it is still the case that it is the user that must pull the trigger.  Learning how to construct meaningful and content-rich queries is a skill that should be taught in every public school, much as typing was once done. 


[1] D. Butler, "Souped-up Search Engines," Nature 405:112-115, May 2000.

Posted by AI3's author, Mike Bergman Posted on September 10, 2005 at 12:03 pm in Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/120/revisiting-limited-keywords-in-search/
The URI to trackback this post is: https://www.mkbergman.com/120/revisiting-limited-keywords-in-search/trackback/