Posted:January 9, 2007

Firefox LogoEarly Progress in the Use of Firefox as a Semantic Web Platform

This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

The other day I posted a general status and statistical report on the growth and implications of Firefox extensions. This post presents more than 30 of those nearly 3,000 extensions that may have usefulness in areas related to the semantic Web. I welcome any additions.

These same extensions have also been added to an update of the Sweet Tools listing, which has now grown to more than 350 tools.

Please note that because the spreadsheet is hosted by Google, you must copy the URL to your address bar rather than clicking directly (direct clicking is anticipated in future versions of the Google spreadsheet; now works):

http://spreadsheets.google.com/ccc?key=0AqUZpo78do-GcEdGU1NTWk1nUU56ZVBGbjc0LTJ4TlE&hl=en

I should mention that I have seen some commentary within the semantic Web community of the desirability of compiling “best of” or “Top X” tools listings for the semantic Web. While such lists have their place, they are no substitute for comprehensive listings. First, semantic tools are still in their infancy and it is premature to bestow “best of” in most categories. Second, many practitioners, such as me, are working to extend and improve existing tools. This requires more comprehensive listings, not narrower ones. And, last, what may ultimately contribute to semantic meaning on the Internet may well extend beyond semantic Web tools, strictly defined. An ivory tower focus on purity is not the means to encourage experimentation and innovation. Many Web 2.0 initiatives, including tagging and social collaboration, may very well point to more effective nucleation points for expanding semantic Web efforts than W3C-compliant efforts.

These are some of the reasons that I have been happy to include simple Firefox extensions or relatively narrow format converters for my listings. Who knows? You never know when and where you might find a gem! (And I’m not speaking solely of Ruby!)

Posted:January 8, 2007

Firefox 2A New Generation of Browser Wars is Brewing

Since its release in early 2004 (with version 0.8), Firefox has achieved phenomenal success, passing the 200 million download mark in late July 2006 and now estimated at the 282 million level or so (you can see or get a copy of the counter from Spread Firefox). Though there are seasonality factors and growth, Firefox downloads are now on the order of 3.5 million to 4 million per week.

After creaming Netscape in the browser wars of the late 1990s, it is widely acknowledged that Microsoft left Internet Explorer to languish for about five years or so, giving the opening for Firefox (and earlier Opera, though at much lower market share) to gain a toehold with fresh innovations. Some of the innovative hallmarks of early Firefox were tabbed browsing, broad operating system (OS) compatibility (Linux, Windows, Mac), constant improvements, and full and complete adherence to open standards and code access.

Though, of course, downloads by no means translate into actual users (many download and then abandon and many downloads are for version upgrades), nonetheless various independent market research firms estimate steady market share gains for Firefox. According to a report last week in ComputerWorld:

Propelled by the release of its Version 2.0 in October, the free Firefox Web browser saw almost a 50% increase in use during 2006, according to one Web measurement firm. The open-source Firefox browser was used by 14% of computers online at the end of 2006, according to Aliso Viejo, Calif.-based Net Applications. That was 46% higher than its 9.6% share of the browser market at the beginning of the year.

General consensus views are that actual Firefox market share is on the order of 12% to 15% currently.

Microsoft is fighting back, with its recent release of IE v 7 adding tabbed browsing and many of the innovative features first brought by Firefox. (This has also resulted in some hilarious send-ups, such as this mock MS site touting the purchase of Firefox). If nothing else, Firefox has helped add new competitive juice to the browser market. But, there is an even more important stealthy factor underlying these trends that bodes very well indeed for Firefox’s future and ongoing threats to Microsoft.

Stealth Extensions Growth

The Mozilla team when initiating Firefox adopted a very prescient stance: to completely open up, modularize and simplify the architecture and to publish clear and easy guidelines for extending it. This stance enabled a lean-and-mean initial browser download but more importantly provided an inviting framework for extending the system through new themes and functionality. These add-ons initially started quite slowly and first consisted of infrastructure extensions. Then, as Firefox code and documentation matured, a broader group of developers began to also see this farsighted vision and began contributing their own extensions.

Today, Firefox has just passed the 2,200 mark for extensions as maintained by its own add-on directory service. This growth is accelerating, as my figure constructed from Mozilla’s online data shows:

My own research suggests only about 90% of available extensions are listed officially on the Mozilla add-on site. The remaining are company- or Web-site-specific extensions or are experimental ones maintained (often largely) by universities. It is perhaps likely that there are about 3,000 or so extensions (separate from another 1,500 or so themes) currently extant.

What is most notable with recent trends is the growth — the number of extensions has grown by what I estimate to be 123% in the past twelve months (based on Mozilla directory data) — and the comprehensiveness and sophistication of the new offerings. Extensions are now being added at about the rate of 10 per day! and in every conceivable subject area.

As with other aspects of the Internet, extension popularity follows the typical power curve. The most popular of the extensions, such as ad blockers or video download assistants, can reach 150,000 per week or more. Quite a few extensions exceed millions in total downloads and some with many version upgrades exceed 10 to 20 million downloads. The distribution by rank popularity and downloads for these 2,200 extensions is shown in the figure below:

Again, using Mozilla data, extension downloads are on the order of 3 million per week, or nearly one extension per standard Firefox download. These extensions are growing in popularity and ubiquity and some users have documented adding 200 or more extension to their basic Firefox package. (Of course, such numbers are absurd per user, and rational means for managing and organizing multiple extensions are also emerging.) Indeed, I will shortly publish another list of about 30 extensions of specific benefit to semantic Web browsing and tasks. Extension bundles of benefit to every need and interest can easily be found.

The Role of the Browser as Platform

What is most compelling about these trends is the emerging centrality of the browser as the dominant software application in most users’ computing lives. This is part of the ongoing trend to Web OSs, as my earlier post on Parakey noted (whose founders are Firefox developers with a strong background in XUL). Firefox is truly notable for the beauty and clean design as a platform for hosting Internet applications. Using XML, XUL and its chrome files, virtually every aspect of the Firefox platform is open to extension. The Javascript examples and the fact that many of the available extensions are also fully available in source code with non-limiting open source licenses provides many examples and exemplars for still further extension innovation.

So, while Microsoft may be able to match browser-wide feature innovations such as tabbed browsing, unless it chooses as well to open the IE platform to a similar extent (granted a difficult task given the inherent proprietary architecture), I believe Microsoft will be hard pressed to maintain its dominant browser market share under assault from the global developer community. Not only are we seeing the democratization of software development through open source, we are also going to increasingly see the democratization of programming as non-programmers in the conventional sense embrace the tools and techniques being innovated by the likes of the Firefox community.

Posted:January 5, 2007

AI3′s Comprehensive Listing of Semantic Web and Related Tools

This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

Since my first posting of 175 semantic Web tools and then an update to 250, the listing has become quite popular and an apparent asset to the semantic Web community. While this AI3 tools listing is not as precise and restricted as the “official” ESW one on the W3C’s Web site, it does contain useful adjunct tools in such areas as parsers, natural language processing, wrappers and the like that are also of potential usefulness to semantic Web practitioners.

Because of the popularity of this listing, I decided to make it easier to access and update by others in the community. Thus, I converted the listing to a permanent feature of this blog (see the Sweet Tools link to the upper left in the Main Links area) as well as posted a publicly accessible Google spreadsheet link (requires Google account!) for direct updates.

Current Listing

As of the date of this posting, I have added 42 new tools since version 5. The listings are posted as an Exhibit-based lightweight structured data publication (as explained here), which allows filtering, sorting and current statistics.

I continue to characterize the listings by: 1) FOSS (free and open source software), with about 90% of the listings being so; and 2) a categorization of the tool type. Currently, there are 27 categories listed, of which some of the tools are surely mis-characterized. If you add a tool (see below), please try to use these categories or suggest a new one to me directly.

I should also note that I track about 250 companies that provide semantic Web software (generally) under license fees. Most of those companies are NOT included in this listing; I may add these at a later point, but such tools are generally quite expensive. (To learn more about these companies, you may want to try SweetSearch, and then restrict by the ‘Company’ facet.)

Finally, you might be interested in the open source popularity of these listings. Raphael Volz published a popularity analysis of the earlier 250 tools listing based on SourceForge statistics; very interesting reading! Thanks, Raphael.

Selective v. Comprehensive Listings

I should mention that I have seen some commentary within the semantic Web community of the desirability of compiling “best of” or “Top X” tools listings for the semantic Web. While such lists have their place, they are no substitute for comprehensive listings. First, semantic tools are still in their infancy and it is premature to bestow “best of” in most categories. Second, many practitioners, such as me, are working to extend and improve existing tools. This requires more comprehensive listings, not narrower ones. And, last, what may ultimately contribute to semantic meaning on the Internet may well extend beyond semantic Web tools, strictly defined. An ivory tower focus on purity is not the means to encourage experimentation and innovation. Many Web 2.0 initiatives, including tagging and social collaboration, may very well point to more effective nucleation points for expanding semantic Web efforts than W3C-compliant efforts.

These are some of the reasons that I have been happy to include simple Firefox extensions or relatively narrow format converters for my listings. Who knows? You never know when and where you might find a gem! (And I’m not speaking solely of Ruby!)

Two Ways to Contribute

If you have new tools to add, corrections to current listings, or any other suggestions, you have two ways to contribute. The easiest way is to post a comment to this entry and I will update the listing based on your input. The second way is to access the Google spreadsheet link itself and make changes directly. I will continue to keep this spreadsheet public unless spam proves to be a problem.

Thanks for your interest and Enjoy!

Posted:January 4, 2007

After feeling like I was drowning in too much Campbell’s alphabet vegetable soup, I decided to bite the bullet (how can one bite a bullet while drinking soup?!) and provide an acronym look-up service to this AI3 blog site. So, welcome to the new permanent link to the Acronyms & Glossary shown in my standard main links to the upper left.

This permanent (and sometimes updated page) lists about 350 acronyms related to computer science, IT, semantic Web and information retrieval and processing. Most entries point to Wikipedia, with those that do not instead referencing their definitive standards site.

I welcome any suggestions for new acronyms to be added to this master list. Yabba dabba doo . . . . (oops, I meant YAML YAML ADO).

Posted by AI3's author, Mike Bergman Posted on January 4, 2007 at 10:48 am in Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/313/new-acronym-listing-from-ado-to-yaml/
The URI to trackback this post is: http://www.mkbergman.com/313/new-acronym-listing-from-ado-to-yaml/trackback/
Posted:January 3, 2007

Google Co-op Custom Search Engines (CSEs) Moving Forward at Internet Speed

Since its release a mere two months ago in late October, Google’s custom search engine (CSE) service, built on its Co-op platform, has gone through some impressive refinements and expansions. Clearly, the development team behind this effort is dedicated and capable.

I recently announced the release of my own CSE — SweetSearch — that is a comprehensive and authoritative search engine for all topics related to the semantic Web and Web 2.0. Like Ethan Zuckerman who published his experience in creating a CSE for Ghana in late October, I too have had some issues. Ethan’s first post was entitled, “What Google Coop Search Doesn't Do Well,” posted on October 27. Yet, by November 6, the Google Co-op team had responded sufficiently that Ethan was able to post a thankful update, “Google Fixes My Custom Search Problems.” I’m hoping some of my own issues get a similarly quick response.

Fast, Impressive Progress

It is impressive to note the progress and removal of some early issues in the last two months. For example, early limits of 1,000 URLs per CSE have been upped to 5,000 URLs, with wildcard pattern matches improving this limit still further. Initial limits to two languages have now been expanded to most common left-to-right languages (Arabic and Hebrew are still excluded). Many bugs have been fixed. The CSE blog has been a welcome addition, and file upload capabilities are quite stable (though not all eventual features are yet supported). The Google Co-op team actively solicits support and improvement comments (http://www.google.com/support/coop/) and a useful blog has been posted by the development team (http://googlecustomsearch.blogspot.com/).

In just a few short weeks, at least 2,100 new CSEs have been created (found by issuing the advanced search query, ‘site:http://google.com/coop/cse?cx=‘ to Google itself, with cx representing the unique ID key for each CSE). This number is likely low since newly created or unreleased CSEs do not appear in the results. This growth clearly shows the pent up demand for vertical search engines and the desire for users to improve authoritativeness and quality. Over time, Google will certainly reap user-driven benefits from these CSEs in its own general search services.

My Pet Issues

So, in the spirit of continued improvement, I offer below my own observations and pet peeves with how the Google CSE service presently works. I know these points will not fall on deaf ears and perhaps other CSE authors may see some issues of their own importance in this listing.

  1. There is a bug in handling “dirty” URLs for results pages. Many standard CRMs or blog software, such as WordPress or Joomla!, provide options for both “pretty” URLs (SEO ones, that contain title names in the URL string, such as http://www.mydomain.com/2007/jan/most-recent-blog-post.html) v. “dirty” ones that label URLs with IDs or sequences with question marks (such as http://www.mydomain.com/?p=123). Often historical “dirty” URLs are difficult to easily convert to “pretty” ones. The Google CSE code unfortunately truncates the URL at the question mark when results are desired to be embedded in a local site using a “dirty” URL, which then causes the Javascript for results presentations to fail (see also this Joomla! link). As Ahmed, one of the Google CSE users points out, there is a relatively easy workaround for this bug, but you would pull your hair out if you did not know the trick.
  2. Results page font-size control is lacking. Though it is claimed that control is provided for this, it is apparently not possible to control results font sizes without resorting to the Google Ajax search API (see more below).
  3. There is a bug in applying filetype “refinements” to results, such as the advanced Google search operator filetype:pdf. Google Co-op staff acknowledge this as a bug and hopefully this will be corrected soon.
  4. Styling is limited to colors and borders and ad placement locations short of resorting to the Google Ajax search API, and the API itself still lacks documentation or tutorials on how to style results or interactions with the native Google CSS. Admittedly, this is likely a difficult issue for Google since too much control given to the user can undercut its own branding and image concerns. However, Google’s Terms of Service seem to be fairly comprehensive in such protections and it would be helpful to see this documentation soon. There is often reference to the Ajax search API by Google Co-op team members, but unfortunately too little useful online documentation to make this approach workable for mere mortals.
  5. It is vaguely stated that items called “attributes” can be included in CSE results and refinements (such as ‘A=Date’), but the direction is unclear and other forum comments seem to suggest this feature is not yet active. My own attempts show no issues in uploading CSE specifications that include attributes, but they are not yet retained in the actual specification currently used by Google. (Related to this topic is the fact that older forum postings may no longer be accurate as other improvements and bug fixes have been released.)
  6. Yes, there still remains a 5,000 “annotation” limit per CSE, which is the subject of complaint by some CSE authors. I personally have less concern with this limit now that the URL pattern matching has been added. Also, there is considerable confusion about what this “annotation” limit really means. In my own investigations, an “annotation” in fact is equivalent to a single harvest point URL (with or without wildcards) and up to four labels or facets (with or without weighting or comments) for each.
  7. While outside parties are attempting to provide general directory services, Google itself has a relatively poor way of announcing or listing new CSEs. The closest it comes is a posting page (http://groups-beta.google.com/group/google-co-op/web/your-custom-search) or the featured CSE engines (http://google.com/coop/cse/examples/GooglePicks), which are an impressive lot and filled with useful examples. Though there are a number of third parties trying to provide comprehensive directory listings, most have limited coverage:
  8. The best way to get a listing of current CSEs still appears to be using the Google site: query above matched with a topic description, though that approach is not browsable and does not link to CSEs hosted on external sites.

  9. I would like to see expanded support for additional input and export formats, including potentially OPML, microformats or Gdata itself. The current TSV and XML approaches are nice.

Yet, despite these quibbles, this CSE service is pointing the way to entirely new technology and business models. It is interesting that the Amazon S3 service and Yahoo!’s Developer Network are experimenting with similar Internet and Web service approaches. Let the fun begin!

Posted by AI3's author, Mike Bergman Posted on January 3, 2007 at 2:46 pm in Searching, Site-related | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/311/googles-custom-search-engine-cse-impressive-start-but-some-quibbles-remain/
The URI to trackback this post is: http://www.mkbergman.com/311/googles-custom-search-engine-cse-impressive-start-but-some-quibbles-remain/trackback/