Posted:April 20, 2006

A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.  The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.  They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.

Among other points, the authors note these key differences and challenges from conventional search engines:

  • Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
  • Search – the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple.  Path tracing and provenance of the information may also be important
  • Rank — results ranking needs to account for the contribution of the semi-structured data, and
  • Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.

The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.

Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.  In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.


1Tim Finin and Li Ding, "Search Engines for Semantic Web Knowledge," a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.  A PDF of the paper is available for download.

Posted by AI3's author, Mike Bergman Posted on April 20, 2006 at 2:42 pm in Searching, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 15, 2006

The W3C’s Internationalization Tag Set Working Group has published an updated Working Draft of the Internationalization Tag Set (ITS). Organized by data categories, this set of elements and attributes supports the internationalization and localization of schemas and documents. Implementations are provided for DTDs, XML Schema and Relax NG, and for existing vocabularies like XHTML, DocBook and OpenDocument.

Posted by AI3's author, Mike Bergman Posted on April 15, 2006 at 9:32 am in Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 10, 2006

On March 14, Tim Berners-Lee returned to Oxford University for a keynote address sponsored by the e-Horizons Institute in affiliation with the Oxford Internet Institute, the Oxford e-Research Centre and the School of Electronics and Computer Science of the University of Southhampton. Sponsorship for the presentation was provided by the British Computer Society.

The 100-min talk entitled, “The Future of the Web,” is available for online viewing or download via a number of different formats. After a slow start, TBL hits his stride and some of his slides (see this W3C listing) are especially good, particularly in the latter part of the presentation.

The major thrust of the talk is on the semantic Web, with attention to why adoption may be perceived as slow, with social and policy factors affecting that. Berners-Lee cogently recalls that the original WWW Web took about five years before it transitioned from geeks to commercial, and he predicts the same for the semantic Web. While it is true we now have the phenomenon of the Web coloring (or “colouring” depending on your semantics) expectations about the pace of adoption of the semantic Web, I thought this quote from the talk was the best by TBL in looking back to his original Web efforts in 1990:

It was really difficult to explain to people what the Web would be like before the Web. The fact it was so difficult to explain to people what the Web was like before the Web [existed] is now extremely difficult to explain to anybody after the Web.

In other words, like all broadly accepted breakthroughs, after acceptance it is hard to understand what life was like before them or why it was so amazing they were innovated and got adopted in the first place.

Check out this talk. It will re-instill perspective and give you a glimpse as to how constant efforts eventually produce results if the vision is compelling.

Jewels & Doubloons An AI3 Jewels & Doubloon Winner

Posted by AI3's author, Mike Bergman Posted on April 10, 2006 at 8:47 pm in Jewels & Doubloons, Semantic Web | Comments (1)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 3, 2006

One way to look at 40 sites trying to achieve Web 2.0 is that each site only contributes Web 0.05.

There’s alot of stuff going on with Web 2.0 in "social" computing, some with implications about my own primary interests in the semantic Web.  Indeed, though all of us can link to Wikipedia for definitions, I doubt other than first checking out that source that most of us would agree with what Wikipedia defines as Web 2.0 .  That’s OK.

Nonetheless, we can see there IS something going on in the nexus of new interoperable Web standards with collaboration and application frameworks specifically geared to shared experiences and information.  I think we can all agree that Web 2.0 is meant to achieve that, and that "social bookmarking" is one of the foundational facets of the phenomenon.

Like other things that take you in tangents while pursuing research stuff over a weekend, I’m actually not sure what got me trying to track down and understand "social bookmarking."  But tracking it down I did, and this post is the result of my cruising through the byways of Web 2.0 driving a "social bookmarks" roadster.

Quick Intro to Social Bookmarks

According to Wikipedia, a "social bookmark" is a:

. . . web based services where shared lists of user-created Internet bookmards are displayed. Social bookmarking sites generally organize their content using tags and are an increasingly popular way to locate, classify, rank, and share Internet resources . . . The concepts of social bookmarking and tagging took root with the launch of a web site called in approximately 2003.

Often, [social bookmark] lists are publicly accessible, although some social bookmarking systems allow for privacy on a bookmark by bookmark basis. They [may] also categorize their resources by the use of informally assigned, user-defined keywords or tags from a folksonomy). Most social bookmarking services allow users to search for bookmarks which are associated with given "tags", and rank the resources by the number of users which have bookmarked them.. . . . as people bookmark resources that they find useful, resources that are of more use are bookmarked by more users. Thus, such a system [can] "rank" a resource based on its perceived utility.

Since the classification and ranking of resources is a continuously evolving process, many social bookmarking services [may also] allow users to subscribe to syndication feeds or collections of tag terms. This allows subscribers to become aware of new resources for a given topic, as they are noted, tagged, and classified by other users. There are drawbacks to such tag-based systems as well: no standard set of keywords, no standard for the structure of such tags, mistagging, etc. . . . . The separate (but related) tagging and social bookmarking services are, however, evolving rapidly, and these shortcomings will likely either be addressed in the near future or shown not to be relevant to these services.

The idea of experts and interested individuals sharing their discoveries and passions is clearly compelling.  What has been most interesting in the development of "social bookmarking" software and services on the Web has been the assumptions underlying how those obejctives can best be achieved.

Of course, the most powerful concept underlying all of this stuff has been the ideal of "community."  We now face (have the opportunity) for electronic tribes and all that means in breaking former bounds of space and circumstance.  Truly, the prospect of finding effective means for the identification, assembly, consensus-building, and sharing within meaningful communities is breathtaking. 

Listing of Social Bookmarking Services

To get a handle on the state of the art, I began assembling a list of social bookmark and closely related services from various sources.  I’ve found about 40 of them, which may mean there are on order of 50 or so extant.  The icons and links below show these 40 or so sites, with a bit of explanation on each:

43 — 43Things — this site is geared for individuals to share activity lists, ambitions or "thngs to do" with one another.

Backflip — Backflip — this is a bookmark recollection and personal search space and directory. It has received a top 100 site from PC Magazine.

blinkbits — blinkbits — this is a social bookmarking site that has about 16,000 "blinks" or topic folders.

BlinkList — BlinkList –this site also allows bookmarks to be filtered by friends and collaborators.

Bloglines — Bloglines — beyond a simple social bookmark service, this site more importantly provides an RSS feeder and aggregator; owned by Ask Jeeves.

Blogmarks — Blogmarks — there is not much background info on this site; it is a somewhat better designed but offers typical social bookmarks services.

CiteULike – this site is geared toward academics and the sharing of paper references and links. Many references are to subscription papers. Generally, all submissions have an edited abstract and pretty accruate tags provided.

Connotea — Connotea — while the functionality of this stie is fairly standard for social bookmarking and activity is lower than some other sites, Connotea has a specific emphasis on technical, research, and academic topics that may make it more attractive to that audience. — — this site is the granddaddy of social bookmark services, plus tagging support, plus is the first to use a very innovative URL. Amongst all the sites herein, this one probably has the greatest activity and number of listings. — — this site is now being combined with

Digg — Digg — the Digg service is similar to others on this listing by providing social bookmarking, voting and popularity, and user control of listings, etc. It has received some buzz in the blog community.

Fark — Fark — while this site has aspects of social bookmarking, it is definiitely more inclined to be edgy and current.

Findory — Findory — geared toward news and blogs aggregation.

Flickr — Flickr — the largest and best known of the photo sharing and bookmarking sites; owned by Yahoo.

Furl — Furl — this site, part of LookSmart, has what you would expect from a bucks-backed site, but seems pretty vanilla with respect to social bookmarking capabilities.

Hyperlinkomatic — a beta service from the UK that has ceased accepting new users.

Jots — a small, and not notably distinguised, social bookmark site.

Kinja — Kinja — this is a blog bookmarking and aggregation service.

Linkroll — this is a relatively low-key service, modeled to a great extent on

Lookmarks –this is a social bookmarking service with tags, sharing, search and popular lists, with images and music/video sharing as well.

Ma.gnolia — Ma.gnolia –this service is a fairly standard social bookmarking site.

Maple –this is a fairly standard social bookmarking service, small with about 5,500 users, that uses Ruby on Rails.

Netvouz — Netvouz — this service is a fairly standard social bookmarking service that also provides tags.

Oyax — this is another fairly standard online bookmarks manager.

RawSugar — RawSugar –this site has most of the standard social bookmarking features, but differentiates by adding various user-defined directory structures.

Reddit — Reddit — the site has recently gotten some buzz due to a voting feature that moves topic rankings up or down based on user feedback; other aspects of the site are fairly vanilla.

Rojo — Rojo — this is a very broad RSS feed reader with hundreds of sources, to which you may add your own. It allows you to organize feeds by tags, share your feeds via an address book, and tracks and ranks what you view most often.  This site has been getting quite a bit of buzz.

Scuttle –Scuttle — this is a fairly standard social bookmarking site with low traffic.

Shadows — Shadows — this social bookmark site is attractively designed and adds a different wrinkle by letting any given topic or document to have its own community discussion page.

Shoutwire — Shoutwire — this site adds community feedback and collaboration to a "standard" RSS news feeder and aggregator.

Smarking — Smarking — this site is a fairly standard social bookmarking site.

Spurl — Spurl — this site is a fairly standard social bookmarking site.

Squidoo — Squidoo — this site is different from other social bookmarking services in that it lets you create a page on your topic of choice (called a lens) where you add links, text, pictures and other pieces of content. Each lens is tagged.

Start — an experimental Microsoft personalized home page service, powered by Ajax; capabilities and direction are still unclear.

TailRank — TailRank — this site allows about 50,000 blogs to be monitored in a fairly standard social bookmarking manner.

Unalog — Unalog — this is a fairly standard social bookmarking site.

Wink – this service is both a social bookmarker and a search engine to other online resources such as and digg.

Wists — Wists — this is a social bookmarking site geared to sharing shopping links and sites.

YahooMyWeb — Yahoo’s MyWeb — this is the personalized entry portal for Yahoo! including bookmarking and many specialty feeds and customization.

— Zurpy — this social bookmark service is in pre-launch phase.

General Observations

I personally participate in a couple of these services, notably Bloglines and Rojo.  Some of what I have discovered will compel me to try some others.

In testing out and assembling this list, however, I do have some general observations:

  • Most sites are repeats or knock-offs of the original  While some offer prettier presentation and images, functionality is pretty identical.  These are what I refer to as the "fairly vanilla" or standard sites above
  • Systems that combine bookmarking with tagging and directory presentations seem most useful (at least to me) for the long haul.  Also of interest are those sites that focus on narrower and more technical communities (e.g., Connotea, CiteULike).  
  • Virtually all sites had poor search capabilities, particularly in advanced search or operator support, and were not taking full advantage of the tagging structure in their listings
  • Development of directory and hierarchical structures is generally poor, with little useful depth or specificity.  This may improve as use grows, as it has in Wikipedia, but limits real expert use at present, and
  • Thus, paradoxically, while the sites and services themselves in their current implementation are very helpful for initial discovery, they are of little or no use for expert discovery or knowledge discovery

I suspect most of these limitations will be overcome over time, and perhaps very shortly at that.  Technology certainly does not appear to be the limiting factor, but rather the needs for scales of use and the network effect.

Can We Get to Web 2.0 by Adding Multiple 0.05s?

Another paradox is that while these sites help promote the concept of community, they seem to work to actually fragment communities.  There’s much competition at present for many of the same people trying to do the same social things and collaboration.  One way to look at 40 sites trying to achieve Web 2.0 is that each site only contributes Web 0.05.

Specific iinnovative communities on the Web such as biologists, physicists, librarians and the like will be some of the most successful for leveraging these technologies for community growth and sharing.  In other communities, certainly competition will winnow out only a few survivors.

The older, centrally imposed means for communities to determine "authoritativeness" — be it peer review, library purchasing decisions, societal recognition or reputation, publisher selection decisions, citation indexes, etc. — do not easily apply to the distributed, chaotic Internet.  What others in your community find of value, and thus choose to bookmark and share, is one promising mechanism to bring some semblance of authoritativeness to the medium.  Of course, for this truly to work, there must be trust and respect within the communities themselves.

I think we should see within the foreseeable future a standard set of functionalities — submitting, ranking, organizing, searching, commentating, collaborating, annotating, exporting, importing, and self-policing — that will allow these community sites to become go-to "infohubs" for their users.  These early social bookmarking services look to be the nucleates that will condense stronger and diverse communities of interest on the Web. Let the maturation begin.

Posted:April 1, 2006

I just came across a pretty neat site and service for creating vertical search engines of your choosing.  Called a ‘swicki’ the service and capabilitiy is provided by Eurekster, a company founded about two years ago around the idea of personalized and social search.  The ‘swicki’ implementation was first released in November 2005.


by Michael K Bergman from


This is a swicki – a search engine that learns from the search behavior of your community.
Get your own swicki from Eurekster for free!

NOTE: As you conduct searches using the form above, you will be taken from my blog to To return, simply use your browser back button.
What in Bloody Hell is a Swicki? 

According to the company:

Swickis are a new kind of search engine or search results aggregator. Swickis allow you to build specific searches tailored to your interests and that of your community and get constantly updated results from your web or blog page. Swickis scan all the data indexed in Yahoo Search, plus all additional sources you specify, and present the results in a dynamically updated, easy to use format that you can publish on your site – or use at We also collect and organize information about all public swickis in our Directory. Whether you have built a swicki or not, you can come to the swicki directory and find swicki search engines that interest you.

Swickis are like wikis in that they are collaborative. Not only does your swicki use Eurekster technology to weight searches based on the behavior of those who come to your site, in the future, your community – if you allow them – can actively collaborate to modify and focus the results of the search engine. . . . Every click refines the swicki’s search strings, creating a responsive, dynamic result that’s both customized and highly relevant.

A 10 Minute Set-up 

I first studied the set-up procedure and then gathered some information before I began my own swicki.  Overall the process was pretty straightforward and took me about 10 minutes.  You begin the process on the Eurekster swicki home page.

  • Step 1:  You begin by customizing how you want the swicki to look — wide or narrow, long or short, and font sizes and a choice of about twenty background and font color combinations. I thought these customization options were generally the most useful ones and the implementation pretty slick
  • Step 2:  You "train" your search (actually, just specify useful domains and URLs and excluded ones).  Importantly, you give the site some keywords or phrases to qualify final results accepted for the site.  One nice feature is to add or not blog content or the content of your existing web site
  •  Step 3:  You then provide a short description for the site and assign it to existing subject categories.  Code is generated at this last step that is simple to insert into your Web site or blog, with some further explanations for different blog environments.

You are then ready to post the site and make it available to collaborative feedback and refinement.  You can also choose to include ads on the site or look to other means to monetize it should it become popular.

If a public site, your swicki is then listed on the Eurekster directory; as of this posting, there were about 2,100 listed swickis (more in a next post on that).

For business or larger site complexes, there are also paid versions building from this core functionality.

SWISHER:  Giving it My Own Test Drive

I have been working in the background for some time on an organized subject portal and directory for this blog called SWISHer — for Semantic Web, Interoperability, Standards and HTML.  (Much more is to be provided on this project at a later time.)  Since it is intended to be an expert’s repository of all relevant Web documents, the SWISHer acronym is apparent.

One of the things that you can do with the Eurekster swicki is run a direct head-to-head comparison of results with Google.  That caused me to think that it would also be interesting when I release my own SWISHer site to compare it with the swicki and with Google.  Thus, the subject of my test swicki was clear.

Since I know the semantic Web reference space pretty well, I chose about 75 key starting URLs to use as the starting "training" set for the swicki.

This first version of SWISHer as a swicki site, with its now-embedded generated code, is thus what appears above.  In use it indicates links to about 400,000 results, though the search function is pretty weak and it is difficult to use some of my standard tricks to ascertain the actual number of  documents in the available index.

To see the swicki site in action, either go to, click on the SWISHer title, or enter your search in the form above and click search.

Now installed, I’m taking these capabilities for a longer road trip.  The test drive was fun; let’s see how it handles over rough terrain and covering real distances.  I’ll post impressions in a day or so.