Posted:April 30, 2006

Despite page ranking and other techniques, the scale of the Internet is straining available commercial search engines to deliver truly relevant content.  This observation is not new, but its relevance is growing.  Similarly, the integration and interoperabillity challenges facing enterprises have never been greater.  One approach to address these needs, among others, is to adopt semantic Web standards and technologies.

The image is compelling:  targeted and unambiguous information from all relevant sources, served in usable bit-sized chunks.  It sounds great; why isn’t it happening?

There are clues — actually, reasons — why semantic Web technology is not being embraced on a broad-scale way.  I have spoken elsewhere as to why enterprises or specific organizations will be the initial adopters and promoters of these technologies.  I still believe that to be the case.  The complexity and lack of a network effect ensure that semantic Web stuff will not initially arise from the public Internet.

Parellels with Knowledge Management

Paul Warren, in  “Knowledge Management and the Semantic Web: From Scenario to Technology,” IEEE Intelligent Systems, vol. 21, no. 1, 2006, pp. 53-59, has provided a structured framework for why these assertions make sense.  This February online article is essential reading for anyone interested in semantic Web issues (and has a listing of fairly classic references).

If you can get past the first silly paragraphs regarding Sally the political scientist and her research example (perhaps in a separate post I will provide better real-world examples from open source intelligence, or OSINT), Warren actually begins to dissect the real issues and challenges in effecting the semantic Web.  It is this latter two-thirds or so of Warren’s piece that is essential reading.

He does not organize his piece in the manner listed below, but real clues emerge in the repeated pointing to the need for “semi-automatic” methods to make the semantic Web a reality.  Fully a dozen such references are provided.  Relatedly, in second place, are multiple references to the need or value of “reasoning algorithms.”  In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:

  • Assign authoritativemenss
  • Learn ontologies
  • Infer better search requests
  • Mediate ontologies (semantic resolution)
  • Support visualization
  • Assign collaborations
  • Infer relationships
  • Extract entities
  • Create ontologies
  • Maintain and evolve ontologies
  • Create taxonomies
  • Infer trust
  • Analyze links
  • etc.

These challenges are not listed in relevance, but as encountered in reading the Warren piece.  Tagging, extracting, classifying and organizing all are pretty intense tasks that certainly can not be done solely manually while still scaling.

Keep It Simple, Stupid

The lack of “simple” approaches is posited as another reason for slow adoption of the semantic Web.  In the article “Spread the word, and join it up,” in the April 6 Guardian, SA Matheson reports Tim O’Reilly as saying:

“I completely believe in the long-term vision of the semantic web – that we’re moving towards a web of data, and sophisticated applications that manipulate and navigate that data web.  However, I don’t believe that the W3C semantic web activity is what’s going to take us there….It always seemed a bit ironic to me that Berners-Lee, who overthrew many of the most cherished tenets of both hypertext theory and SGML with his ‘less is more
and worse is better’ implementation of ideas from both in the world wide web, has been deeply enmeshed in a theoretical exercise rather than just celebrating the bottom-up activity that will ultimately result in the semantic web…..It’s still too early to formalise the mechanisms for the semantic web. We’re going
to learn by doing, and make small, incremental steps, rather than a great leap forward.”

There is certainly much need for simplicity to encourage voluntary compliance with semantic Web potentials, short of crossing the realized rewards of broad benefits from the semantic Web and network effects. However, simplicity and broad use are but two of the factors limiting adoption, some of the others including incentives, self-interest and rewards.

As Warren points out in his piece:

Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. In addition, the volume of information in this scenario is much greater than in the services scenario. So, it’s unlikely that manual annotation of information will occur to the extent required to make this scenario work. We need techniques for reducing the load on the knowledge creator.

Somehow we keep coming back to the tools and automated ways to ease the effort and workflow necessary to put in place all of this semantic Web infrastructure. These aids are no doubt important — perhaps critical — but in my mind still short changes the most determinant dynamic of semantic Web technology adoption: the imperatives of the loosely-federated, peer-to-peer broader Web v. enterprise adoption.

Oligarchical (Enterprise) Control Preceeds the Network Effect

There are some analogies between service-oriented architectures and their associated standards, and the standards contemplated for the semantic Web.  Both are rigorous, prescribed, and meant to be intellectually and functionally complete.  (In fact, most of the WS** standards are specific SOA ones for the semantic Web.)  The past week has seen some very interesting posts on the tensions between “SOA Versus Web 2.0?, triggered by John Hagel’s post:

. . . a cultural chasm separates these two technology communities, despite the fact that they both rely heavily on the same foundational standard – XML. The evangelists for SOA tend to dismiss Web 2.0 technologies as light-weight “toys” not suitable for the “real” work of enterprises.  The champions of Web 2.0 technologies, on the other hand, make fun of the “bloated” standards and architectural drawings generated by enterprise architects, skeptically asking whether SOAs will ever do real work. This cultural gap is highly dysfunctional and IMHO precludes extraordinary opportunities to harness the potential of these two complementary technology sets.

This theme was picked up by Dion Hinchcliffe, among others.  Dion consistently posts on this topic in his ZDNet Enterprise Web 2.0 and Web 2.0 blogs, and is always a thoughtful read.   In his response to Hagel’s post, Hinchcliffe notes “… these two cultures are generally failing to cross-pollinate like they should, despite potentially ‘extraordinary opportunities.’.”

Supposedly, kitchen and garage coders playing around with cool mashups while surfing and blogging and posting pictures to Flickr are seen as a different “culture” than supposedly buttoned-down IT geeks (even if they wear T-shirts or knit shirts).  But, in my experience, these differences have more to do with the claim on time than the fact we are talking about different tribes of people.  From a development standpoint, we’re talking about the same people, with the real distinction being whether they are on payroll time or personal time.

I like the graphic that Hinchcliffe offers where he is talking about the SaaS model in the enterprise and the fact it may be the emerging form.  You can take this graphic and say the left-hand side of the diagram is corporate time, the right-hand side personal time.

Web 2.0 Enterprise Directions

I make this distinction because where systems may go is perhaps more useful to look at in terms of imperatives and opportunities v. some form of “culture” clash.  In the broad Web, there is no control other than broadly-accepted standards, there is no hegemony, there is only what draws attention and can be implemented in a decentralized way.  This impels simpler standards, and simpler “loosely-coupled” integrations.  We thus see mashups and simpler Web 2.0 sites like social bookmarking.   The  drivers are not “complete” solutions to knowledge creation and sharing, but what is fun, cool and gets buzz.

The corporate, or enterprise side, on the other hand, has a different set of imperatives and, as importantly, a different set of control mechanisms to set higher and more constraining standards to meets those imperatives.   SOA and true semantic Web standards like RDF-S or OWL can be imposed, because the sponsor can either require it or pay for it.  Of course, this oligarchic control still does not ensure adherence, just as IT departments were not able to prevent PC adoption 20 years ago, so it is important that productivity tools, workflows and employee incentives also be aligned with the desired outcomes.

So, what we are likely to see, indeed are seeing now, is that more innnovation and experimentation in “looser” ways will take place in Web 2.0 by lots of folks, many on them in their personal time away from the office.  Enterprises, on the other hand, will take the near-term lead on more rigorous and semantically-demanding integration and interoperability using semantic Web standards.

Working Both Ends to the Middle

I guess, then, this puts me squarely in the optimists camp where I normally reside.  (I also come squarely from an enterprise perspective since that is where my company resides.)   I see innovation at an unprecedented level with Web 2.0, mashups and participatory media, matched with effort and focus by leading enterprises to climb the data federation pyramid while dealing with very real and intellectually challenging semantic mediation.  Both ends of this spectrum are right, both will instruct, and therefore both should be monitored closely.

Warren gets it right when he points to prior knowledge management challenges as also informing the adoption challenges for the semantic Web in enterprises:

Currently, the main obstacle for introducing ontology-based knowledge management applications into commercial environments is the effort needed for ontology modeling and metadata creation. Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area….Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge…. I believe there will be deep semantic interoperability within organizational intranets. This is already the focus of practical implementations, such as the SEKT (Semantically Enabled Knowledge Technologies) project,
and across interworking organizations, such as supply chain consortia. In the global Web, semantic interoperability will be more limited.

My suspicion is that Web 2.0 is the sandbox where the tools, interfaces and approaches will emerge that help overcome these enterprise obstacles.  But we will still look strongly to enterprises for much of the money and the W3C for the standards necessary to make it all happen within semantic Web imperatives.

Posted by AI3's author, Mike Bergman Posted on April 30, 2006 at 7:14 pm in Information Automation, Semantic Web | Comments (1)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 21, 2006

I recently posted up a listing and description of 40 social bookmarking sites. Little did I realize what a small tip of the iceberg I was describing!

In commentary to that post, I was directed to Baris Karadogan’s posting of Web 2.0 Companies, which contains a fantastic compilation of 980 specific sites with links. I began reviewing user comments and discovered, possibly, that the original compiler of this list is Bob Stumpel of the Everything 2.0 blog. It is hard to provenance the origination of the list, but Bob is active in assembling long lists of updates. Boris’ is more attractive with embedded live links.

Bob has updated the master list a number of times, and now shows 1,601 sites as of April 16.

Both of these reference lists have organized the sites into about 70 different categories, a sampling of which shows the diversity and innovation taking place:

  • Audio and video
  • Social bookmarking
  • Venture capital
  • Wish lists
  • Search (biggest category)
  • Images
  • Collaboration
  • Fun and games
  • Etc.

The next useful step for some enterprising soul is to provide more commentary on each site and better describe what is meant by each category. Perhaps someone will step forward with a wiki somewhere.

Likely few of us have the time to look at all of the sites listed. But I am slowly sampling my way through the list, checking out the variety of approaches being taken by clever innovators out there. While some of the links are dead — not unexpected in such a nascent area or with such a long list — I’m also seeing alot of clever ideas.

This listing is a useful service. If you know of a missing site, please suggest it to one of the two compilation sites. And, I do hope someone takes authoritative ownership of the list and proper attribution is given where appropriate.

Posted by AI3's author, Mike Bergman Posted on April 21, 2006 at 10:40 am in Semantic Web, Semantic Web Tools | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 20, 2006

A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.  The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.  They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.

Among other points, the authors note these key differences and challenges from conventional search engines:

  • Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
  • Search – the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple.  Path tracing and provenance of the information may also be important
  • Rank — results ranking needs to account for the contribution of the semi-structured data, and
  • Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.

The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.

Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.  In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.


1Tim Finin and Li Ding, "Search Engines for Semantic Web Knowledge," a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.  A PDF of the paper is available for download.

Posted by AI3's author, Mike Bergman Posted on April 20, 2006 at 2:42 pm in Searching, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 15, 2006

The W3C’s Internationalization Tag Set Working Group has published an updated Working Draft of the Internationalization Tag Set (ITS). Organized by data categories, this set of elements and attributes supports the internationalization and localization of schemas and documents. Implementations are provided for DTDs, XML Schema and Relax NG, and for existing vocabularies like XHTML, DocBook and OpenDocument.

Posted by AI3's author, Mike Bergman Posted on April 15, 2006 at 9:32 am in Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 10, 2006

On March 14, Tim Berners-Lee returned to Oxford University for a keynote address sponsored by the e-Horizons Institute in affiliation with the Oxford Internet Institute, the Oxford e-Research Centre and the School of Electronics and Computer Science of the University of Southhampton. Sponsorship for the presentation was provided by the British Computer Society.

The 100-min talk entitled, “The Future of the Web,” is available for online viewing or download via a number of different formats. After a slow start, TBL hits his stride and some of his slides (see this W3C listing) are especially good, particularly in the latter part of the presentation.

The major thrust of the talk is on the semantic Web, with attention to why adoption may be perceived as slow, with social and policy factors affecting that. Berners-Lee cogently recalls that the original WWW Web took about five years before it transitioned from geeks to commercial, and he predicts the same for the semantic Web. While it is true we now have the phenomenon of the Web coloring (or “colouring” depending on your semantics) expectations about the pace of adoption of the semantic Web, I thought this quote from the talk was the best by TBL in looking back to his original Web efforts in 1990:

It was really difficult to explain to people what the Web would be like before the Web. The fact it was so difficult to explain to people what the Web was like before the Web [existed] is now extremely difficult to explain to anybody after the Web.

In other words, like all broadly accepted breakthroughs, after acceptance it is hard to understand what life was like before them or why it was so amazing they were innovated and got adopted in the first place.

Check out this talk. It will re-instill perspective and give you a glimpse as to how constant efforts eventually produce results if the vision is compelling.

Jewels & Doubloons An AI3 Jewels & Doubloon Winner

Posted by AI3's author, Mike Bergman Posted on April 10, 2006 at 8:47 pm in Jewels & Doubloons, Semantic Web | Comments (1)
The URI link reference to this post is:
The URI to trackback this post is: