Posted:April 30, 2006

Despite page ranking and other techniques, the scale of the Internet is straining available commercial search engines to deliver truly relevant content.  This observation is not new, but its relevance is growing.  Similarly, the integration and interoperabillity challenges facing enterprises have never been greater.  One approach to address these needs, among others, is to adopt semantic Web standards and technologies.

The image is compelling:  targeted and unambiguous information from all relevant sources, served in usable bit-sized chunks.  It sounds great; why isn’t it happening?

There are clues — actually, reasons — why semantic Web technology is not being embraced on a broad-scale way.  I have spoken elsewhere as to why enterprises or specific organizations will be the initial adopters and promoters of these technologies.  I still believe that to be the case.  The complexity and lack of a network effect ensure that semantic Web stuff will not initially arise from the public Internet.

Parellels with Knowledge Management

Paul Warren, in  “Knowledge Management and the Semantic Web: From Scenario to Technology,” IEEE Intelligent Systems, vol. 21, no. 1, 2006, pp. 53-59, has provided a structured framework for why these assertions make sense.  This February online article is essential reading for anyone interested in semantic Web issues (and has a listing of fairly classic references).

If you can get past the first silly paragraphs regarding Sally the political scientist and her research example (perhaps in a separate post I will provide better real-world examples from open source intelligence, or OSINT), Warren actually begins to dissect the real issues and challenges in effecting the semantic Web.  It is this latter two-thirds or so of Warren’s piece that is essential reading.

He does not organize his piece in the manner listed below, but real clues emerge in the repeated pointing to the need for “semi-automatic” methods to make the semantic Web a reality.  Fully a dozen such references are provided.  Relatedly, in second place, are multiple references to the need or value of “reasoning algorithms.”  In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:

  • Assign authoritativemenss
  • Learn ontologies
  • Infer better search requests
  • Mediate ontologies (semantic resolution)
  • Support visualization
  • Assign collaborations
  • Infer relationships
  • Extract entities
  • Create ontologies
  • Maintain and evolve ontologies
  • Create taxonomies
  • Infer trust
  • Analyze links
  • etc.

These challenges are not listed in relevance, but as encountered in reading the Warren piece.  Tagging, extracting, classifying and organizing all are pretty intense tasks that certainly can not be done solely manually while still scaling.

Keep It Simple, Stupid

The lack of “simple” approaches is posited as another reason for slow adoption of the semantic Web.  In the article “Spread the word, and join it up,” in the April 6 Guardian, SA Matheson reports Tim O’Reilly as saying:

“I completely believe in the long-term vision of the semantic web – that we’re moving towards a web of data, and sophisticated applications that manipulate and navigate that data web.  However, I don’t believe that the W3C semantic web activity is what’s going to take us there….It always seemed a bit ironic to me that Berners-Lee, who overthrew many of the most cherished tenets of both hypertext theory and SGML with his ‘less is more
and worse is better’ implementation of ideas from both in the world wide web, has been deeply enmeshed in a theoretical exercise rather than just celebrating the bottom-up activity that will ultimately result in the semantic web…..It’s still too early to formalise the mechanisms for the semantic web. We’re going
to learn by doing, and make small, incremental steps, rather than a great leap forward.”

There is certainly much need for simplicity to encourage voluntary compliance with semantic Web potentials, short of crossing the realized rewards of broad benefits from the semantic Web and network effects. However, simplicity and broad use are but two of the factors limiting adoption, some of the others including incentives, self-interest and rewards.

As Warren points out in his piece:

Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. In addition, the volume of information in this scenario is much greater than in the services scenario. So, it’s unlikely that manual annotation of information will occur to the extent required to make this scenario work. We need techniques for reducing the load on the knowledge creator.

Somehow we keep coming back to the tools and automated ways to ease the effort and workflow necessary to put in place all of this semantic Web infrastructure. These aids are no doubt important — perhaps critical — but in my mind still short changes the most determinant dynamic of semantic Web technology adoption: the imperatives of the loosely-federated, peer-to-peer broader Web v. enterprise adoption.

Oligarchical (Enterprise) Control Preceeds the Network Effect

There are some analogies between service-oriented architectures and their associated standards, and the standards contemplated for the semantic Web.  Both are rigorous, prescribed, and meant to be intellectually and functionally complete.  (In fact, most of the WS** standards are specific SOA ones for the semantic Web.)  The past week has seen some very interesting posts on the tensions between “SOA Versus Web 2.0?, triggered by John Hagel’s post:

. . . a cultural chasm separates these two technology communities, despite the fact that they both rely heavily on the same foundational standard – XML. The evangelists for SOA tend to dismiss Web 2.0 technologies as light-weight “toys” not suitable for the “real” work of enterprises.  The champions of Web 2.0 technologies, on the other hand, make fun of the “bloated” standards and architectural drawings generated by enterprise architects, skeptically asking whether SOAs will ever do real work. This cultural gap is highly dysfunctional and IMHO precludes extraordinary opportunities to harness the potential of these two complementary technology sets.

This theme was picked up by Dion Hinchcliffe, among others.  Dion consistently posts on this topic in his ZDNet Enterprise Web 2.0 and Web 2.0 blogs, and is always a thoughtful read.   In his response to Hagel’s post, Hinchcliffe notes “… these two cultures are generally failing to cross-pollinate like they should, despite potentially ‘extraordinary opportunities.’.”

Supposedly, kitchen and garage coders playing around with cool mashups while surfing and blogging and posting pictures to Flickr are seen as a different “culture” than supposedly buttoned-down IT geeks (even if they wear T-shirts or knit shirts).  But, in my experience, these differences have more to do with the claim on time than the fact we are talking about different tribes of people.  From a development standpoint, we’re talking about the same people, with the real distinction being whether they are on payroll time or personal time.

I like the graphic that Hinchcliffe offers where he is talking about the SaaS model in the enterprise and the fact it may be the emerging form.  You can take this graphic and say the left-hand side of the diagram is corporate time, the right-hand side personal time.

Web 2.0 Enterprise Directions

I make this distinction because where systems may go is perhaps more useful to look at in terms of imperatives and opportunities v. some form of “culture” clash.  In the broad Web, there is no control other than broadly-accepted standards, there is no hegemony, there is only what draws attention and can be implemented in a decentralized way.  This impels simpler standards, and simpler “loosely-coupled” integrations.  We thus see mashups and simpler Web 2.0 sites like social bookmarking.   The  drivers are not “complete” solutions to knowledge creation and sharing, but what is fun, cool and gets buzz.

The corporate, or enterprise side, on the other hand, has a different set of imperatives and, as importantly, a different set of control mechanisms to set higher and more constraining standards to meets those imperatives.   SOA and true semantic Web standards like RDF-S or OWL can be imposed, because the sponsor can either require it or pay for it.  Of course, this oligarchic control still does not ensure adherence, just as IT departments were not able to prevent PC adoption 20 years ago, so it is important that productivity tools, workflows and employee incentives also be aligned with the desired outcomes.

So, what we are likely to see, indeed are seeing now, is that more innnovation and experimentation in “looser” ways will take place in Web 2.0 by lots of folks, many on them in their personal time away from the office.  Enterprises, on the other hand, will take the near-term lead on more rigorous and semantically-demanding integration and interoperability using semantic Web standards.

Working Both Ends to the Middle

I guess, then, this puts me squarely in the optimists camp where I normally reside.  (I also come squarely from an enterprise perspective since that is where my company resides.)   I see innovation at an unprecedented level with Web 2.0, mashups and participatory media, matched with effort and focus by leading enterprises to climb the data federation pyramid while dealing with very real and intellectually challenging semantic mediation.  Both ends of this spectrum are right, both will instruct, and therefore both should be monitored closely.

Warren gets it right when he points to prior knowledge management challenges as also informing the adoption challenges for the semantic Web in enterprises:

Currently, the main obstacle for introducing ontology-based knowledge management applications into commercial environments is the effort needed for ontology modeling and metadata creation. Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area….Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge…. I believe there will be deep semantic interoperability within organizational intranets. This is already the focus of practical implementations, such as the SEKT (Semantically Enabled Knowledge Technologies) project,
and across interworking organizations, such as supply chain consortia. In the global Web, semantic interoperability will be more limited.

My suspicion is that Web 2.0 is the sandbox where the tools, interfaces and approaches will emerge that help overcome these enterprise obstacles.  But we will still look strongly to enterprises for much of the money and the W3C for the standards necessary to make it all happen within semantic Web imperatives.

Posted by AI3's author, Mike Bergman Posted on April 30, 2006 at 7:14 pm in Information Automation, Semantic Web | Comments (1)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 29, 2006

In a recent posting from the Signal online magazine, Robert Ackerman provides a fascinating overview of the mission and challenges of the new Open Source Center at the Office of the Director of National Intelligence.  Signal is published by AFCEA, the Armed Forces Communications and Electronics Association.

Posted by AI3's author, Mike Bergman Posted on April 29, 2006 at 10:31 am in OSINT (open source intel) | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 25, 2006

When I worked at the American Public Power Association in the 1980s, one of the most coveted technical awards to a member was the "Seven Hats Award."  All of our APPA members were municipal electric systems, most of which were in small towns or counties.  The Seven Hats Award was granted to the municipal electric director who was deemed to have done the best job juggling other municipal responsibilities.  Other hats this person might wear were manager of the sewer, water, phone, cable, local swimming pool, parks & recreation, cemetery, you name it.  A Seven Hats Award winner was truly the jack of all trades in his local community (I’m not being sexist; all winners in my era were men).

Well, now I’m in a venture-backed company, and there are also many hats.  Here are my own hats, confusing as they are:

  • Board member and chairman
  • Management
  • Founder
  • Investor (multiple series)
  • Licensor (of starting technology)

Some of these roles have legal and fiduciary responsibilities, others represent self interest.  In fact, of course, it is not unusual for starting companies that are not publicly held to have many individuals with such multiple roles.

Management and communications in such environments is a tricky matter.  I can say, and often do, "I’m now wearing this hat," and I expect the audience to understand this role and the differences in viewpoint that "hat" has brought.  But has it really?  And did the listener really understand the different "hat" nuance?

Well, no, and, of course not.  And that raises the rub.

Small companies can not automatically "gin" new perspectives and new people to represent the multitude of viewpoints from thin air.  There is NOT a single small company start-up that exists that does not have key individuals playing multiple roles and wearing multiple hats.  Frankly, on the face of it, there is nothing inherently wrong with this situation and there is likely nothing that can be done to change it for small companies anyway.

Yet, the diversity of interests still remains and is ALSO unavoidable. So, in my experience, saying we are now wearing a different hat is not sufficient alone to set expectations and trust.

‘Multiple hats’ does not combine well with sentiments such as "my word is my bond."   As start-ups grow up, what is written in agreements becomes the touchpoints.  Handshakes are great, a signal of initial trust, a hopeful way that points toward success, but everything in the end must be codified and enforceable.

So, start-up denizens let me give you a piece of advice:  Companies are not people.  They are legal entiities that represent a codified way of representing the rights and ownership of various interests.  It’s great to have friends, family, or whatever that you want involved with your venture.  But for your sake and theirs, make sure the understood relationship going in is not dancing ’round the Maypole, but the legal contract.  That is the only way to work out expectations and relations in advance, and the clearest way to not worry about what hat you’re wearing as success and complexity come to the fore.

Posted by AI3's author, Mike Bergman Posted on April 25, 2006 at 10:14 pm in Software and Venture Capital | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 21, 2006

I recently posted up a listing and description of 40 social bookmarking sites. Little did I realize what a small tip of the iceberg I was describing!

In commentary to that post, I was directed to Baris Karadogan’s posting of Web 2.0 Companies, which contains a fantastic compilation of 980 specific sites with links. I began reviewing user comments and discovered, possibly, that the original compiler of this list is Bob Stumpel of the Everything 2.0 blog. It is hard to provenance the origination of the list, but Bob is active in assembling long lists of updates. Boris’ is more attractive with embedded live links.

Bob has updated the master list a number of times, and now shows 1,601 sites as of April 16.

Both of these reference lists have organized the sites into about 70 different categories, a sampling of which shows the diversity and innovation taking place:

  • Audio and video
  • Social bookmarking
  • Venture capital
  • Wish lists
  • Search (biggest category)
  • Images
  • Collaboration
  • Fun and games
  • Etc.

The next useful step for some enterprising soul is to provide more commentary on each site and better describe what is meant by each category. Perhaps someone will step forward with a wiki somewhere.

Likely few of us have the time to look at all of the sites listed. But I am slowly sampling my way through the list, checking out the variety of approaches being taken by clever innovators out there. While some of the links are dead — not unexpected in such a nascent area or with such a long list — I’m also seeing alot of clever ideas.

This listing is a useful service. If you know of a missing site, please suggest it to one of the two compilation sites. And, I do hope someone takes authoritative ownership of the list and proper attribution is given where appropriate.

Posted by AI3's author, Mike Bergman Posted on April 21, 2006 at 10:40 am in Semantic Web, Semantic Web Tools | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 20, 2006

A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.  The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.  They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.

Among other points, the authors note these key differences and challenges from conventional search engines:

  • Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
  • Search – the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple.  Path tracing and provenance of the information may also be important
  • Rank — results ranking needs to account for the contribution of the semi-structured data, and
  • Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.

The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.

Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.  In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.


1Tim Finin and Li Ding, "Search Engines for Semantic Web Knowledge," a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.  A PDF of the paper is available for download.

Posted by AI3's author, Mike Bergman Posted on April 20, 2006 at 2:42 pm in Searching, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: