Posted:April 20, 2006

A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.  The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.  They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.

Among other points, the authors note these key differences and challenges from conventional search engines:

  • Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
  • Search — the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple.  Path tracing and provenance of the information may also be important
  • Rank — results ranking needs to account for the contribution of the semi-structured data, and
  • Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.

The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.

Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.  In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.

 


1Tim Finin and Li Ding, "Search Engines for Semantic Web Knowledge," a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.  A PDF of the paper is available for download.

Posted by AI3's author, Mike Bergman Posted on April 20, 2006 at 2:42 pm in Searching, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/216/search-engine-challenges-posed-by-the-semantic-web/
The URI to trackback this post is: https://www.mkbergman.com/216/search-engine-challenges-posed-by-the-semantic-web/trackback/
Posted:April 15, 2006

The W3C’s Internationalization Tag Set Working Group has published an updated Working Draft of the Internationalization Tag Set (ITS). Organized by data categories, this set of elements and attributes supports the internationalization and localization of schemas and documents. Implementations are provided for DTDs, XML Schema and Relax NG, and for existing vocabularies like XHTML, DocBook and OpenDocument.

Posted by AI3's author, Mike Bergman Posted on April 15, 2006 at 9:32 am in Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/213/w3c-internationalization-tag-set/
The URI to trackback this post is: https://www.mkbergman.com/213/w3c-internationalization-tag-set/trackback/
Posted:April 10, 2006

On March 14, Tim Berners-Lee returned to Oxford University for a keynote address sponsored by the e-Horizons Institute in affiliation with the Oxford Internet Institute, the Oxford e-Research Centre and the School of Electronics and Computer Science of the University of Southhampton. Sponsorship for the presentation was provided by the British Computer Society.

The 100-min talk entitled, “The Future of the Web,” is available for online viewing or download via a number of different formats. After a slow start, TBL hits his stride and some of his slides (see this W3C listing) are especially good, particularly in the latter part of the presentation.

The major thrust of the talk is on the semantic Web, with attention to why adoption may be perceived as slow, with social and policy factors affecting that. Berners-Lee cogently recalls that the original WWW Web took about five years before it transitioned from geeks to commercial, and he predicts the same for the semantic Web. While it is true we now have the phenomenon of the Web coloring (or “colouring” depending on your semantics) expectations about the pace of adoption of the semantic Web, I thought this quote from the talk was the best by TBL in looking back to his original Web efforts in 1990:

It was really difficult to explain to people what the Web would be like before the Web. The fact it was so difficult to explain to people what the Web was like before the Web [existed] is now extremely difficult to explain to anybody after the Web.

In other words, like all broadly accepted breakthroughs, after acceptance it is hard to understand what life was like before them or why it was so amazing they were innovated and got adopted in the first place.

Check out this talk. It will re-instill perspective and give you a glimpse as to how constant efforts eventually produce results if the vision is compelling.

Jewels & Doubloons An AI3 Jewels & Doubloon Winner

Posted by AI3's author, Mike Bergman Posted on April 10, 2006 at 8:47 pm in Semantic Web | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/212/sir-tims-semantic-web-video-another-great-one/
The URI to trackback this post is: https://www.mkbergman.com/212/sir-tims-semantic-web-video-another-great-one/trackback/

I hate to admit it, but I’m sure months if not years after others have made the move I switched out my Mozilla 1.7x browser and email client for Firefox and Thunderbird today! It’s a step I’ve been contemplating for a while, but was occasioned by my interest in finally testing out the Piggy Bank semantic Web browser from MIT’s Simile project, which is only available as an add-in for Firefox.

I still wear tweed jackets and was probably within the last percentages of users to switch from Wordperfect to MS Word. I hate changing a working desktop environment, and resist learning new tricks about old dogs. Oh well. Now, on to Piggy Bank ….

Posted by AI3's author, Mike Bergman Posted on April 10, 2006 at 6:31 pm in Site-related | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/211/nobodys-business-but-my-own/
The URI to trackback this post is: https://www.mkbergman.com/211/nobodys-business-but-my-own/trackback/
Posted:April 4, 2006

Author's Note: An earlier blog series by me has now been turned into a PDF white paper under the auspices of BrightPlanet Corp The citation for this effort is:

M.K. Bergman, "Why Are $800 Billion in Document Assets Wasted Annually?” BrightPlanet Corporation White Paper, April 2006, 27 pp.

Download PDF file Click here to obtain a PDF copy of this full report (27 pp, 203 KB)

It is a tragedy of no small import when $800 billion in readily available savings from creating, using and sharing documents is wasted in the United States each year. How can waste of such magnitude occur right before our noses? And how can this waste occur so silently, so insidiously, and so ubiquitously that none of us can see it?

This free white paper attempts to address these questions. This report is the result of a series of posts in response to an earlier white paper I authored under BrightPlanet sponsorship entitled, Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents. [1]

This full report intetgrates information from earlier blog postings:

Public and enterprise expenditures to address the wasted document assets problem remain comparatively small, with growth in those expenditures flat in comparison to the rate of document production. This report attempts to bring attention and focus to the various ways that technology, people, and process can bring real document savings to our collective pocketbooks.


[1] Michael K. Bergman, "Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents," BrightPlanet Corporation White Paper, July 2005, 42 pp. The paper contains 80 references, 150 citations, and many data tables.

Posted by AI3's author, Mike Bergman Posted on April 4, 2006 at 10:29 am in Adaptive Information, Document Assets, Information Automation | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/197/full-report-why-are-800-billion-in-document-assets-wasted-annually/
The URI to trackback this post is: https://www.mkbergman.com/197/full-report-why-are-800-billion-in-document-assets-wasted-annually/trackback/