Posted:October 3, 2005

A recent column (Sept. 22) by David Wessel in the Wall Street Journal argues that “Better Information Isn’t Always Beneficial.” His major arguments can be summarized as follows:

  1. Having more information available is generally good
  2. Having some information available is clearly bad (to terrorists, privacy violations)
  3. However, other information is also bad because it may advance the private (profit) interest but not that of society, and
  4. Computers are worsening Argument #3 by reducing the cost of processing information.

Wessel claims that computers are removing limits to information processing that will force society to wrestle with practical issues of inequities that seemed only theoretical a generation ago. Though this article is certainly thought provoking, and therefore of value, it is wrong on epistemological, logical, and real-world grounds.

Epistemology

All of us at times confuse data or content with the concept of information when we describe current circumstances with terms such as “information overload” or “infoglut.” This confusion often extends to the economics literature in how it deals with the value of “information.” Most researchers or analysts in knowledge management acknowledge this hierarchy of value in the knowledge chain:

data (or content) » information » knowledge (actionable)

This progression also represents a narrowing flow or ‘staging’ of volume. The amount of total data always exceeds information; only a portion of available information is useful for knowledge or action.

Rather than provide “definitions” of these terms, which are not universally agreed, let’s use the example of searching on Google to illustrate these concepts:

  • Data — the literally billions of documents contained within Google’s search index
  • Information — subsets of this data appropriate to the need or topic at hand. While this sounds straightforward, depending on how the user queries and its precision, the “information” returned from a search may have much lower or higher percentages of useful information value, as well as a great range of total possible results
  • Knowledge — Google obviously does not provide knowledge per se, but, depending on user review of the information from more-or-less precise search queries and information duplication or not, knowledge may come about through inspection and learning of this information.

The concept of staging and processing is highly useful here. For example, in the context of a purposeful document repository, initial searches to Google and other content aggregation sites — even with a query or topic basis — could act to populate that repository with data, which would then need to be mined further for useful information and then evaluated for supplying knowledge. Computers always act upon data, whether global in a Google case or local in a local repository case, and whether useful information is produced or not.

Wessel and indeed most economists co-mingle all three terms in their arguments and logic. By missing the key distinctions, fuzzy thinking can result.

A Philosophical or Political Polemic?

First, I will not take issue with Wessel’s first two arguments above. Rather, I’d like to look at the question of Argument #3 that some information is “bad” because it delivers private vs. societal value. His two economist references in the piece are to Arrow and Hirshleifer. As Wessel cites Hirshleifer:

“The contrast between the private profitability and the social uselessness of foreknowledge may seem surprising,” the late economist Jack Hirshleifer wrote in 1971. But there are instances, he argued, where “the community as a whole obtains no benefit … from either the acquisition or the dissemination (by resale or otherwise) of private foreknowledge.”

Yet Hirshleifer had a very specific meaning of “private foreknowledge,” likely not in keeping with the Wessel arguments. The Hirshleifer[1] reference deals entirely with speculative investments and the “awareness” or not (knowledge; perfect information) of differing economic players. According to the academic reviewer Morrison[2]:

In Hirshleifer’s terms, ‘private foreknowledge’ is information used to identify pricing errors after resource allocation is fixed. Because it results in a pure wealth transfer but is costly to produce, it reduces social surplus. . . . As opposed to private foreknowledge, ‘discovery information’ is produced prior to the time resource allocation is fixed, and because it positively affects resource allocation it generally increases social surplus. But even discovery information can be overproduced because optimal expenditures on discovery information will inevitably be subject to pricing errors that can be exploited by those who gather superior information. In cases of both fixed and variable resource allocation, then, excess search has the potential to occur, and private parties will adopt institutional arrangements to avoid the associated losses.

Hmmm. What? Is this actually in keeping with the Wessel arguments?

Wessel poses a number of examples where he maintains the disconnect between private gain and societal benefit occurs. The examples he cites are:

  • Assessing judges as to how they might rule on patent infringement cases
  • Screening software for use in jury selections
  • Demographic and voting information for gerrymandering U.S. congressional districts
  • Weather insurance for crops production.

These examples are what Wessel calls “the sort of information that Nobel laureate Kenneth Arrow labeled ‘socially useless but privately valuable.’ It doesn’t help the economy produce more goods or services. It creates nothing of beauty or pleasure. It simply helps someone get a bigger slice of the pie.”

According to Oldrich Kyn, an economics professor emeritus from Boston University, Joseph Stiglitz, another Nobel laureate, took exception to Arrow’s thesis regarding information in the areas of market socialism and neoclassical economics as shown by these Stiglitz quote excerpts:

The idea of market socialism has had a strong influence over economists: it seemed to hold open the possibility that one could attain the virtues of the market system–economic efficiency (Pareto optimality)–without the seeming vices that were seen to arise from private property.

The fundamental problem with [the Arrow–Debrue model] is that it fails to take into account . . .  the absence of perfect information–and the costs of information–as well as the absence of certain key risk markets . . .

The view of economics encapsulated in the Arrow–Debreu framework . . . is what I call ‘engineering economics’ . . .  economics consisted of solving maximization problems . . . The central point is that in that model there is not a flow of new information into the economy, so that the question of the efficiency with which the new information is processed–or the incentives that individuals have for acquiring information–is never assessed. . .  the fundamental theorems of welfare economics have absolutely nothing to say about . . .  whether the expenditures on information acquisition and dissemination– is, in any sense, efficient.

Stiglitz in his own online autobiography states: “The standard competitive market equilibrium model had failed to recognize the complexity of the information problem facing the economy – just as the socialists had. Their view of decentralization was similarly oversimplified.” Grossman and Stiglitz[3] more broadly observe “that perfectly informative financial markets are impossible and . . .  the informativeness of prices is inversely related to the cost of information.”

I am no economist, but reading the original papers suggests to me a narrower and more theoretical focus than what is claimed in Wessel’s arguments. Indeed, the role of “information” is both central to and nuanced within current economic theory, the understanding of which has progressed tremendously in the thirty years since Wessel’s original citations. By framing the question of private (profit) versus societal good, Wessel invokes an argument based on political philosophy and one seemingly “endorsed” by Arrow as a Nobel laureate. Yet as Eli Rabett commented on the Knowledge Crumb’s Web site, “[the Wessel thesis] is a communitarian argument which has sent Ayn Rand, Alan Greenspan, Newt Gingrich and Grover Norquist to spinning in their graves.”

Logical Fallacies

Even if these philosophical differences could be reconciled, there are other logical fallacies in the Wessel piece.

In the case of assessing the performance of patent judges by crunching information that can now be sold cost-effectively to all participants, Wessel asks, “But does it increase the chances that the judge will come to a just decision?” The logical fallacies here are manifest:

  • Is the only societal benefit one of having the judge come to a just decision or, also potentially, society learning about judicial prejudices singly or collectively or setting new standards in evaluating or confirming judicial candidates?
  • No new information has been created by the computer. Rich litigants could have earlier gone through expensive evaluations. Doesn’t cost-effective information democratize this information?
  • Is not broad information availability an example of desired transparency as cited by Knowledge Crumbs?

Wessel raises another case of farmers now possibly being able to buy accurate weather forecasts. But he posits a resulting case where the total amount of food available is unchanged and insurance would no longer be necessary. Yet, as Mark Bahner points out, this has the logical fallacies of:

  • The amount of food available would NOT be “unchanged” if farmers knew for certain what the weather was going to be. Social and private benefits would also accrue from, for example, applying fertilizers when needed without wasteful runoffs
  • Weather knowledge would firstly never be certain and other uncertainties (pests, global factors, etc.) would also exist. Farmers understand uncertainty and would continue to hedge through futures or other forms of insurance or risk management.

The real logical fallacies relate to the assumption of perfect information and complete reduction of uncertainty. No matter how much data, or how fast computers, these factors will never be fully resolved.

Practical Role of the Computer

Wessel concludes that by reducing the cost of information so much, computers intensify the information problem of private gain v. societal benefit. He uses Arrow again to pose the strawman that, “Thirty years ago, Mr. Arrow said the fundamental problem for companies trying to get and use information for profit was ‘the limitation on the ability of any individual to process information.'”

But as Knowledge Crumbs notes, computers may be able to process more data than an individual, but they are still limited and always will be. Moreover there will remain the Knowledge Problem and the SNAFU principle to make sure that humans are not augmented perfectly by their computers. Knowledge Crumbs concludes:

The issue with knowledge isn’t that there is too much, it is that we lack methods to process it in a timely fashion, and processing introduces defects that sometimes are harmful. When data is reduced or summarized something is lost as well as gained.

The speed of crunching data or computer processing power is not the issue. Use and misuse of information will continue to exist, as it has since mythologies were passed by verbal allegory by firelight.

Importance to Document Assets

So, why does such a flawed polemic get published in a reputable source like the Wall Street Journal? There are real concerns and anxieties underlying this Wessel piece and it is always useful to stimulate thought and dialog. But, like all “information” that the piece itself worries over, it must be subjected to scrutiny, testing and acceptance before it can become the basis for action. The failure of the Wessel piece to pass these thresholds itself negates its own central arguments.

Better that our pundits should focus on things that can be improved such as why there is so much duplication, misuse and overlooking of available information. These cost the economy plenty, totally swamping any of Wessel’s putative private benefits were they even correct.

Let’s focus on the real benefits available today through computers and information to improve society’s welfare. Setting up false specters of computer processing serving private greed only takes our eye off the ball.

NOTE: This posting is part of a series looking at why document assets are so poorly utilized within enterprises.  The magnitude of this problem was first documented in a BrightPlanet white paper by the author titled, Untapped Assets:  The $3 Trillion Value of U.S. Enterprise Documents.  An open question in that paper was why nearly $800 billion per year in the U.S. alone is wasted and available for improvements, but enterprise expenditures to address this problem remain comparatively small and with flat growth in comparison to the rate of document production.  This series is investigating the various technology, people, and process reasons for the lack of attention to this problem.

[1] J. Hirshleifer, “The Private and Social Value of Information and the Reward to Inventive Activity,” American Economic Review, Vol. 61, pp. 561-574, 1971.

[2] A. D. Morrison, “Competition and Information Production in Market Maker Models,” forthcoming in the Journal of Business Finance and Accounting, Blackwell Publishing Ltd., Malden, MA. See the 20 pp. online version, http://users.ox.ac.uk/~bras0541/12_jbfa5709.pdf#search=’Hirshleifer%20private%20foreknowledge

[3] S.J. Grossman and J.E. Stiglitz, “On the Impossibility of Informationally Efficient Markets,” American Economic Review, Vol. 70, No. 3, pp. 393-403, June 1980.

Posted by AI3's author, Mike Bergman Posted on October 3, 2005 at 9:14 am in Adaptive Information, Document Assets, Information Automation | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/130/why-are-800-billion-in-document-assets-wasted-annually-i-is-private-information-bad/
The URI to trackback this post is: https://www.mkbergman.com/130/why-are-800-billion-in-document-assets-wasted-annually-i-is-private-information-bad/trackback/
Posted:October 2, 2005

Shortly after its release, I posted information on how to get your blog postings listed on Google’s blog search (GBS) (prior to Google itself providing a submission path, which it claims is forthcoming).

While the information on that earlier post is correct, it needs some updating.

Despite having some of the reference sites and Ping-o-matic listed in my ping Update Services, and despite that my WordPress software is currently up to date with version 1.5.2, I was NOT seeing new listings appear in GBS.

However, if you go to Ping-o-matic itself and manually force a ping update, within minutes your new posts will appear in GBS.  Go figure ….

Posted by AI3's author, Mike Bergman Posted on October 2, 2005 at 2:36 pm in Blogs and Blogging, Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/133/update-google-blog-search-wordpress-ping-o-matic/
The URI to trackback this post is: https://www.mkbergman.com/133/update-google-blog-search-wordpress-ping-o-matic/trackback/
Posted:October 1, 2005

In my daytime life at BrightPlanet we do a lot of work for the intelligence community that we really can’t say anything about.  However, I recently came across a blog that I have been monitoring (and am still the only subscriber to on Bloglines) called Intelligence and Technology and National Security that I have been finding quite useful.  Recommended.

Thus, my interest was piqued when it referred to a Web site called Intelligence Search:  That Web site claims:

“Intelligence Search is the only totally free spy and intelligence web site that searches through creditable web sites to deliver quality information to its visitors. Intelligence Search is free of adware, spyware and pop-ups and does not ask its visitors for donations. Intelligence Search also allows freedom of speech in the written word, so visitors get the purest form of intelligence information possible. “

Hmmm, sounds useful and interesting.  So, I tried ‘ODNI’ as a search term (for Office of the Director of National Intelligence — the new intel oversight position created by the Intelligence Reform and Terrorism Prevention Act of 2004, with Ambassador John D. Negroponte its first director) and only got one result (and that one not even the ODNI’s home page!).  A similar Yahoo! search turns up 84,200 hits, of which 200 are excellent results after applying further query refinements.  (Of course, Yahoo! does not include any deep Web content, so a truly useful compendium would likely have 1,000 documents or more.)  Numerous other searches I tried produced similarly meager results from Intelligence Search in comparison to what is available.

I think the intent of Intelligence Search is laudable and I like its clean interface.  However, I can not recommend it at this time until content coverage actually becomes useful.  Perhaps the site’s developers need to consider better tools for harvesting and building content on their site.  I just may have some recommendations   ….

Posted by AI3's author, Mike Bergman Posted on October 1, 2005 at 11:36 am in Blogs and Blogging, Deep Web, OSINT (open source intel), Searching | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/131/recommended-intel-blog-and-disappointing-intelligence-search/
The URI to trackback this post is: https://www.mkbergman.com/131/recommended-intel-blog-and-disappointing-intelligence-search/trackback/
Posted:September 28, 2005

An earlier posting described a step-by-step process for converting a Word doc to clean HTML for posting on your site. Today’s posting updates that information, with specific reference to creating multi-part HTML postings.

A multi-part posting may make sense when the original document is too long for a single posting on your site, or if you wish to serialize its presentation over postings on multiple days.

Multi-part HTML postings pose a number of unique differences from a single page posting, namely in:

  • Needing to deal with multiple internal document cross-references (not only for a table of contents but also any Word doc cross-references ((Insert –> Reference –> Cross-reference) such as for internal headers, figures, tables, etc.
  • Organizing and splitting the table of contents (TOC) itself, and
  • Image naming and referencing.

So, how does one proceed with a multi-part HTML conversion in preparation for posting?

Specific Conversion Steps

  1. The first requirement is that you must create your baseline Word document with a table of contents (TOC) (Insert –> Reference–> Index and Tables –> Table of Contents). You should give great care to the construction and organization of the TOC because it will dictate your eventual multi-part HTML pages and splits
  2. When the Word doc is absolutely complete (and only then!), follow the steps in the earlier posting on Word docs to HTML to get absolutely as clean an HTML code base as possible. Include all global search and replaces (S & R) as the earlier post instructed. UNTIL THE ABSOLUTELY LAST SPECIFIC CONVERSION STEP #6 BELOW YOU WILL CONTINUE TO WORK WITH THIS SINGLE HTML DOCUMENT! For example, you may end up with clean HTML code for your TOC such as the following:
  3. <p><a href=”#_Toc106767203″>EXECUTIVE SUMMARY. 1</a></p>

    <p><a href=”#_Toc106767204″>I. INTRODUCTION. 3</a></p>

    <p><a href=”#_Toc106767205″>Knowledge Economy. 3</a></p>

    <p><a href=”#_Toc106767206″>Corporate Intellectual Assets. 4</a></p>

    <p><a href=”#_Toc106767207″>Huge Implications. 4</a></p>

    <p><a href=”#_Toc106767208″>Data Warehousing?. 6</a></p>

    <p><a href=”#_Toc106767209″>Connecting the Dots. 6</a></p>

    <p><a href=”#_Toc106767210″>II. INTERNAL DOCUMENTS. 7</a></p>

    <p><a href=”#_Toc106767211″>&#8216;Valuable&#8217; Documents. 7</a></p>

    <p><a href=”#_Toc106767212″>&#8216;Costs&#8217; to Create. 8</a></p>

    <p><a href=”#_Toc106767213″>&#8216;Cost&#8217; to Modify. 9</a></p>

    <p><a href=”#_Toc106767214″>&#8216;Cost&#8217; of a Missed. 9</a></p>

    <p><a href=”#_Toc106767215″>Other Document &#8216;Cost&#8217;. 9</a></p>

    <p><a href=”#_Toc106767216″>Archival Lifetime. 10</a></p>

    <p><a href=”#_Toc106767217″>III. WEB DOCUMENTS AND SEARCH. 10</a></p>

    <p><a href=”#_Toc106767218″>Time and Effort for Search. 11</a></p>

    <p><a href=”#_Toc106767219″>Lost Searches. 11</a></p>

    <p><a href=”#_Toc106767220″>&#8216;Cost&#8217; of a Portal. 14</a></p>

    <p><a href=”#_Toc106767221″>&#8216;Cost&#8217; of Intranets. 16</a></p>

    <p><a href=”#_Toc106767222″>IV. OPPORTUNITIES AND THREATS. 18</a></p>

    <p><a href=”#_Toc106767223″>&#8216;Costs&#8217; of Proposals. 18</a></p>

    <p><a href=”#_Toc106767224″>&#8216;Costs&#8217; of Regulation. 21</a></p>

    <p><a href=”#_Toc106767225″>&#8216;Cost&#8217; of Misuse. 24</a></p>

    <p><a href=”#_Toc106767226″>V. CONCLUSIONS. 25</a></p>

  4. Do global S & R on the TOC references, replacing with internal page link (e.g., “./ …) references, as this example for the Intro shows:
  5. Find and Replace Screen

    There will need to be as many S & R replacements throughout the document as there are entries in the TOC. You should be careful to name your internal pages according to your anticipated final published structure for the multi-part HTML pages. Upon completion of the global S & R, you should then remove earlier Word doc page numbers and clean up spaces or other display issues. Thus, using the example above, you could end up with revised code for the TOC as follows:

    <p><a href=”./summary.html”>EXECUTIVE SUMMARY</a></p>

    <p><a href=”./intro.html”>I. INTRODUCTION</a></p>

    <p><a href=”./intro.html#knowledge”>Knowledge Economy</a></p>

    <p><a href=”./intro.html#assets”>Corporate Intellectual Assets</a></p>

    <p><a href=”./intro.html#huge”>Huge Implications</a></p>

    <p><a href=”./intro.html#data”>Data Warehousing?</a></p>

    <p><a href=”./intro.html#dots”>Connecting the Dots</a></p>

    <p><a href=”./internal.html”>II. INTERNAL DOCUMENTS</a></p>

    <p><a href=”./internal.html#docs”>&#8216;Valuable&#8217; Documents</a></p>

    <p><a href=”./internal.html#create”>&#8216;Costs&#8217; to Create</a></p>

    <p><a href=”./internal.html#modify”>&#8216;Cost&#8217; to Modify</a></p>

    <p><a href=”./internal.html#missed”>&#8216;Cost&#8217; of a Missed</a></p>

    <p><a href=”./internal.html#etc”>Other Document &#8216;Cost&#8217;</a></p>

    <p><a href=”./internal.html#archive”>Archival Lifetime</a></p>

    <p><a href=”./web.html”>III. WEB DOCUMENTS AND SEARCH</a></p>

    <p><a href=”./web.html#time”>Time and Effort for Search</a></p>

    <p><a href=”./web.html#lost”>Lost Searches</a></p>

    <p><a href=”./web.html#portal”>&#8216;Cost&#8217; of a Portal</a></p>

    <p><a href=”./web.html#intranets”>&#8216;Cost&#8217; of Intranets</a></p>

    <p><a href=”./opps.html”>IV. OPPORTUNITIES AND THREATS</a></p>

    <p><a href=”./opps.html#proposals”>&#8216;Costs&#8217; of Proposals</a></p>

    <p><a href=”./opps.html#regs”>&#8216;Costs&#8217; of Regulation</a></p>

    <p><a href=”./opps.html#misuse”>&#8216;Cost&#8217; of Misuse</a></p>

    <p><a href=”./conclusion.html”>V. CONCLUSIONS</a></p>

  6. You may also need to do additional code cleanup. For example, in the snippet below, the first href refers to the TOC entry that will be replaced via steps #3 and #6. However, the second href is an internal cross-reference from another location (not the TOC) in the Word doc. For these additional cross-references, you will need either to chose to keep them and rename logically with S & R or to remove them. (Generally, since you are already splitting a long Word doc into multiple HTML pages such additional cross-references are excessive and unnecessary; you can likely remove.):
  7. <h1><a name=”_Toc106767204″></a><a name=”_Toc90884898″> I. INTRODUCTION</a></h1>

    <p>How many documents does your organization create each year? What effort does this represent in terms of total staffing costs? Etc., etc.</p>

  8. You will then need to rename your images using global S & R, which were given sequential image numbers (not logical names) in the Word doc to HTML conversion. For example, you may have an image named:
  9. <img width=”664″ height=”402″ src=”Document_files/image001.jpg”>

    You will need to give that image a better logical name, and perhaps put it into its own image subdirectory, like the following:

    <img width=”664″ height=”402″ src=”./images/CostChart1.jpg”>

  10. Finally, your HTML is now fully prepped for splitting into multiple pages. You need to do three more things in this last step.

First, via cut-and-paste take your TOC and any intro text from the main HTML document and place it into an index.html HTML document. That should also be the parent directory for any of your subsequent split pages. Thus, in our example herein, you would have a directory structure that looks like:

MAIN (where index.html is located)

Summary

Intro

Internal

Web

Opps

Conclusion

Second, cut-and paste the HTML sections from the main HTML document that correspond to the five specific split pages (summary.html to conclusion.html) and place each of them into their own named, empty HTML shells with header information, etc. Thus, the pasted portions are what generally corresponds to the <body> . . .  </body> portion of the HTML. This is how the various subparts.html get created.

Third, and last, delete each of the main page cross-references changed during global S & R (these are all of the references without internal anchor # tags); these references are now being handled directly via the multiple, split HTML page documents. For clarity, these deleted references are thus for our example:

<p><a href=”./summary.html”>EXECUTIVE SUMMARY</a></p>

<p><a href=”./intro.html”>I. INTRODUCTION</a></p>

<p><a href=”./intro.html#knowledge”>Knowledge Economy</a></p>

<p><a href=”./intro.html#assets”>Corporate Intellectual Assets</a></p>

<p><a href=”./intro.html#huge”>Huge Implications</a></p>

<p><a href=”./intro.html#data”>Data Warehousing?</a></p>

<p><a href=”./intro.html#dots”>Connecting the Dots</a></p>

<p><a href=”./internal.html”>II. INTERNAL DOCUMENTS</a></p>

<p><a href=”./internal.html#docs”>&#8216;Valuable&#8217; Documents</a></p>

<p><a href=”./internal.html#create”>&#8216;Costs&#8217; to Create</a></p>

<p><a href=”./internal.html#modify”>&#8216;Cost&#8217; to Modify</a></p>

<p><a href=”./internal.html#missed”>&#8216;Cost&#8217; of a Missed</a></p>

<p><a href=”./internal.html#etc”>Other Document &#8216;Cost&#8217;</a></p>

<p><a href=”./internal.html#archive”>Archival Lifetime</a></p>

<p><a href=”./web.html”>III. WEB DOCUMENTS AND SEARCH</a></p>

<p><a href=”./web.html#time”>Time and Effort for Search</a></p>

<p><a href=”./web.html#lost”>Lost Searches</a></p>

<p><a href=”./web.html#portal”>&#8216;Cost&#8217; of a Portal</a></p>

<p><a href=”./web.html#intranets”>&#8216;Cost&#8217; of Intranets</a></p>

<p><a href=”./opps.html”>IV. OPPORTUNITIES AND THREATS</a></p>

<p><a href=”./opps.html#proposals”>&#8216;Costs&#8217; of Proposals</a></p>

<p><a href=”./opps.html#regs”>&#8216;Costs&#8217; of Regulation</a></p>

<p><a href=”./opps.html#misuse”>&#8216;Cost&#8217; of Misuse</a></p>

<p><a href=”./conclusion.html”>V. CONCLUSIONS</a></p>

Voilà. You now have multiple HTML pages from a Word document!

Posted:September 27, 2005

Though it has been out since June, I just today came across an interview with Tim Berners-Lee on the Semantic Web that was conducted by Andrew Updegrove for the Consortium Standards Bulletin.  I highly recommend this piece for any interested in an insider’s view to the creation and use of the semantic Web.  Here are some highlights.  All are direct quotes from Berners-Lee.

Here are some excerpts relating to the vision of the semantic Web:

The goal of the Semantic Web initiative is to create a universal medium for the exchange of data where data can be shared and processed by automated tools as well as by people. The Semantic Web is designed to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data.

Many large-scale benefits are, not surprisingly, evident for enterprise level applications. The benefits of being able to reuse and repurpose information inside the enterprise include both for savings and new discoveries. And of course, more usable data brings about a new wave of software development for data analysis, visualization, smart catalogues… not to mention new applications development. The point of the Semantic Web is in the potential for new uses of data on the Web, much of which we haven’t discovered yet.

As for status of the initiative, Berners-Lee directly addresses some critics by emphasizing the importance of automated tools and not author tagging:

It’s not about people encoding web pages; it’s about applications generating machine-readable data on an entirely different scale. Were the Semantic Web to be enacted on a page-by-page basis in this era of fully functional databases and content management systems on the Web, we would never get there. What is happening is that more applications — authoring tools, database technologies, and enterprise-level applications — are using the initial W3C Semantic Web standards for description (RDF) and ontologies (OWL).

Berners-Lee goes on to say:

One of the criticisms I hear most often is, “The Semantic Web doesn’t do anything for me I can’t do with XML”. This is a typical response of someone who is very used to programming things in XML, and never has tried to integrate things across large expanses of an organization, at short notice, with no further programming. One IT professional who made that comment around four years ago, said a year ago words to the effect, “After spending three years organizing my XML until I had a heap of home-made programs to keep track of the relationships between different schemas, I suddenly realized why RDF had been designed. Now I used RDF and its all so simple — but if I hadn’t have had three years of XML hell, I wouldn’t ever have understood.”

Many of the criticisms of the Semantic Web seems (to me at least!) the result of not having understood the philosophy of how it works. A critical part, perhaps not obvious from the specs, is the way different communities of practice develop independently, bottom up, and then can connect link by link, like patches sewn together at the edges. So some criticize the Semantic Web for being a (clearly impossible) attempt to make a complete top-down ontology of everything.

Others criticize the Semantic Web because they think that everything in the whole Semantic Web will have to be consistent, which is of course impossible. In fact, the only things I need to be consistent are the bits of the Semantic Web I am using to solve my current problem.

The web-like nature of the Semantic Web sometimes comes under criticism. People want to treat it as a big XML document tree so that they can use XML tools on it, when in fact it is a web, not a tree. A semantic tree just doesn’t scale, because each person would have their own view of where the root would have to be, and which way the sap should flow in each branch. Only webs can be merged together in arbitrary ways. I think I agree with criticisms of the RDF/XML syntax that it isn’t very easy to read. This raises the entry threshold. That’s why we wrote N3 and the N3 tutorial, to get newcomers on board with the simplicity of the concepts, without the complexity of that serialization.

Some of the other insights in the interview is that early adoption is likely to be internally by enterprises on their intranets, that there will definitely be first-mover advantages for software applications that embrace RDF and OWL, and that a more widely embraced rules-based language (think of a successor to Prolog) may likely emerge.

Highly recommended reading!

Posted by AI3's author, Mike Bergman Posted on September 27, 2005 at 9:33 am in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/128/tbl-on-the-semantic-web/
The URI to trackback this post is: https://www.mkbergman.com/128/tbl-on-the-semantic-web/trackback/