Posted:November 14, 2008

Topics Range from the Deep Web to Semantic Web in this Search Luminaries Series

I’m pleased to wrap up a multi-part interview with the Federated Search Blog as part of their ongoing ‘Search Luminaries’ series. Sol Lederman, editor of the blog, does a thorough and comprehensive job! Over the past month on every Friday, I have answered some 25 or so of his detailed questions.

Federated Search Blog was particularly interested in the deep Web, its discovery and size. Many of the early questions deal with those themes. However, by Part 4 things get a bit more current, with the topics shifting to the semantic Web, linked data and Zitgist.

Here are the links to the series:

To give you a flavor of the interview, here is an example of one of the questions (and probably my favorite):

20. Tim Berners-Lee, credited with inventing the World Wide Web, has been talking about the importance and value of the Semantic Web for years yet common folks don't see much evidence of the Semantic Web gaining traction. Is there substance to the Semantic Web? What's happening with it now and what does its future look like?

Wow, in 10,000 words or less?

No, actually, this is a very good question. As things go, I am a relative newbie to the semantic Web, only having studied and followed it closely since about 2005. I'm sure my perspective in coming later to the party may not be shared by those at the beginning, which dates to the mid-1990s as Berners-Lee's vision naturally progressed from a Web of documents, as most of us currently know the Web, to a Web of data.

I think there is indeed incredibly important substance to the semantic Web. But, as I have written elsewhere, the semantic Web is more of a vision than a discernable point in time or a milestone.

The basic idea of the semantic Web is to shift the focus from documents to data. Give data a unique Web address. Characterize that data with rich metadata. Describe how things are related to one another so that relationships and connections can be traced. Provide defined structures for what these things and relationships "mean"; this is what provides the semantics, with the structures and their defined vocabularies known as "ontologies" (which in one analog can be seen as akin to a relational database schema).

As these structures and definitions get put in place, the Web itself then becomes the infrastructure for relating information from everywhere and anywhere on any given topic or subject. While this vision may sound grandiose, just think back to what the Web itself has done for us and documents over the past decade or so. This same architecture and infrastructure can and should be extended to the actual information in those documents, the data. And, oh, by the way, conventional databases can now join this party as well. The vision is very powerful and very cool.

Progress has indeed been slow. Many advocates fairly point to how long it takes to get standards in place and for a while people spoke of the "chicken-and-egg" problem of getting over the threshold of having enough structured data to consume to make it worthwhile to create the tools and applications and showcases that consume that data.

From my perspective, the early visions of the semantic Web were too abstract, a bit off perhaps. First, there was the whole idea of artificial intelligence and machines using the data as opposed to better ways for humans to draw use from the data at hand. The fundamental and exciting engine underneath the semantic Web — the RDF (Resource Description Framework) data model — was not initially treated on its own. It got admixed with XML that made understanding difficult and distinctions vague. There is and remains too much academia and not enough pragmatics driving the bus.

But that is changing and fast.

There is now an immediate and practical "flavor" of the semantic Web called linked data. It has three simple bases:

(1) RDF as the simple but adaptable data model that can represent any information — structured or unstructured — as the basic "triple" statement of subject-predicate-object. That sounds fancy, but just substitute verb for predicate and noun for subject and object. In other words: Dick sees Jane; or the ball is round. It sounds like a kindergartner reader, but that is how data can be easily represented and built up into more complex structures and stories

(2) Give all objects a unique Web identifier. Unique identifiers are common to any database; in linked data, we just make sure those identifiers conform to the same URIs we see constantly in the address bar of our Web browsers, and:

(3) Post and expose this stuff as accessible on the Web (namely, HTTP).

My company adds some essential "spice" to these flavors with respect to reference structures and concepts to give the information context, but these simple bases remain the foundation.

These are really not complex steps. They are really no different than the early phases of posting documents on the Web. Only now, we are exposing data.

More importantly, we can forget the chicken-and-egg problem. Each new data link we make brings value, in the similar way that adding a node to a network brings value according to Metcalfe's Law. Only with linked data, we already have the nodes — the data — we are just establishing the link connections (the verbs, predicates or relations) to flesh out the network graph. Same principle, only our focus is now to connect what is there rather than to add more nodes. (Of course, adding more linked nodes helps as well!)

The absolutely amazing thing about our current circumstance as Web users is that we truly now have simple and readily deployable mechanisms available to finally overcome the decades of enterprise stovepipes. The whole answer is so simple it can be mistaken as snake oil when first presented and not inspected a bit.

As an industry accustomed to hype and cynical about so much of this, I only ask that your readers check out these assertions for themselves and suspend their normal and expected disbelief. For me, in a career of more than 30 years focusing on information and access, I feel like we finally now have the tools, data model and architecture at hand to actually achieve data interoperability.

Thanks again to Sol and Federated Search Blog for this opportunity.

Posted:November 10, 2008

Inside the BoxLinked Data Need Not Rediscover the Past; A Surprise in Every Box

A standard cliché of management consultants is the exhortation to think “outside the box.” Of course, what is meant by this is to question assumptions, to think differently, to look at problems from new perspectives.

With our recent release of the (linked open data) ‘LOD constellation‘ of linked data classes based around UMBEL, I have been fielding a lot of inquiries on what the relationship is of UMBEL to DBpedia. (See, for example, this current interview by the Semantic Web Company with me and Sören Auer of the DBpedia project.) This also fits into the ongoing distinction we have made in the UMBEL project between our subject concepts (classes) and named entities (instances).

What has actually most been helping my thinking is to get fully inside the box (or, rather, boxes, hehe). Let me explain.

The problem with urging outside-the-box thinking is that many of us do a less-than-stellar job of thinking inside the box. We often fail to realize the options and opportunities that are blatantly visible inside the box that could dramatically improve our chances of success.

Naomi Karten [1]

The Description Logics Underpinnings of the Semantic Web

Description logics are one of the key underpinnings to the semantic Web. They grew out of earlier frame-based logic systems from Marvin Minsky and also semantic networks; the term and discipline was first given definition in the 1980s by Ron Brachman, among many others [2].

Description logics (DL, most often expressed in the plural) are a logic semantics for knowledge representation (KR) systems based on first-order predicate logic (FOL). They are a kind of logical metalanguage that can help describe and determine (with various logic tests) the consistency, decidability and inferencing power of a given KR language. The semantic Web ontology languages, OWL Lite and OWL DL (which stands for description logics), are based on DL and were themselves outgrowths of earlier DL languages.

Description logics and their semantics traditionally split concepts and their relationships from the different treatment of individuals and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships.

Thus, the model is an abstraction of a concrete world where the concepts are interpreted as subsets of the domain as required by the TBox and where the membership of the individuals to concepts and their relationships with one another in terms of roles respect the assertions in the ABox.

Franz Baader and Werner Nutt [3]

The second split of individuals is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of individuals, the roles between individuals, and other assertions about individuals regarding their class membership with the TBox concepts. Both the TBox and ABox are consistent with set-theoretic principles.

TBox and ABox logic operations differ and their purposes differ. TBox operations are based more on inferencing and tracing or verifying class memberships in the hierarchy (that is, the structural placement or relation of objects in the structure). ABox operations are more rule-based and govern fact checking, instance checking, consistency checking, and the like [3]. ABox reasoning is generally more complex and at a larger scale than that for the TBox.

Early semantic Web systems tended to be very diligent about maintaining these “box” distinctions of purpose, logic and treatment. One might argue, as I do herein, that the usefulness and basis for these splits has been lost somewhat in our first implementations and publishing of linked data systems.

ABox and TBox Analogs in the Linked Data Web

Most of the semantic Web work at the beginning of this decade was pretty explicit about references to description logics and related inferencing engines and computational efficiency. Some of the early commercial semantic Web vendors are still very much focused on this space.

However, with the first release and emphasis on linked data about two years ago, the emphasis seemed to shift to the more pragmatic questions of actually posting and getting data out there. Best practices for cool URIs and publishing and linkage modes assumed prominence. The linking open data (LOD) movement began in earnest and gained mindshare. Of course, many in the DL and OWL development communities continued to discuss logic and inferencing, but now seemingly more as a separate camp to which the linked data tribe paid little heed.

The central hub of this linked data effort has been DBpedia and its pivotal place within the ‘LOD cloud.’ What is remarkable about the LOD cloud, however, is that it is almost entirely an ABox representation of the world and its instances. Starting from the core set of individual instances within Wikipedia, this cloud has now grown to many other sources and the central place for finding linked instance data. If one looks carefully at the LOD cloud and its linkages we can see the prevalence of instance-level relationships and attributes.

Linking Open Data’s “ABox”

Linking Open Data’s “TBox”

In fact, the LOD cloud diagram to upper right from the Wikipedia article on linked data has become the key visual metaphor for the movement. But, as noted, this view is almost exclusively one at the ABox instance level.

The UMBEL project began at roughly the same time and as a response to the release of DBpedia. My question in looking at the first data linked to DBpedia was, What is this content about? Sure, I might be able to find multiple records discussing Abraham Lincoln as a US president regarding attributes like birth date and a list of children, but where could I retrieve records about other presidents or, more broadly, other types of leaders such as prime ministers, kings or dictators?

The intuition was that the linked data and the various FOAF and other distributed instance records it was combining lacked a coherent reference structure of subject topics or concepts with which to describe content. The further intuition was that — while tagging systems and folksonomies would allow any and all users to describe this content with their own metadata — a framework for relating these various assignments to one another was still lacking.

In the nearly two years of development leading to the first beta release of UMBEL we have tried many analogies and metaphors to describe the basis of the 20,000 subject concept classes within UMBEL in relation to its role and other linked data initiatives. While many of those metaphors help visualize use and role, the more formal basis offered by description logics actually helps to most precisely cast UMBEL’s role. For example, in today’s interview with the Semantic Web Company, I note:

“. . . we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an infocline, or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema — a TBox — while DBpedia is more of an an instance and entity layer with attributes — an ABox. I think they are pretty complementary. . . “

The resulting class level structure produced by UMBEL and its mappings to other classes within existing linked data enabled us to create and then publish the ‘LOD constellation‘, a complementary TBox structure to the linked data’s existing ABox one. This diagram to the lower right from the Wikipedia article on linked data now shows this complement.

Completeness and Sufficiency

Description logics have arisen to aid our creating and understanding of knowledge representation systems. From this basis, we can see that the first efforts of the linked data initiative have lacked context, the TBox. At a specific level, the question is not DBpedia v. UMBEL or cloud v. constellation. Both types of structure are required in order to complete the logical framework. By thinking inside the box — by paying attention to our logical underpinnings — we can see that both TBoxes and ABoxes are essential and complementary to creating a useful knowledge representation system.

By more explicitly adopting a description logics framework we can also better address many prior questions of context, coherence and sufficiency. These have been constant themes in my recent writings that I will be revisiting again through the helpful prism of formal description logics.

My interview today with Sören Auer also brought up some important points regarding context. As we have said in other venues, it is important that any TBox be available for context purposes. Whether that should be UMBEL or some other framework depends on the use case. As I noted in the interview, “UMBEL’s specific purpose is to provide a coherent framework for serious knowledge engineers looking to federate data.” Other uses may warrant other frameworks, and certainly not always UMBEL.

But, in any event, I have two cautions to the linked data community: 1) do not take the suggestion to have a reference framework of concepts as being equivalent to adopting a single ontology for the Web; think of any reference structure as an essential missing TBox, and not some call to adopt “one ontology to rule them all,” but 2) in adopting alternative frameworks, take care that whatever is designed or adopted itself be able to meet basic DL logic tests of consistency and coherence.

A Serendipitous Surprise

No one has yet elaborated the significant advantages from design, performance, architectural and flexibility perspectives from a distinct and explicit separation of TBox from ABox — but they’re there!

The many advantages from separate TBox and ABox frameworks are one serendipitous surprise coming from the early development of linked data. To my knowledge, no one has yet elaborated the significant advantages from design, performance, architectural and flexibility perspectives from a distinct and explicit separation of TBox from ABox. We believe these advantages to be substantial.

Realize, as distributed, UMBEL already has both TBox and ABox components. The TBox component is the lightweight UMBEL ontology, with its 20,000 subject concept classes and their hierarchical and other relationships. This component has a vocabulary (or terminology) for aiding the linking to external ontologies. The vocabulary is quite suitable for extension into new domains as well.

The ABox component is the named entities part of instances drawn from Wikipedia and the BBC’s John Peel sessions. Besides being of common, broad interest, these 1.5 million instances (per the current version) are included in the distribution to instantiate the ontology for demonstration and sandbox purposes.

So, UMBEL’s world is quite simple: subject concepts (SCs) and named entities (NEs). Subject concepts are the TBox and classes that define the structure and concept relationships. Named entities are the individual “things” in the world (some lower case such as animals or foods) and are the ABox of instances that populate this structure.

In our early efforts, we concentrated on the SC portion of UMBEL. Most recently, we have been concentrating on the NE component and its NE dictionaries. It was these investigations that drew us into an ABox perspective when looking at design options. The logic and rationale had been sitting there for some years, but it took cracking open the older textbooks to become reacquainted with it.

Once we again began looking inside the box, we began to see and enumerate some significant advantages to an explicit TBox-ABox design, as well as advantages for keeping these components distinct:

  • Easier understood ontologies with a very limited number of predicates
  • Lightweight schema design that is easy to extend
  • Ability to “triangulate” between separate SC (concept) and NE (instance) disambiguation approaches to improve overall precision and recall
  • Attribute information is kept separate from structural and conceptual relationships
  • Easy to swap in varied, multiple and private or public named entity dictionaries
  • Relatively easy extension of the schema ontology into specific domains
  • A design suitable to computation efficiency (rules for ABox; inference and standard reasoning for TBox), and
  • Assignment of NEs to distinct and disjoint “super types” [4] that can bring significant tableaux benefits to ABox reasoning.

We are still learning about these advantages and will document them further in pending work on coherence and named entity dictionary (NED) creation.

Thinking Inside the TBox and ABox

The two main points of this article have been to: 1) recognize the important intellectual legacy of description logics and how they can inform the linked data enterprise moving forward; and 2) be explicit about the functional and architectural splits of the TBox from the ABox. Making this split brings many advantages.

There will continue to be many design challenges as linked data proliferates and actually begins to play its role of aiding meaningful knowledge work. The grounding in description logics and the use of DL for testing alternative designs and approaches is a powerful addition to our toolkit.

Sometimes there are indeed many benefits to thinking inside the box.


[2] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. See Chapter 1. Sample chapters may be viewed from Enrico Franconi’s Description Logics course notes and tutorial at http://www.inf.unibz.it/~franconi/dl/course/, which is an excellent starting reference point on the subject.
[3] Ibid.; see Chapter 2.
[4] These are akin to the lexicographer supersenses that have been applied in WordNet for nouns and verbs (though only nouns are used here). See Massimiliano Ciaramita and Mark Johnson, 2003. Supersense Tagging of Unknown Nouns in WordNet, in Proceedings of the Conf. on Empirical Methods in Natural Language Processing, pp. 168173, 2003. See http://www.aclweb.org/anthology-new/W/W03/W03-1022.pdf.
Posted:November 6, 2008

UMBEL (Upper Mapping and Binding Exchange Layer)OpenLink Software

Version 5.0.9 includes UMBEL Class Lookups and Named Entity Extraction

I first wrote about OpenLink Software‘s stellar suite of structured Web-related software back in April 2007, with a spotlight on Virtuoso, the company’s flagship ‘universal server’ product. As it has for years, OpenLink continues a steady drumbeat of new releases and extensions. The most recent version upgrade, 5.0.9, was announced today.

In the intervening period I have now personally had the chance to experience Virtuoso first hand, both as the standard hosting platform for Zitgist’s linked data products and services, and as the hosting environment for UMBEL‘s various and growing Web services. I can state quite categorically that our ability to get things done fast with few resources depends critically on the unbelievable high-productivity platform that Virtuoso provides. (And, hehe, given our close relationship to OpenLink, we also get great responsiveness and technical support! 🙂 Though, truthfully, OpenLink continues to amaze with its outreach and embrace of all of the important initiatives within the semantic Web community.)

I normally let these standard Virtuoso release announcements pass without comment. But today’s release v. 5.0.9 has an especially important feature from my parochial perspective: the first support for UMBEL.

Virtuoso Reprised

Just to refresh memories, OpenLink’s Virtuoso is a cross-platform universal server for SQL, XML, and RDF data, including data management. It includes a powerful virtual database engine, full-text indexing, native hosting of existing applications, Web Services (WS*) deployment platform, Web application server, and bridges to numerous existing programming languages. Now in version 5.0, Virtuoso is also offered in an open source version. The basic technical architecture of Virtuoso and its robust capabilities is:

Virtuoso Architecture
[Click on image for full-size pop-up]

From an RDF and linked data perspective, Virtuoso is the most scalable and fastest platform on the market. Critically from Zitgist’s perspective is Virtuoso’s more than 100 built-in RDF-izers (or “Sponger cartridges”) that address all major data formats, serializations, relational data and Web 2.0 APIs. But don’t take my word for it: Check out OpenLink’s impressive list of these cartridges and their various linkages throughout the linked data space.

UMBEL Support

The key aspect of the new UMBEL support in Virtuoso is its incorporation of UMBEL lookups and its use of Named Entity extraction into the RDF-izer cartridges. This is but the first of growing support anticipated for UMBEL.

Other New Features

In addition to UMBEL, this version 5.0.9 includes significant performance optimizations to the SQL Engine, SPARQL+RDF Engine, and the ODBC and JDBC drivers.

Other new features include:

  • An Excel mime-type output option in the SPARQL endpoint
  • Enhanced triple options for bif:contains plus new options for transitivity
  • New RDF-izer Cartridges for the Sponger RDF Middleware Layer
  • Support for very large HTTP client requests
  • A sparql-auth endpoint with digest authentication for using SPARUL via SPARQL Protocol
  • New commands for the Ubiquity Firefox plugin.

Finally, per usual, there are also minor bug-fixes:

  • Memory leaks
  • SQL query syntax handling
  • SPARQL ‘select distinct’
  • XHTML and Javascript validation and other UI issues in the ODS application suite.

For More Details

For more details, you can see these Virtuoso release notes: https://sourceforge.net/project/shownotes.php?release_id=626647&group_id=161622

You can also get information on the Virtuoso open source edition or download it.

Posted:October 28, 2008

It's UMBELievable!

UMBEL’s New Web Services Embrace a Full Web-Oriented Architecture

I recently wrote about WOA (Web-oriented architecture), a term coined by Nick Gall, and how it represented a natural marriage between RESTful Web services and RESTful linked data. There was, of course, a method behind that posting to foreshadow some pending announcements from UMBEL and Zitgist.

Well, those announcements are now at hand, and it is time to disclose some of the method behind our madness.

As Fred Giasson notes in his announcement posting, UMBEL has just released some new Web services with fully RESTful endpoints. We have been working on the design and architecture behind this for some time and, all I can say is, it’s UMBELievable!

As Fred notes, there is further background information on the UMBEL project — which is a lightweight reference structure based on about 20,000 subject concepts and their relationships for placing Web content and data in context with other data — and the API philosophy underlying these new Web services. For that background, please check out those references; that is not my main point here.

A RESTful Marriage

We discussed much in coming up with the new design for these UMBEL Web services. Most prominent was taking seriously a RESTful design and grounding all of our decisions in the HTTP 1.1 protocol. Given the shared approaches between RESTful services and linked data, this correspondence felt natural.

What was perhaps most surprising, though, was how complete and well suited HTTP was as a design and architectural basis for these services. Sure, we understood the distinctions of GET and POST and persistent URIs and the need to maintain stateless sessions with idempotent design, but what we did not fully appreciate was how content and serialization negotiation and error and status messages also were natural results of paying close attention to HTTP. For example, here is what the UMBEL Web services design now embraces:

  • An idempotent design that maintains state and independence of operation
  • Language, character set, encoding, serialization and mime type enforced by header information and conformant with content negotiation
  • Error messages and status codes inherited from HTTP
  • Common and consistent terminology to aid understanding of the universal interface
  • A resulting componentization and design philosophy that is inherently scalable and interoperable
  • A seamless consistency between data and services.

There are likely other services out there that embrace this full extent of RESTful design (though we are not aware of them). What we are finding most exciting, though, is the ease with which we can extend our design into new services and to mesh up data with other existing ones. This idea of scalability and distributed interoperability is truly, truly powerful.

It is almost like, sure, we knew the words and the principles behind REST and a Web-oriented architecture, but had really not fully taken them to heart. As our mindset now embraces these ideas, we feel like we have now looked clearly into the crystal ball of data and applications. We very much like what we see. WOA is most cool.

First Layer to the Zitgist ‘Grand Vision’

For lack of a better phrase, Zitgist has a component internal plan that it calls its ‘Grand Vision’ for moving forward. Though something of a living document, this reference describes how Zitgist is going about its business and development. It does not describe our markets or products (of course, other internal documents do that), but our internal development approaches and architectural principles.

Just as we have seen a natural marriage between RESTful Web services and RESTful linked data, there are other natural fits and synergies. Some involve component design and architecting for pipeline models. Some involve the natural fit of domain-specific languages (DSLs) to common terminology and design, too. Still others involve use of such constructs in both GUIs and command-line interfaces (CLIs), again all built from common language and terminology that non-programmers and subject matter experts alike can readily embrace. Finally, some is a preference for Python to wrap legacy apps and to provide a productive scripting environment for DSLs.

If one can step back a bit and realize there are some common threads to the principles behind RESTful Web services and linked data, that very same mindset can be applied to many other architectural and design issues. For us, at Zitgist, these realizations have been like turning on a very bright light. We can see clearly now, and it is pretty UMBELievable. These are indeed exciting times.

BTW, I would like to thank Eric Hoffer for the very clever play on words with the UMBELievable tag line. Thanks, Eric, you rock!

Posted:October 19, 2008

The Structured Web and Linked Data are Leading to the Semantic Web

I have been a consistently vocal critic of “Web 3.0” as a moniker for current Web trends, specifically as some sort of branding replacement for the semantic Web. My specific postings on the term Web 3.0, likely with lame attempts at derision, are on record as:

Then, in relation to an article on ReadWriteWeb regarding a keynote on semantics and advertising at last week’s Web 3.0 Conference & Expo in Santa Clara, CA, I added a comment joining others criticizing the use of version numbers to describe the semantic Web. My comment was simply: “Please, Squash that Web 3.0 Cockroach (see https://www.mkbergman.com/?p=406).”

Dan Grigorovici, one of the conference organizers, reacted strongly and negatively, both in a series of comments to the RWW posting and on his own blog.

I count Dan as a colleague and a friend and do not want to engage in a flame war across multiple sites. So, herein, I address his rhetorical question, “How shall we call it [Web 3.0] instead, Mike? Please indulge us.”

It just so happens I have been writing on this very topic for quite some time.

(BTW, while others commented on the proper role or not of semantics and advertising, that is not my issue. I have never criticized advertising or marketing and will not do so with respect to semantic Web techniques, either. I fully expect semantic technologies to be applied everywhere appropriate and where they can work, which most certainly includes better targeting of ads and messages.)

Getting Serious: Meaning is Important

So, what is my problem with the use of Web 3.0 for semantic Web-related trends and activities?

Simply answered, it is because Web 3.0 means nothing.

It can mean anything or everything depending on who is pushing the idea. I dare anyone who is pushing this term to find a consensus understanding or authoritative or “official” understanding or grounding in language or usage or any other informing basis for what this term means.

I find it ironic that the semantic Web, which is at heart about meaning and data interoperability, could potentially accept a naming or branding that is itself meaningless. Simply answered, that is my issue and has been since the term Web 3.0 was first floated.

Moreover, rejection of the term Web 3.0 does not carry with it the logical fallacy of therefore not wanting simpler ways to explain semantic Web-related concepts and technologies to the broader public. Nor does rejection of the term Web 3.0 carry with it the logical fallacy of not wanting effective branding, effective terminology or effective business models.

What rejection of the term Web 3.0 does mean is that we can and should do better in conveying what our collective endeavor is really all about. And with meaning.

So, Please Indulge Us

I have two answers to Dan’s rhetorical question: structured Web and linked data. I further place both of these concepts into context.

Many have given their own bona fide attempts at describing semantic Web aliases and where they stand within a development continuum. Some of this is the natural attempt by some to want to “name” a space and time. Some of it is undoubtedly a reaction to the glazed look many in the public get when “semantic Web” is first put before them. Some of it is likely a reaction to the fact that the semantic Web has been slow to develop or has not yet met its initial promise and (perhaps) hype.

Over the past two years, I have pointed to two concepts as part of an ongoing continuum eventually leading to the semantic Web. These are the broader and bridging structured Web, which is currently seeing one expression as linked data. I first tried to capture this continuum in a diagram from July 2007:

Transition in Web Structure
Document Web Structured Web Semantic Web
    Linked Data  
  • Document-centric
  • Document resources
  • Unstructured data and semi-structured data
  • HTML
  • URL-centric
  • circa 1993
  • Data-centric
  • Structured data
  • Semi-structured data and structured data
  • XML, JSON, RDF, etc
  • URI-centric
  • circa 2003
  • Data-centric
  • Linked data
  • Semi-structured data and structured data
  • RDF, RDF-S
  • URI-centric
  • circa 2006
  • Data-centric
  • Linked data
  • Semi-structured data and structured data
  • RDF, RDF-S, OWL
  • URI-centric
  • circa ???

To further a common language, I have also put forward my own working definitions of these concepts:

  • Structured Web is object-level data within Internet documents and databases that can be extracted, converted from available forms, represented in standard ways, shared, re-purposed, combined, viewed, analyzed and qualified without respect to originating form or provenance
  • Linked Data is a set of best practices for publishing and deploying instance and class data using the RDF data model, naming the data objects using uniform resource identifiers (URIs), and exposing the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.

I have repeatedly discussed these themes and ideas since I first criticised attempts at the Web 3.0 branding:

Those entries marked with an asterisk [*] are the most central ones dealing with either structured Web or linked data as terminology.

Over two years, about one in six of my blog posts has been devoted strictly to meaning and terminology. I agree proper branding for our collective endeavor is very important. In the end, it is not important whether my views hold sway, but that we are able to effectively explain and sell our products and services. There is still considerable outreach and communication required with the marketplace.

The Markets will Decide

Amongst the two concepts, I prefer the terminology of structured Web because it seems to be readily understood and appreciated within enterprise clients. But, we have also embraced linked data because it was gaining mindshare and conveys (imo) a concrete and correct image.

Terminology adoption is a function of both providers and consumers. Consumers vote with their attention and their wallets; providers through their branding and positioning, which naturally should be attentive to what is resonant in the marketplace.

Right now, in the emerging markets around the semantic Web and its related technologies expressing the current evolution of this continuum, the naming issues are still largely in the purview of the providers. Consumers are still few. I think most would agree that terminology is still unsettled.

It is legitimate to question whether the provider community is doing itself a disservice using ‘semantic Web’ as a hook. It is healthy to seek better terminology if it can be found. Any attempt to find clear and compelling language is to be applauded.

But, insofar as we are selling improved meaning and data interoperability, let’s find language that conveys those advantages. Web 3.0 directly contradicts this fundamental message by offering no meaning in its vacuity.

Don’t Crush that Dwarf; Hand Me the Pliers

Well, come on now, let’s do smile a bit. There are worse things than calling the term Web 3.0 a cockroach. After all, cockroaches have existed for 340 million years well in advance of humans and can be found everywhere. More than 5,000 species have been identified. Some claim cockroaches will be here long after humans are gone.

Still, as for me, I’m hoping the term Web 3.0 is squashed well in advance of that time. Meanwhile, I appreciate the invitation to indulge in meaningful branding and terminology.

Posted by AI3's author, Mike Bergman Posted on October 19, 2008 at 6:26 pm in Linked Data, Semantic Web, Structured Web | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/462/how-shall-we-call-web-30-instead-mike-please-indulge-us/
The URI to trackback this post is: https://www.mkbergman.com/462/how-shall-we-call-web-30-instead-mike-please-indulge-us/trackback/