Thanks to Kingsley Idehen and OpenLink Software, DBpedia has been much enrichened with its mapping to UMBEL‘s 20,000 class-based subject concepts. DBpedia is the structured data version of Wikipedia that I (among many) wrote about in depth in April of last year shortly after its release.
We have also recently gotten an updated estimate of the size of the semantic Web and a new release of the linking open data (LOD) cloud diagram.
Since DBpedia’s release, it has become the central hub of linked open data as shown by this now-famous (and recently updated!) LOD diagram :
Each version of the diagram adds new bubbles (datasets) and new connections. The use of linked data, which is based on the RDF data model and uses Web protocols to name and access data, is proving to be a powerful framework for interconnecting disparate and heterogeneous information. As the diagram above shows, all types of information from a variety of public sources now make up the LOD cloud .
The most recent analysis of this LOD cloud is by Michael Hasenblas and colleagues as presented at I-Semantics08 in September . About 50 major datasets comprising roughly two billion triples and three million interlinks were contained in the cloud at the time of their analysis. They partitioned their analysis into two distinct types: 1) single-point-of-access datasets (akin to conventional databases), such as DBpedia or Geonames, and 2) distributed records characterized by RDF ontologies such as FOAF or SIOC. Their paper  should be reviewed for its own conclusions. In general, though, most links appear to be of low value (though a minority are quite useful).
Simple measures such as triples or links have little meaning in themselves. Moreover, and this is most telling, all of the LOD relationships in the diagram above and the general nature of linked data to date have based their connections on instance-level data. Often this takes the form that a specific person, place or thing in one dataset is related to that very same thing in another dataset using the owl:sameAs property; sometimes it is that one person knows another person; or, it may be in other examples that one entry has an associated photo. Entities are related to other entities and their attributes, but little is provided about the conceptual or structural relationships amongst those entities.
Instance-level mapping is highly useful to aggregate various attributes or facts about given entities or things. But, they only scratch the surface of the structure that can be made available through linked data and the conceptual relationships between and amongst all of those things. For those relationships to be drawn or inferred a different level of linkages needs to be made: what is the class or collection or schema view of the data.
UMBEL, or similar conceptual frameworks, can provide this structural backbone.
UMBEL (Upper Mapping and Binding Exchange Layer; see http://www.umbel.org) is a lightweight reference ontology of about 20,000 subject concepts and their logical and semantic relationships. The UMBEL ontology is a direct derivation of the proven Cyc knowledge base from Cycorp, Inc. (see http://www.cyc.com).
UMBEL’s subject concepts provide mapping points for the many (indeed, millions of) named entities that are their notable instances. Examples might include the names of specific physicists, cities in a country, or a listing of financial stock exchanges. UMBEL mappings enable us to link a given named entity to the various subject classes of which it is a member.
And, because of relationships amongst subject concepts in the backbone, we can also relate that entity to other related entities and concepts. The UMBEL backbone traces the major pathways through the content graph of the Web.
The UMBEL backbone provides structure and relationships at large or small scale. For example, in its full extent, the structure of UMBEL’s complete structure resembles:
But, we can dive into that structure with respect to automobiles or related concepts . . .
. . . all the way down to seeing the relationships to Saab cars:
It is this ability to provide context through structure and relations that can help organize and navigate large datasets of instances such as DBpedia. Until the application of UMBEL — or any subject or class structure like it — most of the true value within DBpedia has remained hidden.
But no longer.
UMBEL already had mapped most DBpedia instances to its own internal classes. By a simple mapping of files and then inferencing against the UMBEL classes, this structure has now been brought to DBpedia itself. Any SPARQL queries applied against DBpedia can now take advantage of these relationships.
Below are some sample queries Kingsley used to announce these UMBEL capabilities to the LOD mailing list . You can test these queries yourself or try alternative ones by using a standard SPARQL query.
For example, go to one of DBpedia’s query endpoints such as http://dbpedia.org/sparql and cut-and-paste one of these highlighted code snippets into the ‘Query text’ box:
By going to UMBEL’s technical documentation page at http://umbel.org/documentation.html, you can download the files to create your own mappings (assuming you have a local instance of DBpedia).
The example below also assumes you are using the OpenLink Virtuoso server as your triple store. If you are using a different system, you will need to adjust your commands accordingly.
A new era of interacting with DBpedia is at hand. Within a period of just more than a year, the infrastructure and data are now available to show the advantages of the semantic Web based on a linked Web of data. DBpedia has been a major reason for showing these benefits; it is now positioned to continue to do so.