Posted:October 28, 2008

It's UMBELievable!

UMBEL’s New Web Services Embrace a Full Web-Oriented Architecture

I recently wrote about WOA (Web-oriented architecture), a term coined by Nick Gall, and how it represented a natural marriage between RESTful Web services and RESTful linked data. There was, of course, a method behind that posting to foreshadow some pending announcements from UMBEL and Zitgist.

Well, those announcements are now at hand, and it is time to disclose some of the method behind our madness.

As Fred Giasson notes in his announcement posting, UMBEL has just released some new Web services with fully RESTful endpoints. We have been working on the design and architecture behind this for some time and, all I can say is, it’s UMBELievable!

As Fred notes, there is further background information on the UMBEL project — which is a lightweight reference structure based on about 20,000 subject concepts and their relationships for placing Web content and data in context with other data — and the API philosophy underlying these new Web services. For that background, please check out those references; that is not my main point here.

A RESTful Marriage

We discussed much in coming up with the new design for these UMBEL Web services. Most prominent was taking seriously a RESTful design and grounding all of our decisions in the HTTP 1.1 protocol. Given the shared approaches between RESTful services and linked data, this correspondence felt natural.

What was perhaps most surprising, though, was how complete and well suited HTTP was as a design and architectural basis for these services. Sure, we understood the distinctions of GET and POST and persistent URIs and the need to maintain stateless sessions with idempotent design, but what we did not fully appreciate was how content and serialization negotiation and error and status messages also were natural results of paying close attention to HTTP. For example, here is what the UMBEL Web services design now embraces:

  • An idempotent design that maintains state and independence of operation
  • Language, character set, encoding, serialization and mime type enforced by header information and conformant with content negotiation
  • Error messages and status codes inherited from HTTP
  • Common and consistent terminology to aid understanding of the universal interface
  • A resulting componentization and design philosophy that is inherently scalable and interoperable
  • A seamless consistency between data and services.

There are likely other services out there that embrace this full extent of RESTful design (though we are not aware of them). What we are finding most exciting, though, is the ease with which we can extend our design into new services and to mesh up data with other existing ones. This idea of scalability and distributed interoperability is truly, truly powerful.

It is almost like, sure, we knew the words and the principles behind REST and a Web-oriented architecture, but had really not fully taken them to heart. As our mindset now embraces these ideas, we feel like we have now looked clearly into the crystal ball of data and applications. We very much like what we see. WOA is most cool.

First Layer to the Zitgist ‘Grand Vision’

For lack of a better phrase, Zitgist has a component internal plan that it calls its ‘Grand Vision’ for moving forward. Though something of a living document, this reference describes how Zitgist is going about its business and development. It does not describe our markets or products (of course, other internal documents do that), but our internal development approaches and architectural principles.

Just as we have seen a natural marriage between RESTful Web services and RESTful linked data, there are other natural fits and synergies. Some involve component design and architecting for pipeline models. Some involve the natural fit of domain-specific languages (DSLs) to common terminology and design, too. Still others involve use of such constructs in both GUIs and command-line interfaces (CLIs), again all built from common language and terminology that non-programmers and subject matter experts alike can readily embrace. Finally, some is a preference for Python to wrap legacy apps and to provide a productive scripting environment for DSLs.

If one can step back a bit and realize there are some common threads to the principles behind RESTful Web services and linked data, that very same mindset can be applied to many other architectural and design issues. For us, at Zitgist, these realizations have been like turning on a very bright light. We can see clearly now, and it is pretty UMBELievable. These are indeed exciting times.

BTW, I would like to thank Eric Hoffer for the very clever play on words with the UMBELievable tag line. Thanks, Eric, you rock!

Posted:October 5, 2008

LOD Cloud Diagram

Class-level Mappings Now Generalize Semantic Web Connectivity

We are pleased to present a complementary view to the now-famous linking open data (LOD) cloud diagram (shown to the left; click on it for a full-sized view) [1]. This new diagram (shown below) — what we call the LOD constellation to distinguish it from its notable LOD cloud sibling — presents the current class-level structure within LOD datasets.

This new LOD constellation complements the instance-level view in the LOD cloud that has been the dominant perspective to date. The LOD cloud centrally hubs around DBpedia, the linked data structured representation of Wikipedia. The connections shown in the cloud diagram mostly reflect owl:sameAs relations, which means that the same individual things or instances are referenced and then linked between the datasets. Across all datasets, linking open data (LOD) now comprises some two billion RDF triples, which are interlinked by around 3 million RDF links [2]. This instance-level view of the LOD cloud shown to the left was updated a bit over a week ago [1].

The objective of the Linking Open Data community is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. All of the sources on these LOD diagrams are open data [3].

So, Tell me Again Why Classes Are Important?

In prior postings, Fred Giasson and I have explained the phenomenon of ‘exploding the domain‘. Exploding the domain means to make class-to-class mappings between a reference ontology and external ontologies, which allows properties, domains and ranges of applicability to be inherited under appropriate circumstances [4]. Exploding the domain expands inferencing power to this newly mapped information. Importantly, too, exploding the domain also means that instances or individuals that are members of these mapped classes also inherit or assume the structural relations (schema, if you will) of their mapped sources as well.

Trying to think through the statements above, however, is guaranteed to make your head hurt. When just reading the words, these sound like fairly esoteric or abstract ideas.

So, to draw out the distinctions, let’s discuss linked data that is based on instance (individual) mappings versus those mapped on the basis of classes. Like all things, there are exceptions and edge cases, but let us simply premise our example using basic set theory. Our individual instances are truly discrete individuals, in this case some famous dogs, and our classes are the conceptual sets by which these individuals might be characterized.

To make our example simple, we will use two datasets (A and B) about dogs and their possible relations, each characterized by their structure (classes) or their individuals (instances):

Dataset A (organisms) Dataset B (pets)
Classes (structure) mammal
canid
wolf
dog
pet
dog
breed (list)
Instances (individuals) and class assignments Rin Tin Tin (dog) Rin Tin Tin (German shepherd)
Lassie (collie)
Clifford (Visla)
Old Yeller (mutt)

When datasets are linked based on instance mappings alone, as is generally the case with current practice using sameAs, and there are no class mappings, we can say that Rin Tin Tin is both a dog pet and a mammal. However, we can not say that Lassie, for example, is a mammal, because there is no record for Lassie in Dataset A.

So, we thus see our first lesson: to draw an inference about instances using sameAs in the absence of class mappings requires each record (instance) to exist in an external dataset in order to make that assertion. Instances can inherit the properties and structure of the datasets in which they specifically occur, but only there. Thus, what can be said about a given individual (linked via owl:sameAs) is at most the intersection of what is contained in only the datasets in which that individual appears and is mapped. Assertions are thus totally specific and can not be made without the presence of a matching instance record. We can call this scenario the intersection model: only where there is an intersection of matching instance records can the structure of their source datasets be inferred.

However, when mappings can be made at the class level, then inferences can be drawn about all of the members of those sets. By asserting equivalentClass for dog between Datasets A and B, we can now infer that Lassie, Clifford and Old Yeller are canids and mammals as well as Rin Tin Tin, even though their instance records are not part of Dataset A. To complete the closure we can also now infer that Rin Tin Tin (Dataset A) is a pet and a German shepherd from Dataset B. We can call this scenario the union model. The mappings have become generalized and our inferencing power now extends to all instances of mapped classes whether there are records or not for them in other datasets.

This power of generalizability, plus the inheritance of structure, properties and domain and range attributes, is why class mappings are truly essential for the semantic Web. Exploding the domain is real and powerful. Thus, to truly understand the power of linked data, it is necessary to view its entirety from a class perspective [5].

Thus, to summarize our answer to the rhetorical question, class mappings are important because they can:

  • Generalize the understanding of individual instances
  • Expand the description of things in the world by inheriting and reusing other class properties, domains and ranges, and
  • Place and contextualize things by inheriting class structure and hierarchical relationships.

The LOD Constellation

So, here is the new LOD constellation of class-level linkages. The definition of class-level linkages is based on one of four possible predicates (rdfs:subClassOf, owl:equivalentClass, umbel:superClassOf or umbel:isAligned). Because of the newness of UMBEL as a vocabulary, only a few of the sources linked to UMBEL have the umbel:superClassOf relationship and one (bibo) has isAligned.

Note that some of the sources are combined vocabularies (ontologies) and instance representations (e.g., UMBEL, GeoNames), others are strict ontologies (e.g., event, bibo), and still others are ontologies used to characterize distributed instances (e.g., foaf, sioc, doap). Other distinctions might be applied as well:

Click for full size
[click for full size]

The current 21 LOD datasets and ontologies that contribute to these class-level mappings are (with each introduced by its namespace designation):

  • bibo — Bibilographic ontology
  • cc — Creative Commons ontology
  • damltime — Time Zone ontology
  • doap — Description of a Project ontology
  • event — Event ontology
  • foaf — Friend-of-a-Friend ontology
  • frbr — Functional Requirements for Bibliographic Records
  • geo — Geo wgs84 ontology
  • geonames — GeoNames ontology
  • mo — Music Ontology
  • opencyc — OpenCyc knowledge base
  • owl — Web Ontology Language
  • pim_contact — PIM (personal information management) Contacts ontology
  • po — Programmes Ontology (BBC)
  • rss — Really Simple Syndicate (1.0) ontology
  • sioc — Socially Interlinked Online Communities ontology
  • sioc_types — SIOC extension
  • skos — Simple Knowledge Organization System
  • umbel — Upper Mapping and Binding Exchange Layer ontology
  • wordnet — WordNet lexical ontology
  • yandex_foaf — FOAF (Friend-of-a-Friend) Yandex extension ontology

The diagram was programmatically generated using Cytoscape (see below) [6], with some minor adjustments in bubble position to improve layout separation. The bubble sizes are related to number of linked structures (ontologies) to which the node has class linkages. The arrow thicknesses are related to number of linking predicates between the nodes. Two-way arrows are shown as darker and indicate equivalentClass or matching superClassOf and subClassOf; single arrows represent subClassOf relationships only.

Note we are not presenting any rdf:type relations because those are not structural, and rather deal with the assignment of instances to classes [7]. More background is provided in the discussion of the construction methodology [6].

At this time, we have not calculated how many individuals or instances might be directly included in these class-level mappings. The data and files used in constructing this diagram are available for download without restriction [8].

Finally, we have expended significant effort to discover class-level mappings for which we may not be directly aware (see next). Please bring any missing, erroneous or added linkages to our attention. We will be pleased to incorporate those updates into future releases of the diagram.

How the LOD Constellation Was Constructed

Our diligence has not been exhaustive since not all LOD datasets are indexed locally and others do not have SPARQL endpoints. The general method was to query the datasets to check which ontologies used external classes to instantiate their individuals using the rdf:type predicate. The externally referenced ontology was then checked to determine its own external class mappings.

Here is the basic SPARQL query to discover the the rdf:type predicate for non-Virtuoso resources:

select ?o where
{
?s a ?o.
}

And here is the SPARQL query for Virtuoso-hosted datasets (note Virtuoso supports the distinct non-standard extension to SPARQL which is a more efficient way to get listings):

select distinct ?o where
{
?s a ?o.
}

We then created a simple script to go to all of the ontology namespaces so listed in these external mappinigs. If there was an external class mapping in the source with one of the four possible predicates of rdfs:subClassOf, owl:equivalentClass, umbel:superClassOf or umbel:isAligned, we noted the source and predicate and wrote it to a simple CSV (comma delimited) file. This formed the input file to the Cytoscape program that created the network graph [6].

There are possibly oversights and omissions in this first-release diagram since not all bubbles in the LOD cloud were exhaustively inspected. Please notify us with updates or new class linkages. Alternatively, you can also download and modify the diagram yourself [8].

Conspicuous by Their Absence

We gave particular diligence to a few of the more dominant sources in the LOD instance cloud that showed no class mappings. These include DBpedia and YAGO. While these have numerous and useful rdf:type and owl:sameAs relationships, and all have rich internal class structures, none apparently map at the class level to external sources.

However, because of the unique overlap of instances (named entities) in UMBEL, which does have extensive external class mappings, and which have been mapped to DBpedia instances, it is possible to infer some of these external class linkages.

For example, go to the DBpedia SPARQL endpoint:

http://dbpedia.org/sparql/

And try out some sample queries by pasting the following into the Query text box and running the query:

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where
{
?s a umbel:Person
}

This example query applies the external class structure of UMBEL to individual person instances in DBpedia because of the prior specification of some mapping rules used for inferencing [9]. The result set is limited to 1000 results.

Alternatively, since UMBEL has also been mapped to the external FOAF ontology, we can also now invoke the FOAF class structure directly to produce the exact same result set (since umbel:Person owl:equivalentClass foaf:Person). We do this by applying the same inferencing rules in a different way:

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
select ?s
where
{
?s a <http://xmlns.com/foaf/0.1/Person>.
}

UMBEL can thus effectively act as a class structure bridge to the DBpedia instances.

Since DBpedia is an instance hub, this bridging effect is quite effective between UMBEL and other DBpedia instances in the LOD cloud. However, because there is not the same degree of overlap of instances with, say, GeoNames, this technique would be less effective there.

Explicit class-level mappings between datasets will always be more powerful than instance-level ones with class mediators. And, in all cases, both of those techniques that explicitly invoke classes are more powerful than instance-level links alone.

The Linked Data Infrastructure is Now Complete

Though all of the available linkages have not yet been made in the LOD datasets, we can now see that all essential pieces of the linkage infrastructure are in place and ready to be exploited. Of course, new datasets can take advantage of this infrastructure as well.

UMBEL is one of the essential pieces that provides the bridging “glue” to these two perspectives or “worlds” of the instances in the LOD cloud and the classes in the LOD constellation. This “glue” becomes possible because of UMBEL’s unique combination of three components or roles:

  • UMBEL provides a rich set of 20,000 subject concept classes and their relations (a reference structure “backbone”) that facilitates class-level mappings with virtually any external ontology with the benefits as described above
  • UMBEL contains a named entity dictionary from Wikipedia also mapped to these classes, which therefore strongly intersects with DBpedia and YAGO, and therefore helps provide the individual instances <–> classes bridging “glue”, and
  • UMBEL is also a vocabulary that enhances the lightweight SKOS vocabulary to explicitly facilitate linkages to external ontologies at the subject concept layer.

In fact, it is the latter vocabulary sense, in combination with the reference subject concepts, that enables us to draw the LOD class constellation.

So, we can now see a merger of the LOD cloud and the LOD constellation to produce all of the needed parts to the LOD infrastructure for going forward:

  • A hub of instances (DBpedia)
  • A hub of subject-oriented (“is about”) reference classes (Cyc-UMBEL), and
  • A vocabulary for glueing it all together (SKOS-UMBEL).

This infrastructure is ready today and available to be exploited for those who can grasp its game-changing significance.

And, from UMBEL’s standpoint, we can also point to the direct tie-ins to the Cyc knowledge base and structure for conceptual and relationship coherence testing. This infrastructure is an important enabler to extend these powerful frameworks to new domains and for new purposes. But that is a story for another day. ;)


[1] The current version of the LOD cloud may be found at the World Wide Web Consortium’s (W3C) SWEO wiki page. There is also a clickable version of the diagram that will take you to the home references for the consituent data sources in this diagram; see http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18.html.
[2] According to [1], this estimate was last updated one year ago in October 2007. The numbers today are surely much larger, since the number of datasets has also nearly doubled in the interim.
[3] Open data has many definitions, but a common one with a badge is often seen. However, the best practices of linked data can also be applied to proprietary or intranet information as well; see this FAQ.
[4] For further information about exploding the domain, see these postings: F. Giasson, Exploding the Domain: UMBEL Web Services by Zitgist (April 20, 2008), UMBEL as a Coherent Framework to Support Ontology Development (August 27, 2008), Exploding DBpedia’s Domain using UMBEL (September 4, 2008); M. Bergman, ‘ Exploding the Domain’ in Context (September 24, 2008).
[5] Of course, instance-level mappings with sameAs have value as well, as the usefulness of linked data to date demonstrates. The point is not that class-level mappings are the “right” way to construct linked data. Both instance mappings and class mappings complement one another, in combination bringing both specificity and generalizability. The polemic here is merely to redress today’s common oversight of the class-level perspective.
[6] See the text for how the listing of class relationships was first assembled. After removal of duplicates, a simple comma delimited file was produced, class_level_lod_constellation.csv, with four columns. The first three columns show the subject-predicate-object linking the datasets by the class-level predicate. The fourth column presents the count of the number of types of class-level predicates used between the dataset pairs; the maximum value is 4.

This CVS file is the import basis to Cytoscape. After import, the spring algorithm was applied to set the initial distribution of nodes (bubbles) in the diagram. Styles were developed to approximate the style in the LOD cloud diagram, and each of the class-linkage predicates was assigned different views and arrows. (The scaling of arrow width allowed the chart to be cleaned up with repeat linkages removed and simplified to a single arrow, with the strongest link type used as the remaining assignment. For example, equivalentClass is favored over subClassOf is favored over superClassOf.)
In addition, each node was scaled according to the number of external dataset connections, with the assignments as shown in the file. Prior to finalizing, the node bubbles were adjusted for clear spacing and presentation. The resulting output is provided as the Cytoscape file, lod_class_constellation.cys. For the published diagram, the diagram was also exported as an SVG file.
This SVG file was lastly imported into Inkscape for final clean up. The method for constructing the final diagram, including discussion about how the shading effect was added, is available upon request.
[7] rdf:type is a predicate for assigning an instance to a class, which is often an external class. It is an important source of linkages between datasets at the instance level and provides a structural understanding of the instance within its dataset. While this adds structural richness for instances, rdf:type is by definition not class-level and provides no generalizability.
[8] The CSV file and the Cytoscape files may be downloaded from http://www.umbel.org/lod_constellation.html.
[9] See the explanation of the external linkage file in M. Bergman, ‘ Exploding the Domain’ in Context (September 24, 2008).

Posted by AI3's author, Mike Bergman Posted on October 5, 2008 at 7:26 pm in Adaptive Information, Linked Data, Semantic Web, UMBEL | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/457/a-new-constellation-in-the-linking-open-data-lod-sky/
The URI to trackback this post is: http://www.mkbergman.com/457/a-new-constellation-in-the-linking-open-data-lod-sky/trackback/
Posted:September 29, 2008

Zotero Bibliographic Plug-in

Part of Kick-off Series on Emerging Ontologies

I was greatly pleased to present a talk on UMBEL before the Ontolog Forum as part of their kick-off series on emerging ontologies. The September 25 podcast and slides are now available online.

ONTOLOG (a.k.a. ‘Ontolog Forum’) is an open, international, virtual community of practice devoted to advancing the field of ontology, ontological engineering and semantic technology, and advocating their adoption into mainstream applications and international standards. It has a great reputation and about 520 active members from 30 countries.

Our panel session kicked-off the Forum’s new “Emerging Ontology Showcase” mini-series. This series is being co-championed by Ken Baclawski (Northeastern University) and Mike Bennett (Hypercube Ltd., UK). The criteria for invitation to the showcase include being new or a new release within the past 6 months or so; an emphasis on an ontology itself, not data or tools; and a focus on schema versus instances or facts or assertions. Efforts intended to produce or create standards are of particular interest.

Please Listen In

After Ken’s introduction, the podcast begins with Mike Bennett speaking on, “The EDM Council Semantics Repository: Building Global Consensus for the Financial Services Industry.” This is an important initiative and in keeping with other financial reporting and XBRL-related topics of late. His slides are also online.

My talk, “UMBEL: A Lightweight Subject Reference Structure for the Web,” begins about 35% of the way into the podcast, accompanied by about 30 slides. The audio is a bit spotty for the first two slides until I switched from a speaker to a microphone. My presentation is about 30 min followed by joint Q & A with Mike for another 30 min or so.

Full proceedings — including agenda, abstracts, slides, audio recording and the transcript of the live chat session — may be found on the session page of the Forum wiki; see http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2008_09_25.

The Forum has been doing this for some time and has a nice system worked out for coordinating later viewing of presentations synchronized with the audio.

This presentation was part of a frequent Thursday speaker’s session sponsored by the Forum.

Posted by AI3's author, Mike Bergman Posted on September 29, 2008 at 4:47 am in Linked Data, Ontologies, Semantic Web, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/456/umbel-presentation-at-ontolog-forum/
The URI to trackback this post is: http://www.mkbergman.com/456/umbel-presentation-at-ontolog-forum/trackback/
Posted:September 24, 2008

Fitting the Pieces

Coherence is Needed for Continued Sustainability

Since early in 2008 my colleague, Fred Giasson, has been authoring a series of important blog posts on ‘exploding the domain.’ Exploding the domain means to make class-to-class mappings between a reference ontology and external ontologies, which allows properties, domains and ranges of applicability to be inherited under appropriate circumstances. Exploding the domain expands inferencing power to this newly mapped information.

Fred first used the phrase in an April post that introduced the concept:

“. . . once the linkages between UMBEL subject concepts and external ontologies classes are made, the following becomes possible: 1) the UMBEL subject concept structure can be used to describe (instantiate) things using the UMBEL data structure; 2) external ontology properties can be re-used to describe these new instances since external ontologies classes are linked to UMBEL subject concept classes; and 3) in some cases, the properties defined in these ontologies can be used in relation with UMBEL subject concept classes.”

Since that time, Fred has continued to explore these implications. In an August posting looking at UMBEL as a possible reference framework for mapping and exploding domains, Fred stated,

“. . . once we have the context in place, we are on our way to achieve coherence. UMBEL is 100% based on OpenCyc and Cyc, which are internally consistent and coherent within themselves. We thus use these coherent frameworks to make the mappings to external ontologies coherent, too.

The equation is simple:

a coherent framework + ontologies contextualized by this framework = more coherent ontologies.”

The thrust of this analysis was to show how UMBEL subject concepts can act to create context (his emphasis) for linked classes defined in external ontologies. Where the individuals in a dataset are instances of classes, and some of these classes are linked to UMBEL (or a similar contextual structure), these subject concept classes also give context to those individuals.

Finally, and most recently, Fred demonstrated how the use of UMBEL could explode DBpedia’s domain by linking classes using only three properties: rdfs:subClassOf, owl:equivalentClass and umbel:isAligned. (And, as I noted in an earlier posting this week, those mappings have now been made bi-directional from DBpedia to UMBEL as well.)

As we discuss and apply these concepts we are starting to see some further guidelines emerge. Presenting these is the purpose of this post in this ongoing exploding the domain series.

The Mere Existence of Classes is Not Enough

Since its inception, DBpedia has had a class structure of sorts, first from the native Wikipedia categories from which it was derived and then with the incorporation of the YAGO structure (based on WordNet concept relationships). Yet we have claimed that class structure has truly only recently been brought to DBpedia with the mappings to UMBEL. Why? Does not DBpedia’s initial class structure meet the test?

(BTW, these same questions may be applied to some of the other large data structures beginning to emerge on the semantic Web such as Freebase. But, those are stories for another day.)

There are really two answers to these questions. First, the mere existence of classes is not enough; they must actually be used. And, second, the nature of those classes and their structure and coherence are absolutely fundamental. This subsection addresses the first point; the following the second; with both aided by the table below.

There has been a class structure within DBpedia from inception, which was then supplemented a few months after release with the YAGO structure. The starting Wikipedia structure showed early issues which began to be addressed through a cleaned Wikipedia category class (CWCC) hierarchy. These relationships were established with the rdf:type predicate that relates an instance to a class. The classes themselves were related to one another through the rdfs:subClassOf predicate. These class relationships allowed the linked classes to be shown in a hierarchical (taxonomy-like) structure.

Initially, in the case of the beginning Wikipedia categories, the internal class relationships were weak. This was somewhat improved with the addition of YAGO and its WordNet-based concept relationships (with better semantics).

However, these class relationships were (to my knowledge) never mapped to any external structures or ontologies. If used at all, they were only implied for ad hoc navigation within the internal instance data.

Really, anyone could have approached DBpedia at this point and began an effort of mapping its existing class structures to external data. Indeed, we (as editors of UMBEL) considered doing so, but chose a different structure (see next section) for reasons of context and coherence.

The net result is that DBpedia and the other instances of the linking open data (LOD) cloud have remained focused and useful at the instance level, and not yet at the class level.

As we brought in UMBEL to provide a class structure to DBpedia and linked data, this circumstance began to change, as this table indicates:

  Yago UMBEL
Predicate
Differences
– subClassOf subClassOf
equivalentClass
superClassOf
isAligned
– and, entity-to-class predicates
Mapping/Application
Differences
– No external mappings made – Aggressive use of external mappings (‘exploding the domain‘)
– Consistent internal structure
Structural
Differences
– Based on WordNet concept relationships – Based on Cyc common sense structural relationships
– Inferencing and reasoning Cyc tools for testing coherence
– Microtheories framework for domain differences
– Extendable structure
Unique Class Count ~ 55,000 ~ 20,000

Though shown for comparison reasons, the number of classes probably has no real importance.

The key argument in this subsection is that classes matter. Indeed, one of the current challenges before the linked data community is to understand and treat differently the issues of instances from classes. But, the question of whether one class structure is better than another is moot if class mappings are neglected altogether.

Sustainability Requires Coherence

UMBEL’s reasons for not taking up the Wikipedia structure or the WordNet structure — that is, the initial structures within DBpedia — for its lightweight subject ontology was based on lack of coherence. I have spoken earlier about When is Content Coherent? regarding these arguments. Other analysis supports the conclusions in various ways [1].

A central (or “upper”) reference framework should be one that is solid and venerable. Over time, many subsidiary ontologies and structures will relate to it. Like a steel superstructure to a skyscraper or a structural framework to a large ocean-going tanker, this beginning structure needs to withstand many stresses and maintain its integrity as various subsidiary structures hang from it.

So long as we are still in “toy” mode with relatively few external mappings and relative few ontologies, simple class-to-class mappings without respect to the coherence of the underlying ontologies may be OK. But, we will soon (if not already) see that structural flaws, like slight perturbations at the Big Bang, may propagate to create huge discontinuities.

At the pace of development we are now seeing, there will be tens to hundreds of thousands of ontologies within the foreseeable future. Granted, for any given circumstance, only a minor few of those may be applicable. But the potential combinations still can defy imagination in terms of complexity and potential scope at widely varying scales.

At the scale of the Web, of course, there will never be a central authority (nor should there be) for “official” vocabularies or structures. Yet, just as certainly, those ontologies and structures that do share some conceptual and structural coherence and are therefore more likely to easily integrate and interoperate will (I believe) win the Darwinian race. Without some degree of coherence, these disparate structures become like ill-fitting jigsaw pieces from different puzzles.

As we watch structures and relationships accrete like layers in a pearl, we should begin with a solid grain of common sense and coherence. That is why we chose the Cyc structure as the basis of UMBEL — it provides one such solid, coherent framework for moving forward.

I am not sanguine that ad hoc, free-form ontological structures, created in the same manner as topic-specific articles in Wikipedia or as informal tags in bookmarking systems, will bring such coherence. But who knows? Perhaps on the Web where novelty and the joy of creation and exploration can trump usefulness, such could transpire.

But, when we look to linked data and semantic Web constructs finally achieving its potential in the enterprise to overcome decades of data silos, I suspect purposeful coherence will win the day.

Sustainable ontologies, which themselves can host and interoperate with still further ontologies and structures, will require coherent underpinnings to not collapse from the weight of keeping consistency. Just as our current highways and interstates follow the earlier roads before them, as early trailblazers we have a responsibility to follow the natural contours of our applicable information spaces. And that requires coherence and consistency; in other words, logic and design.

Vocabularies are Not Free Form

In the past few months there has been a remarkable emergence of interest in vocabularies and semantics (as traditionally understood by linguists). Today, for example, marks the kick-off of the first VoCamp get-together in Oxford, England, with interest and discussion active about potentially many others to follow. Peter Mika, Matthias Samwald and Tom Heath have each outlined their desires for this meeting.

I hope the participants in this meeting and others to follow look seriously at the issues of coherence and interoperability and sustainability. My caution is as follows: like tagging and Wikipedia, we have seen amazing contributions from user-generated content that have totally re-shaped our information world. However, we have not yet seen such processes work for structure and coherent conceptual relationships.

I believe participation and UGC have real roles to play in the emergence of coherent structures and vocabularies to enable interoperability. But I also believe they have not done so to date, and useful approaches to those will not emerge in a free-form fashion and without consideration for sustainability and coherence testing.

Putting this Context into Context

In these observations there is absolutely no criticism intended or implied with DBpedia or prior linked data practice. A natural and understandable progression has been followed here: first make connections between things, then begin to surface knowledge through the exploration of relationships. We are just now beginning that exploration through the use of classes, vocabularies and ontologies to explicate relationships. The fact that linked data and DBpedia first emphasized linking things and publishing things is a major milestone. It is now time to move on to the new challenges of structure and relationships.

There is much to be learned from other pathbreaking efforts such as the Open Biomedical Ontologies efforts and their attempts at coordination and standardization. As the demands and interests in interoperability increase, interfaces, consistency and coordination will continue, I believe, to come to the fore.

In another 18 months we will likely look back at today’s issues and thoughts as also naïve in light of new understandings. The pace of discovery and learning is such that I believe best practices will remain fluid for quite some time.


[1] An exact analysis related to our arguments of coherence has not been made, but related studies point to these observations in part or in slightly different contexts.

Wordnets tend to be star-like in structure, with sparse relations, and dominated by a few hub clusters. c.f., Holger Wunsch, 2008. Exploiting Graph Structure for Accelerating the Calculation of Shortest Paths in Wordnets, in Proceedings of Coling 2008, Manchester, UK, August 2008. See http://www.sfs.uni-tuebingen.de/~wunsch/wn-shortest-paths.pdf.

The strict uncorrected structure of Wikipedia categories can also be inconsistent, inaccurate, populated with administrative categories, demonstrate cycles, and lack uniform coverage. c.f., Jonathan Yu, James A. Thom and Audrey Tam, 2007. Ontology Evaluation Using Wikipedia Categories for Browsing, in Proceedings of 16th Conference on Information and Knowledge Management (CIKM), 2007; see http://goanna.cs.rmit.edu.au/~jyu/publications/YuEtal07.pdf. This paper also presents a comprehensive framework for ontology evaluation.

This reference describes a new way to calculate semantic relatedness (not the same as coherence) in relation to Wikipedia, ConceptNet and WordNet: Sander Wubben, 2008. Using Free Link Structure to Calculate Semantic Relatedness. ILK Research Group Technical Report Series no. 08-01, July 2008, 61 pp. See http://ilk.uvt.nl/downloads/pub/papers/wubben2008-techrep.pdf.

Table 3 in this citation presents an interesting contrast between what the authors call collaborative knowledge bases (CKBs, like Wikipedia) and linguistic knowledge basis (LKBs, like WordNet), again however not really addressing the coherence issue: Torsten Zesch and Christof Müller and Iryna Gurevych, 2008. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary, in Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), May 28-30, Marrakech, Morocco. See http://elara.tk.informatik.tu-darmstadt.de/publications/2008/lrec08_camera_ready.pdf. Also see Torsten Zesch and Iryna Gurevych, 2007. Analysis of the Wikipedia Category Graph for NLP Applications, in Proceedings of the Workshop TextGraphs-2: Graph-Based Algorithms for Natural Language Processing at HLT-NAACL 2007, 26 April, 2007, Rochester, NY, pp. 1-8. See http://www.aclweb.org/anthology-new/W/W07/W07-02.pdf.

Posted by AI3's author, Mike Bergman Posted on September 24, 2008 at 6:03 pm in Ontologies, Semantic Web, Structured Web, UMBEL | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/454/exploding-the-domain-in-context/
The URI to trackback this post is: http://www.mkbergman.com/454/exploding-the-domain-in-context/trackback/
Posted:September 22, 2008

The Linkage of UMBEL’s 20,000 Subject Concepts and Inferencing Brings New Capabilities

Thanks to Kingsley Idehen and OpenLink Software, DBpedia has been much enrichened with its mapping to UMBEL‘s 20,000 class-based subject concepts. DBpedia is the structured data version of Wikipedia that I (among many) wrote about in depth in April of last year shortly after its release.

We have also recently gotten an updated estimate of the size of the semantic Web and a new release of the linking open data (LOD) cloud diagram.

A New Instance of the LOD Cloud Diagram

Since DBpedia’s release, it has become the central hub of linked open data as shown by this now-famous (and recently updated!) LOD diagram [1]:

Click for full size
[click for full size]

Each version of the diagram adds new bubbles (datasets) and new connections. The use of linked data, which is based on the RDF data model and uses Web protocols to name and access data, is proving to be a powerful framework for interconnecting disparate and heterogeneous information. As the diagram above shows, all types of information from a variety of public sources now make up the LOD cloud [2].

A Beginning Basis for Estimating the Size of the Semantic Web

The most recent analysis of this LOD cloud is by Michael Hasenblas and colleagues as presented at I-Semantics08 in September [3]. About 50 major datasets comprising roughly two billion triples and three million interlinks were contained in the cloud at the time of their analysis. They partitioned their analysis into two distinct types: 1) single-point-of-access datasets (akin to conventional databases), such as DBpedia or Geonames, and 2) distributed records characterized by RDF ontologies such as FOAF or SIOC. Their paper [3] should be reviewed for its own conclusions. In general, though, most links appear to be of low value (though a minority are quite useful).

Simple measures such as triples or links have little meaning in themselves. Moreover, and this is most telling, all of the LOD relationships in the diagram above and the general nature of linked data to date have based their connections on instance-level data. Often this takes the form that a specific person, place or thing in one dataset is related to that very same thing in another dataset using the owl:sameAs property; sometimes it is that one person knows another person; or, it may be in other examples that one entry has an associated photo. Entities are related to other entities and their attributes, but little is provided about the conceptual or structural relationships amongst those entities.

Instance-level mapping is highly useful to aggregate various attributes or facts about given entities or things. But, they only scratch the surface of the structure that can be made available through linked data and the conceptual relationships between and amongst all of those things. For those relationships to be drawn or inferred a different level of linkages needs to be made: what is the class or collection or schema view of the data.

The UMBEL Subject Concept ‘Backbone’

UMBEL, or similar conceptual frameworks, can provide this structural backbone.

UMBEL (Upper Mapping and Binding Exchange Layer; see http://www.umbel.org) is a lightweight reference ontology of about 20,000 subject concepts and their logical and semantic relationships. The UMBEL ontology is a direct derivation of the proven Cyc knowledge base from Cycorp, Inc. (see http://www.cyc.com).

UMBEL’s subject concepts provide mapping points for the many (indeed, millions of) named entities that are their notable instances. Examples might include the names of specific physicists, cities in a country, or a listing of financial stock exchanges. UMBEL mappings enable us to link a given named entity to the various subject classes of which it is a member.

And, because of relationships amongst subject concepts in the backbone, we can also relate that entity to other related entities and concepts. The UMBEL backbone traces the major pathways through the content graph of the Web.

The UMBEL backbone provides structure and relationships at large or small scale. For example, in its full extent, the structure of UMBEL’s complete structure resembles:

UMBEL Big Braph

But, we can dive into that structure with respect to automobiles or related concepts . . .

UMBEL Big Saab

. . . all the way down to seeing the relationships to Saab cars:

UMBEL Saab Neighborhood

It is this ability to provide context through structure and relations that can help organize and navigate large datasets of instances such as DBpedia. Until the application of UMBEL — or any subject or class structure like it — most of the true value within DBpedia has remained hidden.

But no longer.

Some Example Queries

UMBEL already had mapped most DBpedia instances to its own internal classes. By a simple mapping of files and then inferencing against the UMBEL classes, this structure has now been brought to DBpedia itself. Any SPARQL queries applied against DBpedia can now take advantage of these relationships.

Below are some sample queries Kingsley used to announce these UMBEL capabilities to the LOD mailing list [4]. You can test these queries yourself or try alternative ones by using a standard SPARQL query.

For example, go to one of DBpedia’s query endpoints such as http://dbpedia.org/sparql and cut-and-paste one of these highlighted code snippets into the ‘Query text’ box:

Example Query 1

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where

{
?s a umbel:RoadVehicle
}

Example Query 2

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where

{
?s a umbel:Automobile_GasolineEngine
}

Example Query 3

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where

{
?s a umbel:Project
}

Example Query 4

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where

{
?s a umbel:Person
}

Example Query 5

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
select ?s
where

{
?s
a umbel:Graduate;
a umbel:Boxer.
}

Example Query 6

define input:inference ‘http://dbpedia.org/resource/inference/rules/umbel#’
prefix umbel: <http://umbel.org/umbel/sc/>
prefix yago: <http://dbpedia.org/class/yago/>
select ?s
where

{
?s
a yago:FemaleBoxers;
a umbel:Graduate;
a umbel:Boxer.
}

Creating Your Own Mapping

By going to UMBEL’s technical documentation page at http://umbel.org/documentation.html, you can download the files to create your own mappings (assuming you have a local instance of DBpedia).

The example below also assumes you are using the OpenLink Virtuoso server as your triple store. If you are using a different system, you will need to adjust your commands accordingly.

1. Load linkages (owl:sameAs) between UMBEL named entities and DBpedia resources

File: umbel_dbpedia_linkage.n3

select ttlp_mt (file_to_string_output (‘umbel_dbpedia_linkage.n3′), ”, ‘http://dbpedia.org’);

2. Load inferred DBpedia types (rdf:types) based on UMBEL named entities

File: umbel_dbpedia_types.n3

select ttlp_mt (file_to_string_output (‘umbel_dbpedia_types.n3′), ”, ‘http://dbpedia.org’);

3. Load Virtuoso-specific file containing the rules for inferencing

File: umbel_virtuoso_inference_rules.n3

select ttlp_mt (file_to_string_output (‘umbel_virtuoso_inference_rules.n3′), ”, ‘http://dbpedia.org/resource/classes/umbel#’);

4. Load UMBEL External Ontology Mapping into a Named Graph (owl:equivalentClasses)

File: umbel_external_ontologies_linkage.n3

select ttlp_mt (file_to_string_output (‘umbel_external_ontologies_linkage.n3′), ”, ‘http://dbpedia.org/resource/classes/umbel#’);

5. Create UMBEL Inference Rules

rdfs_rule_set (‘http://dbpedia.org/resource/inference/rules/umbel#’, ‘http://dbpedia.org/resource/classes/umbel#’);

Conclusion

A new era of interacting with DBpedia is at hand. Within a period of just more than a year, the infrastructure and data are now available to show the advantages of the semantic Web based on a linked Web of data. DBpedia has been a major reason for showing these benefits; it is now positioned to continue to do so.


[1] This new LOD diagram is still being somewhat updated based on review. The version shown above is based on the one posted at the W3C’s SWEO wiki with my own updates of the two-way UMBEL links and the blue highlighting of DBpedia and UMBEL. There is also a clickable version of the diagram that will take you to the home references for the consituent data sources in this diagram; see http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18.html.
[2] The objective of the Linking Open Data community is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. All of the sources on the LOD diagram are such open data. However, the best practices of linked data can also be applied to proprietary or intranet information as well; see this FAQ.
[3] See, Michael Hausenblas, Wolfgang Halb, Yves Raimond and Tom Heath, 2008. What is the Size of the Semantic Web?, paper presented at the International Conference on Semantic Systems (I-Semantics08) at TRIPLE-I, Sept. 2008. See http://sw-app.org/pub/isemantics08-sotsw.pdf.

Posted by AI3's author, Mike Bergman Posted on September 22, 2008 at 11:47 pm in Open Source, Semantic Web, Structured Web, UMBEL | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/453/dbpedia-gains-a-subject-class-structure-lod-cloud-diagram-updated/
The URI to trackback this post is: http://www.mkbergman.com/453/dbpedia-gains-a-subject-class-structure-lod-cloud-diagram-updated/trackback/