Part 1 of 4 on Foundations to UMBEL
UMBEL (Upper-level Mapping and Binding Exchange Layer) is a lightweight ontology for relating Web content and data to a standard set of subject concepts. It is being designed to apply to all types of data on the Web from RSS and Atom feeds to tagging to microformats and topic maps to RDF and OWL (among others). The project Web site is at http://www.umbel.org.
UMBEL was first announced in July 2007 and has been a direct subject of these prior posts:
- Announcing UMBEL: A Lightweight Subject Structure for the Web (July 12, 2007)
- Erecting Road Signs on the Structured Web (August 8, 2007)
- A Data Model of Web Data Models: Part I (October 10, 2007)
- So, What Might The Web’s Subject Backbone Look Like? (February 20, 2008)
However, much internal development and refinement has occurred especially in the past few months [1]. Over the next few days, this posting, a re-introduction to UMBEL, will be followed by these additional parts:
- Part 2: UMBEL: Making Linked Data Classy
- Part 3: Subject Concepts and Named Entities and
- Part 4: Basing UMBEL’s Backbone on OpenCyc.
These articles are lead-ins to the discussion of the actual UMBEL ontology that will soon follow.
Purpose
UMBEL has two purposes: 1) to provide a lightweight structure of subject concepts as a reference to what Web content or data “is about“, what is called a concept schema in SKOS [4]; and 2) to define a variety of binding protocols for different Web data formats to map to this “backbone.” The project’s immediate priority is to first create this reference backbone [2]. That is the focus of these current postings.
Think of the backbone as a set of roadsigns to help find related content. UMBEL is like a map of an interstate highway system, a way of getting from one big place to another. Once in the right vicinity, other maps (or ontologies), more akin to detailed street maps, are then necessary to get to specific locations or street addresses.
By definition, these more fine-grained maps are beyond UMBEL’s scope. But UMBEL can help provide the context for placing such detailed maps in relation to one another and in relation to the Big Picture of what related content is about.
These subject concepts also provide the mapping points for the many, many thousands (indeed, millions) of specific named entities that are the notable instances of these subject concepts. Examples might include the names of specific physicists, cities in a country, or a listing of financial stock exchanges. UMBEL mappings enable us to link a given named entity to the various subject classes of which it is a member.
And, because of relationships amongst subject concepts in the backbone, we can also relate that entity to other related entities and concepts. The UMBEL backbone traces the major pathways through the content graph of the Web. For some visualizations of this subject graph, see So, What Might The Web’s Subject Backbone Look Like?
Relation to Linked Data
Today, the actual linkages in Linked Data, the first meaningful expression of the semantic Web, occur almost exclusively via direct sameAs relationships between instances. An easy way to think of one of these notable instances is as the topic of a specific article in Wikipedia. People, places, important historical events, and so forth are all examples of such named entities.
Current Linked Data is therefore useful for linking data for given instances from different data sources (say, for combining political, demographic and mapping information for a geographic place like Quebec City). But, these instance-level links lack context and a conceptual framework for inferencing or determining relatedness between concepts or in relation to other instances. For these purposes, Linked Data needs a class structure (Part 2).
As noted, UMBEL’s class structure is based on subject concepts, which are a distinct subset of the more broadly understood concept [3] such as used in the SKOS RDFS controlled vocabulary [4], conceptual graphs, formal concept analysis or the very general concepts common to many upper ontologies [5]. We define subject concepts as a special kind of concept: namely, ones that are concrete, subject-related and non-abstract [6].
UMBEL contrasts its subject concepts with abstract concepts and with named entities. Abstract concepts represent abstract or ephemeral notions such as truth, beauty, evil or justice, or are thought constructs useful to organizing or categorizing things but are not readily seen in the experiential world. Named entities are the real things or instances in the world that are themselves natural and notable instances (members) of subject concepts (classes). More detailed discussion of this terminology is provided in Part 3.
Basic Approach
UMBEL thus sets for itself objectives that include an identification of subject concepts and their relationships; a premise of emphasizing representational concepts over unattainable precision or exactitude; and a means for relating any notable thing of the world to that structure. Moreover, meeting these objectives should be based on best systems and practices, informed where possible by social acceptance and consensus.
W-O-W-Y is the shorthand we apply to the semantic framework for meeting these UMBEL objectives. W-O-W-Y is derived from the constituent UMBEL building blocks of WordNet (W) [7], OpenCyc (O), Wikipedia (W) [8] and YAGO (Y) [9]. Each resource contributes in a different way.
Via the WOWY framework, OpenCyc provides the basis for the reference subject backbone (Part 4), WordNet (supplemented by others) provides the “synsets” for relating natural language nouns and phrases to these concepts, and Wikipedia as processed by YAGO (among a growing list of other resources) provides the starting dictionary of relevant named entities important to the Web public.
The initial UMBEL ontology contains roughly 21,000 subject concepts distilled from OpenCyc and vetted for their relational structure. A further 1.5 million named entities are also currently mapped to that structure.
Coincident with the pending release of the draft UMBEL ontology of these subject concepts will be a multi-volume technical report that details the exact mapping and vetting procedures used.
While I’m far from convinced by the upper ontology approach – it’s not very webby – something similar (but less philosophical!) could be very useful. Anything that stands a chance of increasing data linkage is worth a try. Right now DBpedia/Wikipedia seems to be the hub, so if I were you I’d make connecting up with that a priority. Like DBpedia/Wikipedia I reckon you’ve jumped a big hurdle with “a premise of emphasizing representational concepts over unattainable precision or exactitude”.
Couple of random refs you probably already have: SUO WG , Sowa’s Top-Level Categories – his book’s very good but hard work.
Hi Danny,
The question is: the hub of what? DBPedia/Wikipedia are mainly describing named entities, so individuals. In fact, in LOD, the only property that links “documents” and “papers” is owl:sameAs. As you know, this properly only says that two “Individuals” are the same. But what does this really say about the state of LOD? It says that currently, LOD only “links” the individuals of classes; and this is clearly the case with DBpedia.
UMBEL goes one step further: it links classes of individuals. As this will be explained in the second part that Mike will publish tomorrow, UMBEL is also about how to CLASSify the linked data spaces.
And our third article tries is to answer the question: why did we use Cyc as the reference framework?
Some will like it, others won’t. In fact, we welcome alternatives arguing that ANY form of context is better than none.
I can already think about comments such as: why this and this? What is the utility? DBPedia already is linked to OpenCyc and has classes with categories from Wikipedia… and such. But believe me, we didn’t put all that work on this without knowing these things, and there are reasons why we continued in this direction and why we will release this work in the next weeks and months.
Take care,
Fred