A Reference for Data Set Interoperability, Look Up and Retrieval
UMBEL is a lightweight way to describe the subject(s) of Web content, akin to the relationship “isAbout”. Its subject reference structure is meant to be simple, universally applicable, and agnostic to the form or schema of source data. UMBEL does not replace formal domain or upper ontologies and has little or no inferential power. It is merely a pool of consensus ‘proxies’ to initially describe what subjects data sets are about.
UMBEL’s design includes binding mechanisms that work with HTML, tagging or other standard practices, including various RDF schema and more formal ontologies. Its reference subject ‘backbone’ is derived from the intersection of common subjects found on popularly used Web sites and other accepted subject references. Access and easy adoption is given preference over inferential or logical elegance.
In addition to its core reference subjects, the UMBEL project will provide look up, query, registration, pinging, and related services. The project is completely open and supported by a community process. All project products are made available without charge under Creative Commons licenses. UMBEL’s development is being backed by a number of leading open data efforts and entities; see the last section for how to get involved.
The UMBEL project stands for the Upper-level Mapping and Binding Exchange Layer. UMBEL is pronounced like “humble” — in keeping with its nature — except without the “h”. The name has the same Latin root as umbrella (umbra for shade, or umbella for parasol), meant to convey the umbrella-like nature of UMBEL’s subject bindings.
What’s the Problem?
With dozens of protocols and hundreds of thousands of potentially useful data sets, there are many challenges to getting Web data to interoperate. Two of these problems are foundational.
First, there are dozens of formalisms, schema, models and serializations for characterizing and communicating data and data content on the Web, ranging from the simplest Web page to the most formal OWL ontologies. A universal mechanism is lacking for how these variations can describe or publish to one other what they are about. This mechanism must be simple, neutral, broadly applicable and widely accepted.
Second, even if this publication mechanism existed, there is no accepted set of subjects for referencing what this diverse content is about. No attempt to date to provide a reference subject structure has been widely accepted.
Combined, these twin problems mean there are few road signs and poor road maps for how to find relevant data sets on the Web. UMBEL provides simple — but necessary — first steps to address these basic problems.
Simple, with Low Expectations
Advocates and users of various models and formalisms on the Web have their real-world reasons for embracing each form. Domain experts and various communities have their own world views, represented by their own vocabularies and structure. Only by understanding and respecting those differences can means to bridge them become widely accepted.
There is, of course, no such thing as complete objectivity or neutrality. But, from the standpoint of UMBEL and its purpose, keeping its approach simple with a minimum of structure poses the least challenge to the world views of existing publishers and data sets on the Web — and therefore the best likelihood of wide acceptance. Where choices are necessary, such as the selection of the reference subjects themselves, building from accepted Web practices and norms helps minimize bias and arbitrariness.
Thus, by necessity, UMBEL must be simple with limited ambitions. Its reference structure is merely a ‘bag of subjects’, with each subject reference only acting as a ‘proxy’ to a set of concepts that specific users may describe and refer to in their own ways. UMBEL’s core structure is completely flat, with no implied hierarchy or structure amongst its reference subjects. UMBEL’s reference subjects are simply that, proxy references and no more.
UMBEL thus has no or minimal inference power (though some disambiguation is possible). Inferencing, usefulness and authoritativeness are the responsibility of others. UMBEL is meant only to be a map to possible subjects, not whether those destinations are worthwhile or, indeed, even correct.
Consensus and Use Determine the Subject Pool
The selection of the actual subject proxies within the UMBEL core are to be based on consensus use. The subjects of existing and popular Web subject portals such as Wikipedia and the Open Directory Project (among others) will be intersected with other widely accepted subject reference systems such as WordNet and library classification systems (among others) in order to derive the candidate pool of UMBEL subject proxies. The actual methodology and sources of this process are still being determined (see further the project specification).
The objective, in any case, is to provide a simple and transparent method for subject selection that reflects current use and consensus to the maximum extent possible. The anticipation is that the first subject candidate pool will number in the many hundreds to the low thousands of proxies.
UMBEL as a general subject ‘backbone’ is meant to be useful as a reference by more specific domains or ontologies, but not fully descriptive for any of them. The core, internal UMBEL ontology is to be based on RDF and written in the RDF Schema vocabulary of SKOS (Simple Knowledge Organization System).
Very simple binding mechanisms will be developed and extended to the most widely employed approaches on the Web. UMBEL will, at minimum, support Atom, microformats, OPML, OWL, RDF, RDFa, RDF Schema, RSS, tags (via Tag Commons), and topic maps in its first release. The simplicity of the ontology and approach will enable other formats to be easily added.
Ping, update and registration protocols will also be provided for these formats. Existing project sponsors already possess a variety of ping, update, conversion and translation utilities for such purposes.
Additional UMBEL Initiatives
Besides the core structure, the UMBEL project will also develop a second ‘unofficial’ structure of hierarchical and interlinked subject relationships. This ‘unofficial’ structure will be used solely for look up and browsing functions, and will reside external to the core UMBEL subject and binding structure. Indeed, we anticipate that many such look-up structures from other parties may evolve over time for specific purposes and viewpoints.
Finally, besides development of the UMBEL ontology, the project will also be providing a data set registration service, information and collaboration Web site, tools clearinghouse, and support for language translations and some tools development.
How to Help and Get Involved
The initial project site is at http://www.umbel.org, including this project introduction, the draft project specification (http://www.umbel.org/proposal.xhtml), and other helpful background information. A more interactive Web site is currently under development and will be announced shortly.
A mailing list you can monitor or join to become part of the project is at http://groups.google.com/group/umbel-ontology.