Bringing Context through a Meta-Subject Framework for the Web
Today marks the first public release of UMBEL, a lightweight subject concept reference structure for the Web. This version 0.70 release required a full 12 months and many person-years of development effort.
UMBEL (Upper Mapping and Binding Exchange Layer) is a lightweight ontology structure for relating Web content and data to a standard set of 20,000 subject concepts. Its purpose is to provide a fixed set of common reference points in the global knowledge space. These subject concepts have defined relationships between them, and can act as semantic binding nodes for any Web content or data. The UMBEL reference structure is a large, inclusive, linked concept graph.
Connecting to the UMBEL structure gives context and coherence to Web data. In this manner, Web data can be linked, made interoperable, and more easily navigated and discovered. UMBEL is a great vehicle for interconnecting content metadata.
The UMBEL vocabulary defines some important new predicates and leverages existing semantic Web standards. The ontology is provided as Linked Data with Web services access (and pending SPARQL endpoints). Besides its 20,000 subject concepts and relationships distilled from OpenCyc, a further 1.5 million named entities are mapped to that structure. The system is easily extendable.
Fred Giasson, UMBEL’s co-editor, posts separately on how the UMBEL vocabulary can enrich existing semantic Web ontologies and techniques. Also, see the project’s Web site for additional background and explanatory information on the project.
UMBEL is provided as open source under the Creative Commons 3.0 Attribution-Share Alike license; the complete ontology with all subject concepts, definitions, terms and relationships can be freely downloaded. All subject concepts are Web-accessible as Linked Data URIs.
Access and Documentation
Five volumes of technical documentation are available. The two key volumes explaining the UMBEL project and process are UMBEL Ontology, Vol. A1: Technical Documentation (also online) and Distilling Subject Concepts from OpenCyc, Vol. B1: Overview and Methodology.
A new overview slideshow is also available.
Ontology Access and Download
- The UMBEL ontology namespace is http://umbel.org/ns/ (see Vol. 2 of the Ontology Documentation for access instructions)
- umbel.n3 — describes all of the classes and properties created for the UMBEL Ontology; that is, the basic UMBEL vocabulary and formal specification
- umbel_subject_concepts.n3 — instantiates all of the subject concepts that belong to UMBEL using the RDF/N3 serialization
- umbel_abstract_concepts.n3 — instantiates all of the abstract concepts that belong to UMBEL using RDF/N3 serialization.
UMBEL Web Services
There are two input files for Cytoscape, the open source program used for certain large-scale UMBEL visualization and analysis:
- umbel_cytoscape.csv — lists all the nodes and arcs to import into Cytoscape to visualize the UMBEL graph
- umbel_cytoscape.cys — a pre-prepared input file to Cytoscape that includes a force-directed layout of the UMBEL subject concept graph; this is the file that most should use unless you want to re-build from scratch within Cytoscape.
The two complete references to all current and archived files and access procedures in the UMBEL project are UMBEL Ontology, Vol. A2: Subject Concepts and Named Entities Instantiation and Distilling Subject Concepts from OpenCyc, Vol. B2: Files Documentation. Finally, the fifth documentation volume accompanying the release is Distilling Subject Concepts from OpenCyc, Vol. B3: Appendices, which provides supporting materials and detailed backup.
Current Editorial Positions
As discussed on the Web site on UMBEL’s role, the project currently has adopted two pivotal positions with respect to OpenCyc and its use:
- All UMBEL subject concepts are based on existing concepts in OpenCyc. This means UMBEL inherits the proven structure and relationships extant in OpenCyc
- No new subject concepts will be added to UMBEL that are not included in OpenCyc. This means that UMBEL’s structure will not diverge from the structural relations already in OpenCyc. This decision preserves the use of UMBEL as a sort of contextual middleware between unstructured Web content and the inferential and tools infrastructure within OpenCyc (and beyond into ResearchCyc and Cyc for commercial purposes) and back again to the Web.
For these positions to be effective, we are putting in place mechanisms for UMBEL to collect and forward community comments regarding the suitability of the subject concept structure, and for Cycorp to deliberate on that input and respond as appropriate to maintain the coherence of the knowledge base.
Fortunately, Cycorp has been supremely responsive to date and made changes to the OpenCyc concept structure and its conversion to OWL in support of needs and observations brought forth by the UMBEL project. We anticipate this excellent working relationship to continue.
Setting Realistic Expectations
This version 0.70 release is based on versioning and numbering as presented in the supporting documentation. But, also, releasing with a version increment below 1.0 additionally signals the newness and relative immaturity of the system.
This release is the first one in which the UMBEL subject concepts and ontology will be applied as a real vocabulary in public settings. Some areas are known to be weaker and less complete than others. Some areas, such as the coverage of Internet and the Web topics particular to domain experts, are relatively sparse. Other areas, such as organizing science and academic disciplines, have seen much improvement, but more is necessary. Still additional areas will certainly surface as warranting better subject concept coverage.
Input mechanisms are being put in place for user feedback and input and discussion is always welcomed at the project’s discussion forum and mailing list. We anticipate rapid changes and versioning over the next six months or so, which is also roughly the forecasted horizon for the first production-grade version 1.0.
Contributions and Thanks
A number of individuals and organizations have contributed significantly to this release, for which the project offers hearty thanks.
|Zitgist LLC has been the major source of staff time and hosting services to the project. Two of Zitgist’s principals, Mike Bergman and Fred Giasson, have acted as editors on the UMBEL project.Zitgist also has contributed nearly two person-years of effort to the project.Zitgist intends on continuing to lead and manage the project with a substantial future commitment of time and effort.|
|OpenLink Software has been the major source of infrastructure, financing and software for the project. OpenLink’s Virtuoso virtual data management system is the hosting software environment for UMBEL and its Web services.Kingsley Idehen, CEO and President of OpenLink, has been a key source of inspiration for the project.|
|Cycorp is the developer of the Cyc knowledge base, with more than 1,000 person-years of effort behind it, from which the OpenCyc open source version is derived.Since the initial selection of OpenCyc for UMBEL, Cycorp staff have devoted many person-months of effort to help explain the underlying system and, then, most recently, to make improvements and revisions to OpenCyc and its OWL version in response to project input. Larry Lefkowitz, VP of business development, has been a very effective interface with the project.|
|YAGO is a project from Fabian Suchanek, Gjergji Kasneci and Gerhard Weikum of the Max-Planck-Institute for Computer Science, Saarbruecken, Germany. It is based on extracting and organizing entities from Wikipedia according to the WordNet concept structure.YAGO demonstrated the methodology for how to replace the native Wikipedia structure with alternate external structures and provided the starting set of named entities used within UMBEL. Fabian has been especially helpful in data, software and methodology support to the project.|
|The Cyc Foundation and its members have been devoted to Web exposure of OpenCyc and have provided great guidance to the project in learning and navigating the knowledge base. Their concepts browser and other Web services have also been extremely helpful to the project’s initial ideas and testing.Mark Baltzegar and John De Oliveira, the two lead directors of the Cyc Foundation, have been particularly helpful.|
|Moritz Stefaner is one of the innovators and rising stars in large-scale data visualization.Moritz has kindly contributed his cool Flash explorer implementation used in UMBEL’s Subject Concept Explorer and continues to make ongoing improvements to UMBEL’s visualization.Moritz’s Web site and separate blog are each worth perusing for neat graphics and ideas.|
Thanks, all of you! This is a day we have worked long and hard to see come to reality. As Fred puts it, let the fun begin!