Objective is to Tackle the ‘Semantics’ Gap in the Semantic Web
Ontotext is the developer of OWLIM, a highly scalable semantic database engine, and KIM, a popular semantic annotation and search platform. Its FactForge and LinkedLifeData services provide the largest curated and interoperable linked data platforms over which inferencing and reasoning may be applied. Some of Ontotext’s major clients include AstraZeneca, BBC and Korea Telecom. Major professional services include its own technologies, plus text mining and semantic annotation. Ontotext has notable and longstanding technical partnerships, such as with the GATE team and many of the other leading technologies and companies in the semantic Web space. We are very pleased to join forces with them.
Semantic ‘Gap’ is Basis of Partnership
Our partnership was formed to address some of the key semantic ‘gaps’ in the semantic Web. The partnership will focus on development of the next generation of the UMBEL and PROTON ontologies, as well as tools and applications based on them.
- inadequate semantics for how to link disparate information together that recognizes inherently different contexts and viewpoints and (often) approximate mappings
- misapplication of many linking predicates, such as owl:sameAs, and
- a lack of coherent reference concepts by which to aggregate and organize this linkable content.
Thanks to the efforts of the W3C (World Wide Web Consortium), we now have the techniques, languages and standards to deliver the “web” portion of the semantic Web. But, the practical “semantics” for actually effecting the semantic Web have heretofore been lacking. Early experience with linked data has exposed many poor practices. The lack of approximate linking predicates and reference concepts undercuts our ability to achieve meaningful semantic interoperability.
In forming our partnership, Ontotext and SD will shine attention on this semantics “gap”. We will also be aggressively seeking additional partners and players to join with us on this challenge. My recent outreach to DCMI (the Dublin Core Metadata Initiative) is one example of this commitment; we will be talking with others in the coming weeks.
Linked data and the prospects of the semantic Web are at a critical juncture. While we have seen much growth in the release of linked data, we are still not seeing much uptake (other than some curated pockets). Linkages between datasets are still disappointingly low, and quality of linkages is an issue. The time has come to stop simply shoveling more triples over the fence.
The combination of UMBEL and PROTON offers a powerful blend to address these weaknesses. Our partnership will first provide a logical mapping and consolidated framework based on the two core ontologies. These will be made available as standard ontologies and via open source semantic annotation tools.
UMBEL (Upper Mapping and Binding Exchange Layer) is both a vocabulary for building domain ontologies and a framework of more than 20,000 reference concepts. The UMBEL reference ontology is used to tag information and map existing schema in order to help link content and promote interoperability. UMBEL’s reference concepts and structure are a direct subset extraction of the Cyc knowledge base.
The PROTON ontology (PROTo ONtology) is a basic upper-level ontology that contains about 300 classes and 100 properties, providing coverage of the general concepts necessary for a wide range of tasks, including semantic annotation, indexing, and retrieval of documents. It is domain independent with coverage suitable to encompass any domain or named entity.
This consolidated framework will then be applied to organize and provide a coherent categorization of the Wikipedia online encyclopedia. One expression of this result will be a new version of Ontotext’s FactForge, already the largest and best performing reasoning engine leveraging linked data. This new version will allow easy access to the most central Linking Open Data (LOD) datasets such as DBpedia, Freebase, and Geonames, through the vocabularies of UMBEL and PROTON. Additional applications in linked data mining and general tagging of standard Web content are also contemplated by the partnership.
Ontotext’s proven reasoning technologies and ability to host extremely large knowledge bases with great performance are tremendous boons to the next iteration of UMBEL. We have been seeking large-scale coherency testing of UMBEL for some time and Ontotext is the perfect answer.
Ontotext’s CEO, Atanas Kiryakov, indicated their interest in UMBEL stemmed from what they saw as some stumbling blocks with linked data while developing FactForge. “The growth and maturation of linked data will require credible ways to orient and annotate the data,” said Kiryakov. “UMBEL is the right scope of comprehensiveness and size to use as one foundation for this,” he said. Ontotext is also the original developer and current maintainer of PROTON, which will also contribute in this role.
What is to Come?
The efforts of the partnership will first be seen with release of UMBEL v. 0.80 in the next couple of weeks. This update revises many aspects of the ontology based on two years of applied experience and updates it to OWL 2. Then, this basis will be used for broader mappings and linkages to Wikipedia. Those next mappings are earmarked for UMBEL version 1.00, slated for release by the end of the year. All of these planned efforts will be released as open source.
Among other intended uses, PROTON, UMBEL and FactForge form a layered reference data structure that will be used for data integration within the European Union research project RENDER. The large-scale RENDER project aims to integrate diverse methods in the ways Web information is selected, ranked, aggregated, presented and used.
Beyond that, further relationships and partnerships are being actively sought with players serious about interoperable, high-quality data on the semantic Web. We welcome inquiries or outreach.