Posted:January 31, 2011

Making Connections Real

UMBEL Vocabulary and Reference Concept Ontology Wikipedia Refining UMBEL’s Linking and Mapping Predicates with Wikipedia

We are only days away from releasing the first commercial version 1.00 of UMBEL (Upper Mapping and Binding Exchange Layer) [1]. To recap, UMBEL has two purposes, both aimed to promote the interoperability of Web-accessible content. First, it provides a general vocabulary of classes and predicates for describing domain ontologies and external datasets. Second, UMBEL is a coherent framework of 28,000 broad subjects and topics (the “reference concepts”), which can act as binding nodes for mapping relevant content.

This last iteration of development has focused on the real-world test of mapping UMBEL to Wikipedia [2]. The result, to be more fully described upon release, has led to two major changes. It has acted to expand the size of the core UMBEL reference concepts to about 28,000. And it has led to adding to and refining the mapping predicates necessary for UMBEL to fulfill its purpose as a reference structure for external resources. This latter change is the focus of this post.

There is a huge diversity of organizational structure and world views on the Web; the linking and mapping predicates to fulfill this purpose must also capture that diversity. Relations between things on the Web can range from the exact and identity, to the approximate, descriptive and casual [3]. The 16 K direct mappings that have now been made between UMBEL and Wikipedia (resulting in the linkage of more than 2 million Wikipedia pages) provide a real-world test for how to capture this diversity. The need is to find the range of predicates that can reflect and capture quality, accurate mappings. Further, because mappings also can be aided with a variety of techniques from the manual to the automatic, it is important to characterize the specific mapping methods used whenever a linking predicate is assigned. Such qualifications can help to distinguish mapping trustworthiness, plus enable later segregation for the application of improved methods as they may arise.

As a result, the UMBEL Vocabulary now has a pretty well vetted and diverse set of linking and mapping predicates. Guidelines for how these differ, how they are used, and how they are qualified is described next.

A Comparison of Mapping Predicates

Properties for linking and mapping need to differ more than in name or intended use. They must represent differences that affect inferences and reasoners, and can be acted upon by specific utilities via user interfaces and other applications. Furthermore, the diversity of mapping predicates should capture the types of diverse mappings and linkages possible between disparate sources.

Sometimes things are individuals or instances; other times they are classes or groupings of similar things. Sometimes things are of the same kind, but not exactly aligned. Sometimes things are unlike, but related in a common way. (Everything in Britain, for example, is a British “thing” even though they may be as different as trees, dead kings or cathedrals.) Sometimes we want to say something about a thing, such as an animal’s fur color or age, as a way to further characterize it, and so on.

The OWL 2 language and existing semantic Web languages give us some tools and existing vocabulary to capture some of this diversity. How these options, plus new predicates defined for UMBEL’s purposes, compare is shown by this table:

Property Relative Strength Usage Standard Reasoner? Inverse Property? Kind of Thing Symmetrical? Transitive? Reflexive?
It is It Relates to
owl:equivalentClass 10 equivalence X N/A class class yes yes yes
owl:sameAs 9 identity X N/A individual individual yes yes yes
rdfs:subClassOf 8 subset X class class no yes yes
umbel:correspondsTo 7 ~equivalence + / – anything RefConcept yes yes yes
skos:narrowerTransitive 6 hierarchical X skos:Concept skos:Concept no yes no
skos:broaderTransitive 6 hierarchical X skos:Concept skos:Concept no yes no
rdf:type 5 membership X anything class no no no
umbel:isAbout 4 topical X anything RefConcept perhaps not likely not likely
umbel:isLike 3 similarity anything anything yes no not likely
umbel:relatesToXXX 2 relationship anything SuperType no no not likely
umbel:isCharacteristicOf 1 attribute X anything RefConcept no no no

I discuss each of these predicates below. But, first, let’s discuss what is in this table and how to interpret it [4].

  • Relative strength – an arbitrary value that is meant to capture the inferencing power (entailments) embodied in the predicate. Identity (equivalence), class implications, and specific predicate properties that can be acted upon by reasoners are given higher relative power
  • Standard reasoner? – indicates whether standard reasoners [5] draw inferences and entailments from the specific property. A “+ / -” indication indicates that reasoners do not recognize the specific property per se, but can act upon the predicates (such as symmetric, transitive or reflexive) used to define the predicate
  • Inverse property? – indicates whether there is an inverse property used within UMBEL that is not listed in the table. In such cases, the predicate shown is the one that treats the external entity as the subject
  • It is a kind of thing – is the same as domain; it means the kind of thing to which the subject belongs
  • It relates to a kind on thing – is the same as range; it means the kind of thing to which the object of the subject belongs
  • Symmetrical? – describes whether the predicate for an s – p – o (subject – predicate – object) relationship can also apply in the o – p – s manner
  • Transitive? – is whether the predicate interlinks two individuals A and C whenever it interlinks A with B and B with C for some individual B
  • Reflexive? – By that is meant whether the subject has itself as a member. In a reflexive closure between subject and object the subject is fully included as a member. Equivalence, subset, greater than or equal to, and less than or equal to relationships are reflexive; not equal, less than or greater than relationships are not.

The Usage metric is described for each property below.

Individual Predicates Discussion

To further aid the understanding of these properties, we can also group them into equivalence, membership, approximate or descriptive categories.

Equivalent Properties

Equivalent properties are the most powerful available since they entail all possible axioms between the resources.

owl:equivalentClass

Equivalent class means that two classes have the same members; each is a sub-class of the other. The classes may differ in terms of annotations defined for each of them, but otherwise they are axiomatically equivalent.

An owl:equivalentClass assertion is the most powerful available because of its ability to ‘Explode the Domain[6]. Because of its entailments, owl:equivalentClass should be used with great care.

owl:sameAs

The owl:sameAs assertion claims two instances to be an identical individual. This assertion also carries with it strong entailments of symmetry and reflexivity.

owl:sameAs is often misapplied [7]. Because of its entailments, it too should be used with great care. When there are doubts about claiming this strong relationship, UMBEL has the umbel:isLike alternative (see below).

Membership and Hierarchical Properties

Membership properties assert that an instance is a member of a class.

rdfs:subClassOf

The rdfs:subClassOf asserts that one class is a subset of another class. This assertion is transitive and reflexive. It is a key means for asserting hierarchical or taxonomic structures in an ontology. This assertion also has strong entailments, particularly in the sense of members having consistent general or more specific relationships to one another.

Care must be exercised that full inclusivity of members occurs when asserting this relationship. When correctly asserted, however, this is one of the most powerful means to establish a reasoning structure in an ontology because of its transitivity.

skos:narrowerTransitive/skos:broaderTransitive

Both of these predicates work on skos:Concept (recall that umbel:RefConcept is itself a subClassOf a skos:Concept). The predicates state a hierarchical link between the two concepts that indicates one is in some way more general (“broader”) than the other (“narrower”) or vice versa. The particular application of skos:broaderTransitive (or its complement) is used to infer the transitive closure of the hierarchical links, which can then be used to access direct or indirect hierarchical links between concepts.

The transitive relationship means that there may be intervening concepts between the two stated resources, making the relationship an ancestral one, and not necessarily (though it is possible to be so) a direct parent-child one.

rdf:type

The rdf:type assertion assigns instances (individuals) to a class. While the idea is straightforward, it is important to understand the intensional nature of the target class to ensure that the assignment conforms to the intended class scope. When this determination can not be made, one of the more approximate UMBEL predicates (see below) should be used.

Approximation Properties

For one reason or another, the precise assertions of the equivalent or membership properties above may not be appropriate. For example, we might not know sufficiently an intended class scope, or there might be ambiguity as to the identity of a specific entity (is it Jimmy Johnson the football coach, race car driver, fighter, local plumber or someone else?). Among other options — along a spectrum of relatedness — is the desire to assign a predicate that is meant to represent the same kind of thing, yet without knowing if the relationship is an equivalence (identity, or sameAs), a subset, or merely just a member of relationship. Alternatively, we may recognize that we are dealing with different things, but want to assert a relationship of an uncertain nature.

This section presents the UMBEL alternatives for these different kinds of approximate predicates [4].

umbel:correspondsTo

The most powerful of these approximate predicates in terms of alignment and entailments is the umbel:correspondsTo property. This predicate is the recommended option if, after looking at the source and target knowledge bases [8], we believe we have found the best equivalent relationship, but do not have the information or assurance to assign one of the relationships above. So, while we are sure we are dealing with the same kind of thing, we may not have full confidence to be able to assign one of these alternatives:

   rdfs:subClassOf
   owl:equivalentClass
   owl:sameAs
   superClassOf

Thus, with respect to existing and commonly used predicates, we want an umbrella property that is generally equivalent or so in nature, and if perhaps known precisely might actually encompass one of the above relations, but we don’t have the certainty to choose one of them nor perhaps assert full “sameness”. This is not too dissimilar from the rationale being tested for the x:coref predicate in relation to owl:sameAs from the UMBC Ebiquity group [9,10].

The property umbel:correspondsTo is thus used to assert a close correspondence between an external class, named entity, individual or instance with a Reference Concept class. It asserts this correspondence through the basis of both its subject matter and intended scope.

This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.

umbel:isAbout

In most uses, the most prevalent linking property to be used is the umbel:isAbout assertion. This predicate is useful when tagging external content with metadata for alignment with an UMBEL-based reference ontology. The reciprocal assertion, umbel:isRelatedTo is when an assertion within an UMBEL vocabulary is desired to an external ontology. Its application is where the reference vocabulary itself needs to refer to an external topic or concept.

The umbel:isAbout predicate does not have the same level of confidence or “sameness” as the umbel:correspondsTo property. It may also reflect an assertion that is more like rdf:type, but without the confidence of class membership.

The property umbel:isAbout is thus used to assert the relation between an external named entity, individual or instance with a Reference Concept class. It can be interpreted as providing a topical assertion between an individual and a Reference Concept.

This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.

umbel:isLike

The property umbel:isLike is used to assert an associative link between similar individuals who may or may not be identical, but are believed to be so. This property is not intended as a general expression of similarity, but rather the likely but uncertain same identity of the two resources being related.

This property may be considered as an alternative to sameAs where there is not a certainty of sameness, and/or when it is desirable to assert a degree of overlap of sameness via the umbel:hasMapping reification predicate. This property can and should be changed if the certainty of the sameness of identity is subsequently determined.

It is appropriate to use this property when there is strong belief the two resources refer to the same individual with the same identity, but that association can not be asserted at the present time with full certitude.

This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.

umbel:relatesToXXX

At a different point along this relatedness spectrum we have unlike things that we would like to relate to one another. It might be an attribute, a characteristic or a functional property about something that we care to describe. Further, by nature of the thing we are relating, we may also be able to describe the kind of thing we are relating. The UMBEL SuperTypes (among many other options) gives us one such means to characterize the thing being related.

UMBEL presently has 31 predicates for these assertions relating to a SuperType [11]. The various properties designated by umbel:relatesToXXX are used to assert a relationship between an external instance (object) and a particular (XXX) SuperType. The assertion of this property does not entail class membership with the asserted SuperType. Rather, the assertion may be based on particular attributes or characteristics of the object at hand. For example, a British person might have an umbel:relatesToXXX asserted relation to the SuperType of the geopolitical entity of Britain, though the actual thing at hand (person) is a member of the Person class SuperType.

This predicate is used for filtering or clustering, often within user interfaces. Multiple umbel:relatesToXXX assertions may be made for the same instance.

Each of the 32 UMBEL SuperTypes has a matching predicate for external topic assignments (relatesToOtherOrganism shares two SuperTypes, leading to 31 different predicates):

SuperType Mapping Predicate Comments
NaturalPhenomena relatesToPhenomenon This predicate relates an external entity to the SuperType (ST) shown. It indicates there is a relationship to the ST of a verifiable nature, but which is undetermined as to strength or a full rdf:type relationship
NaturalSubstances relatesToSubstance same as above
Earthscape relatesToEarth same as above
Extraterrestrial relatesToHeavens same as above
Prokaryotes relatesToOtherOrganism same as above
ProtistsFungus
Plants relatesToPlant same as above
Animals relatesToAnimal same as above
Diseases relatesToDisease same as above
PersonTypes relatesToPersonType same as above
Organizations relatesToOrganizationType same as above
FinanceEconomy relatesToFinanceEconomy same as above
Society relatesToSociety same as above
Activities relatesToActivity same as above
Events relatesToEvent same as above
Time relatesToTime same as above
Products relatesToProductType same as above
FoodorDrink relatesToFoodDrink same as above
Drugs relatesToDrug same as above
Facilities relatesToFacility same as above
Geopolitical relatesToGeoEntity same as above
Chemistry relatesToChemistry same as above
AudioInfo relatesToAudioMusic same as above
VisualInfo relatesToVisualInfo same as above
WrittenInfo relatesToWrittenInfo same as above
StructuredInfo relatesToStructuredInfo same as above
NotationsReferences relatesToNotation same as above
Numbers relatesToNumbers same as above
Attributes relatesToAttribute same as above
Abstract relatesToAbstraction same as above
TopicsCategories relatesToTopic same as above
MarketsIndustries relatesToMarketIndustry same as above

This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.

Descriptive Properties

Descriptive properties are annotation properties.

umbel:isCharacteristicOf

Two annotation properties are used to describe the attribute characteristics of a RefConcept, namely umbel:hasCharacteristic and its reciprocal, umbel:isCharacteristicOf. These properties are the means by which the external properties to describe things are able to be brought in and used as lookup references (that is, metadata) to external data attributes. As annotation properties, they have weak semantics and are used for accounting as opposed to reasoning purposes.

These properties are designed to be used in external ontologies to characterize, describe, or provide attributes for data records associated with a given RefConcept. It is via this property or its inverse, umbel:hasCharacteristic, that external data characterizations may be incorporated and modeled within a domain ontology based on the UMBEL vocabulary.

Qualifying the Mappings

The choice of these mapping predicates may be aided with a variety of techniques from the manual to the automatic. It is thus important to characterize the specific mapping methods used whenever a linking predicate is assigned. Following this best practice allows us to distinguish mapping trustworthiness, plus to also enable later segregation for the application of improved methods as they may arise.

UMBEL, for its current mappings and purposes, has adopted the following controlled vocabulary for characterizing the umbel:hasMapping predicate; such listings may be readily modified for other domains and purposes when using the UMBEL vocabulary. This controlled vocabulary is based on instances of the Qualifier class. This class represents a set of descriptions to indicate the method used when applying an approximate mapping predicate (see above):

Qualifier Description
Manual – Nearly Equivalent The two mapped concepts are deemed to be nearly an equivalentClass or sameAs relationship, but not 100% so
Manual – Similar Sense The two mapped concepts share much overlap, but are not the exact same sense, such as an action as related to the thing it acts upon
Heuristic – ListOf Basis Type assignment based on Wikipedia ListOf category; not currently used
Heuristic – Not Specified Heuristic mapping method applied; script or technique not otherwise specified
External – OpenCyc Mapping Mapping based on existing OpenCyc assertion
External – DBOntology Mapping Mapping based on existing DBOntology assertion
External – GeoNames Mapping Mapping based on existing GeoNames assertion
Automatic – Inspected SV Mapping based on automatic scoring of concepts using Semantic Vectors, with specific alignment choice based on hand selection
Automatic – Inspected S-Match Mapping based on automatic scoring of concepts using S-Match, with specific alignment choice based on hand selection; not currently used
Automatic – Not Specified Mapping based on automatic scoring of concepts using a script or technique not otherwise specified; not currently used

Again, as noted, for other domains and other purposes this listing can be modified at will.

Status of Mappings

Final aspects of these mappings are now undergoing a last round of review. A variety of sources and methods have been applied, to be more fully documented at time of release.

Some of the final specifics and counts may be modified slightly by the time of actual release of UMBEL v 1.00, which should occur in the next week or so. Nonetheless, here are some tentative counts for a select portion of these predicates in the internal draft version:

Item or Predicate Count
Total UMBEL Reference Concepts 27,917
owl:equivalentClass
(external OpenCyc, PROTON, DBpedia)
28,618
umbel:correspondsTo
(direct mappings to Wikipedia)
16,884
rdf:type 876,125
umbel:relatesToXXX
(31 variations)
3,059,023
Unique Wikipedia Pages Mapped 2,130,021

All of these assignments have also been hand inspected and vetted.

Major Progress Towards a Gold Standard

To date, in various steps and in various phases, the inspection of Wikipedia, its categories, and its match with UMBEL has perhaps incurred more than 5,000 hours (or nearly a three person-year equivalence) of expert domain and semantic technology review [12]. As noted, about 60% (16,884 of 27,917) of UMBEL concepts have now been directly mapped to Wikipedia and inspected for accuracy.

Wikipedia provides the most demanding and complete mapping target available for testing the coverage of UMBEL’s reference concepts and the adequacy of its vocabulary. As a result, we have added to and refined the mapping and linking predicates used in the UMBEL vocabulary, and added a Qualifier class to record the mapping process, as this post overviews. We have added the SuperType class to better organize and disambiguate large knowledge bases [13]. And, in this mapping process, we have expanded UMBEL’s reference concepts by about 33% to improve coverage, while remaining consistent with its origins as a faithful subset of the venerable Cyc knowledge structure [14].

A side benefit that has emerged from these efforts — with a huge potential upside — is the valuable combination of UMBEL and Wikipedia as a “gold standard” for aligning and mapping knowledge bases. Such a standard is critically needed. For example, in reviewing many of the existing Wikipedia mappings claimed as accurate, we found misplacement errors that averaged 15.8% [15]. Having a baseline of vetted mappings will aid future mappings. Moreover, having a complete conceptual infrastructure over Wikipedia will enable new and valuable reasoning and inference services.

The results from the UMBEL v 1.00 mapping are promising and very much useful today, but by no means complete. Future versions will extend the current mappings and continue to refine its accuracy and completeness [16]. What we can say, however, is that a coherent organization and conceptual schema — namely, UMBEL — overlaid on the richness of the instance data and content of Wikipedia, can produce immediate and useful benefits. These benefits apply to semantic search, semantic annotation and tagging, reasoning, discovery, inferencing, organization and comparisons.


[1] UMBEL has been under development since March 2007, with its first release in July 2008 and its last release (v 0.80) in November 2010. Throughout its releases we have reserved incrementing the vocabulary and its ontology to version 1.00 until it was deemed “commercial”. This is the first version to meet this test. We’d like to thank our partner, Ontotext, and the RENDER project for providing assistance and resources to bring the system to this point.
[2] The basic approach is to use the DBpedia representation of Wikipedia, since its extractors have already done a great job in preparing structured data.
[3] M.K. Bergman, 2010. “The Nature of Connectedness on the Web,” AI3:::Adaptive Information blog, November 22, 2010; see https://www.mkbergman.com/935/the-nature-of-connectedness-on-the-web/.
[4] A good starting reference for some of these concepts is Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.
[5] Such as the semantic reasoners FaCT++, Racer, Pellet, Hermit, etc..
[6] Fred Giasson first coined this phrase; see F. Giasson, 2008. “Exploding the Domain: UMBEL Web Services by Zitgist,” blog posting on April 20, 2008; see http://fgiasson.com/blog/index.php/2008/04/20/exploding-the-domain-umbel-web-services-by-zitgist/.
[7] Among many, many references, see a fairly comprehensive discussion of this issue at http://ontologydesignpatterns.org/wiki/Community:Overloading_OWL_sameAs.
[8] This predicate is designed for the circumstance of aligning two different ontologies or knowledge bases based on node-level correspondences, but without entailing the actual ontological relationships and structure of the object source. For example, the umbel:correspondsTo predicate is used to assert close correspondence between UMBEL Reference Concepts and Wikipedia categories or pages, yet without entailing the actual Wikipedia category structure.
[9] Jennifer Sleeman and Tim Finin, 2010. “Learning Co-reference Relations for FOAF Instances,” Proceedings of the Poster and Demonstration Session at the 9th International Semantic Web Conference, November 2010; see http://ebiquity.umbc.edu/_file_directory_/papers/522.pdf.
[10] For example, in the words of Tim Finin of the Ebiquity group:
“The solution we are currently exploring is to define a new property to assert that two RDF instances are co-referential when they are believed to describe the same object in the world. The two RDF descriptions might be incompatible because they are true at different times, or the sources disagree about some of the facts, or any number of reasons, so merging them with owl:sameAs may lead to contradictions. However, virtually merging the descriptions in a co-reference engine is fine — both provide information that is useful in disambiguating future references as well as for many other purposes.”
[11] The same vocabulary construct can be applied to other domain ontologies based on the UMBEL Vocabulary.
[12] The efforts with Wikipedia have been ongoing to a certain degree since the inception of UMBEL. As one example, we have been maintaining a comprehensive tracking of the use of Wikipedia for mapping and semantic technology purposes, called SWEETpedia, for many years. Applying these techniques to both UMBEL and Wikipedia has been most active over the past 18 months.
[13] See the UMBEL Annex G: UMBEL SuperTypes Documentation, which will also be slightly updated upon the new UMBEL v 1.00 release.
[14] See the ‘Use of OpenCyc’ section in the UMBEL Specifications.
[15] These sources and error rates will be detailed in a paper after the pending new release of UMBEL.
[16] Fortunately, returns on the time investment will accelerate since basic lessons and techniques have now been learned.

Schema.org Markup

headline:
Making Connections Real

alternativeHeadline:
Refining UMBEL’s Linking and Mapping Predicates with Wikipedia

author:

image:
http://www.mkbergman.com/wp-content/themes/ai3v2/images/umbel_medium.png

description:
16,000 direct mappings have now been made between UMBEL and Wikipedia, resulting in the linkage of more than 2 million Wikipedia pages. The use and next steps of these mappings are detailed herein

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published. Required fields are marked *