Part 3 of 4 on Foundations to UMBEL
UMBEL (Upper-level Mapping and Binding Exchange Layer) is a lightweight ontology for relating Web content and data to a standard set of subject concepts. It is being designed to apply to all types of data on the Web from RSS and Atom feeds to tagging to microformats and topic maps to RDF and OWL (among others). The project Web site is at http://www.umbel.org.
The first portion and priority for UMBEL is to prepare the lightweight subject concept ontology, the focus of this four-part foundations series. After the UMBEL ontology is released in first draft, the project will then turn to the binding protocols for non-RDF formats.
The previous part in this series discussed at length RDF classes and instances or individuals. We are now tightening these terms down to reflect the specific intents and usage within UMBEL. UMBEL’s main classes categorize subject concepts; notable instances are specifically termed named entities.
UMBEL defines subject concepts as a distinct subset of the more broadly understood concept [1] such as used in the SKOS RDFS controlled vocabulary [2], conceptual graphs, formal concept analysis or the very general concepts common to many upper ontologies [3]. We define subject concepts as a special kind of concept: namely, ones that are concrete, subject-related and non-abstract [4].
UMBEL contrasts subject concepts with abstract concepts and with named entities. Abstract concepts represent abstract or ephemeral notions such as truth, beauty, evil or justice, or are thought constructs useful to organizing or categorizing things but are not readily seen in the experiential world. Named entities are the real things or instances in the world that are themselves natural and notable class members of subject concepts.
Subject Concepts
Subject concepts are a special kind of concept: namely, ones that are concrete, subject-related and non-abstract. Note in other systems or ontologies similar constructs may alternatively be called topics, subjects, concepts or perhaps interests. UMBEL has adopted the term subject concept to distinguish from these uses, which have different nuances of meaning and use, as well as to highlight the subject or topic nature of UMBEL's concrete concepts.
Each subject concept is a class. While subject concepts have a preferred label (using SKOS terminology), they are representative or a proxy for that concept, and not to be confused with the thing itself. Every UMBEL subject concept can be expressed and referred to by a different preferred label in alternate languages. Indeed, in a given language, different preferred labels may be swapped out without affecting the identity or use of the subject concept itself. The name for a subject concept is therefore merely a handle.
Subject concepts are the core constituents to the UMBEL framework. All subject concepts are based on existing concepts in OpenCyc, the open source version of the Cyc knowledge base (see Part 4). About 21,000 of them have been distilled and are part of the UMBEL backbone.
Semsets
Semsets are semantically close terms or phrases synonomous or nearly so with the meanings of a subject concept or a named entity. Semsets are akin to WordNet synsets or Cyc aliases, but can also include more contemporary jargon or slang as may be drawn from Web tagging or folksonomies. The term semset has been chosen to distinguish this consolidated meaning.
Semsets may apply to either subject concepts or named entities. In the latter case, their use is closer to the sense of an alias (such as nicknames, or "great satan" or "uncle sam" for the "United States").
Abstract Concepts
Abstract concepts represent abstract or ephemeral notions such as truth, beauty, evil or justice, or are thought constructs useful to organizing or categorizing things but are not readily seen in the experiential world. They are included in the UMBEL specification because they help maintain the integrity of the UMBEL subject concept graph.
Like subject concepts, abstract concepts are based strictly on those already in OpenCyc. Abstract concepts may be viewed in the UMBEL graph, and may be used for ontology mapping, but are not generally displayed when doing standard content mapping or concept look-ups via Web services. For various domain extraction or relatedness determinations, abstract concepts may be excluded from UMBEL's internal processing.
Named Entities
Named entities are the real things or instances in the world that are themselves natural and notable class members of subject concepts. The initial named entities are drawn from Wikipedia as processed via YAGO, and other online fact-based repositories. Named entities are the instances of the subject concepts in the standard definition of the term [5].
Named entities and the sources for them are also a major avenue for growth and expansion of UMBEL moving forward. Named entities are more contemporary and changing, while the reference subject concept backbone is more fixed and stable.
Each named entity is mapped to a governing subject concept for ontology purposes. There are no relations between named entities except as mediated through a subject concept(s). As noted, named entities may also have semset aliases.
Subject Concepts v. Abstract Concepts
The following table helps draw the distinction between subject concepts and abstract concepts. Technical documentation at the time of the UMBEL ontology release will list the 520 or so abstract concepts presently within UMBEL. Looking at those can help draw the distinction.
Subject Concepts | Abstract Concepts |
|
|
Subject Concepts v. Named Entities
The following table helps draw the distinction between subject concepts and named entities. Technical documentation at the time of the UMBEL ontology release will describe certain "gray" categories and the determination as to whether they should be treated as one or the other.
For example, most geographical places clearly belong to the named entity category. But, on somewhat arbitrary grounds, all nations, countries, states and provinces were assigned as subject concepts so that they would act as classes with other entities mapped to them. It should also be noted that entites or concepts in the gray zone may be treated both as a named entity and a subject concept.
Subject Concepts | Named Entities |
|
|
Though there are shades of gray between subject concepts and named entities, we have found this distinction to be a powerful means for gaining clarity in UMBEL’s design. It provides a clean path for keeping an ontology lightweight while in essence providing infinite extensibility for all manner of named entities and the datasources that contain them. Moreover, the ability to classify named entities into types orthogonal to subject concepts also provides useful guidance for presentation templates that may be automatically invoked in data meshups. But, that is a topic for another day. 🙂