Posted:December 6, 2010

Reference Concepts Provide DirectionAnd, Seven Guidelines for this Second of Two Semantic ‘Gaps’

I have been writing and speaking of late about next priorities to promote the interoperability of linked data and the semantic Web. In a talk a few weeks back to the Dublin Core (DCMI) annual conference, I summarized these priorities as the need to address two aspects of the semantic “gap”:

  1. One aspect is the need for vetted reference sources that provide the entities and concepts for aligning disparate content sources on the Web, and
  2. A second aspect is the need for accurate mapping predicates that can represent the often approximate matches and overlaps of this heterogeneous content.

I discussed the second aspect in an earlier post [1]. In today’s installment, we now focus on the “gap” relating to reference concepts.

The Web Increases the Need for Organization

Interoperability comes down to the nature of things and how we describe those things or quite similar things from different sources. Given the robust nature of semantic heterogeneities in diverse sources and datasets on the Web (or anywhere else, for that matter!) [2], how do we bring similar or related things into alignment? And, then, how can we describe the nature or basis of that alignment?

Of course, classifiers since Aristotle and librarians for time immemorial have been putting forward various classification schemes, controlled vocabularies and subject headings. When one wants to find related books, it is convenient to go to a central location where books about the same or related topics are clustered. And, if the book can be categorized in more than one way — as all are — then something like a card catalog is helpful to find additional cross-references. Every domain of human endeavor makes similar attempts to categorize things.

On the Web we have none of the limitations of physical books and physical libraries; locations are virtual and copies can be replicated or split apart endlessly because of the essentially zero cost of another electron. But, we still need to find things and we still want to gather related things together. According to Svenonius, “Organizing information if it means nothing else means bringing all the same information together” [3]. This sentiment and need remains unchanged whether we are talking about books, Web documents, chemical elements or linked data on the Web.

Like words or terms in human language that help us communicate about things, how we organize things on the Web needs to have an understood and definable meaning, hopefully bounded with some degree of precision, that enables us to have some confidence we are really communicating about the same something with one another. However, when applied to the Web and machine communications, the demands for how these definitions and precisions apply need to change. This makes the notion of a Web basis for organization both easier and harder than traditional approaches to classification.

It is easier because everything is virtual: we can apply multiple classification schema and can change those schema at will. We are not locked into historical anomalies like huge subject areas reserved for arcane or now historically less important topics, such as the Boer Wars or phrenology. We need not move physical books around on shelves in order to accommodate new or expanded classification schemes. We can add new branches to our classification of, say, nanotechnology as rapidly as the science advances.

Yet it is harder because we can no longer rely on the understanding of human language as a basis for naming and classifying things. Actually, of course, language has always been ambiguous, but it is manifestly more so when put through the grinder of machine processing and understanding. Machine processing of related information adds the new hurdles of no longer being able to rely on text labels (“names”) alone as the identifier of things and requires we be more explicit about our concept relationships and connections. Fortunately, here, too, much has been done in helping to organize human language through such lexical frameworks as WordNet and similar.

The Idea and Role of Reference Concepts

Many groups and individuals have been grappling with these questions of how to organize and describe information to aid interoperability in an Internet context. Among many, let me simply mention two because of the diversity their approaches show.

Bernard Vatant, for one, has with his colleagues been an advocate for some time for the need for what he calls “hubjects.” With an intellectual legacy from the Topic Maps community, the idea of “hubjects” is to have a flat space of reference subjects to which related information can link and refer. Each hubject is the hub of a spoked wheel of representations by which the same subject matter from different contexts may be linked. The idea of the flat space or neutrality in the system is to place the “hubject” identifier (referent) outside of other systems that attempt to organize and provide “meta-frameworks” of knowledge organization. In other words, there are no inherent suggested relationships in the reference “hubject” structure: just a large bin of defined subjects to which external systems may link.

A different and more formalized approach has been put forward by the FRSAD working group [4], dealing with subject authority data. Subject authority data is the type of classificatory information that deals with the subjects of various works, such as their concepts, objects, events, or places. As the group stated, the scope of this effort pertains to the “aboutness” of various conceptual works. The framework for this effort, as with the broader FRBR effort, are new standards and approaches appropriate to classifying electronic bibliographic records.

Besides one of the better summaries and introductions to the general problems of subject classification in general, the FRSAD approach makes its main contribution in clearly distinguishing the idea of something (which it calls a thema, or entity used as the subject of a work) from the name or label of something (which it calls nomen). For many in the logic community, steeped in the Peirce triad of sign-object-interpretant [5], this distinction seems rather obvious and straightforward. But, in library science, labels have been used interchangeably as identifiers, and making this distinction clean is a real contribution. The FRSAD effort does not itself really address how the thema are actually found or organized.

The notion of a reference concept used herein combines elements from both of these approaches. A reference concept is the idea of something, or a thema in the FRSAD sense. It is also a reference hub of sorts, similar to the idea of a “hubject”. But it is also much more and more fully defined.

So, let’s first begin by representing a reference concept in relation to its referers and possible linking predicates as follows:

A referer needs to link appropriately to its reference concept, with some illustrative examples shown on the arrows in the diagram. These links are the predicates, ranging from the exact to the approximate, discussed in the first semantic “gap” posting. (Note: see that earlier post for a longer listing of existing, candidate linking predicates. No further comment is made in this present article as to whether those in that earlier posting or the example ones above are “correct” or not; see the first post for that discussion.)

If properly constructed and used, a reference concept thus becomes a fixed point in an information space. As one or more external sources link to these fixed points, it is then possible to gather similar content together and to begin to organize the information space, in the sense of Svenonius. Further, and this is a key difference from the “hubject” approach, if the reference concept is itself part of a coherent structure, then additional value can be derived from these assignments, such as inference, consistence testing, and alignments. (More on this latter point is discussed below.)

Seven Guidelines for a Reference Concept

If the right factors are present, it should be possible to relate and interoperate multiple datasets and knowledge representations. If present, these factors can result in a series of fixed reference points to which external information can be linked. In turn, these reference nodes can form constellations to guide the traversal to desired information destinations on the Web.

Let’s look at the seven factors as to what constitutes guidelines for best practices.

Guideline #1: Persistent URI

By definition, a Web-based reference concept should adhere to linked data principles and should have a URI as its address and identifier. Also, by definition as a “reference”, the vocabulary or ontology in which the concept is a member should be given a permanent and persistent address. Steps should be taken to ensure 24×7 access to the reference concept’s URI, since external sources will be depending on it.

As a general rule, the concepts should also be stated as single nouns and use CamelCase notation (that is, class names should start with a capital letter and not contain any spaces, such as MyNewConcept).

Guideline #2: Preferred Label

Provide a preferred label annotation property that is used for human readable purposes and in user interfaces. For this purpose, a construct such as the SKOS property of skos:prefLabel works well. Note, this label is not the basis for deciding and making linkages, but it is essential for mouseovers, tooltips, interface labels, and other human use factors.

Guideline #3: Definition

Give all concepts and properties a definition. The matching and alignment of things is done on the basis of concepts (not simply labels), which means each concept must be defined [6]. Providing clear definitions (along with the coherency of its structure) gives an ontology its semantics. Remember not to confuse the label for a concept with its meaning. For this purpose, a property such as skos:definition works well, though others such as rdfs:comment or dc:description are also commonly used.

The definition is the most critical guideline for setting the concept’s meaning. Adequate text and content also aid semantic alignment or matching tasks.

Guideline #4: Tagset

Include explicit consideration for the idea of a “semset” or “tagset”, which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept. The semset construct is similar to the “synsets” in Wordnet, but with a broader use understanding. Included in the semset construct is the single (per language) preferred (human-readable) label for the concept, the prefLabel, an embracing listing of alternative phrase and terms for the concept (including acronyms, synonyms, and matching jargon), the altLabels, and a listing of prominent or common misspellings for the concept or its alternatives, the hiddenLabels.

This tagset is an essential basis for tagging unstructured text documents with reference concepts, and for search not limited to keywords. The tagset, in combination with the definition, is also the basis for feeding many NLP-driven methods for concept or ontology alignment.

Guideline #5: Language Independent

The practice of using an identifier separate from label, and language qualified entries for definition, preferred label and tagset (alternative labels) means that multi-lingual versions can be prepared for each concept. Though this is a somewhat complicated best practice in its own right (for example, being attentive to the xml:lang=”en” tag for English), adhering to this practice provides language independence for reference concepts.

Sources such as Wikipedia, with its richness of concepts and multiple language versions, can then be a basis for creation of alternative language versions.

Guideline #6: Range and Domain

Use of domains and ranges assists testing, helps in disambiguation, and helps in external concept alignments. Domains apply to the subject (the left hand side of a triple); ranges to the object (the right hand side of the triple). Domains and ranges should not be understood as actual constraints, but as axioms to be used by reasoners. In general, domain for a property is the range for its inverse and the range for a property is the domain of its inverse.

Example of a Coherent Structure

Guideline #7: Part of Coherent Structure

When reference concepts, properly constructed as above, are also themselves part of a coherent structure, further benefits may be gained. These benefits include inferencing, consistency testing, discovery and navigation. For example, the sample at right shows that a retrieval for Saab cars can also inform that these are automobiles, a brand of automobile, and a Swedish kind of car.

To gain these advantages, the coherent structure need not be complicated. RDFS and SKOS-based lightweight vocabularies can meet this test. Properly constructed OWL ontologies can also provide these benefits.

When best practices are combined with being part of a coherent structure, we can refer to these structures as reference ontologies or domain ontologies.

The State of Reference Concepts

In part, these best practices are met to a greater or lesser extent by many current vocabularies. But few provide complete coverage, and across a broad swath of domain needs, major gaps remain. This unfortunate observation applies to upper-level ontologies, reference vocabularies, and domain ontologies alike.

Upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc and BFO (Basic Formal Ontology). While these have a coherency of construction, they are most often incomplete with respect to reference concept construction. With the exception of SUMO and Cyc, domain coverage is also very general.

Our own UMBEL reference ontology [7] is closest to meeting all criteria. The reference concepts are constructed to standard. But coverage is fairly general, and not directly applicable to most domains (though it can help to orient specific vocabularies).

Wikipedia, as accessed via the DBpedia expression, has good persistent URIs, labels, altLabels and proxy definitions (via the first sentences abstract). As a repository of reference concepts, it is extremely rich. But the organizational structure is weak and provides very few of the benefits for coherent structures noted above.

Going back to 1997, DCMI has been involved in putting forward possible vocabularies that may act as “qualifiers” to dc:subject [8]. Such reference vocabularies can extend from the global or comprehensive, such as the Universal Decimal Classification or Library of Congress Subject Headings, to the domain specific such as MeSH in medicine or Agrovoc in agriculture [9]. One or more concepts in such reference vocabularies can be the object of a dc:subject assertion, for example. While these vocabularies are also a rich source of reference concepts, they are not constructed to standards and at most provide hierarchical structures.

In the area of domain vocabularies, we are seeing some good pockets of practice, especially in the biomedical and life sciences arena [10].  Promising initiatives are also underway in library applications [11] and perhaps other areas unknown to the author.

In summary, I take the state of the art to be quite promising. We know what to do, and it is being done in some pockets. What is needed now is to more broadly operationalize these practices and to extend them across more domains. If we can bring attention to and publicize exemplar vocabularies, we can start to realize the benefits of actual data interoperability on the Web.

[1] See M. K. Bergman, 2010. “The Nature of Connectedness on the Web,” AI3:::Adaptive Information blog, November 22, 2010. See
[2] See M. K. Bergman, 2006. “Sources and Classification of Semantic Heterogeneities,” AI3:::Adaptive Information blog, June 6, 2006. See
[3] From a quote on page 10 by Elaine Svenonius, 2000. The Intellectual Foundation of Information Organization, MIT Press, 2000, 255pp. I’d like to thank Karen Coyle for recently posting this quote on the Linked Library Data (LLD) mailing list.
[4] Marcia Lei Zeng, Maja Žumer, Athena Salaba, eds., 2010. Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model, prepared by the IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR), June 2010, 75 pp. See This effort is part of the broader and well-known FRBR (Functional Requirements of Bibliographic Records) initiative.
[5] C.S. Peirce’s sign relations are covered under the discussion about Semiotic Elements under the Sign section on Peirce in Wikipedia. In the the context of this discussion, the sign corresponds to any of the labels or identifiers associated with the (reference concept) object, the meaning of which is provided by its interpretant definition and useful language labels. See also John Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS’2000 in Darmstadt, Germany, on August 14, 2000; see
[6] As another commentary on the importance of definitions, see
[7] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.
[8] Rebecca Guenther, 1997. Dublin Core Qualifiers/Substructure, October 15, 1997. See
[10] For example, see the Open Biological and Biomedical Ontologies (OBO) initiative and the W3C‘s  Semantic Web Health Care and Life Sciences Interest Group.
[11] See the W3C’s Linked Library Data initiative, with particular attention to topics and use cases.

Posted by AI3's author, Mike Bergman Posted on December 6, 2010 at 2:32 am in Linked Data, Ontologies, Semantic Web, UMBEL | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:November 26, 2010

There’s an Endless Variety of World Views, and Almost as Many Ways to Organize and Describe ThemFriday     Brown Bag Lunch

Ontology is one of the more daunting terms for those exposed for the first time to the semantic Web. Not only is the word long and without many common antecedents, but it is also a term that has widely divergent use and understanding within the community. It can be argued that this not-so-little word is one of the barriers to mainstream understanding of the semantic Web.

The root of the term is the Greek ontos, or being or the nature of things. Literally — and in classical philosophy — ontology was used in relation to the study of the nature of being or the world, the nature of existence. Tom Gruber, among others, made the term popular in relation to computer science and artificial intelligence about 15 years ago when he defined ontology as a “formal specification of a conceptualization.”

While there have been attempts to strap on more or less formal understandings or machinery around ontology, it still has very much the sense of a world view, a means of viewing and organizing and conceptualizing and defining a domain of interest. As is made clear below, I personally prefer a loose and embracing understanding of the term (consistent with Deborah McGuinness’s 2003 paper, Ontologies Come of Age [1]).

There has been a resurgence of interest in ontologies of late. Two reasons have been the emergence of Web 2.0 and tagging and folksonomies, as well as the nascent emergence of the structured Web. In fact, on April 23-24 one of the noted communities of practice around ontologies, Ontolog, sponsored the Ontology Summit 2007 ,”Ontology, Taxonomy, Folksonomy: Understanding the Distinctions.”

These events have sparked my preparing this guide to ontologies. I have to admit this is a somewhat intrepid endeavor given the wealth of material and diversity of opinions.

Friday      Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator more than three years ago on May 16, 2007. This reprise is unchanged since its original posting, though there is a more recent executive-level intro to ontologies on the OpenStructsTechWiki.

Overview and Role of Ontologies

Of course, a fancy name is not sufficient alone to warrant an interest in ontologies. There are reasons why understanding, using and manipulating ontologies can bring practical benefit:

  • Depending on their degree of formalism (an important dimension), ontologies help make explicit the scope, definition, and language and meaning (semantics) of a given domain or world view
  • Ontologies may provide the power to generalize about their domains
  • Ontologies, if hierarchically structured in part (and not all are), can provide the power of inheritance
  • Ontologies provide guidance for how to correctly “place” information in relation to other information in that domain
  • Ontologies may provide the basis to reason or infer over its domain (again as a function of its formalism)
  • Ontologies can provide a more effective basis for information extraction or content clustering
  • Ontologies, again depending on their formalism, may be a source of structure and controlled vocabularies helpful for disambiguating context; they can inform and provide structure to the “lexicons” in particular domains
  • Ontologies can provide guiding structure for browsing or discovery within a domain, and
  • Ontologies can help relate and “place” other ontologies or world views in relation to one another; in other words, ontologies can organize ontologies from the most specific to the most abstract.

Both structure and formalism are dimensions for classifying ontologies, which combined are often referred to as an ontology’s “expressiveness.” How one describes this structure and formality differs. One recent attempt is this figure from the Ontology Summit 2007‘s wrap-up communique:

Ontology Summit 2007 Communique Diagram

Note the bridging role that an ontology plays between a domain and its content. (By its nature, every ontology attempts to “define” and bound a domain.) Also note that the Summit’s 50 or so participants were focused on the trade-off between semantics v. pragmatic considerations. This was a result of the ongoing attempts within the community to understand, embrace and (possibly) legitimize “less formal” Web 2.0 efforts such as tagging and the folksonomies that can result from them.

There is an M.C. Escher-like recursion of the lizard eating its tail when one observes ontologists creating an ontology to describe the ontological domain. The above diagram, which itself would be different with a slight change in Summit participation or editorship, is, of course, but one representative view of the world. Indeed, a tremendous variety of scientific and research disciplines concern themselves with classifying and organizing the “nature of things.” Those disciplines go by such names as logicians, taxonomists, philosophers, information architects, computer scientists, librarians, operations researchers, systematicists, statisticians, historians, and so forth. (In short, given our ontos, every area of human endeavor has the urge to classify, to organize.) In each of these areas not only do their domains differ, but so do the adopted structures and classification schemes often used.

There are at least 40 terms or concepts across these various disciplines, most related to Web and general knowledge content, that have organizational or classificatory aspects that — loosely defined — could be called an “ontology” framework or approach:

Actual domains or subject coverage are then mostly orthogonal to these approaches.

Loosely defined, the number of possible ontologies is therefore close to infinite: domain X perspective X schema. (Just kidding — sort of! In fact, UMBC’s Swoogle ontology search service claims 10,000 ontologies presently on the Web; the actual data from August 2006 ranges from about 16,000 to 92,000 ontologies, depending on how “formal” the definition. These counts are also limited to OWL-based ontologies.)

Many have misunderstood the semantic Web because of this diversity and the slipperiness of the concept of an ontology. This misunderstanding becomes flat wrong when people claim the semantic Web implies one single grand ontology or organizational schema, One Ring to Rule Them All. Human and domain diversities makes this viewpoint patently false.

Diversity, ‘Naturalness’ and Change

The choice of an ontological approach to organize Web and structured content can be contentious. Publishers and authors perhaps have too many choices: from straight Atom or RSS feeds and feeds with tags to informal folksonomies and then Outline Processor Markup Language or microformats. From there, the formalism increases further to include the standard RDF ontologies such as SIOC (Semantically-Interlinked Online Communities), SKOS (Simple Knowledge Organizing System), DOAP (Description of a Project), and FOAF (Friend of a Friend) and the still greater formalism of OWL’s various dialects.

Arguing which of these is the theoretical best method is doomed to failure, except possibly in a bounded enterprise environment. We live in the real world, where multiple options will always have their advocates and their applications. All of us should welcome whatever structure we can add to our information base, no matter where it comes from or how it’s done. The sooner we can embrace content in any of these formats and convert it to a canonical form, we can then move on to needed developments in semantic mediation, the threshold condition for the semantic Web.

There are at least 40 concepts — loosely defined — that could be called an “ontology” framework or approach.

So, diversity is inevitable and should be accepted. But that observation need not also embrace chaos.

In my early training in biological systematics, Ernst Haeckel’s recapitulation theory that “ontogeny recapitulates phylogeny” (note the same ontos root, the difference from ontology being growth v. study) was losing favor fast. The theory was that the development of an organism through its embryological phases mirrors its evolutionary history. Today, modern biologists recognize numerous connections between ontogeny and phylogeny, explain them using evolutionary theory, or view them as supporting evidence for that theory.

Yet, like the construction of phylogenetic trees, systematicists strive for their classifications of the relatedness of organisms to be “natural”, to reflect the true nature of the relationship. Thus, over time, that understanding of a “natural” system has progressed from appearance → embryology → embryology + detailed morphology → species and interbreeding → DNA. While details continue to be worked out, the degree of genetic relatedness is now widely accepted by biologists as a “natural” basis for organizing the Tree of Life.

It is not unrealistic to also seek “naturalness” in the organization of other knowledge domains, to seek “naturalness” in the organization of their underlying ontologies. Like natural systems in biology, this naturalness should emerge from the shared understandings and perceptions of the domain’s participants. While subject matter expertise and general and domain knowledge are essential to this development, they are not the only factors. As tagging systems on the Web are showing, common usage and broad acceptance by the community at hand is important as well.

While it may appear that a domain such as the biological relatedness of organisms is more empirical than the concepts and ambiguous words in most domains of human endeavor, these attempts at naturalness are still not foolish. The phylogeny example shows that understanding changes over time as knowledge is gained. We now accept DNA over the recapitulation theory.

As the formal SKOS organizational schema for knowledge systems recognizes (see below), the ideas of narrower and broader concepts can be readily embraced, as well as concepts of relatedness and aliases (synonyms). These simple constructs, I would argue, plus the application of knowledge being gained in related domains, will enable tomorrow’s understandings to be more “natural” than today’s, no matter the particular domain at hand.

So, in seeking a “naturalness” within our organizational schema, we can also see that change is a constant. We also see that the tools and ideas underlying the seemingly abstract cause of merging and relating existing ontologies to one another will further a greater “naturalness” within our organizations of the world.

A Spectrum of Formalisms

According to the Summit, expressiveness is the extent and ease by which an ontology can describe domain semantics. Structure they define as the degree of organization or hierarchical extent of the ontology. They further define granularity as the level of detail in the ontology. And, as the diagram above alludes, they define other dimensions of use, logical basis, purpose and so forth of an ontology.

The over fifty respondents from 42 communities submitted some 70 different ontologies under about 40 terms to a survey that was used by the Summit to construct their diagram. These submissions included:

. . . formal ontologies (e.g., BFO, DOLCE, SUMO), biomedical ontologies (e.g., Gene Ontology, SNOMED CT, UMLS, ICD), thesauri (e.g., MeSH, National Agricultural Library Thesaurus), folksonomies (e.g., Social bookmarking tags), general ontologies (WordNet, OpenCyc) and specific ontologies (e.g., Process Specification Language). The list also includes markup languages (e.g., NeuroML), representation formalisms (e.g., Entity-Relation model, OWL, WSDL-S) and various ISO standards (e.g., ISO 11179). This [Ontolog] sample clearly illustrates the diversity of artifacts collected under “ontology”.

I think the simplest spectrum for such distinctions is the formalism of the ontology and its approach (and language or syntax, not further discussed here). More formal ontologies have greater expressiveness and structure and inferential power, less formal ones the opposite. Constructing more formal ontologies is more demanding, and takes more effort and rigor, resulting in an approach that is more powerful but also more rigid and less flexible. Like anything else, there are always trade-offs.

Based on work by Leo Obrst of Mitre as interpreted by Dan McCreary, we can view this as a trade-off as one of semantic clarity v. the time and money required to construct the formalism [12, 13]:

Structure and Formalism Increases Semantic Expressiveness
[Click on image for full-size pop-up]

Note this diagram reflects the more conventional, practitioner’s view of the “formal” ontology, which does not include taxonomies or controlled vocabularies (for example) in the definition. This represents the more “closely defined” end of the ontology (semantic) spectrum.

However, since we are speaking here of ontologies and the structured Web or the semantic Web, I believe we need to embrace a concept of ontology aligned to actual practice. Not all content providers can or want to employ ontology engineers to enable formal inferencing of their content. Yet, on the other hand, their content in its various forms does have some meaningful structure, some organization. The trick is to extract this structure for more meaningful use such as data exchange or data merging.

Ontology Approaches on the Web

Under such “loosely defined” bases we can thus see a spectrum of ontology approaches on the Web, proceeding from less structure and formalism to more so:

Type or Schema Examples Comments on Structure and Formalism
Standard Web Page entire Web General metadata fields in the and internal HTML codes and tags provide minimal, but useful sources of structure; other HTTP and retrieval data can also contribute
Blog / Wiki Page examples from Technorati, Bloglines, Wikipedia Provides still greater formalism for the organization and characterization of content (subjects, categories, posts, comments, date/time stamps, etc.). Importantly, with the addition of plug-ins, some of the basic software may also provide other structured characterizations or output (SIOC, FOAF, etc.; highly varied and site-specific given the diversity of publishing systems and plug-ins)
RSS / Atom feeds most blogs and most news feeds RSS extends basic XML schema for more robust syndication of content with a tightly controlled vocabulary for feed concepts and their relationships. Because of its ubiquity, this is becoming a useful baseline of structure and formalism; also, the nature of adoption shows much about how ontological structure is an artifact, not driver, for use
RSS / Atom feeds with tags or OPML Grazr, most newsfeed aggregators can import and export OPML lists of RSS feeds The OPML specification defines an outline as a hierarchical, ordered list of arbitrary elements. The specification is fairly open which makes it suitable for many types of list data. See also OML and XOXO
Hierarchical Faceted Metadata XFML, Flamenco These and related efforts from the information architecture (IA) community are geared more to library science. However, they directly contribute to faceted browsing, which is one of the first practical instantiations of the semantic Web
Folksonomies Flickr, Based on user-generated tags and informal organizations of the same; not linked to any “standard” Web protocols. Both tags and hierarchical structure are arbitrary, but some researchers now believe over large enough participant sets that structural consensus and value does emerge
Microformats Example formats include hAtom, hCalendar, hCard, hReview, hResume, rel-directory, xFolk, XFN and XOXO A microformat is HTML mark up to express semantics with strictly controlled vocabularies. This markup is embedded using specific HTML attributes such as class, rel, and rev. This method is easy to implement and understand, but is not free-form
Embedded RDF RDFa, eRDF An embedded format, like microformats, but free-form, and not subject to the approval strictures associated with microformats
Topic Maps Infoloom, Topic Maps Search Engine A topic map can represent information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (which represent the relationships between them), and occurrences (which represent relationships between topics and information resources relevant to them)
RDF Many; DBpedia, etc. RDF has become the canonical data model since it represents a “universal” conversion format
RDF Schema SKOS, SIOC, DOAP, FOAF RDFS or RDF Schema is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources. This becomes the canonical ontology common meeting ground
OWL Lite Here are some existing OWL ontologies; also see Swoogle for OWL search facilities The Web Ontology Language (OWL) is a language for defining and instantiating Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. OWL is designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. The three language versions are in order of increasing expressiveness
OWL Full
Higher-order “formal” and “upper-level” ontologies SUMO, DOLCE, PROTON, BFO, Cyc, OpenCyc These provide comprehensive ontologies and often related knowledge bases, with the goal of enabling AI applications to perform human-like reasoning. Their reasoning languages often use higher-order logics

As a rule of thumb, items that are less “formal” can be converted to a more formal expression, but the most formal forms can generally not be expressed in less formal forms.

As latter sections elaborate, I see RDF as the universal data model for representing this structure into a common, canonical format, with RDF Schema (specifically SKOS, but also supplemented by FOAF, DOAP and SIOC) as the organizing ontology knowledge representation language (KRL).

This is not to say that the various dialects of OWL should be neglected. In bounded environments, they can provide superior reasoning power and are warranted if they can be sufficiently mandated or enforced. But the RDF and RDF-S systems represent the most tractable “meeting place” or “middle ground,” IMHO.

Still-Another “Level” of Ontologies

As if the formalism dimension were not complicated enough, there is also the practice within the ontology community to characterize ontologies by “levels”, specifically upper, middle and lower levels. For example, chances are that you have heard particularly of “upper-level” ontologies.

The following figure helps illustrate this “level” dimension. This diagram is also from Leo Obrst of Mitre [12], and was also used in another 2006 talk by Jack Park and Patrick Durusau (discussed further below for other reasons):

Ontology Levels

Examples of upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc and BFO (Basic Formal Ontology). Most of the content in their upper-levels is akin to broad, abstract relations or concepts (similar to the primary classes, for example, in a Roget’s Thesaurus — that is, real ontos stuff) than to “generic common knowledge.” Most all of them have both a hierarchical and networked structure, though their actual subject structure relating to concrete things is generally pretty weak [2].

The above diagram conveys a sense of how multiple ontologies can relate to one another both in terms of narrower and broader topic matter and at the same “levels” of generalization. Such “meta-structure” (if you will) can provide a reference structure for relating multiple ontologies to one another.

The relationships and mappings amongst ontologies is a critical infrastructure component of the semantic Web.

It resides exactly in such bindings or relationships that we can foresee the promise of querying and relating multiple endpoints on the Web with accurate semantics in order to connect dots and combine knowledge bases. Thus, the understanding of the relationships and mappings amongst ontologies becomes a critical infrastructural component of the semantic Web.

The SUMO Example

We can better understand these mapping and inter-relationship concepts by using a concrete example with a formal ontology. We’ll choose to use the Suggested Upper Merged Ontology simply because it is one of the best known. We could have also selected another upper-level system such as PROTON [3] or Cyc [4] or one of the many with narrower concept or subject coverage.

SUMO is one of the formal ontologies that has been mapped to the WordNet lexicon, which adds to its semantic richness. SUMO is written in the SUO-KIF language. SUMO is free and owned by the IEEE. The ontologies that extend SUMO are available under GNU General Public License.

The abstract, conceptual organization of SUMO is shown by this diagram, which also points to its related MILO (MId-Level Ontology), which is being developed as a bridge between the abstract content of the SUMO and the richer detail of various domain ontologies:

At this level, the structure is quite abstract. But one can easily browse the SUMO structure. A nifty tool to do so is the KSMSA (Knowledge Support for Modeling and Simulation) ontology browser. Using a hierarchical tree representation, you can navigate through SUMO, MILO, WordNet, and (with the locally installed version) Wikipedia.

The figure below shows the upper-level entity concept on the left; the right-hand panel shows a drill-down into the example atom entity:

Example SUMO Categories
[Click on image for full-size pop-up]

These views may be a bit misleading because the actual underlying structure, while it has hierarchical aspects as shown here, really is in the form of a directed acyclic graph (showing other relatedness options, not just hierarchical ones). So, alternate visualizations include traditional network graphs.

The other thing to note is that the “things” covered in the ontology, the entities, are also fairly abstract. That is because the intention of a standard “upper-level” ontology is to cover all relevant knowledge aspects of each entity’s domain. This approach results in a subject and topic coverage that feels less “concrete” than the coverage in, say, an encyclopedia, directory or card catalog.

Ontology Binding and Integration Mechanisms

According to Park and Durusau, upper ontologies are diverse, middle ontologies are even more diverse, and lower ontologies are more diverse still. A key observation is that ontological diversity is a given and increases as we approach real user interaction levels. Moreover, because of the “loose” nature of ontologies on the Web (now and into the future), diversity of approach is a further key factor.

Recall the initial discussion on the role and objectives of ontologies. About half of those roles involve effectively accessing or querying more than one ontology. The objective of “upper-level” ontologies, many with their own binding layers, is also expressly geared to ontology integration or federation. So, what are the possible mechanisms for such binding or integration?

A fundamental distinction within mechanisms to combine ontologies is whether it is a unified or centralized approach (often imposed or required by some party) or whether it is a schema mapping or binding approach. We can term this distinction centralized v. federated.

Centralized Approaches

Centralized approaches can take a number of forms. At the most extreme, adherence to a centralized approach can be contractual. At the other end are reference models or standards. For example, illustrative reference models include:

  • the Data Reference Model (DRM), one of the five reference models of the Federal Enterprise Architecture (FEA)
  • UDEF (Unified Data Element Framework), an approach toward a unified description framework, or
  • the eXtended MetaData Registry (XMDR) project.

Though I have argued that One Ring to Rule them All is not appropriate to the general Web, there may be cases within certain enterprises or where through funding clout (such as government contracts), some form of centralized approach could be imposed [5]. And, frankly, even where compliance can not be assured, there are advantages in economy, efficiency and interoperability to attempt central ontologies. Certain industries — notably pharmaceuticals and petrochemicals — and certain disciplines — such as some areas of biology among others — have through trade associations or community consensus done admirable jobs in adopting centralized approaches.

Federated Approaches

However, combining ontologies in the context of the broader Internet is more likely through federated approaches. (Though federated approaches can also be improved when there are consensual standards within specific communities.) The key aspect of a federated approach is to acknowledge that multiple schema need to be brought together, and that each contributing data set and its schema will not be altered directly and will likely remain in place.

Thus, the key distinctions within this category are the mechanisms by which those linkages may take place An important goal in any federated approach is to achieve interoperability at the data or instance level without unacceptable loss of information or corruption of the semantics. Numerous specific approaches are possible, but three example areas in RDF-topic map interoperability, the use of “subject maps”, and binding layers can illustrate some of the issues at hand.

In 2006 the W3C set up a working group to look at the issue of RDF and topic maps interoperability. Topic maps have been embraced by the library and information architecture community for some time, and have standards that have been adopted under ISO. Somewhat later but also in parallel was the development of the RDF standard by W3C. The interesting thing was that the conceptual underpinnings and objectives between these two efforts were quite similar. Also, because of the substantive thrust of topic maps and the substantive needs of its community, quite a few topic maps had been developed and implemented.

One of the first efforts of the W3C work group was to evaluate and compare five or six extant proposals for how to relate RDF and topic maps [6]. That report is very interesting reading for any one desirous of learning more about specific issues in combining ontologies and their interoperability. The result of that evaluation then led to some guidelines for best practices in how to complete this mapping [7]. Evaluations such as these provide confidence that interoperability can be achieved between relatively formal schema definitions without unacceptable loss in meaning.

A different, “looser” approach, but one which also grew out of the topic map community, is the idea of “subject maps.” This effort, backed by Park and Durusau noted above, but also with the support of other topic map experts such as Steve Newcomb and Robert Barta via their proposed Topic Maps Reference Model (ISO 13250-5), seems to be one of the best attempts I’ve seen that both respects the reality of the actual Web while proposing a workable, effective scheme for federation.

The basic idea of a subject map is built around a set of subject “proxies.” A subject proxy is a computer representation of a subject that can be implemented as an object, must have an identity, and must be addressable (this point provides the URI connector to RDF). Each contributing schema thus defines its own subjects, with the mappings becoming meta-objects. These, in turn, would benefit from having some accepted subject reference schema (not specifically addressed by the proponents) to reduce the breadth of the ultimate mapped proxy “space.”

I don’t have the expertise to judge further the specifics, but I find the presentation and papers by Park and Durusau, Avoiding Hobson’s Choice In Choosing An Ontology and Towards Subject-centric Merging of Ontologies to be worthwhile reading in any case. I highly recommend these papers for further background and clarity.

As the third example, “binding layers” are a comparatively newer concept. Leading upper-level ontologies such as SUMO or PROTON propose their own binding protocols to their “lower” domains, but that approach takes place within the construct of the parent upper ontology and language. Such designs are not yet generalized solutions. By far the most promising generalized binding solution is the SKOS (Simple Knowledge Organization System). Because of its importance, the next section is devoted to it.

Finally, with respect to federated approaches, there are quite a few software tools that have been developed to aid or promote some of these specific methods. For, example, about twenty of the software applications in my Sweet Tools listing of 500+ semantic Web and -related tools could be interpreted as aiding ontology mapping or creation. You may want to check out some of these specific tools depending on your preferred approach [8].

The Role of SKOS – the Simple Knowledge Organization System

SKOS, or the Simple Knowledge Organization System, is a formal language and schema designed to represent such structured information domains as thesauri, classification schemes, taxonomies, subject-heading systems, controlled vocabularies, or others; in short, most all of the “loosely defined” ontology approaches discussed herein. It is a W3C initiative more fully defined in its SKOS Core Guide [9].

SKOS is built upon the RDF data model of the subject-predicate-object “triple.” The subjects and objects are akin to nouns, the predicate a verb, in a simple Dick-sees-Jane sentence. Subjects and predicates by convention are related to a URI that provides the definitive reference to the item. Objects may be either a URI resource or a literal (in which case it might be some indexed text, an actual image, number to be used in a calculation, etc.).

Being an RDF Schema simply means that SKOS adds some language and defined relationships to this RDF baseline. This is a bit of recursive understanding, since RDFS is itself defined in RDF by virtue of adding some controlled vocabulary and relations. The power, though, is that these schema additions are also easily expressed and referenced.

This RDFS combination can thus be shown as a standard RDF triple graph, but with the addition of the extended vocabulary and relations:

Standard RDF Graph Model

The power of the approach arises from the ability of the triple to express virtually any concept, further extended via the RDFS language defined for SKOS. SKOS includes concepts such as “broader” and “narrower”, which enable hierarchical relations to be modeled, as well as “related” and “member” to support networks and arrays, respectively [9].

We can visualize this transforming power by looking at how an “ontology” in a totally foreign scheme can be related to the canonical SKOS scheme. In the figure below the left-hand portion shows the native hierarchical taxonomy structure of the UK Archival Thesaurus (UKAT), next as converted to SKOS on the right (with the overlap of categories shown in dark purple). Note the hierarchical relationships visualize better via a taxonomy, but that the RDF graph model used by SKOS allows a richer set of additional relationships including related and alternative names:

Example Structural Comparison of Hierarchical Taxonomy with Network Graph
[Click on image for full-size pop-up]

SKOS also has a rich set of annotation and labeling properties to enhance human readability of schema developed in it [9]. There is also a useful draft schema that the W3C’s SWEO (Semantic Web Education and Outreach) group is developing to organize semantic Web-related information [10].

Combined, these constructs provide powerful mechanisms for giving contributory ontologies a common conceptualization. When added to other sibling RDF schema such as FOAF or SIOC or DOAP, still additional concepts can be collated.


While not addressed directly in this piece, it is obviously of first importance to have content with structure before the questions of connecting that information can even arise. Then, that structure must also be available in a form suitable for merging or connection.

At that point, the subjects of this posting come into play.

We are stubbing our toes on the rocks while we gaze at the heavens.

We see that the daily Web has a diversity of schema or ontologies “loosely defined” for representing the structure of the content. These representations can be transferred to more complex schema, but not in the opposite direction. Moreover, the semantic basis for how to make these mappings also needs some common referents.

RDF provides the canonical data model for the data transfers and representations. RDFS, especially in the form of SKOS, appears to form one basis for the syntax and language for these transformations. And SKOS, with other schema, also appears to offer much of the appropriate “middle ground” for data relationships mapping.

However, lacking in this story is a referential structure for subject relationships [11]. (Also lacking are the ultimately critical domain specifics required for actual implementation.)

Abstract concepts of interest to philosophers and deep thinkers have been given much attention. Sadly, to date, concrete subject structures in which tangible things and tangible actions can be shared, is still very, very weak. We are stubbing our toes on the rocks while we gaze at the heavens.

Yet, despite this, simple and powerful infrastructures are well in-hand to address all foreseeable syntactic and semantic issues. There appear to be no substantive limits to needed next steps.

Lastly, many valuable resources for further reading and learning may be found within the Ontolog Community, W3C, TagCommons and Topics Maps groups. Enjoy! And be wary of ontology no longer.

[1] Deborah L. McGuinness. “Ontologies Come of Age”. In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003. See
[2] I think it would be much clearer to refer to “upper level” ontologies as abstract or conceptual, “mid levels” as mapping or binding, and “lower levels” as domain (without any hierarchical distinctions such as lower or lowest or sub-domain), but current practice is probably too entrenched to change now.
[3] There are many aspects that make PROTON one of the more attractive reference ontologies. The PROTON ontology (PROTo ONtology), developed within the scope of the SEKT project, is attractive because of its understandability, relatively small size, modular architecture and a simple subsumption hierarchy. It is available in an OWL Lite form and is easy to adopt and extend. On the face of it, the Topic class within PROTON, which is meant to serve as a bridge between different ontologies, may also provide a binding layer to specific subject topics as sub-classes or class instances.
[5] Even with such clout, it is questionable to get rather complete adherence, as Ada showed within the Federal government. However, where circumstances allow it, central schema and ontologies may be worth pursuing because of improved interoperability and lower costs, even where some portions do not adhere and are more chaotic like the standard Web.
[6] See, A Survey of RDF/Topic Maps Interoperability Proposals, W3C Working Group Note 10 February 2006, Pepper, Vitali, Garshol, Gessa, Presutti (eds.)
[7] See, Guidelines for RDF/Topic Maps Interoperability, W3C Editor’s Draft 30 June 2006, Pepper, Presutti, Garshol, Vitali (eds.)
[8] Here are some Sweet Tools that may have a usefulness to ontology federation and creation:
  • Adaptiva — is a user-centered ontology building environment, based on using multiple strategies to construct an ontology, minimising user input by using adaptive information extraction
  • Altova SemanticWorks — is a visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design
  • CMS — the CROSI Mapping System is a structure matching system that capitalizes on the rich semantics of the OWL constructs found in source ontologies and on its modular architecture that allows the system to consult external linguistic resources
  • ConcepTool — is a system to model, analyze, verify, validate, share, combine, and reuse domain knowledge bases and ontologies, reasoning about their implication
  • ConRef — is a service discovery system which uses ontology mapping techniques to support different user vocabularies
  • FOAM — is the Framework for Ontology Alignment and Mapping. It is based on heuristics (similarity) of the individual entities (concepts, relations, and instances)
  • hMAFRA (Harmonize Mapping Framework) — is a set of tools supporting semantic mapping definition and data reconciliation between ontologies. The targeted formats are XSD, RDFS and KAON
  • IF-Map — is an Information Flow based ontology mapping method. It is based on the theoretical grounds of logic of distributed systems and provides an automated streamlined process for generating mappings between ontologies of the same domain
  • IODT — is IBM’s toolkit for ontology-driven development. The toolkit includes EMF Ontology Definition Metamodel (EODM), EODM workbench, and an OWL Ontology Repository (named Minerva)
  • KAON — is an open-source ontology management infrastructure targeted for business applications. It includes a comprehensive tool suite allowing easy ontology creation and management and provides a framework for building ontology-based applications. An important focus of KAON is scalable and efficient reasoning with ontologies
  • LinKFactory — is Language & Computing’s ontology management tool. It provides an effective and user-friendly way to create, maintain and extend extensive multilingual terminology systems and ontologies (English, Spanish, French, etc.). It is designed to build, manage and maintain large, complex, language independent ontologies
  • M3t4.Studio Semantic Toolkit — is Metatomix’s free set of Eclipse plug-ins to allow developers to create and manage OWL ontologies and RDF documents
  • MAFRA Toolkit — the Ontology MApping FRAmework Toolkit allows to create semantic relations between two (source and target) ontologies, and apply such relations in translating source ontology instances into target ontology instances
  • OntoEngine — is a step toward allowing agents to communicate even though they use different formal languages (i.e., different ontologies). It translates data from a “source” ontology to a “target.”
  • OntoPortal — enables the authoring and navigation of large semantically-powered portals
  • OWLS-MX — the hybrid semantic Web service matchmaker OWLS-MX 1.0 utilizes both description logic reasoning, and token based IR similarity measures. It applies different filters to retrieve OWL-S services that are most relevant to a given query
  • pOWL — is a semantic Web development platform for ontologies in PHP. pOWL consists of a number of components, including RAP
  • Protege — is an open source visual ontology editor written in Java with many plug-in tools
  • Semantic Net Generator — is a utility for generating topic maps automatically from different data sources by using rules definitions specified with Jelly XML syntax. This Java library provides Jelly tags to access and modify data sources (also RDF) to create a semantic network
  • SOFA — is a Java API for modeling ontologies and Knowledge Bases in ontology and Semantic Web applications. It provides a simple, abstract and language neutral ontology object model, inferencing mechanism and representation of the model with OWL, DAML+OIL and RDFS languages
  • Terminator — is a tool for creating term to ontology resource mappings (documentation in Finnish)
  • WebOnto — supports the browsing, creation and editing of ontologies through coarse grained and fine grained visualizations and direct manipulation.
[9] The SKOS language has the following classes:
  • CollectableProperty — A property which can be used with a skos:Collection
  • Collection — A meaningful collection of concepts
  • Concept — An abstract idea or notion; a unit of thought
  • ConceptScheme — A set of concepts, optionally including statements about semantic relationships between those concepts. Thesauri, classification schemes, subject heading lists, taxonomies, ‘folksonomies’, and other types of controlled vocabulary are all examples of concept schemes. Concept schemes are also embedded in glossaries and terminologies.
  • OrderedCollection — An ordered collection of concepts, where both the grouping and the ordering are meaningful
. . . and the following properties:
  • altLabel — An alternative lexical label for a resource. Acronyms, abbreviations, spelling variants, and irregular plural/singular forms may be included among the alternative labels for a concept
  • altSymbol — An alternative symbolic label for a resource
  • broader — A concept that is more general in meaning. Broader concepts are typically rendered as parents in a concept hierarchy (tree)
  • changeNote — A note about a modification to a concept
  • definition — A statement or formal explanation of the meaning of a concept
  • editorialNote — A note for an editor, translator or maintainer of the vocabulary
  • example — An example of the use of a concept
  • hasTopConcept — A top level concept in the concept scheme
  • hiddenLabel — A lexical label for a resource that should be hidden when generating visual displays of the resource, but should still be accessible to free text search operations
  • historyNote — A note about the past state/use/meaning of a concept
  • inScheme — A concept scheme in which the concept is included. A concept may be a member of more than one concept scheme
  • isPrimarySubjectOf — A resource for which the concept is the primary subject
  • isSubjectOf –A resource for which the concept is a subject
  • member — A member of a collection
  • memberList — An RDF list containing the members of an ordered collection
  • narrower — A concept that is more specific in meaning. Narrower concepts are typically rendered as children in a concept hierarchy (tree)
  • note — A general note, for any purpose. The other human-readable properties of definition, scopeNote, example, historyNote, editorialNote and changeNote are all sub-properties of note
  • prefLabel — The preferred lexical label for a resource, in a given language. No two concepts in the same concept scheme may have the same value for skos:prefLabel in a given language
  • prefSymbol — The preferred symbolic label for a resource
  • primarySubject — A concept that is the primary subject of the resource. A resource may have only one primary subject per concept scheme
  • related — A concept with which there is an associative semantic relationship
  • scopeNote — A note that helps to clarify the meaning of a concept
  • semanticRelation — A concept related by meaning. This property should not be used directly, but as a super-property for all properties denoting a relationship of meaning between concepts
  • subject — A concept that is a subject of the resource
  • subjectIndicator — A subject indicator for a concept. [The notion of 'subject indicator' is defined here with reference to the latest definition endorsed by the OASIS Published Subjects Technical Committee]
  • symbol — An image that is a symbolic label for the resource. This property is roughly analagous to rdfs:label, but for labelling resources with images that have retrievable representations, rather than RDF literals. Symbolic labelling means labelling a concept with an image.
[10] The SWEO classification ontology is still under active development and has these draft classes. Note, however, the relative lack of actual subject or topic matter:
Classes are currently defined as:
  • article – magazine article
  • blog – blog discussing SW topics
  • book – indicates a textbook, applies to the book’s home page, review or listing in Amazon or such
  • casestudy – Article on a business case
  • conference/event – conferences or events where you can learn about the Semantic Web
  • demo/demonstration – interactive SW demo
  • forum – a forum on semantic web or related topics
  • presentation – Powerpoint or similar slide show
  • person – If this is a person’s home page or blog, see below
  • publication – a scientific publication
  • ontology – a formalisation of a shared conceptualization using OWL, RDFS, SKOS or something else based on RDF
  • organization – If the page is the home page of an organization, research, vendor etc, see below
  • portal – a portal website Semantic Web or related topics, usually hosting information items, mailinglists, community tools
  • project – a research (for example EU-IST) or other project that addresses Semantic Web issues
  • mailinglist – a mailinglist on semantic Web or related topics
  • person – ideally a person that is well known regarding the Semantic Web (people who can do keynote speakers), may also be any related person
  • press – a press release by a company or an article about Semantic Web
  • recommended – If the resource is seen to be in the top 10 of its kind
  • specification – a Semantic Web specification (RDF, RDF/S, OWL, etc)
  • categories – (perhaps using tags or other free form annotation
  • successstory – Article that can contain advertisment and clearly shows the benefit of semantic web
  • tutorial – a tutorial teaching some aspect of semantic web, an example
  • vocabulary – a RDF vocabulary
  • software project/tool – For product/project home pages
If the page describes an organization, it can be tagged as:
  • vendor
  • research
  • enduser
If the page is a person’s home page or blog or similar, it could be:
  • opinionleader
  • researcher
  • journalist
  • executive
  • geek
The type of audience can also be tagged, for example:
  • general public
  • beginners
  • technicians
  • researchers.
[11] The OASIS Topic Maps Published Subjects Technical Committee was formed a number of years back to promote Topic Maps interoperability through the use of Published Subjects Indicators (PSIs). Their resulting report was a very interesting effort that unfortunately did not lead to wide adoption, perhaps because the effort was a bit ahead of its time or it was in advance of the broader acceptance of RDF. This general topic is the subject of a later posting by me.
[12] See further, Leo Obrst, “The Semantic Spectrum & Semantic Models,” a Powerpoint presentation (–LeoObrst_20060112.ppt)
made as part of an Ontolog Forum ( presentation in two parts, “What is an Ontology? – A Briefing on the Range of Semantic Models” (see, in January 2006. Leo Obrst is a principal artificial intelligence scientist at MITRE’s ( Center for Innovative Computing and Informatics and a co-convener of the Ontolog Forum. His presentation is a rich source of practical overview information on ontologies.
[13] The actual diagram is an unattributed modification by Dan McCreary (see based on Obrst’s material in [12].
Posted:November 15, 2010

UMBEL Vocabulary and Reference Concept OntologySignificant Upgrades, Changes Based on Two Years of Use

Structured Dynamics and Ontotext are pleased to announce the latest release of UMBEL, version 0.80. It has been more than a year since the last update of UMBEL, and well past earlier announced targets for this upgrade. UMBEL was first publicly released as version 0.70 on July 16, 2008.

UMBEL (Upper Mapping and Binding Exchange Layer) has two roles. It is firstly a vocabulary for building reference ontologies to guide the interoperation of domain information. It is secondly a reference ontology in its own right that contains about 21,000 general reference concepts. With more than two years of practical experience with UMBEL, much has been learned.

This learning has now been reflected into five major changes for the system, embodying numerous minor changes. I summarize these major changes below. The formal release of UMBEL v. 0.80 is also being accompanied by a complete revamping and updating of the project’s Web site. I hope you will find these changes as compelling and exciting as we do.

In the broader context, it is probably best to view this release as but the interim first step of a two-step release sequence leading to UMBEL version 1.00. We are on track to release version 1.00 by the end of this year. This second step will include a complete mapping to the PROTON upper-level ontology and the re-organization and categorization of Wikipedia content into the UMBEL structure. We anticipate the pragmatic challenges in this massive effort will also inform some further refinements to UMBEL itself, which will also lead to further changes in its specification.

Nonetheless, UMBEL v. 0.80 does embody most of the language and structural changes anticipated over this evolution. It is fully ready for use and evaluation; it will, for example, be incorporated into a next version of FactForge. But, do be aware that the major revisions discussed herein are subject to further refinements as the efforts leading to version 1.00 are culminated over the next few weeks.

Let’s now overview these major changes in UMBEL v. 0.80.

Major Change #1: Clarification of Dual Role

The genesis of UMBEL more than three years ago was the recognition that data interoperability on the semantic Web depended on shared reference concepts to link related content. We spent much effort to construct such a reference structure with about 21,000 concepts. That purpose remains.

But, the way in which we created this structure — its vocabulary — has also proven to have value in its own right. The same basic way that we constructed the original UMBEL we have now applied to multiple, specific domain ontologies. With use, it has become clear that the vocabulary for creating reference ontologies is on an equal footing to the reference concepts themselves.

With this understanding has come clarity of role and description of UMBEL. With version 0.80, we now have explicitly split and defined these roles and files.

The UMBEL Vocabulary

Thus, UMBEL’s first purpose is to provide a general vocabulary (the UMBEL Vocabulary) of classes and predicates for describing domain ontologies, with the specific aim of promoting interoperability with external datasets and domains. It is usable exclusive of the UMBEL Reference Concept Ontology.

The UMBEL Vocabulary recognizes that different sources of information have different contexts and different structures. A meaningful vocabulary is necessary that can express potential relationships between two information sources with respect to their differences in structure and scope. By nature, these connections are not always exact. Means for expressing the “approximateness” of relationships are essential.

The vocabulary has been greatly simplified from earlier versions (see Major Change #2 below); it now defines two classes:

  • RefConcept
  • SuperType

These are explained further below. And, the vocabulary has 10 properties:

  • isAbout
  • isRelatedTo
  • isLike
  • hasMapping
  • hasCharacteristic
  • isCharacteristicOf
  • preflabel
  • altLabel
  • hiddenLabel
  • definition.

(Note, the latter four are also in SKOS; see [1].)

In addition, UMBEL re-uses certain properties from external vocabularies. These classes and properties are used to instantiate the UMBEL Reference Concept ontology (see next), and to link Reference Concepts to external ontology classes. For more detail on the vocabulary see Part I: Vocabulary Specification in the specifications.

The UMBEL Reference Concept Ontology

The second purpose of UMBEL is to provide a coherent framework of broad subjects and topics, the “reference concepts” or RefConcepts, expressed as the UMBEL Reference Concept Ontology. The RefConcepts act as binding nodes for mapping relevant Web-accessible content, also with the specific aim of promoting interoperability and to reason over a coherent reference structure and its linked resources. UMBEL presently has about 21,000 of these reference concepts drawn from the Cyc knowledge base, which are organized into more than 30 mostly disjoint SuperTypes (see Major Change #3).

The UMBEL Reference Concept Ontology is, in essence, a content graph of subject nodes related to one another via broader-than and narrower-than relations. In turn, these internal UMBEL RefConcepts may be related to external classes and individuals (instances and named entities) via a set of relational, equivalent, or alignment predicates (the UMBEL Vocabulary, see above).

The actual RefConcepts used are the least changed part in UMBEL from previous versions, and still have the same identifiers as prior versions. The Reference Concept Ontology now uses a recently updated release of the OpenCyc KB v3. Cycorp also added some additional mapping predicates in this release that allows items such as fields of study to be added to the structure. (Thanks, Cycorp!)

Here is a large-graph view of the 21,000 reference concepts in the ontology (click to expand; large file):

UMBEL Reference Concept Ontology

More detail on the RefConcepts is provided in Part II: Reference Concepts Specification of the full specifications.

Major Change #2: Reference Concepts and Predicate Simplification

Another set of major changes was the simplification and streamlining of the predicates and construction of the UMBEL Vocabulary [2]. Again, the specifications detail these changes, but the significant ones include:

Natural World Natural Phenomena
Natural Substances
Living Things Prokaryotes
Protists & Fungus
Person Types
Human Activities Organizations
Finance & Economy
Time-related Events
Human Works Products
Food or Drink
Human Places Geopolitical
Workplaces, etc.
Information Chemistry (n.o.c)
Audio Info
Visual Info
Written Info
Structured Info
Notations & References
Descriptive Attributes
Classificatory Abstract-level
Markets & Industries
Dimensions and SuperTypes
  • Changed the name of ‘Subject Concepts’ (SubjectConcept, or SC) to ‘Reference Concepts’ (RefConcept, or RC). The umbel:SubjectConcept class got deprecated, and the umbel:RefConcept class got added. As noted by many practitioners, the rather tortured use of the earlier “subject concepts” was questioned. The change in this new version reflects the actual reference use of the concepts and ontologies that employ them
  • Dropped the “SemSet” class, and replaced the same idea of providing multiple tagging options via the best practice of the use of umbel:preLabel and multiple umbel:altLabels and umbel:hiddenLabels. This simplifies the language and brings usage into conformance with standard practice and reasoners
  • With the addition of SuperTypes (see next Major Change), dropped the distinction for “abstract concepts” and rolled their earlier use into the standard RefConcepts
  • The simplification due to OWL 2 metamodeling (see Major Change #4) enabled the removal of many earlier predicates and their inverse properties,
  • With experience gained through linking datasets and their attributes to ontologies [3], added predicates (hasCharacteristic and isCharacteristicOf) for relating external properties, and
  • Many other streamlining changes and improvements to property specifications.

See further the Part II in the full specifications.

Major Change #3: SuperTypes

Shortly after the first public release of UMBEL, it was apparent that the 21,000 reference concepts tended to “cluster” into some natural groupings. Further, upon closer investigation, it was also apparent that most of these concepts were disjoint with one another. As subsequent analysis showed, more fully detailed in the Annex G document, fully 75% of the reference concepts in the UMBEL ontology are disjoint with one another.

Natural clusters provide a tractable way to access and manage some 21,000 items. And, large degrees of disjointedness between concepts also can lead to reasoning benefits and faster processing and selection of those items.

For these reasons a dedicated analysis to analyze and assign all UMBEL reference concepts to a new class of SuperTypes was undertaken. SuperTypes are now a major enhancement to UMBEL v. 0.80. The assignment results and the SuperType specification are discussed in Part II, with full analysis results in Annex G.

In addition, all of these SuperTypes are clustered into nine “dimensions”, which are useful for aggregation and organizational purposes, but which have no direct bearing on logic assertions or disjointedness testing. These nine dimensions, with their associated SuperTypes, are shown in the table to the right. Note the last two dimensions (and four SuperTypes), shown in italics, are by definition non-disjoint.

The construct of the SuperType may be applied to any domain ontology constructed with the UMBEL Vocabulary. The UMBEL Reference Concept Ontology includes all disjoint assertions for all of its RefConcepts.

Major Change #4: OWL 2 Compliance

One of the most challenging improvements in the new UMBEL version 0.80 was to make its vocabulary and ontology compliant with the new OWL 2 Web Ontology Language. We wanted to convert to OWL 2 in order to:

  • Use OWL reasoners
  • Load the full UMBEL into the Protégé 4 ontology editor
  • Use the OWL API, consistent with many other ontology tools we are pursuing, and
  • Take advantage of a neat trick in OWL 2 called “punning“.

The latter reason is the most important given the reference role of UMBEL and ontologies based on the UMBEL Vocabulary. It is not unusual to want to treat things either as a class or an instance in an ontology. Among other aspects, this is known as metamodeling and it can be accomplished in a number of ways. “Punning” is one metamodeling technique that importantly allows us to use concepts in ontologies as either classes or instances, depending on context.

To better understand why we should metamodel, let’s look at a couple of examples, both of which combine organizing categories of things and then describing or characterizing those things. This dual need is common to most domains [4].

As one example, let’s take a categorization of apes as a kind of mammal, which is then a kind of animal. In these cases, ape is a class, which relates to other classes, and apes may also have members, be they particular kinds of apes or individual apes. Yet, at the same time, we want to assert some characteristics of apes, such as being hairy, two legs and two arms, no tails, capable of walking bipedally, with grasping hands, and with some being endangered species. These characteristics apply to the notion of apes as an instance.

As another example we may have the category of trucks, which may further be split into truck types, brands of trucks, type of engine, and so forth. Yet, again, we may want to characterize that a truck is designed primarily for the transport of cargo (as opposed to automobiles for people transport), or that trucks may have different drivers license requirements or different license fees than autos. These descriptive properties refer to trucks as an instance.

These mixed cases combine both the organization of concepts in relation to one another and with respect to their set members, with the description and characterization of these concepts as things unto themselves. This is a natural and common way to express most any domain of interest. It is also a general requirement for a reference ontology, as we use in the sense of UMBEL.

When we combine this “punning” aspect of OWL 2 with our standard way of relating concepts in a hierarchical manner, this general view of the predicates within UMBEL emerges (click to expand):

UMBEL Predicates - click to expand

On the left-hand side (quadrants A and C) is the “class” view of the structure; the right-hand side is the “individual” (or instance) view of the structure (quadrants B and D). These two views represent alternative perspectives for looking at the UMBEL reference concepts based on metamodeling.

The top side of the diagram (quadrants A and B) is an internal view of UMBEL reference concepts (RefConcept) and their predicates (properties). This internal view applies to the UMBEL Reference Concept Ontology or to domain ontologies based on the UMBEL Vocabulary. These relationships show how RefConcepts are clustered into SuperTypes or how hierarchical relationships are established between Reference Concepts (via the skos:narrowerTransitive and skos:broaderTransitive relations). The concept relationships and their structure is a “class” view (quadrant A); treating these concepts as instances in their own right and relating them to SKOS is provided by the right-hand “individual” (instance) view (quadrant B).

The bottom of the diagram (quadrants C and D) shows either classes or individuals in external ontologies. The key mapping predicates cross this boundary (the broad dotted line) between UMBEL-based ontologies and external ontologies. See further Part I in the full specification for more detailed discussed of this figure and its relation to metamodelling.

Major Change #5: Documentation and Packaging

These changes also warranted better documentation and a better project Web site. From a documentation standpoint, the organization was simplified between the actual specifications and related annexes. Also, because of a more collaborative basis resulting from the new partnership with Ontotext, we also established an internal wiki following TechWiki designs. Initial authoring occurs there, with final results re-purposed and published on the project Web site.

The UMBEL Web site also underwent a major upgrade. It is now based on Drupal, and therefore will be able to embrace our conStruct advances in visualization and access over time. We also posted the full Reference Concept Ontology as an OWLDoc portal.

We feel these changes have now resulted in a clean and easy-to-maintain framework for the next phase in UMBEL’s growth and maturation.

Next Steps and Version

As noted in the intro, this version is but an interim step to the pending next release of UMBEL v. 1.00. This next version will provide mappings to leading ontologies and knowledge bases, as well as the upgrade of existing Web services and other language support features. Intended production or commercial uses would best await this next version.

However, the current version 0.80 is fully consistent and OWL 2-compliant. It loads and can be reasoned over with OWL 2 reasoners (see those available with Protégé 4.1, for example). We encourage you to download, test and comment upon this version. Specifics are:

As co-editors, Frédérick Giasson and I are extremely enthused about the changes and cleanliness of version 0.80. It is already helping our client work. We think these improvements are a good harbinger for UMBEL version 1.00 to come by the end of the year. We hope you agree.

[1] Some relevant SKOS properties are now shown in the UMBEL namespace. This is a technical issue with regard to SKOS needing to have a separate namespace for its DL version, which has been brought up with the relevant Work Group individuals at the W3C. As soon as this oversight is rectified, the SKOS predicates now in UMBEL will be deprecated in favor of the appropriate ones in SKOS.
[2] We’d especially like to thank Jack Park for a series of critical email exchanges in November 2008 regarding terminology and purpose. We are, of course, solely responsible for the changes we did invoke.
[3] See, for example, the site, with its richness of indicator and demographic data. UMBEL co-editors Bergman and Giasson have each blogged about this site.
[4] Much of this material is drawn from M.K. Bergman, “Metamodeling in Domain Ontologies,” AI3:::Adaptive Information blog, Sept 20, 2010; see In the reference ontologies that are the focus here, we often want to treat our concepts as both classes and instances of a class. This is known as “metamodeling” or “metaclassing” and is enabled by “punning” in OWL 2. For example, here a case cited on the OWL 2 wiki entry on “punning“:
People sometimes want to have metaclasses. Imagine you want to model information about the animal kingdom. Hence, you introduce a class a:Eagle, and then you introduce instances of a:Eagle such as a:Harry.
(1) a:Eagle rdf:type owl:Class
(2) a:Harry rdf:type a:Eagle
Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:
(3) a:Eagle rdf:type a:Species
(4) a:Eagle rdf:type a:EndangeredSpecies.
This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005. For some other examples, see Bernd Neumayr and Michael Schrefl, 2009. “Multi-Level Conceptual Modeling and OWL (Draft, 2 May – Including Full Example)”; see

Posted by AI3's author, Mike Bergman Posted on November 15, 2010 at 1:54 am in Ontologies, Semantic Web, Structured Dynamics, UMBEL | Comments (3)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:September 27, 2010

Resources Useful to the Understanding of Ontologies and the Semantic Web

Over the past few weeks we have been publishing a series of general background documents and tutorials useful to the understanding of ontologies. These entries have been prepared specifically with the non-expert and end user in mind.

The Ontology Tutorial Series is now complete as initially scoped. These various articles, in both originally posted form and as kept current on the OpenStructs‘ TechWiki [1], are:


[1] The tutorials were first published on this blog over the period of Aug. 9 to Sept. 20, 2010. They are now permanently maintained and updated on the TechWiki.

Posted by AI3's author, Mike Bergman Posted on September 27, 2010 at 1:19 am in Ontologies, Ontology Best Practices | Comments (2)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:September 20, 2010

OWL 2 Has New Options; Useful to SKOS, Too

It is not unusual to want to treat things either as a class or an instance in an ontology, depending on context. Among other aspects, this is known as metamodeling and it can be accomplished in a number of ways. However, the newest version of the Web Ontology Language, OWL 2, provides a neat trick for doing this called “punning“. Why one would want to metamodel, how to specify it in an ontology, and why the OWL 2 approach is helpful are described in this post [1].

Why Metamodel?

Lightweight, domain ontologies have been the focus of this ontology series. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies.

These lightweight, domain ontologies often have a hierarchical structure for which SKOS (Simple Knowledge Organization System) is a recommended starting ontology [2] (see best practices recommendations). A subject concept reference ontology such as UMBEL (Upper Mapping and Binding Exchange Layer) [3], which we also recommend, also has a similar structure and a heavy reliance on SKOS in its vocabulary. Because of these structural similarities, ontologies that use SKOS or UMBEL are therefore good candidates for using metamodeling techniques.

To better understand why we should metamodel, let’s look at a couple of examples, both of which combine organizing categories of things and then describing or characterizing those things. This dual need is common to most domains [4].

For the first example, let’s take a categorization of apes as a kind of mammal, which is then a kind of animal. In these cases, ape is a class, which relates to other classes, and apes may also have members, be they particular kinds of apes or individual apes. Yet, at the same time, we want to assert some characteristics of apes, such as being hairy, two legs and two arms, no tails, capable of walking bipedally, with grasping hands, and with some endangered species. These characteristics apply to the notion of apes as an instance.

As another example we may have the category of trucks, which may further be split into truck types, brands of trucks, type of engine, and so forth. Yet, again, we may want to characterize that a truck is designed primarily for the transport of cargo (as opposed to automobiles for people transport), or that trucks may have different drivers license requirements or different license fees than autos. These descriptive properties refer to trucks as an instance.

These mixed cases combine both the organization of concepts in relation to one another and with respect to their set members, with the description and characterization of these concepts as things unto themselves. This is a natural and common way to express most any domain of interest.

The practice has been to express these mixed uses in RDFS or OWL Full, which makes them easy to write and create since most “anything goes” (a loose way of saying that the structures are not decidable) [5]. Use of sub-class relationships also enables tree-like hierarchies to be constructed and some minor inferencing (such as one concept is broader than another concept, one of the contributions of SKOS).

But such mixed uses do not allow more capable OWL reasoners to be applied, nor for the full power of query or search abstraction to be applied, nor for the ontology to be checked for consistency. These limits may be fine in many circumstances, but their lack does allow structures to evolve that may become incoherent or illogical. If data interoperability is a goal, as it is in our enterprise use cases, incoherent ontologies can not contribute or participate as structures to linking datasets. At most — and this is the case for much linked data practice — all that can be done is to make explicit pairwise connections between different dataset objects. This is not efficient and defeats the whole purpose of leveraging schema. OWL 2 has been designed to fix that (in addition to other benefits [12]).

The approach taken by OWL 2 to overcome some of these metamodeling limitations is through “punning” [6]. Recall that objects are named in RDF with URIs (IRIs in OWL 2). The trick with “punning” is to evaluate the object based on how it is used contextually [7]; the IRI is shared but its referent may be viewed as either a class or instance based on context. Thus, objects used both as concepts (classes) and individuals (instances) are allowed and standard OWL 2 reasoners may be used against them.

It should be noted, however, that this “punning” technique does not support the full range of possible metamodeling aspects [8]. Like any language, there is a trade-off in OWL 2 between expressivity and reasoning efficiency [9]. But, for lightweight, domain ontologies where the objective is interoperability across heterogeneous sources — that is, namely the main objectives of the semantic Web or semantic enterprise — this trade-off in OWL 2 now appears to be well balanced. Moreover, its automatic detection by tools such as Protégé 4 that use the OWL API also means it is comparatively easy to use and implement.

Relationship to Recommended Best Practices

An earlier chapter in this series presented some best practices for ontology building and maintenance. A fundamental aspect of those recommendations was the desirability of keeping instance data (ABox) separate from the conceptual structure (TBox) that provides the schema of relationships for those concepts [10]. Fortunately, this approach also integrates well with the metamodeling capabilities in OWL 2.

How metamodeling and the ABox-TBox split is accommodated is shown by this diagram, using trucks as an example:

Metamodeling in Domain Ontologies
Figure 1. Metamodeling in Domain Ontologies (click to expand)

The right-hand side of the diagram shows the two views possible via OWL 2 metamodeling in the TBox. In some cases, we may speak of trucks as a class of vehicle, to which individual members may belong; this is the class view. In other contexts, we may want to characterize or make assertions about trucks in our ontology, such as asserting cargo transport or engine type, in which case truck is now represented as an instance (individual) under the individual view. These two views in the TBox represent our structural and conceptual description (the “world view”) regarding this domain of which vehicles and trucks are a part.

Then, when we begin to populate our knowledge base with specific data, we do so via the ABox. In this example, as we add data about the specific brand of Ford trucks and their attributes, we link the Ford instance to the TBox via the Truck class. (Best practice also requires that we model this new attribute structure into the TBox as well, but that is a different topic. ;) .)

How Punning is Triggered in OWL 2

Punning is not triggered by annotation properties. Annotation properties applied to a class merely act as additional description or metadata about that class; the annotation property by definition does not participate in any inferencing or reasoning. You should also know that in OWL 2, certain predicates (properties) such as label, comment or description (among others) are reserved as annotation properties [11].

You can invoke the OWL 2 punning process directly or via context when your ontologies are processed with the OWL API. The basic rule to follow is:

Any entity declared as a class and with an asserted object or data property [15] is punned (metamodeled).

This test is done directly by the OWL API [7]. You can go ahead and test this out with an OWL 2-compliant editor, such as Protégé 4. Here is an example test (in N3 notation):

First, begin with some initial declarations:

foo:Car a owl:Class .

foo:Animal a owl:Class ;
owl:disjointWith foo:Car .

Then, let’s describe an object property:

foo:isEndangered a owl:ObjectProperty ;
rdf:domain foo:Animal ;
rdf:range bar:SomeSpecies .

And define and make an assertion about Apes:

foo:Ape a owl:Class ;
foo:isEndangered bar:SomeSpecies .

Now, the system begins by testing for punning and other checks, such as:

  1. isEndangered an annotation property? no
  2. what is its domain? foo:Animal
  3. this will detect and infer:
foo:Ape a owl:Class ;
foo:Ape a foo:Animal ;
foo:isEndangered bar:SomeSpecies .
  1. punning is triggered because non-annotation property has been applied to a class
  2. non-annotation properties are now assigned to named individual (which captures individual view part of the TBox above)
  3. then, can check for inconsistencies depending on the restriction(s) applied to the foo:Animal class.

In this case, no inconsistencies were found.

But, let’s now add another object (non-annotation) property:

foo:hasBrand a owl:ObjectProperty ;
rdf:domain foo:Car ;
rdf:range bar:SomeBrand .

And use it to expand our assertions about Apes:

foo:Ape a owl:Class ;
foo:isEndangered bar:SomeSpecies ;
foo:hasBrand bar:Ford .

And repeat #3:

foo:Ape a owl:Class .
foo:Ape a foo:Animal .
foo:Ape a foo:Car ;
foo:isEndangered bar:SomeSpecies ;
foo:hasBrand bar:Ford .

Now, inconsistencies are raised in the second #3:

So, the consistency check fails, because Ape can not be both an Animal and a Car.

While this is clearly a silly example, such checks are quite important as the number of objects and assertions grows in an ontology.

What Does Punning Look Like?

The punning technique works because the IRI for the object ends up being treated as both a concept (class) and an instance (individual). Thus, while the object shares the same IRI, depending on its context, it is evaluated by an OWL reasoner as a different thing (class or individual). The OWL API achieves this by actually writing out the object in both its class view and individual view. Here is an example (in RDF/XML serialization):

Input OWL:

<owl:Class rdf:about=“>

Output from Protégé with punning:


<owl:Class rdf:about=""/>


<owl:NamedIndividual rdf:about="">

Notice the duplicate definition (in RDF/XML) to the NamedIndividual. When writing out the ontology, all punned objects are duplicated in a similar manner.

The Beginning of the Transition

OWL 2 and its other general changes [12] have arrived in the nick of time. Not only were we seeing some of the weaknesses in OWL 1 that warranted updating, but we are also now being challenged with regard to how to make linked data and the many datasets in RDF effectively interoperate. Perhaps undecidability and throwing triples to the wind worked OK in the early days of our semantic Web Wild West. But now it is time for the new sheriff to bring order to the emerging chaos.

Of course only time will tell, but we believe the design decisions made by the OWL 2 working group were judicious and balanced ones to find that sweet spot between expressiveness and reasoning efficiency [9]. We also believe that, while useful in its less expressive form [2], that many new domain vocabularies based on SKOS would especially benefit from embracing the OWL 2 metamodeling techniques.

But two criticisms still remain. First, tooling support for OWL 2 and the OWL API is weak, as discussed in an earlier chapter. And, as the last chapter discussed, there are not enough practitioners that have yet taken up OWL 2, which means that best practice guidance and exemplars are still limited.

Lightweight domain ontologies can greatly benefit from these OWL 2 metamodeling techniques and the OWL RL alternative that also emerged as one of the OWL 2 profile enhancements [13]. Structured Dynamics thinks the growing scale and learning taking place around linked data and RDF datasets is now pointing the way to a necessary transition. And OWL 2 metamodeling should be one of the key components to making our semantic technologies more responsive and effective [14].

[1] This posting is part of a current series on ontology development and tools, jointly developed with Structured Dynamics with co-authorship by Frédérick Giasson, now permanently archived and updated on the OpenStructs TechWiki. The series began with An Executive Intro to Ontologies, then continued with an update of the prior Ontology Tools listing, which now contains 185 tools. It progressed to a survey of ontology development methodologies. That led to a presentation of a new, Lightweight, Domain Ontologies Development Methodology. That piece was then expanded to address A New Landscape in Ontology Development Tools, which was followed up by a listing of best practices in domain ontology building and maintenance. This portion completes the series.
[2] Alistair Miles and Sean Bechhofer, eds., 2009. SKOS Simple Knowledge Organization System Reference, W3C Recommendation, 18 August 2009. See Some common SKOS domain predicates include skos:definition, skos:prefLabel, skos:altLabel, skos:broaderTransitive, skos:narrowerTransitive.
According to the cited W3C recommendation:
. . . the “concepts” of a thesaurus or classification scheme are modeled [in the base SKOS form] as individuals in the SKOS data model, and the informal descriptions about and links between those “concepts” as given by the thesaurus or classification scheme are modeled as facts about those individuals, never as class or property axioms. Note that these are facts about the thesaurus or classification scheme itself, such as “concept X has preferred label ‘Y’ and is part of thesaurus Z”; these are not facts about the way the world is arranged within a particular subject domain, as might be expressed in a formal ontology.
Metamodeling and the use of OWL allows the base SKOS form to be expressed as a formal ontology, over which reasoning and inference may occur. Not all SKOS structures may be amenable to this (thesauri and lexical resources such as Wordnet perhaps fall into this category), but some other structures are logical and can be formalized. UMBEL, for example, fits into this category, as do many carefully crafted controlled vocabularies. When used as such, many of the SKOS predicates become OWL annotation properties.
[3] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.
[4] In the domain ontologies that are the focus here, we often want to treat our concepts as both classes and instances of a class. This is known as “metamodeling” or “metaclassing” and is enabled by “punning” in OWL 2. For example, here a case cited on the OWL 2 wiki entry on “punning“:
People sometimes want to have metaclasses. Imagine you want to model information about the animal kingdom. Hence, you introduce a class a:Eagle, and then you introduce instances of a:Eagle such as a:Harry.
(1) a:Eagle rdf:type owl:Class
(2) a:Harry rdf:type a:Eagle
Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:
(3) a:Eagle rdf:type a:Species
(4) a:Eagle rdf:type a:EndangeredSpecies.
This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005. For some other examples, see Bernd Neumayr and Michael Schrefl, 2009. “Multi-Level Conceptual Modeling and OWL (Draft, 2 May – Including Full Example)”; see
[5] A good explanation of this can be found in Rinke J. Hoekstra, 2009. Ontology Representation: Design Patterns and Ontologies that Make Sense, thesis for Faculty of Law, University of Amsterdam, SIKS Dissertation Series No. 2009-15, 9/18/2009. 241 pp. See In that, Hoekstra states (pp. 49-50):
RDFS has a non-fixed meta modelling architecture; it can have an infinite number of class layers because rdfs:Resource is both an instance and a super class of rdfs:Class, which makes rdfs:Resource a member of its own subset (Nejdl et al., 2000). All classes (including rdfs:Class itself) are instances of rdfs:Class, and every class is the set of its instances. There is no restriction on defining sub classes of rdfs:Class itself, nor on defining sub classes of instances of instances of rdfs:Class and so on. This is problematic as it leaves the door open to class definitions that lead to Russell’s paradox (Pan and Horrocks, 2002). The Russell paradox follows from a comprehension principle built in early versions of set theory (Horrocks et al., 2003). This principle stated that a set can be constructed of the things that satisfy a formula with one free variable. In fact, it introduces the possibility of a set of all things that do not belong to itself . . . .
In RDFS, the reserved properties rdfs:subClassOf, rdf:type, rdfs:domain and rdfs:range are used to define both the other RDFS modelling primitives themselves and the models expressed using these primitives. In other words, there is no distinction between the meta-level and the domain.
[6] “Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Father and the individual Father as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. The technique listed in the main body triggers this treatment in an OWL 2-compliant editor. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see
[7] The OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL. Two recent papers describing the updated API are: Matthew Horridge and Sean Bechhofer, 2009. “The OWL API: A Java API for Working with OWL 2 Ontologies,” presented at OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009. See; and, Matthew Horridge and Sean Bechhofer, 2010. “The OWL API: A Java API for OWL Ontologies,” paper submitted to the Semantic Web Journal; see Also see its code documentation at
The main text describes how via “punning” the OWL API supports two parallel views sharing the same IRI, which can enable a concept to operate as either a class or instance depending on context.
[8] Some other metamodeling aspects not supported by “punning” include full multi-level modeling (such as in UML or OMG‘s model-driven architecture) or linkage with closed-world reasoning.
[9] OWL has historically been described as trying to find the proper tradeoff between expressive power and efficient reasoning support. See, for example, Grigoris Antoniou and Frank van Harmelen, 2003. “Web Ontology Language: OWL,” in S. Staab and R. Studer, eds., Handbook on Ontologies in Information Systems, Springer-Verlag, pp. 76-92. See
[10] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. These distinctions have their grounding in description logics.
[11] For a listing, see Even if your local ontology defines a sub-property of one of these items, such as foo:myLabel as a sub-property of rdfs:label, you are advised to still specifically declare it as an annotation property.
[12] See Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter Patel-Schneider and Ulrike Sattler, 2008. “OWL2: The Next Step for OWL,” see; and also see the OWL 2 Quick Reference Guide by the W3C, which provides a brief guide to the constructs of OWL 2, noting the changes from OWL 1.
[13] OWL RL is the “rules” profile of OWL 2 and is both decidable and offers additional axiomatic support for metamodeling. As this figure drawn from Hoekstra [Fig. 3-4 in 5] shows comparing OWL 2 to OWL 1, OWL RL provides a subset of decidable description logics:
OWL 1 v OWL 2
[14] Metamodeling might be a new concept to you and some of the aspects can certainly be academic. If the references above do not sufficient satisfy your curiosity, you may want to check out some of these other useful references: Birte Glimm, Sebastian Rudolph and Johanna Völker, 2009. “Integrated Metamodeling and Diagnosis in OWL 2,” see; and Nophadol Jekjantuk, Gerd Groener and Jeff. Z. Pan, 2009. “Reasoning in Metamodeling Enabled Ontologies,” in Rinke Hoekstra and Peter F. Patel-Schneider, eds., Proceedings of OWL: Experiences and Directions (OWLED 2009); see
[15] In OWL 2, an object property is a predicate that defines a binary relationship between two objects (in specific respect to a triple, between a subject and an object). A data property is a predicate that defines a binary relationship between an object an a literal (string or data value). In contrast to object and data properties, annotation properties and reserved OWL and RDF vocabularies are explicitly excluded from this rule. Only declared object or data properties trigger the punning.

Posted by AI3's author, Mike Bergman Posted on September 20, 2010 at 12:23 am in Ontologies, Ontology Best Practices | Comments (3)
The URI link reference to this post is:
The URI to trackback this post is: