Posted:December 15, 2019

'Dazzle' image by Shigeki Matsuyama, as found on https://www.vice.com/en_uk/article/wnp3mn/dazzle-camouflage-room-sized-optical-illusion The Choice Between Class and Instance Depends on Your Point of View

Readers of this blog know that I use the open-source Protégé ontology editor to build and maintain our knowledge graphs. Besides the usefulness of the tool, there is also an informative user mail list that discusses the Protégé application and modeling choices that may arise when using it [1]. A recent thread, ‘How to Relate Different Classes,’ is but one example of an issue one might encounter on this list [2]. As one of the frequent commenters on the list, Michael DeBellis, noted about this thread [3], “I think this is a common issue with modeling, what to make a class and what to make an instance.”

Michael is indeed correct that the distinction between classes and instances is a frequent topic, one that I have touched upon in various ways through the years. The liveliness of this recent thread convinced me it would be helpful to pull together how one chooses to use a class or instance in their knowledge graphs. The topic is also critical to the questions of knowledge representation and interoperability, two key uses for knowledge graphs. So, let’s look at this question of class v instance from the aspects of the nature of knowledge, modeling, and practical considerations.

Epistemological Issues

Epistemology is simply the study of the nature of knowledge. It gets at the questions of what is knowledge? what is belief? what is justification for action? how can we acquire and validate knowledge? is knowledge infallible? are there different kinds of knowledge?

Charles Sanders Peirce and his theory of signs was intimately related to these questions, as well to how we express and convey knowledge to others. Since, as humans, we communicate through our language as symbols, what we mean and intend to convey when expressing these symbols is also of utmost importance to how we understand and refine knowledge as a community process. My recent book has a number of chapters mostly if not exclusively related to these topics [4,5]. Many of the points in this section are drawn from these chapters.

We can illustrate some of the tricky epistemology issues associated with the nature of language using the example of the ‘toucan’ bird often used in discussions of semantic technologies. When we see something, or point to something, or describe something in words, or think of something, we are, of course, using proxies in some manner for the actual thing. If the something is a ‘toucan’ bird, that bird does not reside in our head when we think of it. The ‘it’ of the toucan is a ‘re-presentation’ of the real, dynamic toucan. The representation of something is never the actual something but is itself another thing — that is, a sign — that conveys to us the idea of the real something. In our daily thinking we rarely make this distinction. (For which we should be thankful, otherwise, our flow of thoughts would be wholly jangled.) Nonetheless, the difference is real, and we should be conscious of it when we are trying to be precise in representing knowledge.

How we ‘re-present’ something is also not uniform or consistent. For the toucan bird, perhaps we make caw-caw bird noises or flap our arms to indicate we are referring to a bird. Perhaps we point at the bird. Alternatively, perhaps we show a picture of a toucan or read or say aloud the word “toucan” or see the word embedded in a sentence or paragraph, as in this one, that also provides additional context. How quickly or accurately we grasp the idea of ‘toucan’ is partly a function of how closely associated one of these accompanying signs may be to the idea of toucan bird. Probably all of us would agree that arm flapping is not nearly as useful as a movie of a toucan in flight or seeing one scolding from a tree branch to convey the ‘toucan’ concept.

The question of what we know and how we know it fascinated Peirce over the course of his intellectual life. He probed this relationship between the real or actual thing, the object, with how that thing is represented and understood. (Also understand that Peirce’s concept of the object may embrace individual or particular things to classifications or generalities.) This triadic relationship between immediate object, representation, and interpretation forms a sign and is the basis for the process of sign-making and understanding, what Peirce called semiosis [6].

Even the idea of the object, in this case, the toucan bird, is not necessarily so simple. The real thing itself, an actual toucan bird, has characters and attributes. How do we ‘know’ this real thing? Bees, like many insects, may perceive different coloration for the toucan because they can see in the ultraviolet spectrum, while we do not. On the other hand, most mammals in the rainforest would also not perceive the reds and oranges of the toucan’s feathers, which we readily see. The ‘toucan’ object is thus perceived differently by bees, humans, and other animals. Beyond physical attributes, this actual toucan may be healthy, happy, or sad, nuances beyond our perception that only some fellow toucans may perceive. Though humans, through our ingenuity, may create devices or technologies that expand our standard sensory capabilities to make up for some of these perceptual gaps, our technology will never make our knowledge fully complete. Given limits to perceptions and the information we have on hand, we can never completely capture the nature of the dynamic object, the real toucan bird.

Things get murkier still when we try to convey to others what we mean by the ‘toucan’ bird. For example, when we inspect what might be a description of a toucan on Wikipedia, we see that the term more broadly represents the family of Ramphastidae, which contains five genera and forty different species. The picture we use to refer to ‘toucan’ may be, say, that of the keel-billed toucan (Ramphastos sulfuratus).Keel-billed Toucan However, if we view the images of a list of toucan species, we see just how physically divergent various toucans are from one another. Across all species, average sizes vary by more than a factor of three with great variation in bill sizes, coloration, and range. Further, if I assert that the picture of the toucan is that of my pet keel-billed toucan, Pretty Bird, then we can also understand that this representation is for a specific individual bird, and not the physical keel-billed toucan species as a whole. The point is not a lesson on toucans, but an affirmation that distinctions between what we think we may be describing occurs over multiple levels. The meaning of what we call a ‘toucan’ bird is not embodied in its label or even its name, but in the accompanying referential information that places the referent into context.

If, in our knowledge graph we intend to convey all of these broader considerations, then we are best defining ‘toucan’ as a class. On the other hand, if we are discussing the individual Pretty Bird toucan or are describing ‘toucan’ and average attributes in relation to a wider context of many types of other birds including eagles and wrens, then perhaps treating the ‘toucan’ as an instance is the better approach. Context and what we intend to convey are essential components to how we need to represent our knowledge. Whether something is an ‘instance’ or a ‘class’ is but the first of the distinctions we need to convey, and those may often vary by context.

Modeling Issues

Because these principles are universal, let’s shift our example to ‘truck’ [7]. In the English language, one of the ways we distinguish between an instance and a class is guided by the singular and plural (though English is notorious for its many different plural forms and exceptions). The attributes we assign to a term differ whether we are discussing ‘trucks’, which we think about more in terms of transport purpose, brands, model, and model year; or are discussing a ‘truck’, which has a particular driver, engine, transmission and mileage. Here is one way to look at such ‘truck’ distinctions (for this discussion, we’ll skip the ABox and TBox, another modeling topic importantly using description logics [8]):

Different Views of 'Truck'

To accommodate the twin views of class and individual, we could double the number of entities in our knowledge graphs by separately modeling single instances or plural classes, but that rapidly balloons the size of our graphs. What is more efficient is an approach that would enable us to combine both the organization of concepts and their relations and set members with the description and characterization of these concepts as things unto themselves. As our examples of ‘toucans’ and ‘trucks’ show, this dual treatment is a natural and common way to refer to things for most any domain of interest. Further, class and sub-class relationships enable us to construct tree-like hierarchies over which we can infer or inherit attributes and characteristics between parents and children.

For modeling purposes, we also want our graphs to be decidable, which importantly means we can reason over our knowledge graphs with an expectation that we can get definitive answers (even if the answer is “don’t know”) in a reasonable computation time. It is for these reasons that we have chosen the standard OWL 2 as the representation language for our knowledge graphs (in addition to other benefits [9]). A proper OWL 2 knowledge graph is decidable, and it handles both class and instance views using the metamodeling technique of “punning” [10]. Objects in OWL 2 are named with IRIs (internationalized Web links). The trick with “punning” is to evaluate the object based on how it is used contextually; the IRI is shared but its referent may be viewed as either a class or instance based on context. Any entity declared as a class and with an asserted object or data property is punned. Thus, objects used both as concepts (classes) and individuals (instances) are allowed and standard OWL 2 reasoners may be used against them.

Other Practical Issues

We’ve already discussed context, inference, and decidability, but I thought Igor Toujilov highlighted another important benefit in the mail thread of using class over instance declarations in a knowledge graph. The example he provided was based on drug development [11]:

However from my point of view (software engineering), many modern drugs are developed as a specialisation of existing drugs, i.e. by bringing new features to existing drugs. So, some new drug can be considered as a subclass of an existing drug. This is similar to object-orientated design in software: to bring new features, establish a subclass and implement it.

For example, methylphenidate can be considered as a superclass of Ritalin. If an earlier version of your ontology represents methylphenidate as an individual, then it would be difficult to represent Ritalin in later versions without breaking backward compatibility with existing interoperable applications.

This example shows that the preferable approach in ontology development is: use classes instead of individuals, if there is any chance you would need subclasses in the future.

Since knowledge is constantly dynamic and growing, it would seem prudent advice to allow for expansion of the things in your knowledge graph. Classes are the better choice in this instance (pun intended).

Like any language, there is a trade-off in OWL 2 between expressivity and reasoning efficiency [12]. Some prefer a less-constrained RDF and RDFS construct for their knowledge graphs. This approach allows virtually any statement to be asserted and is a least-common denominator for dealing with data encountered in the wild. However, one loses the punning and decidability advantages of OWL 2, and has a less-powerful framework for staging training sets and corpora for machine learning, another key motivation for our own knowledge graphs.

One could also choose a more powerful modeling language such as Datalog or Common Logic to gain the advantages of OWL 2, plus more. We have nothing critical to say about making such a choice. For our use cases, though, we do like the broader use and tools afforded by the use of OWL 2 and other W3C standards. Finding your own ‘sweet spot’ means understanding some of these knowledge representation trade-offs in context with your anticipated applications.


[2] Protégé user email list, ‘How to Relate Different Classes’, https://mailman.stanford.edu/pipermail/protege-user/2019-November/010890.html, Nov 9, 2019.
[4] Bergman, M. K. Information, Knowledge, Representation. in A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (ed. Bergman, M. K.) 15–42 (Springer International Publishing, 2018). doi:10.1007/978-3-319-98092-8_2.
[5] Bergman, M. K. A KR Terminology. in A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (ed. Bergman, M. K.) 129–149 (Springer International Publishing, 2018). doi:10.1007/978-3-319-98092-8_7.
[6] Peirce actually spelled it “semeiosis.” While it is true that other philosophers such as Ferdinand de Saussure also employed the shorter term “semiosis.” I also use this more common term due to greater familiarity.
[7] Bergman, M. K. Metamodeling in Domain Ontologies. AI3:::Adaptive Information https://www.mkbergman.com/913/metamodeling-in-domain-ontologies/ (2010).
[8] See, for example, my four-part series on description logics, beginning with Bergman, M. K. Making Linked Data Reasonable using Description Logics, Part 1, AI3:::Adaptive Information https://www.mkbergman.com/474/making-linked-data-reasonable-using-description-logics-part-1/ (2009).
[9] See Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter Patel-Schneider and Ulrike Sattler, 2008. “OWL2: The Next Step for OWL,” see http://www.comlab.ox.ac.uk/people/ian.horrocks/Publications/download/2008/CHMP+08.pdf; and also see the OWL 2 Quick Reference Guide by the W3C, which provides a brief guide to the constructs of OWL 2, noting the changes from OWL 1.
[10] “Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Truck and the individual Truck as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.
[12] OWL has historically been described as trying to find the proper tradeoff between expressive power and efficient reasoning support. See, for example, Grigoris Antoniou and Frank van Harmelen, 2003. “Web Ontology Language: OWL,” in S. Staab and R. Studer, eds., Handbook on Ontologies in Information Systems, Springer-Verlag, pp. 76-92. See http://www.few.vu.nl/~frankh/postscript/OntoHandbook03OWL.pdf.

Posted by AI3's author, Mike Bergman Posted on December 15, 2019 at 11:59 pm in Ontology Best Practices, Peircean Principles | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2286/knowledge-representation-is-a-tricky-business/
The URI to trackback this post is: https://www.mkbergman.com/2286/knowledge-representation-is-a-tricky-business/trackback/
Posted:February 21, 2018

The Compleat Knowledge GraphNine Features Wanted for Ontologies

I think the market has spoken in preferring the term of ‘knowledge graph’ over that for ‘ontology.’ I suppose we could argue nuances in differences for what the terms mean. We will continue to use both terms, more-or-less interchangeably. But, personally, I do find the concept of ‘knowledge graph’ easier to convey to clients.

As we see knowledge graphs proliferate in many settings — from virtual agents (Siri, Alexa, Cortana and Google Assistant, among others) to search and AI platforms (Watson) — I’d like to take stock of the state-of-the-art and make some recommendations for what I would like to see in the next generation of knowledge graphs. We are just at the beginning of tapping the potential of knowledge graphs, as my recommendations show.

Going back for twenty years to Nicola Guarino in 1998 [1] and Michael Uschold in 2008 [2], there is a sense that ontologies could be relied upon for even more central aspects of overall applications. Both Guarino and Uschold termed this potential ’ontology-driven information systems.’ It is informative of the role that ontologies may play by listing some of these incipient potentials, some of which have only been contemplated or met in one or two actual installations. Let me list nine main areas of (largely) untapped potential:

  1. Context and meaning — by this, I mean the ability to model contexts and situations, which requires specific concepts for such and an ability to express gradations of adjacency (spatial and otherwise). Determining or setting contexts is essential to disambiguate meaning. Context and situations have been particularly difficult ideas for ontologies to model, especially those that have a binary or dichotomous design;
  2. A relations component — true, OWL offers the distinction of annotation, object and datatype properties, and we can express property characteristics such as transitivity, domain, range, cardinality, inversion, reflexivity, disjunction and the like, but it is a rare ontology that uses any or many of these constructs. The subProperty expression is used, but only in limited instances and rarely in a systematic schema. For example, it is readily obvious that some broader predicates such as animalAction could be split into  involuntaryAction and voluntaryAction, and then into specific actions such as breathing or walking, and so on, but schema with these kinds of logical property subsumptions are not evident. Structurally, we can use OWL to reason over actions and relations in a similar means as we reason over entities and types, but our common ontologies have yet to do so. Creating such schema are within grasp since we have language structures such as VerbNet and other resources we could put to the task;
  3. An attributes component — the lack of a schema and organized presentation of attributes means it is a challenge to do ABox-level integration and interoperability. As with a relations component, this gap is largely due to the primary focus on concepts and entities in the early stages of semantic technologies. Optimally, what we would like to see is a well-organized attributes schema that enables instance data characteristics from different sources to be mapped to a canonical attributes schema. Once in place, not only would mapping be aided, but we should also be able to reason over attributes and use them as intensional cues for classifying instances. At one time Google touted its Biperpedia initiative [3] to organize attributes, but that effort went totally silent a couple of years ago;
  4. A quantity units ontology —  is the next step beyond attributes, as we attempt to bring data values for quantities (and well as the units and labeling used) into alignment. Fortunately, of late, the QUDT ontologies (quantities, units and data types) has become an active project again with many external supporters. Something like this needs to accompany the other recommendations listed;
  5. A statistics and probabilities ontology —  the world is not black-and-white, but vibrantly colored with all kinds of shades. We need to be able to handle gradations as well as binary choices. Being able to add probabilistic reasoners is appropriate given the idea of continua (Thirdness) from Charles Sanders Peirce and capturing the idea of fallibility. Probabilistic reasoning is still a young field in ontology. Some early possibilities include Costa [4] and the PR-OWL ontology using Multi-Entity Bayesian Networks (MEBN) [5] which are a probabilistic first-order logic that goes beyond Peirce’s classic deterministic logic; as well as fuzzy logic applied to ontologies [6];
  6. Abductive reasoning and hypothesis generation —  Peirce explicated a third kind of logical reasoning, abduction, that combines hypothesis generation with an evaluation of likelihood of success and effort required. This logic method has yet to be implemented in any standard Web ontologies to my knowledge. The method could be very useful to pose desired outcome cases and then to work through what may be required to get there. Adding this to existing knowledge graphs would likely require developing a bespoke abductive reasoner;
  7. Rich feature set for KBAI —  we want a rich features set useful to provide labeled instances for supervised machine learners. I addressed this need earlier with a rather comprehensive listing of possible features for knowledge graphs useful to learners [7]. We now need to start evaluating this features pool to provide pragmatic guidance for which features and learners match best for various knowledge-based artificial intelligence (KBAI) tasks;
  8. Consistent, clean, correct and coherent — we want knowledge graphs that are as free from error as possible to make sure we are not feeding garbage to our machine learners and as a coherent basis for evaluating new additions and mappings; and
  9. ODapps — ‘ontology-driven applications’ go beyond the mere templating or completions of user interface components to devise generic software packages driven by ontology specifications for specific applications. We have developed and deployed ODapps to import or export datasets; create, update, delete (CRUD) or otherwise manage data records; search records with full-text and faceted search; manage access control at the interacting levels of users, datasets, tools, and CRUD rights; browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria; or process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions. ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality is quite similar. The major change in ODapps is to use a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. We may embed these ODapps in a layout canvas for a Web page, where, as the user interacts with the system, the service generates new queries (most often SPARQL) to the various Web services endpoints, which produce new structured results sets, which can drive new displays and visualizations. As new user interactions occur, the iteration cycle is generated anew, again starting a new cycle of queries and results sets.

Fortunately, we are actively addressing multiple of these recommendations (#1 – #3, #6 – #9) with our KBpedia initiative. We are also planning to add mapping to QUDT (#4) in a near-future release. We are presently evaluating probabilistic reasoners and hypothesis generators (#5 and #6).

Realizing these potentials will enable our knowledge management (KM) efforts to shift to the description, nature, and relationships of the information environment. In other words, ontologies themselves need to become the focus of development. KM no longer needs to be abstracted to the IT department or third-party software. The actual concepts, terminology and relations that comprise coherent ontologies now become the explicit focus of KM activities, and subject to the direct control and refinement by their users, the knowledge workers, and subject matter experts.

We are still some months from satisfying our desiderata for knowledge graphs. Fortunately, we have already made good progress, and we are close at hand to check off all of the boxes. Stay tuned!


[1] N. Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, 1998, pp. 3–15.
[2] M. Uschold, “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, 2008, pp. 3–20.
[3] R. Gupta, A. Halevy, X. Wang, S.E. Whang, and F. Wu. “Biperpedia: An Ontology for Search Applications,” Proceedings of the VLDB Endowment 7, no. 7, 2014, pp. 505-516.
[4] P. C. Costa, “Bayesian Semantics for the Semantic Web,” Ph.D., George Mason University, 2005.
[5] K. B. Laskey, “MEBN: A Language for First-Order Bayesian Knowledge Bases,” Artificial Intelligence, vol. 172, no. 2–3, pp. 140–178, Feb. 2008.
[6] F. Bobillo and U. Straccia, “Fuzzy Ontology Representation Using OWL 2,” International Journal of Approximate Reasoning, vol. 52, no. 7, pp. 1073–1094, Oct. 2011.
[7] M.K. Bergman, “A (Partial) Taxonomy of Machine Learning Features,” AI3:::Adaptive Information blog, November 23, 2015.

 

Posted:March 7, 2011

from Wikimedia CommonsThe Time and Technology is Here to Stand Software Engineering on its Head

As an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value [1]. We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing [3]. In the U.S. alone, about 2 million people work as programmers or related [4].

It goes without saying that software is a very big deal.

No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership [5]. Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception.

This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in [1]) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare.

Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps.

In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership.

The Ontology Perspective in this Mix

For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 [6]. In this review, he points to these advantages for ontology-based approaches to software engineering:

  • Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse
  • Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly
  • Increased reliability — formal constructs with automation reduces human error
  • Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain.

These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:

  • Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking
  • Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and
  • Agility/flexibility — ontology-driven information systems are more flexible, because you can much more easily and reliably make changes in the model than in code.

In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 [7]. The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software [8].

Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is:

  1. Domain modeling — this category includes the domain knowledge representations and reasoning and inference bases that are the traditional understanding of ontologies in the semantic space. The structural aspects are akin to a database schema definition; the unique aspects of ontologies reside in their logic foundations and graph structures, which offer more power in inferencing, reasoning and graph analysis than conventional approaches
  2. Model-driven architectures (MDA) — like UML, these are platform-independent specifications that provide the functional and dataflow definitions of “models” executed by the system. These are the natural progeny of earlier CASE approaches, for example. Such systems also potentially allow graphical or visual means for building or hooking together components as a substitute to direct coding
  3. Program specifications and excecutables — though fairly experimental at present, these approaches use the languages of RDF, OWL or direct use of logic languages to create the equivalent of executable software programs. A couple of experimental systems include Fhat and Neno, for example, point to possible future directions in this area [9]
  4. Runtime or utility components — proper construction of ontologies can be a source for labels and prompts within user interfaces and other runtime uses. Because of the ontology basis, these contributions may also be contextual [10]
  5. Automated agents — based on context, user choices and the governing ontologies, new instruction sets can be generated via what some term automated agents or “robots” to instruct subsequent steps in the software, including potentially analysis or validation. Mission Critical IT [11] is apparently the most advanced in this area; we discuss their ODASE approach more below
  6. Bespoke drivers of generic applications — through using and combining a number of the aspects above, in its totality this approach is a very different paradigm, as we describe below.

When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches.

But Is Software Engineering Even the Right Focus?

For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as:

Software is merely an intermediary artifact to accomplish some given tasks. Rather than “engineering” software, the focus should be on how to fulfill those tasks in an optimal manner — and that demands a systems approach.
  • data access
  • queries
  • data transformations
  • data integration or federation
  • reports
  • other data presentations
  • business analysis, and
  • targeted, specialty functionality.

Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized [12]. Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down.

Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example [13]?

For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline.

Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset [14].

If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions.

ODapps: The Ontology-Driven Application Approach

Open Semantic Framework (OSF) at openstructs.orgOntology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components [15]. However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision.

ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies [16].

ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.

In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps).

In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago.

This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through:

  • An understanding of the kind of things under consideration, including their inference chains
  • The types of data in results sets, and how that informs the nature of the widget(s) (maps, calendars, timelines, charts, tabular reports, images, stories, media, etc.) appropriate to display and manipulate that information, and
  • UI and utility functions such as interface labels, mouseovers, auto-suggests, spelling suggestions, synonym matches, etc.

Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP.

Booch has maintained for years that in software design abstraction is good, but not if too abstract [1]. ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology [16].

In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact.

Built Upon an Ontology- and Web-based Architecture

Again, to cite Booch, the most fundamental software design decision is architecture [1]. In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps [17]. The major layers in the OSF stack are:

  • Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
  • scones / irON – the conversion layer, in part consisting of information extraction of subject concepts or named entities (scones) or the instance record Object Notation for conveying XML, JSON or spreadsheets (CSV) in RDF-ready form (via irON or RDFizers)
  • structWSF – a platform-independent suite of more than 20 RESTful Web services, organized for managing structured data datasets; it provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
  • Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave
  • conStruct – connecting modules to enable structWSF and sComponents to be hosted/embedded in Drupal, and
  • sComponents – (mostly) Flex semantic components (widgets) for visualizing and manipulating structured data.

Not all of these layers or even their specifics is necessary for an ontology-driven app design [18]. However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings.

OSF is the Basis for Domain-specific Instantiations

The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework.

The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases.

What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos:

(click for full size)

Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets.

Lower Layers

Conversion of existing assets to RDF and canonical forms is not discussed further here. See the irON and scones documentation or the TechWiki for more information on these topics.

The structWSF Web Services Layer

The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to:

  • Import or export datasets
  • Create, update, delete (CRUD) or otherwise manage data records
  • Search records with full-text and faceted search
  • Browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
  • Process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions.

Here is a listing of current ODapp functions within structWSF (with links to details for each):

WSF management Web services
User-oriented Web services

At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection.

The Semantics Components Layer

The visualization and data display and manipulation ODapps are provided via the semantic components layer. Structured Dynamics’s sComponents are Flex-based widgets that conform to a standard, generic design. Other developers using the OSF framework are developing JavaScript versions [19]. Here is the current library (with links to details for each):

New Components
Components Extending Flex

These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above.

The next animated figure shows how the basic interaction flow works with these components:

(click for full size)

Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise.

As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema:

(click for full size)

These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas.

As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

Self-service Reporting

Since self-service reporting has been such a disappointment [12], it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object).

However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:

Thing
Product
Camera
Digital Camera
SLR Digital Camera
Olympus Evolt E520

At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.

This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain.

It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT.

Adaptive Analysis

The Mission Critical IT reference provided earlier [11] helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics.

Standing Software Engineering on its Head

All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it.

Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures.

This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head.

What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate [20]. These are important development needs for the near term.

And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end.

What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand.


[1] This estimate is from Grady Booch, 2005. “The Complexity of Programming Models,” see http://www.cs.nott.ac.uk/~nem/complexity.pdf. He comments on the weakness of software lines of code as a meaningful measure. At the time in 2005, he estimated perhaps 800 billion lines of code has accumulated, which given growth and vagaries of such guesstimates I have updated to the 1 billion number noted.
[2] For a wildly different estimate, that has been criticized somewhat, see Blackduck Software, 2009. “Estimating the Development Cost of Open Source Software,” at http://www.blackducksoftware.com/development-cost-of-open-source. According to Blackduck’s research there are over 200,000 OSS projects on the Internet representing more than 4.9 billion lines of available code from 4,000 sites that the company monitors. Blackduck estimates that reproducing this OSS would cost $387 billion for “typical” SLOC estimating bases. While Blackduck is likely in the best place of any organization to track open source given their business model, others have criticized the estimates because only a portion (fewer than 10%, consistent with my own research) of open source projects are active, and many active projects also share significant code bases. Nonetheless, there is still a huge disparity between the 1 billion SLOC estimate in [1] and this estimate of 5 billion for open source alone. This disparity is an indicator of the measurement challenges.
[3] See IMAP, 2010. Computing & Internet Software Global Report — 2010, 40 pp, see http://imap.com/imap/media/resources/HighTechReport_WEB_89B4E29C01817.pdf. The relative splits they show for software packages and licenses, IT consulting or outsourcing are 48%, 29% and 23%, respectively, of the total shown. Note however, that Gartner estimates are as high as 2x these amounts, again showing the uncertainty of measuring software; see, for example, http://www.gartner.com/it/page.jsp?id=1209913.
[4] For this and related measures, see Business Software Alliance, 2009. Software Industry Facts and Figures, see http://www.bsa.org/country/Public%20Policy/~/media/Files/Policy/Security/General/sw_factsfigures.ashx.
[5] Simply conduct a Web search on ‘”open source” “cost of ownership”‘ to see the many studies in this area. Depending on advocacy, estimates may be as high as proprietary software to a lower, but still substantial percentage. In no cases are open source understood to be fully “free” once maintenance, upgrades, modifications, and site adaptations are considered.
[6] Michael Uschold, 2008. “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see http://mba.eci.ufmg.br/downloads/recol/FormalOntologyinInformationSystems2008.pdf.
[7] Nicola Guarino, 1998. “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1776&rep=rep1&type=pdf.
[8] See Phil Tetlow et al., eds., 2006. Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering, a W3C Editor’s Draft on Best Practices, February 11, 2006; see http://www.w3.org/2001/sw/BestPractices/SE/ODA/. UML class diagrams have close resemblance to certain ontology structures. This effort was part of a formal collaboration between W3C and the Object Management Group (OMG), which resulted among other things in the production of the Ontology Definition Metamodel (ODM). In the OMG’s model-driven architecture (MDA) initiative, models are used not only for design and maintenance purposes, but as a basis for generating executable artifacts for downstream use. The MDA approach grew out of much of the standards work conducted in the 1990s in the Unified Modeling Language (UML).
[9] Neno is a semantic network programming language and Fhat is a virtual machine that works off of it. These two projects have been largely abandoned. A related project is Ripple, a relational, stack-based dataflow language by Joshua Shinavier, which is episodically updated.
[10] Holger Knublauch of TopQuadrant has made the point that ontologies can also have runtime uses as well: “In contrast to conventional Model-Driven Architecture known from object-oriented systems, semantic applications use their data models not only at design time, but also as runtime components. The rich declarative semantics of ontological data models can be exploited to drive user interfaces and to control an application’s behavior.” See H. Knublauch, 2007. “From Ontology Design to Deployment: Semantic Application Development with TopBraid,” presented at the 2007 Semantic Technology Conference, San Jose, CA; see http://www.semantic-conference.com/2007/sessions/l5.html.
[11] Mission Critical IT describes its ODASE platform (Ontology Driven Architecture for Software Engineering) as a set of tools to facilitate the creation of working applications from a semantic business model (an ontology), using the open standards OWL, SWRL and RDF. The ODASE code generators (a.k.a “robots”) generate an API based on the business terminology defined by the OWL+SWRL+RDF business model, which the ODASE platform then uses to execute the rules and reasoning as contextual choices are made by the user. Among other links, the company has an impressive online demo that shows a consumer telecommunications purchase example; there is also a video explaining the rules basis of the ODASE framework.
[12] See Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see http://tdwi.org/articles/2007/10/18/the-myth-of-selfservice-bi.aspx.
[13] The buggy whip industry as a major economic entity ceased to exist with the introduction of the automobile, and is cited in economics and marketing as an example of an industry ceasing to exist because its market niche, and the need for its product, disappears. Not recognizing what industry or business purpose is being served is an oft-cited cause for obsolescence. Thus, software engineering is a practice that serves the creation of software, which itself is only a means to a functional end.
[14] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[15] See M.K. Bergman, 2010. Seven Pillars of the Open Semantic Enterprise, AI3:::Adaptive Information blog, January 12, 2010.
[16] See M.K. Bergman, 2009. Ontologies as the ‘Engine’ for Data-Driven Applications, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.
[17] Some 250 pp of complete technical documentation for these projects is provided on the Structured Dynamics’ open source OpenStructs TechWiki.
[18] For more discussion of semantic components, see F. Giasson, 2010. “Semantic Components,” in his blog, July 5, 2010. For more discussion of the layered OSF design, see M.K. Bergman, 2010. Domain-specific Instantiations based on the Open Semantic Framework, AI3:::Adaptive Information blog, June 17, 2010.
[19] To find these groups and follow the open source OSF developments, see xxx. So long as the basic design comports with the foundations herein, sComponents may be developed in any rich Internet application (RIA) environment.
[20] Ontology development, management and mapping is the emerging imperative in the semantic technology space. For some thoughts on how Structured Dynamics is approaching this question, see a Normative Landscape of Ontology Tools on the TechWiki.
Posted:September 27, 2010

Resources Useful to the Understanding of Ontologies and the Semantic Web

Over the past few weeks we have been publishing a series of general background documents and tutorials useful to the understanding of ontologies. These entries have been prepared specifically with the non-expert and end user in mind.

The Ontology Tutorial Series is now complete as initially scoped. These various articles, in both originally posted form and as kept current on the OpenStructs‘ TechWiki [1], are:

 


[1] The tutorials were first published on this blog over the period of Aug. 9 to Sept. 20, 2010. They are now permanently maintained and updated on the TechWiki.

Posted by AI3's author, Mike Bergman Posted on September 27, 2010 at 1:19 am in Ontologies, Ontology Best Practices | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/916/ontology-tutorial-series/
The URI to trackback this post is: https://www.mkbergman.com/916/ontology-tutorial-series/trackback/
Posted:September 20, 2010

OWL 2 Has New Options; Useful to SKOS, Too

It is not unusual to want to treat things either as a class or an instance in an ontology, depending on context. Among other aspects, this is known as metamodeling and it can be accomplished in a number of ways. However, the newest version of the Web Ontology Language, OWL 2, provides a neat trick for doing this called “punning“. Why one would want to metamodel, how to specify it in an ontology, and why the OWL 2 approach is helpful are described in this post [1].

Why Metamodel?

Lightweight, domain ontologies have been the focus of this ontology series. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies.

These lightweight, domain ontologies often have a hierarchical structure for which SKOS (Simple Knowledge Organization System) is a recommended starting ontology [2] (see best practices recommendations). A subject concept reference ontology such as UMBEL (Upper Mapping and Binding Exchange Layer) [3], which we also recommend, also has a similar structure and a heavy reliance on SKOS in its vocabulary. Because of these structural similarities, ontologies that use SKOS or UMBEL are therefore good candidates for using metamodeling techniques.

To better understand why we should metamodel, let’s look at a couple of examples, both of which combine organizing categories of things and then describing or characterizing those things. This dual need is common to most domains [4]. For the first example, let’s take a categorization of apes as a kind of mammal, which is then a kind of animal. In these cases, ape is a class, which relates to other classes, and apes may also have members, be they particular kinds of apes or individual apes. Yet, at the same time, we want to assert some characteristics of apes, such as being hairy, two legs and two arms, no tails, capable of walking bipedally, with grasping hands, and with some being endangered species. These characteristics apply to the notion of apes as an instance.

As another example we may have the category of trucks, which may further be split into truck types, brands of trucks, type of engine, and so forth. Yet, again, we may want to characterize that a truck is designed primarily for the transport of cargo (as opposed to automobiles for people transport), or that trucks may have different drivers license requirements or different license fees than autos. These descriptive properties refer to trucks as an instance.

These mixed cases combine both the organization of concepts in relation to one another and with respect to their set members, with the description and characterization of these concepts as things unto themselves. This is a natural and common way to express most any domain of interest. The practice has been to express these mixed uses in RDFS or OWL Full, which makes them easy to write and create since most “anything goes” (a loose way of saying that the structures are not decidable) [5]. Use of sub-class relationships also enables tree-like hierarchies to be constructed and some minor inferencing (such as one concept is broader than another concept, one of the contributions of SKOS). But such mixed uses do not allow more capable OWL reasoners to be applied, nor for the full power of query or search abstraction to be applied, nor for the ontology to be checked for consistency. These limits may be fine in many circumstances, but their lack does allow structures to evolve that may become incoherent or illogical. If data interoperability is a goal, as it is in our enterprise use cases, incoherent ontologies can not contribute or participate as structures to linking datasets. At most — and this is the case for much linked data practice — all that can be done is to make explicit pairwise connections between different dataset objects. This is not efficient and defeats the whole purpose of leveraging schema.

OWL 2 has been designed to fix that (in addition to other benefits [12]). The approach taken by OWL 2 to overcome some of these metamodeling limitations is through “punning” [6]. Recall that objects are named in RDF with URIs (IRIs in OWL 2). The trick with “punning” is to evaluate the object based on how it is used contextually [7]; the IRI is shared but its referent may be viewed as either a class or instance based on context. Thus, objects used both as concepts (classes) and individuals (instances) are allowed and standard OWL 2 reasoners may be used against them. It should be noted, however, that this “punning” technique does not support the full range of possible metamodeling aspects [8]. Like any language, there is a trade-off in OWL 2 between expressivity and reasoning efficiency [9].

But, for lightweight, domain ontologies where the objective is interoperability across heterogeneous sources — that is, namely the main objectives of the semantic Web or semantic enterprise — this trade-off in OWL 2 now appears to be well balanced. Moreover, its automatic detection by tools such as Protégé 4 that use the OWL API also means it is comparatively easy to use and implement.

Relationship to Recommended Best Practices

An earlier chapter in this series presented some best practices for ontology building and maintenance. A fundamental aspect of those recommendations was the desirability of keeping instance data (ABox) separate from the conceptual structure (TBox) that provides the schema of relationships for those concepts [10]. Fortunately, this approach also integrates well with the metamodeling capabilities in OWL 2. How metamodeling and the ABox-TBox split is accommodated is shown by this diagram, using trucks as an example:

  Figure 1. Metamodeling in Domain Ontologies (click to expand)

The right-hand side of the diagram shows the two views possible via OWL 2 metamodeling in the TBox. In some cases, we may speak of trucks as a class of vehicle, to which individual members may belong; this is the class view. In other contexts, we may want to characterize or make assertions about trucks in our ontology, such as asserting cargo transport or engine type, in which case truck is now represented as an instance (individual) under the individual view. These two views in the TBox represent our structural and conceptual description (the “world view”) regarding this domain of which vehicles and trucks are a part. Then, when we begin to populate our knowledge base with specific data, we do so via the ABox. In this example, as we add data about the specific brand of Ford trucks and their attributes, we link the Ford instance to the TBox via the Truck class. (Best practice also requires that we model this new attribute structure into the TBox as well, but that is a different topic. 😉 .)

How Punning is Triggered in OWL 2

Punning is not triggered by annotation properties. Annotation properties applied to a class merely act as additional description or metadata about that class; the annotation property by definition does not participate in any inferencing or reasoning. You should also know that in OWL 2, certain predicates (properties) such as label, comment or description (among others) are reserved as annotation properties [11]. You can invoke the OWL 2 punning process directly or via context when your ontologies are processed with the OWL API. The basic rule to follow is:

Any entity declared as a class and with an asserted object or data property [15] is punned (metamodeled).

This test is done directly by the OWL API [7]. You can go ahead and test this out with an OWL 2-compliant editor, such as Protégé 4. Here is an example test (in N3 notation): First, begin with some initial declarations:

foo:Car a owl:Class .

foo:Animal a owl:Class ;
owl:disjointWith foo:Car .

Then, let’s describe an object property:

foo:isEndangered a owl:ObjectProperty ;
rdf:domain foo:Animal ;
rdf:range bar:SomeSpecies .

And define and make an assertion about Apes:

foo:Ape a owl:Class ;
foo:isEndangered bar:SomeSpecies .

Now, the system begins by testing for punning and other checks, such as:

  1. isEndangered an annotation property? no
  2. what is its domain? foo:Animal
  3. this will detect and infer:
foo:Ape a owl:Class ;
foo:Ape a foo:Animal ;
foo:isEndangered bar:SomeSpecies .
  1. punning is triggered because non-annotation property has been applied to a class
  2. non-annotation properties are now assigned to named individual (which captures individual view part of the TBox above)
  3. then, can check for inconsistencies depending on the restriction(s) applied to the foo:Animal class.

In this case, no inconsistencies were found. But, let’s now add another object (non-annotation) property:

foo:hasBrand a owl:ObjectProperty ;
rdf:domain foo:Car ;
rdf:range bar:SomeBrand .

And use it to expand our assertions about Apes:

foo:Ape a owl:Class ;
foo:isEndangered bar:SomeSpecies ;
foo:hasBrand bar:Ford .

And repeat #3:

foo:Ape a owl:Class .
foo:Ape a foo:Animal .
foo:Ape a foo:Car ;
foo:isEndangered bar:SomeSpecies ;
foo:hasBrand bar:Ford .

Now, inconsistencies are raised in the second #3: So, the consistency check fails, because Ape can not be both an Animal and a Car. While this is clearly a silly example, such checks are quite important as the number of objects and assertions grows in an ontology.

What Does Punning Look Like?

The punning technique works because the IRI for the object ends up being treated as both a concept (class) and an instance (individual). Thus, while the object shares the same IRI, depending on its context, it is evaluated by an OWL reasoner as a different thing (class or individual). The OWL API achieves this by actually writing out the object in both its class view and individual view. Here is an example (in RDF/XML serialization): Input OWL:

<owl:Class rdf:about=“http://purl.org/ontology/Ape> <isEndangered>Ape</isEndangered> </owl:Class>

Output from Protégé with punning:

<!-- http://purl.org/ontology/Ape-->

<owl:Class rdf:about="http://purl.org/ontology/Ape"/>

<!-- http://purl.org/ontology/Ape-->

<owl:NamedIndividual rdf:about="http://purl.org/ontology/Ape">
   <isEndangered>Ape</isEndangered>
</owl:NamedIndividual>

Notice the duplicate definition (in RDF/XML) to the NamedIndividual. When writing out the ontology, all punned objects are duplicated in a similar manner.

The Beginning of the Transition

OWL 2 and its other general changes [12] have arrived in the nick of time. Not only were we seeing some of the weaknesses in OWL 1 that warranted updating, but we are also now being challenged with regard to how to make linked data and the many datasets in RDF effectively interoperate. Perhaps undecidability and throwing triples to the wind worked OK in the early days of our semantic Web Wild West. But now it is time for the new sheriff to bring order to the emerging chaos. Of course only time will tell, but we believe the design decisions made by the OWL 2 working group were judicious and balanced ones to find that sweet spot between expressiveness and reasoning efficiency [9]. We also believe that, while useful in its less expressive form [2], that many new domain vocabularies based on SKOS would especially benefit from embracing the OWL 2 metamodeling techniques. But two criticisms still remain. First, tooling support for OWL 2 and the OWL API is weak, as discussed in an earlier chapter. And, as the last chapter discussed, there are not enough practitioners that have yet taken up OWL 2, which means that best practice guidance and exemplars are still limited. Lightweight domain ontologies can greatly benefit from these OWL 2 metamodeling techniques and the OWL RL alternative that also emerged as one of the OWL 2 profile enhancements [13]. Structured Dynamics thinks the growing scale and learning taking place around linked data and RDF datasets is now pointing the way to a necessary transition. And OWL 2 metamodeling should be one of the key components to making our semantic technologies more responsive and effective [14].


[1] This posting is part of a current series on ontology development and tools, jointly developed with Structured Dynamics with co-authorship by Frédérick Giasson. The series began with An Executive Intro to Ontologies, then continued with an update of the prior Ontology Tools listing, which now contains 185 tools. It progressed to a survey of ontology development methodologies. That led to a presentation of a new, Lightweight, Domain Ontologies Development Methodology. That piece was then expanded to address A New Landscape in Ontology Development Tools, which was followed up by a listing of best practices in domain ontology building and maintenance. This portion completes the series.
[2] Alistair Miles and Sean Bechhofer, eds., 2009. SKOS Simple Knowledge Organization System Reference, W3C Recommendation, 18 August 2009. See http://www.w3.org/TR/skos-reference/. Some common SKOS domain predicates include skos:definition, skos:prefLabel, skos:altLabel, skos:broaderTransitive, skos:narrowerTransitive.
According to the cited W3C recommendation:
. . . the “concepts” of a thesaurus or classification scheme are modeled [in the base SKOS form] as individuals in the SKOS data model, and the informal descriptions about and links between those “concepts” as given by the thesaurus or classification scheme are modeled as facts about those individuals, never as class or property axioms. Note that these are facts about the thesaurus or classification scheme itself, such as “concept X has preferred label ‘Y’ and is part of thesaurus Z”; these are not facts about the way the world is arranged within a particular subject domain, as might be expressed in a formal ontology.
Metamodeling and the use of OWL allows the base SKOS form to be expressed as a formal ontology, over which reasoning and inference may occur. Not all SKOS structures may be amenable to this (thesauri and lexical resources such as Wordnet perhaps fall into this category), but some other structures are logical and can be formalized. UMBEL, for example, fits into this category, as do many carefully crafted controlled vocabularies. When used as such, many of the SKOS predicates become OWL annotation properties.
[3] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.
[4] In the domain ontologies that are the focus here, we often want to treat our concepts as both classes and instances of a class. This is known as “metamodeling” or “metaclassing” and is enabled by “punning” in OWL 2. For example, here a case cited on the OWL 2 wiki entry on “punning“:
People sometimes want to have metaclasses. Imagine you want to model information about the animal kingdom. Hence, you introduce a class a:Eagle, and then you introduce instances of a:Eagle such as a:Harry.
(1) a:Eagle rdf:type owl:Class (2) a:Harry rdf:type a:Eagle
Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:
(3) a:Eagle rdf:type a:Species (4) a:Eagle rdf:type a:EndangeredSpecies.
This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005. For some other examples, see Bernd Neumayr and Michael Schrefl, 2009. “Multi-Level Conceptual Modeling and OWL (Draft, 2 May – Including Full Example)”; see http://www.dke.jku.at/m-owl/most09_22_full.pdf.
[5] A good explanation of this can be found in Rinke J. Hoekstra, 2009. Ontology Representation: Design Patterns and Ontologies that Make Sense, thesis for Faculty of Law, University of Amsterdam, SIKS Dissertation Series No. 2009-15, 9/18/2009. 241 pp. See http://dare.uva.nl/document/144859. In that, Hoekstra states (pp. 49-50):
RDFS has a non-fixed meta modelling architecture; it can have an infinite number of class layers because rdfs:Resource is both an instance and a super class of rdfs:Class, which makes rdfs:Resource a member of its own subset (Nejdl et al., 2000). All classes (including rdfs:Class itself) are instances of rdfs:Class, and every class is the set of its instances. There is no restriction on defining sub classes of rdfs:Class itself, nor on defining sub classes of instances of instances of rdfs:Class and so on. This is problematic as it leaves the door open to class definitions that lead to Russell’s paradox (Pan and Horrocks, 2002). The Russell paradox follows from a comprehension principle built in early versions of set theory (Horrocks et al., 2003). This principle stated that a set can be constructed of the things that satisfy a formula with one free variable. In fact, it introduces the possibility of a set of all things that do not belong to itself . . . .
In RDFS, the reserved properties rdfs:subClassOf, rdf:type, rdfs:domain and rdfs:range are used to define both the other RDFS modelling primitives themselves and the models expressed using these primitives. In other words, there is no distinction between the meta-level and the domain.
[6] “Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Father and the individual Father as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. The technique listed in the main body triggers this treatment in an OWL 2-compliant editor. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.
[7] The OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL. Two recent papers describing the updated API are: Matthew Horridge and Sean Bechhofer, 2009. “The OWL API: A Java API for Working with OWL 2 Ontologies,” presented at OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009. See http://www.webont.org/owled/2009/papers/owled2009_submission_29.pdf; and, Matthew Horridge and Sean Bechhofer, 2010. “The OWL API: A Java API for OWL Ontologies,” paper submitted to the Semantic Web Journal; see http://www.semantic-web-journal.net/sites/default/files/swj107.pdf. Also see its code documentation at http://owlapi.sourceforge.net/2.x.x/documentation.html.
The main text describes how via “punning” the OWL API supports two parallel views sharing the same IRI, which can enable a concept to operate as either a class or instance depending on context.
[8] Some other metamodeling aspects not supported by “punning” include full multi-level modeling (such as in UML or OMG‘s model-driven architecture) or linkage with closed-world reasoning.
[9] OWL has historically been described as trying to find the proper tradeoff between expressive power and efficient reasoning support. See, for example, Grigoris Antoniou and Frank van Harmelen, 2003. “Web Ontology Language: OWL,” in S. Staab and R. Studer, eds., Handbook on Ontologies in Information Systems, Springer-Verlag, pp. 76-92. See http://www.few.vu.nl/~frankh/postscript/OntoHandbook03OWL.pdf.
[10] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. These distinctions have their grounding in description logics.
[11] For a listing, see http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Annotation_Properties. Even if your local ontology defines a sub-property of one of these items, such as foo:myLabel as a sub-property of rdfs:label, you are advised to still specifically declare it as an annotation property.
[12] See Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter Patel-Schneider and Ulrike Sattler, 2008. “OWL2: The Next Step for OWL,” see http://www.comlab.ox.ac.uk/people/ian.horrocks/Publications/download/2008/CHMP+08.pdf; and also see the OWL 2 Quick Reference Guide by the W3C, which provides a brief guide to the constructs of OWL 2, noting the changes from OWL 1.
[13] OWL RL is the “rules” profile of OWL 2 and is both decidable and offers additional axiomatic support for metamodeling. As this figure drawn from Hoekstra [Fig. 3-4 in 5] shows comparing OWL 2 to OWL 1, OWL RL provides a subset of decidable description logics:
[14] Metamodeling might be a new concept to you and some of the aspects can certainly be academic. If the references above do not sufficient satisfy your curiosity, you may want to check out some of these other useful references: Birte Glimm, Sebastian Rudolph and Johanna Völker, 2009. “Integrated Metamodeling and Diagnosis in OWL 2,” see http://www.comlab.ox.ac.uk/files/3129/paper.pdf; and Nophadol Jekjantuk, Gerd Groener and Jeff. Z. Pan, 2009. “Reasoning in Metamodeling Enabled Ontologies,” in Rinke Hoekstra and Peter F. Patel-Schneider, eds., Proceedings of OWL: Experiences and Directions (OWLED 2009); see http://www.webont.org/owled/2009.
[15] In OWL 2, an object property is a predicate that defines a binary relationship between two objects (in specific respect to a triple, between a subject and an object). A data property is a predicate that defines a binary relationship between an object an a literal (string or data value). In contrast to object and data properties, annotation properties and reserved OWL and RDF vocabularies are explicitly excluded from this rule. Only declared object or data properties trigger the punning.

Posted by AI3's author, Mike Bergman Posted on September 20, 2010 at 12:23 am in Ontologies, Ontology Best Practices | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/913/metamodeling-in-domain-ontologies/
The URI to trackback this post is: https://www.mkbergman.com/913/metamodeling-in-domain-ontologies/trackback/