As an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value . We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing . In the U.S. alone, about 2 million people work as programmers or related .
It goes without saying that software is a very big deal.
No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership . Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception.
This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in ) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare.
Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps.
In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership.
For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 . In this review, he points to these advantages for ontology-based approaches to software engineering:
These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:
In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 . The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software .
Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is:
When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches.
For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as:
Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized . Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down.
Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example ?
For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline.
Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset .
If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions.
Ontology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components . However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision.
ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies .
ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.
ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.
In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps).
In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago.
This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through:
Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP.
Booch has maintained for years that in software design abstraction is good, but not if too abstract . ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology .
In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact.
Again, to cite Booch, the most fundamental software design decision is architecture . In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps . The major layers in the OSF stack are:
Not all of these layers or even their specifics is necessary for an ontology-driven app design . However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings.
The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework.
The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases.
What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos:
(click for full size)
Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets.
The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to:
Here is a listing of current ODapp functions within structWSF (with links to details for each):
|WSF management Web services||
User-oriented Web services
At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection.
Components Extending Flex
These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above.
The next animated figure shows how the basic interaction flow works with these components:
(click for full size)
Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise.
As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema:
(click for full size)
These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas.
As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
Since self-service reporting has been such a disappointment , it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object).
However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:
|SLR Digital Camera|
|Olympus Evolt E520|
At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.
This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain.
It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT.
The Mission Critical IT reference provided earlier  helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics.
All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it.
Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures.
This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head.
What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate . These are important development needs for the near term.
And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end.
What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand.
Over the past few weeks we have been publishing a series of general background documents and tutorials useful to the understanding of ontologies. These entries have been prepared specifically with the non-expert and end user in mind.
It is not unusual to want to treat things either as a class or an instance in an ontology, depending on context. Among other aspects, this is known as metamodeling and it can be accomplished in a number of ways. However, the newest version of the Web Ontology Language, OWL 2, provides a neat trick for doing this called “punning“. Why one would want to metamodel, how to specify it in an ontology, and why the OWL 2 approach is helpful are described in this post .
Lightweight, domain ontologies have been the focus of this ontology series. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies.
These lightweight, domain ontologies often have a hierarchical structure for which SKOS (Simple Knowledge Organization System) is a recommended starting ontology  (see best practices recommendations). A subject concept reference ontology such as UMBEL (Upper Mapping and Binding Exchange Layer) , which we also recommend, also has a similar structure and a heavy reliance on SKOS in its vocabulary. Because of these structural similarities, ontologies that use SKOS or UMBEL are therefore good candidates for using metamodeling techniques.
To better understand why we should metamodel, let’s look at a couple of examples, both of which combine organizing categories of things and then describing or characterizing those things. This dual need is common to most domains .
For the first example, let’s take a categorization of apes as a kind of mammal, which is then a kind of animal. In these cases, ape is a class, which relates to other classes, and apes may also have members, be they particular kinds of apes or individual apes. Yet, at the same time, we want to assert some characteristics of apes, such as being hairy, two legs and two arms, no tails, capable of walking bipedally, with grasping hands, and with some endangered species. These characteristics apply to the notion of apes as an instance.
As another example we may have the category of trucks, which may further be split into truck types, brands of trucks, type of engine, and so forth. Yet, again, we may want to characterize that a truck is designed primarily for the transport of cargo (as opposed to automobiles for people transport), or that trucks may have different drivers license requirements or different license fees than autos. These descriptive properties refer to trucks as an instance.
These mixed cases combine both the organization of concepts in relation to one another and with respect to their set members, with the description and characterization of these concepts as things unto themselves. This is a natural and common way to express most any domain of interest.
The practice has been to express these mixed uses in RDFS or OWL Full, which makes them easy to write and create since most “anything goes” (a loose way of saying that the structures are not decidable) . Use of sub-class relationships also enables tree-like hierarchies to be constructed and some minor inferencing (such as one concept is broader than another concept, one of the contributions of SKOS).
But such mixed uses do not allow more capable OWL reasoners to be applied, nor for the full power of query or search abstraction to be applied, nor for the ontology to be checked for consistency. These limits may be fine in many circumstances, but their lack does allow structures to evolve that may become incoherent or illogical. If data interoperability is a goal, as it is in our enterprise use cases, incoherent ontologies can not contribute or participate as structures to linking datasets. At most — and this is the case for much linked data practice — all that can be done is to make explicit pairwise connections between different dataset objects. This is not efficient and defeats the whole purpose of leveraging schema. OWL 2 has been designed to fix that (in addition to other benefits ).
The approach taken by OWL 2 to overcome some of these metamodeling limitations is through “punning” . Recall that objects are named in RDF with URIs (IRIs in OWL 2). The trick with “punning” is to evaluate the object based on how it is used contextually ; the IRI is shared but its referent may be viewed as either a class or instance based on context. Thus, objects used both as concepts (classes) and individuals (instances) are allowed and standard OWL 2 reasoners may be used against them.
It should be noted, however, that this “punning” technique does not support the full range of possible metamodeling aspects . Like any language, there is a trade-off in OWL 2 between expressivity and reasoning efficiency . But, for lightweight, domain ontologies where the objective is interoperability across heterogeneous sources — that is, namely the main objectives of the semantic Web or semantic enterprise — this trade-off in OWL 2 now appears to be well balanced. Moreover, its automatic detection by tools such as Protégé 4 that use the OWL API also means it is comparatively easy to use and implement.
An earlier chapter in this series presented some best practices for ontology building and maintenance. A fundamental aspect of those recommendations was the desirability of keeping instance data (ABox) separate from the conceptual structure (TBox) that provides the schema of relationships for those concepts . Fortunately, this approach also integrates well with the metamodeling capabilities in OWL 2.
How metamodeling and the ABox-TBox split is accommodated is shown by this diagram, using trucks as an example:
The right-hand side of the diagram shows the two views possible via OWL 2 metamodeling in the TBox. In some cases, we may speak of trucks as a class of vehicle, to which individual members may belong; this is the class view. In other contexts, we may want to characterize or make assertions about trucks in our ontology, such as asserting cargo transport or engine type, in which case truck is now represented as an instance (individual) under the individual view. These two views in the TBox represent our structural and conceptual description (the “world view”) regarding this domain of which vehicles and trucks are a part.
Then, when we begin to populate our knowledge base with specific data, we do so via the ABox. In this example, as we add data about the specific brand of Ford trucks and their attributes, we link the Ford instance to the TBox via the Truck class. (Best practice also requires that we model this new attribute structure into the TBox as well, but that is a different topic. .)
Punning is not triggered by annotation properties. Annotation properties applied to a class merely act as additional description or metadata about that class; the annotation property by definition does not participate in any inferencing or reasoning. You should also know that in OWL 2, certain predicates (properties) such as label, comment or description (among others) are reserved as annotation properties .
You can invoke the OWL 2 punning process directly or via context when your ontologies are processed with the OWL API. The basic rule to follow is:
First, begin with some initial declarations:
foo:Car a owl:Class . foo:Animal a owl:Class ; owl:disjointWith foo:Car .
Then, let’s describe an object property:
foo:isEndangered a owl:ObjectProperty ; rdf:domain foo:Animal ; rdf:range bar:SomeSpecies .
And define and make an assertion about Apes:
foo:Ape a owl:Class ; foo:isEndangered bar:SomeSpecies .
Now, the system begins by testing for punning and other checks, such as:
foo:Ape a owl:Class ; foo:Ape a foo:Animal ; foo:isEndangered bar:SomeSpecies .
In this case, no inconsistencies were found.
But, let’s now add another object (non-annotation) property:
foo:hasBrand a owl:ObjectProperty ; rdf:domain foo:Car ; rdf:range bar:SomeBrand .
And use it to expand our assertions about Apes:
foo:Ape a owl:Class ; foo:isEndangered bar:SomeSpecies ; foo:hasBrand bar:Ford .
And repeat #3:
foo:Ape a owl:Class . foo:Ape a foo:Animal . foo:Ape a foo:Car ; foo:isEndangered bar:SomeSpecies ; foo:hasBrand bar:Ford .
Now, inconsistencies are raised in the second #3:
So, the consistency check fails, because Ape can not be both an Animal and a Car.
While this is clearly a silly example, such checks are quite important as the number of objects and assertions grows in an ontology.
The punning technique works because the IRI for the object ends up being treated as both a concept (class) and an instance (individual). Thus, while the object shares the same IRI, depending on its context, it is evaluated by an OWL reasoner as a different thing (class or individual). The OWL API achieves this by actually writing out the object in both its class view and individual view. Here is an example (in RDF/XML serialization):
Output from Protégé with punning:
<!-- http://purl.org/ontology/Ape--> <owl:Class rdf:about="http://purl.org/ontology/Ape"/> <!-- http://purl.org/ontology/Ape--> <owl:NamedIndividual rdf:about="http://purl.org/ontology/Ape"> <isEndangered>Ape</isEndangered> </owl:NamedIndividual>
Notice the duplicate definition (in RDF/XML) to the NamedIndividual. When writing out the ontology, all punned objects are duplicated in a similar manner.
OWL 2 and its other general changes  have arrived in the nick of time. Not only were we seeing some of the weaknesses in OWL 1 that warranted updating, but we are also now being challenged with regard to how to make linked data and the many datasets in RDF effectively interoperate. Perhaps undecidability and throwing triples to the wind worked OK in the early days of our semantic Web Wild West. But now it is time for the new sheriff to bring order to the emerging chaos.
Of course only time will tell, but we believe the design decisions made by the OWL 2 working group were judicious and balanced ones to find that sweet spot between expressiveness and reasoning efficiency . We also believe that, while useful in its less expressive form , that many new domain vocabularies based on SKOS would especially benefit from embracing the OWL 2 metamodeling techniques.
But two criticisms still remain. First, tooling support for OWL 2 and the OWL API is weak, as discussed in an earlier chapter. And, as the last chapter discussed, there are not enough practitioners that have yet taken up OWL 2, which means that best practice guidance and exemplars are still limited.
Lightweight domain ontologies can greatly benefit from these OWL 2 metamodeling techniques and the OWL RL alternative that also emerged as one of the OWL 2 profile enhancements . Structured Dynamics thinks the growing scale and learning taking place around linked data and RDF datasets is now pointing the way to a necessary transition. And OWL 2 metamodeling should be one of the key components to making our semantic technologies more responsive and effective .
As we conclude this recent series on ontology tools and building , one item stands clear: the relative lack of guidance on how one actually builds and maintains these beasties. While there is much of a theoretic basis in the literature and on the Web, and much of methodologies and algorithms, there is surprisingly little on how one actually goes about creating an ontology.
An earlier posting pointed to the now classic Ontology Development 101 article as a good starting point . Another really excellent starting point is the Protégé 4 user manual . Though it is obviously geared to the Protégé tool and its interface, it also is an instructive tutorial on general ontology (OWL) topics and constructs. I highly recommend printing it out and reading it in full.
Another way to learn more about ontology construction is to inspect some existing ontologies. Though one may use a variety of specialty search engines and Google to find ontologies , there are actually three curated services that are more useful and which I recommend.
The best, by far, is the repository created by the University of Manchester for the now-completed TONES project . TONES has access to some 200+ vetted ontologies, plus a search and filtering facility that helps much in finding specific OWL constructs. It is a bit difficult to filter by OWL 2-compliant only ontologies (except for OWL 2 EL), but other than that, the access and use of the repository is very helpful. Another useful aspect is that the system is driven by the OWL API, a central feature that we recommended in the prior tools landscape posting. From a learning standpoint this site is helpful because you can filter by vocabulary.
An older, but similar, repository is OntoSelect. It is difficult to gauge how current this site is, but it nonetheless provides useful and filtered access to OWL ontologies as well.
These sources provide access to complete ontologies. Another way to learn about ontology construction is from a bottom-up perspective. In this regard, the Ontology Design Patterns (ODP) wiki is the definitive source . This is certainly a more advanced resource, since its premise begins from the standpoint of modeling issues and patterns to address them, but the site is also backed by an active community and curated by leading academics. Besides ontology building patterns, ODP also has a listing of exemplary ontologies (though without the structural search and selection features of the sources above). ODP is not likely the first place to turn to and does not give “big picture” guidance, but it also should be a bookmarked reference once you begin real ontology development.
It is useful to start with fully constructed ontologies to begin to appreciate the scope involved with them. But, of course, how one gets to a full ontology is the real purpose of this post. For that purpose, let’s now turn our attention to general and then more specific best practices.
As noted above, there is a relative paucity of guidance or best practices regarding ontologies, their construction and their maintenance. However, that being said, there are some sources whereby guidance can be obtained.
To my knowledge, the most empirical listing of best practices comes from Simperl and Tempich . In that 2006 paper they examined 34 ontology building efforts and commented on cost, effectiveness and methodology needs. It provides an organized listing of observed best practices, though much is also oriented to methodology. I think the items are still relevant, though they are now four to five years old. The paper also contains a good reference list.
Various collective ontology efforts also provide listings of principles or such, which also can be a source for general guidance. The OBO (The Open Biological and Biomedical Ontologies) effort, for example, provides a listing of principles to which its constituent ontologies should adhere . As guidance to what it considers an exemplary ontology, the ODP effort also has a useful organized listing of criteria or guidance.
One common guidance is to re-use existing ontologies and vocabularies as much as possible. This is a major emphasis of the OBO effort . The NeOn methodology also suggests guidelines for building individual ontologies by re-use and re-engineering of other domain ontologies or knowledge resources . Brian Sletten (among a slate of emerging projects) has also pointed to the use of the Simple Knowledge Organization System (SKOS) as a staging vocabulary to represent concept schema like thesauri, taxonomies, controlled vocabularies, and subject headers .
The Protégé manual  is also a source of good tips, especially with regard to naming conventions and the use of the editor. Lastly, the major source for the best practices below comes from Structured Dynamics‘ own internal documentation, now permanently archived. We are pleased to now consolidate this information in one place and to make it public.
General best practices refer to how the ontology is scoped, designed and constructed. Note the governing perspective in this series has been on lightweight, domain ontologies.
In the case of ontology-driven applications using adaptive ontologies , there are also additional instructions contained in the system (via administrative ontologies) that tell the system which types of widgets need to be invoked for different data types and attributes. This is different from the standard conceptual schema, but is nonetheless essential to how such applications are designed.
The administrative ontologies supporting these applications are managed differently than the standard domain ontologies that are the focus of most of the best practices above. Nonetheless, some of the domain ontology best practices work in tandem with them, the combination of which are called adaptive ontologies.
(1) a:Eagle rdf:type owl:Class
(2) a:Harry rdf:type a:Eagle
Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:
(3) a:Eagle rdf:type a:Species
(4) a:Eagle rdf:type a:EndangeredSpecies.
This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005.
“Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Father and the individual Father as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. The technique listed in the main body triggers this treatment in an OWL 2-compliant editor. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.
CWA is the traditional perspective of relational database systems within enterprises. The premise of CWA is that which is not known to be true is presumed to be false; or, any statement not known to be true is false. Another way of saying this is that everything is prohibited until it is permitted. CWA works well in bounded systems such as known product listings or known customer rosters, and is one reason why it is favored for transaction-oriented systems where completeness and performance are essential. In an ontology sense, CWA works best for bounded engineering environments such as aeronautics or petroleum engineering. Closed world ontologies also tend to be much more complicated with many nuanced predicates, and can be quite expensive to build.
The open world assumption (OWA), on the other hand, is premised that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity, and everything is permitted until it is prohibited. As a result, open world works better in knowledge environments with the incorporation of external information such as business intelligence, data warehousing, data integration and federation, and knowledge management.
See further, M. K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, Dec. 21, 2009.
Sometimes this perception of shared views is too strictly interpreted as needing to have one and only one understanding of concepts and language. Far from it. One of the strengths of ontologies and language modeling within them is that multiple terms for the same concept or slight differences in understandings about nearly similar concepts can be accommodated. It is perfectly OK to have differences in terminology and concept understandings so long as those differences are also captured and explicated within the ontology. The recommendations herein that all concepts and terminology be defined, that SemSets be used to capture alternative ways to name concepts, and that concepts often be treated as both classes and instances are some of the best practices that reflect this approach.
So, while consensus building and collaboration methods are at the heart of effective ontology building, those methods need not also strive for a imposition of language and concepts by fiat. In fact, trying to do so undercuts the ability of the collaborative process to lead to greater shared understandings.
These ontology-driven apps, then, are informed structured results sets that are output in a form suitable to various intended applications. This output form can include a variety of serializations, formats or metadata. This flexibility of output is tailored to and responsive to particular generic applications; it is what makes our ontologies “adaptive”. Using this structure, it is possible to either “drive” queries and results sets selections via direct HTTP request or via simple dropdown selections on HTML forms. Similarly, it is possible with a single parameter change to drive either a visualization app or a structured table template from the equivalent query request. Ontology-driven apps through this ontology and architecture design thus provide two profound benefits. First, the entire system can be driven via simple selections or interactions without the need for any programming or technical expertise. And, second, simple additions of new and minor output converters can work to power entirely new applications available to the system.
Previous installments in this series have listed existing ontology tools, overviewed development methodologies, and proposed a new approach to building lightweight, domain ontologies . For the latter to be successful, a new generation in ontology development tools is needed. This post provides an explication of the landscape under which this new generation of tools is occurring.
Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data.
We are now concluding the first decade of ontology development tools, especially those geared to the semantic Web and its associated languages of RDFS and OWL. Last year we also saw the release of the major update to the OWL 2 language, with its shift to more expressiveness and a variety of profiles. The upcoming next generation of ontology tools now must also shift.
The current imperative is to shift away from ontology engineering by a priesthood to pragmatic daily use and maintenance by domain practitioners. Market growth demands simpler, task-focused tools with intuitive interfaces. For this change to occur, the general tools architecture needs to shift its center of gravity from IDEs and comprehensive toolkits to APIs and Web services. Not surprisingly, this same shift is what has been occurring across all areas of software.
In the previous installment of this series, we presented a new methodological approach to ontology development, geared to lightweight, domain ontologies. One aspect of that design was to separate the operational workflow into two pathways:
The ontology build methodology concentrated on the upper half of this diagram (blue, with yellow lead-ins and outcomes) with the various steps overviewed in that installment :
The methodology captured in this diagram embraces many different emphases from current practice: re-use of existing structure and information assets; conscious split between instance data (ABox) and the conceptual structure (TBox) ; incremental design; coherency and other integrity testing; and explicit feedback for scope extension and growth. The methodology also embraces some complementary utility ontologies that also reflect the design of ontology-driven apps .
These are notable changes in emphasis. But they are not the most important one. The most important change is the tools landscape to implement this methodology. This landscape needs to shift to pragmatic daily use and maintenance by domain practitioners. That requires simpler and more task-oriented tools. And that change in tooling needs a still more fundamental shift in tools architecture and design.
In many places throughout this series I use the term “inadequate” to describe the current state of ontology development tools. This characterization is not a criticism of first-generation tools per se. Rather, it is a reflection of their inadequacy to fulfill the realities of the new tooling landscape argued in this series. The fact remains, as initial generation tools, that many of the existing tools are quite remarkable and will play central roles (mostly for the professional ontologist or developer) moving forward.
At the risk of overlooking some important players, let’s trace the (partial) legacy of some of the more pivotal tools in today’s environment.
As early as a decade ago the ontology standards languages were still in flux and the tools basis was similarly immature. Frame logic, description logics, common logic and many others were competing at that time for primacy and visibility. Most ontology tools at that time such as Protégé , OntoEdit , or OilEd  were based on F-logic or the predecessor to OWL, DAML+Oil. But the OWL language was under development by the W3C and in anticipation of its formal release the tools environment was also evolving to meet it. Swoop , for example, was one of the first dedicated OWL browsers. A Protégé plug-in for OWL was also developed by Holger Knublauch . In parallel, the OWL group at the University of Manchester also introduced the OWL API .
With the formal release of OWL 1.0 in 2004, ontology tools continued to migrate to the language. Protégé, up through the version 3x series, became a popular open source system with many visualization and OWL-related plug-ins. Knublauch joined TopQuadrant and brought his OWL experience to TopBraid Composer, which shifted to the Eclipse IDE platform and leveraged the Jena API [9,11]. In Europe, the NeON (Networked Ontologies) project started in 2006 and by 2008 had an Eclipse-based OWL platform using the OWL API with key language processing capabilities through GATE .
Most recently, Protégé and NeON in open source, and TopBraid Composer on the commercial side, have likely had the largest market share of the comprehensive ontology toolkits. So far, with the release of OWL 2 in late 2009, only Protégé in version 4 and the TwoUse Toolkit have yet fully embraced all aspects of the new specification, doing so by intimately linking with the new OWL API (version 3x has full OWL 2 support) . However, most leading reasoners now support OWL 2 and products such as TopBraid Composer and Ontotext’s OWLIM support OWL 2 RL as well .
The evolution of Protégé to version 4 (OWL 2) was led by the University of Manchester via its CO-ODE project , now ended, which has also been a source for most existing Protégé 4 plug-ins. (Because of the switch to OWL 2 and the OWL API most earlier plug-ins are incompatible with Protégé 4.) Manchester has also been a leading force in the development of OWL 2 and the alternative Manchester syntax.
Though only recently stable because of the formalization of OWL 2, Protégé 4 and its linkage to the new OWL API provides for a very powerful combination. With Protégé, the system has a familiar ontology editing framework and a mechanism for plug-in migration and growth. With the OWL API, there is now a common API for leading reasoners (Pellet, HermiT, FaCT++, RacerPro, etc.), a solid ontology management and annotation framework, and validators for various OWL 2 profiles (RL, EL and QL). The system is widely embraced by the biology community, probably the most active scientific field in ontologies. However, plug-in support lags the diversity of prior versions of Protégé and there does not appear to be the energy and community standing behind it as in prior years.
These leading frameworks and toolkits have opted to be “ontology engineering” environments. Via plug-ins and complicated interfaces (tabs or Eclipse-style panes) the intent has apparently been to provide “all capabilities in one box.” The tools have been IDE-centric.
Unfortunately, one must be a combination of ontologist, developer, programmer and IDE expert in order use the tools effectively. And, as incremental capabilities get added to the systems, these also inherit the same complexity and style of the host environment. It is simply not possible to make complex environments and conventions simple.
Curiously, the existence or use of APIs have also not been adequately leveraged. The usefulness of an API means that subsets of information can be extracted and worked on in very clear and simple ways. This information can then be roundtripped without loss. An API allows a tailored subset abstraction of the underlying data model. In contrast, IDEs, such as Protégé or Eclipse, when they play a similar role, force all interfaces to share their built-in complexity.
With these thoughts in mind, then, we set out to architect a tools suite and work flow that could truly take advantage of a central API. We further wanted to isolate the pieces into distributable Web services in keeping with our standard structWSF Web services framework design.
This approach also allows us to split out simpler, focused tools that domain users and practitioners can use. And, we can do all of this while also enabling the existing professional toolsets and IDEs to also interoperate in the environment.
The resulting tools landscape is shown in the diagram below. This diagram takes the same methodology flow from Figure 1 (blue and yellow boxes) and stretches them out in a more linear fashion. Then, we embed the various tools (brown) and APIs (orange) in relation to that methodology:
This diagram is worth expanding to full size and studying in some detail. Aspects of this diagram that deserve more discussion are presented in the sections below.
As noted in the preceding methodology installment, the working ontology is the central object being managed and extended for a given deployment. Because that ontology will evolve and grow over time, it is important the complete ontology specification itself be managed by some form of version control system (green) . This is the one independent tool in the landscape.
Access to and from the working ontology is mediated by the OWL API . The API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines . Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API. It is reasonable to expect more APIs to emerge over time that also interoperate .
The OWL API is the best current choice because of its native capabilities and because Jena does not yet support OWL 2 . However, because of the basic design with structWSF (see next), it is also possible to swap out with different APIs at a later time should developments warrant.
In short, having the API play the central management role in the system means that any and all tools can be designed to interact effectively with the working ontology(ies) without any loss in information due to roundtripping.
Use of the structWSF layer also means that tools and functionality can be distributed anywhere on the Web. Specialized server-side functions can be supported as well as dedicated specialty hardware. Text indexing or disambiguation services can fit within this design.
The ultimate value of piggybacking on the structWSF framework is that all other extant services also become available. Thus, a wealth of converters, data managers, and semantic components (or display widgets) can be invoked depending on the needs of the specific tool.
The objective, of course, of this design is to promote more and simpler tools useful to domain users. Some of these are shown under the Use & Maintain box in the diagram above; others are listed by category in the table below.
The RESTful interface and parameter calls of the structWSF layer further simplify the ontology management and annotation abstractions arising from the OWL API. The number of simple tools available to users under this design is virtually limitless. These tools are also fast to develop and test.
This landscape is not yet a full reality. It is a vision of adaptive and simpler tools, working with a common API, and accessible via platform-independent Web services. It also preserves many of the existing tools and IDEs familiar to present ontology engineers.
However, pieces of this landscape do presently exist and more are on the way. The next section briefly overviews some of the major application areas where these tools might contribute.
If one inspects the earlier listing of 185 ontology tools it is clear that there is a diversity of tools both in terms of scope and function across the entire ontology development stack. It is also clear that nearly all of those 185 tools listed do not communicate with one another. That is a tremendous waste.
Via shared APIs and some degree of consistent design it should be possible to migrate these capabilities into a more-or-less interoperating whole. We have thus tried to categorize some important tool types and exemplar tools from that listing to show the potential that exists. (Please note that the Example Tools are links to the tools and categories from the earlier 185 tools listing.)
This correlation of types and example tools is not meant to be exhaustive nor a recommendation of specific tools. But, this tabulation is illustrative of the potential that exists to both simplify and extend tool support across the entire ontology development workflow:
|Tool Type||Comments||Example Tools|
|OWL API||OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL||OWL API|
|Web Services Layer||This layer provides a common access layer and set of protocols for almost all tools. It depends critically on linkage and communication with the OWL API||structWSF|
|Ontology Editor (IDE)||There are a variety of options in this area. Generally, more complete environments (that is, IDEs) based on OWL and with links to the OWL API are preferred. Less complete editor options are listed under other categories. Note that only Protégé 4 incorporates the OWL API||NeOn toolkit,
|Scripts||In all pragmatic cases the migration of existing structure and vocabulary assets to an ontology framework requires some form of scripting. These may be off the shelf resources, but more often are specific to the use case at hand. Typical scripting languages include the standard ones (Perl, Python, PHP, Ruby, XSLT, etc.) and often involve some form of parsing or regex||variety; specific to use case|
|Converters||Converters are more-or-less pre-packaged scripts for migrating one serialization or data format to another one. As the scripts above continue to be developed, this roster of off-she-shelf starting points can increase. Today, there are perhaps close to 200 converters useful to ontology purposes||irON, ReDeFer, SKOS2GenTax; also see RDFizers|
|Vocabulary Prompter||Domain ontologies are ultimately about meaning, and for that purpose there is much need for definitions, synonyms, hyponyms, and related language assets. Vocabulary prompters take input documents or structures and help identify additional vocabulary useful for characterizing semantic meaning||see the TechWiki’s vocab prompting tools; ROC|
|Spreadsheet||Spreadsheets can be important initial development environments for users without explicit ontology engineering backgrounds. The biggest issue with spreadsheets is that what is specified in them is more general or simplistic compared to what is contained in an actual ontology. Attempts to have spreadsheets capture all of this sophistication are often less than satisfactory. One way to effective “round trip” with spreadsheets (and many related simple tools) is to adhere to an OWL API||Anzo, RDF123, irON (commON), Excel, Open Office|
|Editor (general)||Ontology editing spans from simple structures useful to non-ontologists to those (like the IDEs or toolkits) that capture all aspects of the ontology. Further, some of these editors are strictly textual or (literally) editors; others span or attempt to enable visual editing. Visual editing (see below) can ultimately extend to the ontology graph itself||see the TechWiki’s ontology editing tools|
|Alignment API||The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF||Alignment API|
|Mapper||A variety of tools, algorithms and techniques are available for matching or mapping concepts between two different ontologies. In general, no single method has shown itself individually superior. The better approaches use voting methods based on multiple comparisons||see the TechWiki’s ontology mapping tools|
|Ontology Browser||Ontology browsers enable the navigation or exploration of the ontology — generally in visual form — but without allowing explicit editing of the structure||Relation Browser, Ontology Browser, OwlSight, FlexViz|
|Vocabulary Manager||Vocabulary managers provide a central facility for viewing, selecting, accessing and managing all aspects of the vocabulary in an ontology (that is, to the level of all classes and properties). This tool category is poorly represented at present. Ultimately, vocabulary managers should also be one (if not the main) access point to vocabulary editing||PoolParty, TermWiki, UMBEL Web service|
|Vocabulary Editor||Vocabulary editors provide (generally simple) interfaces for the editing and updating of vocabulary terms, classes and properties in an ontology||Neologism, TemaTres, ThManager, Vocab Editor|
|Structure Editor||A structure editor is a specific form of an ontology editor, geared to the subsumption (taxonomic) organization of a largely hierarchical structure. Editors of this form tend to use tree controls or spreadsheets with indented organization to show parent and child relationships||PoolParty, irON (commON)|
|Graph Analysis||Ontologies form graph structures, which are amenable to many specific network and graph analysis algorithms, included relatedness, shortest path, grouped structures, communities and the like||SNAP, igraph, Network Workbench, NetworkX, Ontology Metrics|
|Graph API||Graph visualization with associated tools is best enabled by working from a common API. This allows for expansion and re-use of other capabilities. Preferably, this graph API would also have direct interaction with the OWL API, but none exist at the moment||under investigation|
|Graph Visualizer||Graph visualizers enable the ontology to be rendered in graph form and presentation, often with multiple layout options. The systems also enable export to PDF or graphics formats for display or printing. The better tools in this category can handle large graphs, can have their displays easily configured, and are performant||see the TechWiki’s ontology visualization tools|
|Visual Editor||An ontology visual editor enables the direct manipulation of the graph in a visual mode. This capability includes adding and moving nodes, changing linkages between nodes, and other ontology specification. Very few tools exist in this category at present||COE, TwoUse Toolkit|
|Coherence Tester||Testing for coherence involves whether the ontology structure is properly constructed and has logical interconnections. The testing either involves inference and logic testing (including entailments) based on the structure as provided; comparisons with already vetted logical structures and knowledge bases (e.g., Cyc, Wikipedia); or both||Cyc, OWLim, FactForge|
|Gap Tester||Related to coherence testing, gap testing is the identification of key missing pieces or intermediary nodes in the ontology graph. This tends to happen when external specification of the ontology is made without reference to connecting information||requires use of a reference external ontology; see above|
|Documenter||Ontology documentation is not limited to the technical specifications of the structure, but also includes best practices, how-to and use guides, and the like. Automated generation of structure documentation is also highly desirable||TechWiki, SpecGen, OWLDoc|
|Tagger||Once constructed, ontologies (and their accompanying named entity dictionaries) can be very powerful resources for aiding tagging and information extraction utilities. Like vocabulary prompting, there is a broad spectrum of potential tools and uses in the tagging category||GATE (OBIE); many other options|
|Exporter||Exports need to range from full-blown OWL representations to the simpler export of data and constructs. Multiple serialization options and the ability to support the input requirements of third-party tools is also important||OWL Syntax Converter, OWL Verbalizer; many various options|
The beauty of this approach is that most of the tools listed are open source and potentially amenable to the minor modifications necessary to conform with this proposed landscape.
Contrasting the normative tools landscape above with the existing listing of ontology tools points out some key gaps or areas deserving more development attention. Some of these are:
Finally, it does appear that the effort and focus behind Protégé seems to be slowing somewhat. The future has clearly shifted to OWL 2 with Protégé 4. Yet, besides the admirable CO-ODE project (now ended), tools and plug-in support seems to have slowed. Many of the admirable plug-ins for Protégé 3x do not appear to be under active development as upgrades to Protégé 4. While Protégé’s future (and similar IDEs) seems assured, its prominence possibly will (and should) be replaced by a simpler kit of tools useful to users and practitioners.
For the past few months we at Structured Dynamics have seen ontology design and management as the pending technical priorities within the semantic technology space. Now that the market no longer looks at “ontology” as a four-letter word, it is imperative to simplify the development and use of ontologies. The first generation of tools leading up to this point have been helpful to understand the semantic space; changes are now necessary to expand it.
In our first generation we have begun to understand the types and nature of needed tools. But our focus on IDEs and comprehensive toolsets belies a developer’s or technologist’s perspective. We need to now shift focus and look at tool needs from the standpoint of users and actual use of ontologies. Many players and many toolmakers and innovators will need to contribute to build this market for semantic technologies and approaches.
Fortunately, replacing an IDE focus with one based around APIs and Web services should be a fairly smooth and natural transition. If we truly desire to be market makers, we need to stand back and place ourselves into the shoes of the domain practitioners, the subject matter experts. We need to shield actual users from all of the silly technical details and complexity. And, then, let’s focus — task-by-task — on discrete items of management and use of ontologies. Growth of the semantic technology space depends on expanding our practitioner base.
For its part, Structured Dynamics is presently seeking new projects and sponsors with a commitment to these aims. Like our prior development of structWSF and semantic components, we will be looking to make simpler ontology tools a priority in the coming months. Please let me know if you want to partner with us toward this commitment.