Posted:September 13, 2010

A Reference Guide to Ontology Best Practices

A Single-stop Assembly of Ontology Tips and Pointers

As we conclude this recent series on ontology tools and building [1], one item stands clear: the relative lack of guidance on how one actually builds and maintains these beasties. While there is much of a theoretic basis in the literature and on the Web, and much of methodologies and algorithms, there is surprisingly little on how one actually goes about creating an ontology.

An earlier posting pointed to the now classic Ontology Development 101 article as a good starting point [2]. Another really excellent starting point is the Protégé 4 user manual [3]. Though it is obviously geared to the Protégé tool and its interface, it also is an instructive tutorial on general ontology (OWL) topics and constructs. I highly recommend printing it out and reading it in full.

If you do nothing else, you should download, print and study in full the Protégé 4 users manual [3].

Learning by Example

Another way to learn more about ontology construction is to inspect some existing ontologies. Though one may use a variety of specialty search engines and Google to find ontologies [4], there are actually three curated services that are more useful and which I recommend.

The best, by far, is the repository created by the University of Manchester for the now-completed TONES project [5]. TONES has access to some 200+ vetted ontologies, plus a search and filtering facility that helps much in finding specific OWL constructs. It is a bit difficult to filter by OWL 2-compliant only ontologies (except for OWL 2 EL), but other than that, the access and use of the repository is very helpful. Another useful aspect is that the system is driven by the OWL API, a central feature that we recommended in the prior tools landscape posting. From a learning standpoint this site is helpful because you can filter by vocabulary.

An older, but similar, repository is OntoSelect. It is difficult to gauge how current this site is, but it nonetheless provides useful and filtered access to OWL ontologies as well.

These sources provide access to complete ontologies. Another way to learn about ontology construction is from a bottom-up perspective. In this regard, the Ontology Design Patterns (ODP) wiki is the definitive source [6]. This is certainly a more advanced resource, since its premise begins from the standpoint of modeling issues and patterns to address them, but the site is also backed by an active community and curated by leading academics. Besides ontology building patterns, ODP also has a listing of exemplary ontologies (though without the structural search and selection features of the sources above). ODP is not likely the first place to turn to and does not give “big picture” guidance, but it also should be a bookmarked reference once you begin real ontology development.

It is useful to start with fully constructed ontologies to begin to appreciate the scope involved with them. But, of course, how one gets to a full ontology is the real purpose of this post. For that purpose, let’s now turn our attention to general and then more specific best practices.

Sources of Best Practices

As noted above, there is a relative paucity of guidance or best practices regarding ontologies, their construction and their maintenance. However, that being said, there are some sources whereby guidance can be obtained.

To my knowledge, the most empirical listing of best practices comes from Simperl and Tempich [7]. In that 2006 paper they examined 34 ontology building efforts and commented on cost, effectiveness and methodology needs. It provides an organized listing of observed best practices, though much is also oriented to methodology. I think the items are still relevant, though they are now four to five years old. The paper also contains a good reference list.

Various collective ontology efforts also provide listings of principles or such, which also can be a source for general guidance. The OBO (The Open Biological and Biomedical Ontologies) effort, for example, provides a listing of principles to which its constituent ontologies should adhere [8]. As guidance to what it considers an exemplary ontology, the ODP effort also has a useful organized listing of criteria or guidance.

One common guidance is to re-use existing ontologies and vocabularies as much as possible. This is a major emphasis of the OBO effort [9]. The NeOn methodology also suggests guidelines for building individual ontologies by re-use and re-engineering of other domain ontologies or knowledge resources [10]. Brian Sletten (among a slate of emerging projects) has also pointed to the use of the Simple Knowledge Organization System (SKOS) as a staging vocabulary to represent concept schema like thesauri, taxonomies, controlled vocabularies, and subject headers [11].

The Protégé manual [3] is also a source of good tips, especially with regard to naming conventions and the use of the editor. Lastly, the major source for the best practices below comes from Structured Dynamics‘ own internal documentation, now permanently archived. We are pleased to now consolidate this information in one place and to make it public.

The best practices herein are presented as single bullet points. Not all are required and some may be changed depending on your own preferences. In all cases, however, these best practices are offered from Structured Dynamics’ perspective regarding the use and role of adaptive ontologies [12]. To our knowledge, this perspective is a unique combination of objectives and practices, though many of the individual practices are recommended by others.

General Best Practices

General best practices refer to how the ontology is scoped, designed and constructed. Note the governing perspective in this series has been on lightweight, domain ontologies.

Scope and Content

Provide balanced coverage of the subject domain. The breadth and depth of the coverage in the ontology should be roughly equivalent across its scope
Reuse structure and vocabularies as much as possible. This best practice refers to leveraging non-ontological content such as existing relational database schema, taxonomies, controlled vocabularies, MDM directories, industry specifications, and spreadsheets and informal lists. Practitioners within domains have been looking at the questions of relationships, structure, language and meaning for decades. Effort has already been expended to codify many of these understandings. Good practice therefore leverages these existing structural and vocabulary assets (of any nature), and relies on known design patterns
Embed the domain coverage into a proper context. A major strength of ontologies is their potential ability to interoperate with other ontologies. Re-using existing and well-accepted vocabularies and including concepts in the subject ontology that aid such connections is good practice. The ontology should also have sufficient reference structure for guiding the assignment of what content “is about”
Define clear predicates (also known as properties, relationships, attributes, edges or slots), including a precise definition. Then, when relating two things to one another, use care in actually assigning these properties. Initially, assignments should start with a logical taxonomic or categorization structure and expand from there into more nuanced predicates
Ensure the relationships in the ontology are coherent. The essence of coherence is that it is a state of logical, consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. Is the hip bone connected to the thigh bone, or is the skeleton askew? Testing (see below) is a major aspect for meeting this best practice
Map to external ontologies to increase the likelihood of sharing and interoperability. In Structured Dynamics’ case, we also attempt to map at minimum to the UMBEL subject reference structure for this purpose [13]
Rely upon a set of core ontologies for external re-use purposes; Structured Dynamics tends to rely on a set of primary and secondary standard ontologies [14]. The corollary to this best practice is don’t link indiscriminantly.

Structure and Design

Begin with a lightweight, domain ontology [15]. Ontologies built for the pragmatic purposes of setting context and aiding disparate data to interoperate tend to be lightweight with only a few predicates, such as isAbout, narrowerThan or broaderThan. But, if done properly, these lighter weight ontologies with more limited objectives can be surprisingly powerful in discovering connections and relationships. Moreover, they are a logical and doable intermediate step on the path to more demanding semantic analysis. Because we have this perspective, we also tend to rely heavily on the SKOS vocabulary for many of our ontology constructs [16]
Try to structurally split domain concepts from instance records. Concepts represent the nodes within the structure of the ontology (also known as classes, subject concepts or the TBox). Instances represent the data that populates that structure (also known as named entities, individuals or the ABox) [17]. Trying to keep the ABox and TBox separate enables easier maintenance, better understandability of the ontology, and better scalability and incorporation of new data repositories
Treat many concepts via “punning” as both classes and instances (that is, as either sets or members, depending on context). The “punning” technique enables “metamodeling,” such as treating something via its IRI as a set of members (such as Mammal being a set of all mammals) or as an instance (such as Endangered Mammal) when it is the object of a different contextual assertion. Use of “metamodeling” is often helpful to describe the overall conceptual structure of a domain. See endnote [18] for more discussion on this topic
Build ontologies incrementally. Because good ontologies embrace the open world approach [19], working toward these desired end states can also be incremental. Thus, in the face of common budget or deadline constraints, it is possible initially to scope domains as smaller or to provide less coverage in depth or to use a small set of predicates, all the while still achieving productive use of the ontology. Then, over time, the scope can be expanded incrementally. Much value can be realized by starting small, being simple, and emphasizing the pragmatic. It is OK to make those connections that are doable and defensible today, while delaying until later the full scope of semantic complexities associated with complete data alignment
Build modular ontologies that split your domain and problem space into logical clusters. Good ontology design, especially for larger projects, warrants a degree of modularity. An architecture of multiple ontologies often works together to isolate different work tasks so as to aid better ontology management. Also, try to use a core set of primitives to build up more complex parts. This is a kind of reuse within the same ontology, as opposed to reusing external ontologies and patterns. The corollary to this is: the same concepts are not created independently multiple times in different places in the ontology. Adhering to both of these practices tends to make ontology development akin to object-oriented programming
Assign domains and ranges to your properties. Domains apply to the subject (the left hand side of a triple); ranges to the object (the right hand side of the triple). Domains and ranges should not be understood as actual constraints, but as axioms to be used by reasoners. In general, domain for a property is the range for its inverse and the range for a property is the domain of its inverse. Use of domains and ranges will assist testing (see below) and help ensure the coherency of your ontology
Assign property restrictions, but do so sparingly and judiciously [20]. Use of property restrictions will assist testing (see below) and help ensure the coherency of your ontology
Use disjoint classes to separate classes from one another where the logic makes sense and dictates (if not explicitly stated, they are assumed to overlap)
Write the ontology in a machine-processable language such as OWL or RDF Schema (among others), and
Aggressively use annotation properties (see next) to promote the usefulness and human readability of the ontology.

Naming and Vocabulary Best Practices

Name all concepts as single nouns. Use CamelCase notation for these classes (that is, class names should start with a capital letter and not contain any spaces, such as MyNewConcept)
Name all properties as verb senses (so that triples may be actually read); e.g., hasProperty. Try to use mixedCase notation for naming these predicates (that is, begin with lower case but still capitalize thereafter and don’t use spaces)
Try to use common and descriptive prefixes and suffixes for related properties or classes (while they are just labels and their names have no inherent semantic meaning, it is still a useful way for humans to cluster and understand your vocabularies). For examples, properties about languages or tools might contain suffixes such as ‘Language‘ or ‘Tool‘ for all related properties
Provide inverse properties where it makes sense, and adjust the verb senses in the predicates to accommodate. For example, <Father> <hasChild> <Janie> would be expressed inversely as <Janie> <isChildOf> <Father>
Give all concepts and properties a definition. The matching and alignment of things is done on the basis of concepts (not simply labels) which means each concept must be defined [21]. Providing clear definitions (along with the coherency of its structure) gives an ontology its semantics. Remember not to confuse the label for a concept with its meaning. (This approach also aids multi-linguality). In its own ontologies, Structured Dynamics uses the property of skos:definition, though others such as rdfs:comment or dc:description are also commonly used
Provide a preferred label annotation property that is used for human readable purposes and in user interfaces. For this purpose, Structured Dynamics uses the property of skos:prefLabel
Include a class “SemSet”, which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept. The umbel:SemSet construct enables a listing of individual members to be generated that provides the matching set for tagging and information extraction tasks. (As such, also include the prefLabel in the SemSet for proper lookup and tagging purposes.) The SemSet construct is similar to the “synsets” in Wordnet, but with a broader use understanding. This construct is an integral part of Structured Dynamics’ approach to using ontologies for information extraction and tagging of unstructured text
Try to assign logical and short names to namespaces used for your vocabularies, such as foaf:XXX, umbel:XXX or skos:XXX, with a maximimum of five letters preferred
Enable multi-lingual capabilities in all definitions and labels. This is a rather complicated best practice in its own right. For the time being, it means being attentive to the xml:lang=”en” (for English, in this case) property for all annotation properties
(If you disagree with these naming conventions, use your own, but in any event, be consistent!!).

Documentation Best Practices

Like good software programs, a properly constructed and commented ontology is the first requirement of best practice documentation
The entire ontology vocabulary should be documented via a dedicated system that allows finding, selecting and editing of any and all ontology terms and their properties
The methodologies should be documented for ontology construction and maintenance, including naming, selection, completeness and other criteria. Documents such as this one and others in this series provide examples of important supplementary documentation regarding methodology and practice
Provide a complete TechWiki-like documentation system for use cases, best practices, evaluation and testing metrics, tools installation and use, and all aspects of the ontology lifecycle should be provided and supported [22]
Develop a complete graph of the ontology and make it available via graph visualization tools to aid understanding of the ontology in its complete aspect [23], and
Other ample diagrams and flowcharts should also be prepared and made available for knowledge workers’ use. UML diagrams, for example, might be included here, but general workflows and concept relationships should be explicated in any case through visual means. Such diagrams are much easier to understand and follow than the actual ontology specification.

Organizational and Collaborative Best Practices

Collaboration is an implementation best practice [24]
Re-use of already agreed-up structures and vocabularies respects prior investments and needs to be emphasized
Improved processes for consensus making, including tools support, must be found to enable work groups to identify and decide upon terminology, definitions, alternative labels (SemSets), and relations between concepts. These processes need not be at the formal ontology level, but at the level of the concept graph underlying the ontology [24].

Testing Best Practices

Test new concepts, aided by proper domain, range and property restrictions; by invoking reasoners such that inconsistencies can be determined [25]
Test new properties, aided by invoking reasoners, which will identify inconsistencies [25]
Test via external class assignments, by linking to classes in external ontologies, which acts to ‘explode the domain’ [26]
Use external knowledge bases and ontologies, such as Cyc or UMBEL [27], to conduct coherency testing for the basic structure and relationships in the ontology
Evolve the ontology specification to include necessary and sufficient conditions [25] aid more complete reasoner testing for consistency and coherence.

Best Practices for Adaptive Ontologies

In the case of ontology-driven applications using adaptive ontologies [28], there are also additional instructions contained in the system (via administrative ontologies) that tell the system which types of widgets need to be invoked for different data types and attributes. This is different from the standard conceptual schema, but is nonetheless essential to how such applications are designed.

Use the structWSF middleware layer [29] as the abstract access point to:
- To create, update, delete or otherwise manage data records
- To browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
- To take one of these results sets and progress it through various workflows involving specialized analysis, applications, or visualization.
Supplement the domain ontology with a semantic component ontology for the purposes of guiding data widget display and visualization [30], and
Supplement the domain ontology with the irON (instance record Object Notation) for dataset exchange and interoperability [31].

The administrative ontologies supporting these applications are managed differently than the standard domain ontologies that are the focus of most of the best practices above. Nonetheless, some of the domain ontology best practices work in tandem with them, the combination of which are called adaptive ontologies.

[1] This posting is part of a current series on ontology development and tools, now permanently archived and updated on the OpenStructs TechWiki. The series began with An Executive Intro to Ontologies, then continued with an update of the prior Ontology Tools listing, which now contains 185 tools. It progressed to a survey of ontology development methodologies. That led to a presentation of a new, Lightweight, Domain Ontologies Development Methodology. That piece was then expanded to address A New Landscape in Ontology Development Tools. This portion completes the series.

[2] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.

[3] Matthew Horridge et al., 2009. A Practical Guide to Building OWL Ontologies Using Protégé 4 and CO-ODE Tools, manual prepared by the University of Manchester, March 13, 2009. 108 pp. See http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/resources/ProtegeOWLTutorialP4_v1_2.pdf.

[4] Specialty search engines for ontologies include Swoogle, FalconS, Watson, Sindice and SWSE. In addition, one can use a general search engine such as Google with a search query such as <topic> owl:equivalentClass filetype:owl. Note the filetype might also include RDF or a variant such as N3, and other language-specific constructs of interest can also be substituted for the owl:equivalentClass.

[5] The TONES Ontology Repository is primarily designed to be a central location for ontologies that might be of use to tools developers for testing purposes. It has a nice browse facility, as well as filtering by OWL vocabulary. The system contains about 220 ontologies and is powered by the OWL API.

[6] OntologyDesignPatterns.org is a semantic Web portal dedicated to ontology design patterns (ODPs). The portal was started under the NeOn project, which still partly supports its development.

[7] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. “Ontology Engineering: A Reality Check,” in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE2006, 2006. See http://ontocom.ag-nbi.de/docs/odbase2006.pdf .

[8] See http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles.

[9] Barry Smith et al., 2007. “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration,” in Nature Biotechnology 25: 1251 – 1255, published online 7 November 2007; see http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf.

[10] See the NeOn networked ontologies project; see http://www.neon-project.org/. The four-year project began in 2006 and its first open source toolkit was released by the end of 2007. OWL features were added in 2008-09. NeON has since completed, though its toolkit and plug-ins can still be downloaded as open source.

[11] Brian Sletten, 2008. “Applying SKOS Concept Schemes,” on the DevX Web site, July 22, 2008; see http://www.devx.com/semantic/Article/38629.

[12] M. K. Bergman, 2009. “Confronting Misconceptions with Adaptive Ontologies,” AI3:::Adaptive Information blog, Aug. 17, 2009.

[13] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.

[14] Core ontologies are Dublin Core, DC Terms, Event, FOAF, GeoNames, SKOS, Timeline, and UMBEL. The various criteria that are considered in nominating an existing ontology to “core” status is that it should be general; highly used; universal; broad committee or community support; well done and documented; and easily understood. Though less universal, there are also a number of secondary ontologies, namely BIBO, DOAP, and SIOC.

[15] See Fausto Giunchiglia, Maurizio Marchese and Ilya Zaihrayeu, 2006. “Encoding Classifications into Lightweight Ontologies,” see http://www.science.unitn.it/~marchese/pdf/encoding%20classifications%20into%20lightweight%20ontologies_JoDS8.pdf. Also, M. K. Bergman, 2010. “A New Methodology for Buidling Lightweight, Domain Ontologies,” AI3:::Adaptive Information blog, Sept. 1, 2010.

[16] Alistair Miles and Sean Bechhofer, eds., 2009. SKOS Simple Knowledge Organization System Reference, W3C Recommendation, 18 August 2009. See http://www.w3.org/TR/skos-reference/. Some of the common SKOS predicates used in our ontologies include skos:definition, skos:prefLabel, skos:altLabel, skos:broaderTransitive, skos:narrowerTransitive.

[17] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. These distinctions have their grounding in description logics.

[18] In the domain ontologies that are the focus here, we often want to treat our concepts as both classes and instances of a class. This is known as “metamodeling” or “metaclassing” and is enabled by “punning” in OWL 2. For example, here a case cited on the OWL 2 wiki entry on “punning“:

People sometimes want to have metaclasses. Imagine you want to model information about the animal kingdom. Hence, you introduce a class a:Eagle, and then you introduce instances of a:Eagle such as a:Harry.

(1) a:Eagle rdf:type owl:Class
(2) a:Harry rdf:type a:Eagle

Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:

(3) a:Eagle rdf:type a:Species
(4) a:Eagle rdf:type a:EndangeredSpecies.

This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005.

“Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Father and the individual Father as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. The technique listed in the main body triggers this treatment in an OWL 2-compliant editor. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.

[19] There is a role and place for closed world assumption (CWA) ontologies, though Structured Dynamics does not engage in them.

CWA is the traditional perspective of relational database systems within enterprises. The premise of CWA is that which is not known to be true is presumed to be false; or, any statement not known to be true is false. Another way of saying this is that everything is prohibited until it is permitted. CWA works well in bounded systems such as known product listings or known customer rosters, and is one reason why it is favored for transaction-oriented systems where completeness and performance are essential. In an ontology sense, CWA works best for bounded engineering environments such as aeronautics or petroleum engineering. Closed world ontologies also tend to be much more complicated with many nuanced predicates, and can be quite expensive to build.

The open world assumption (OWA), on the other hand, is premised that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity, and everything is permitted until it is prohibited. As a result, open world works better in knowledge environments with the incorporation of external information such as business intelligence, data warehousing, data integration and federation, and knowledge management.

See further, M. K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, Dec. 21, 2009.

[20] See [3] for a good description of property restrictions in Section 4 and Appendix A.

[21] As another commentary on the importance of definitions, see http://ontologyblog.blogspot.com/2010/09/physician-decries-lack-of-definitions.html.

[22] The technical wiki (TechWiki) is the central repository for all documentation related to OpenStructs projects. TechWiki is the location for users and interested parties to learn about these projects and their applications, and for developers to author and write about their use and best practices. Both the TechWiki’s content and its software and organizatonal structure may be downloaded for free for setting up similar local technical documentation.

[23] See M. K. Bergman, 2008. “Large-scale RDF Graph Visualization Tools,” AI3:::Adaptive Information blog, Jan. 28, 2008; and M. K. Bergman, 2008. “Cytoscape: Hands-down Winner for Large-scale Graph Visualization,” AI3:::Adaptive Information blog, Jan. 28, 2008.

[24] The central role of ontologies is to describe a “worldview” and in specific organizations this means a shared understanding of the concepts, relations and terminology to describe the participants’ common domain. In turn, these shared understandings establish the semantics for how to effect communication and understanding within the population of domain users. All of this means that finding ways to identify and agree upon shared vocabularies and understandings is central to the task of modeling (creating an ontology) for the domain.

Sometimes this perception of shared views is too strictly interpreted as needing to have one and only one understanding of concepts and language. Far from it. One of the strengths of ontologies and language modeling within them is that multiple terms for the same concept or slight differences in understandings about nearly similar concepts can be accommodated. It is perfectly OK to have differences in terminology and concept understandings so long as those differences are also captured and explicated within the ontology. The recommendations herein that all concepts and terminology be defined, that SemSets be used to capture alternative ways to name concepts, and that concepts often be treated as both classes and instances are some of the best practices that reflect this approach.

So, while consensus building and collaboration methods are at the heart of effective ontology building, those methods need not also strive for a imposition of language and concepts by fiat. In fact, trying to do so undercuts the ability of the collaborative process to lead to greater shared understandings.

[25] See [3] for a good description of various testing and consistency checks in Sections 4.9 to 4.14.

[26] See Frédérick Giasson, 2008. “Exploding the Domain,” from his blog, April 20, 2008. ‘Exploding the domain’ means what happens when internal ontology concepts are linked to related ones on the external Web, which helps to bring in more information and context about the concept. It is also a way to test the coherence of the original concept.

[27] Already vetted knowledge bases can be a good reference testbed for testing the coherence of concepts in a new domain ontology. If the domain ontology describes concepts quite differently than standard practice (Wikipedia, Cyc and UMBEL are good for testing this), or if relationships between concepts are greatly at variance (Cyc and UMBEL are good for this), then there are likely coherency problems. In other domains other reference knowledge bases, more specific to the domain, can be used in similar ways.

[28] Structured Dynamics’ ontology-driven apps are generic applications, the operations of which are guided by the instructions and nature of the underlying data that feeds them. For example, in the case of a standard structured data display (say, a simple table like a Wikipedia infobox), such generic design includes templates tailored to various instance types (say, locational information presenting on a map versus people information warranting a image and vital statistics). Alternatively, in the generic design for a data visualization application using Adobe Flash, the information output of the results set contains certain formats and attributes, keyed by an administrative ontology linked by data type to a domain ontology’s results sets.

These ontology-driven apps, then, are informed structured results sets that are output in a form suitable to various intended applications. This output form can include a variety of serializations, formats or metadata. This flexibility of output is tailored to and responsive to particular generic applications; it is what makes our ontologies “adaptive”. Using this structure, it is possible to either “drive” queries and results sets selections via direct HTTP request or via simple dropdown selections on HTML forms. Similarly, it is possible with a single parameter change to drive either a visualization app or a structured table template from the equivalent query request. Ontology-driven apps through this ontology and architecture design thus provide two profound benefits. First, the entire system can be driven via simple selections or interactions without the need for any programming or technical expertise. And, second, simple additions of new and minor output converters can work to power entirely new applications available to the system.

[29] The structWSF Web services framework is generally RESTful middleware that provides a bridge between existing content and structure and content management systems and available indexing engines and RDF data stores. structWSF is a platform-independent means for distributed collaboration via an innovative dataset access paradigm. It has about twenty embedded Web services. See http://openstructs.org/structwsf.

[30] A semantic component is a Flex component that takes record descriptions and schema as input, and then outputs some (possibly interactive) visualizations of that record. Depending on the logic described in the input schema and the input record descriptions, the semantic component will behave differently to optimize its presentation to the users. About a dozen semantic components are available from the Semantic Component (Flex) Library. The Semantic Component Ontology is the governing structure for these schema.

[31] irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples and code parsers and converters are also provided for the irXML, irJSON and commON serializations.

Posted:September 7, 2010

A New Landscape in Ontology Development Tools

Shifting the Center of Gravity to the OWL API, Web Services

Previous installments in this series have listed existing ontology tools, overviewed development methodologies, and proposed a new approach to building lightweight, domain ontologies [1]. For the latter to be successful, a new generation in ontology development tools is needed. This post provides an explication of the landscape under which this new generation of tools is occurring. Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data. We are now concluding the first decade of ontology development tools, especially those geared to the semantic Web and its associated languages of RDFS and OWL. Last year we also saw the release of the major update to the OWL 2 language, with its shift to more expressiveness and a variety of profiles. The upcoming next generation of ontology tools now must also shift. The current imperative is to shift away from ontology engineering by a priesthood to pragmatic daily use and maintenance by domain practitioners. Market growth demands simpler, task-focused tools with intuitive interfaces. For this change to occur, the general tools architecture needs to shift its center of gravity from IDEs and comprehensive toolkits to APIs and Web services. Not surprisingly, this same shift is what has been occurring across all areas of software.

Methodology Reprise: The Nature of the Landscape

In the previous installment of this series, we presented a new methodological approach to ontology development, geared to lightweight, domain ontologies. One aspect of that design was to separate the operational workflow into two pathways:

Instances, and their descriptive characteristics, and
Conceptual relationships, or ontologies.

The ontology build methodology concentrated on the upper half of this diagram (blue, with yellow lead-ins and outcomes) with the various steps overviewed in that installment [2]:

Figure 1. Flowchart of Ontology Development Methodology (click to expand)

The methodology captured in this diagram embraces many different emphases from current practice: re-use of existing structure and information assets; conscious split between instance data (ABox) and the conceptual structure (TBox) [3]; incremental design; coherency and other integrity testing; and explicit feedback for scope extension and growth. The methodology also embraces some complementary utility ontologies that also reflect the design of ontology-driven apps [4]. These are notable changes in emphasis. But they are not the most important one. The most important change is the tools landscape to implement this methodology. This landscape needs to shift to pragmatic daily use and maintenance by domain practitioners. That requires simpler and more task-oriented tools. And that change in tooling needs a still more fundamental shift in tools architecture and design.

A Legacy of Excellent First Generation Tools

In many places throughout this series I use the term “inadequate” to describe the current state of ontology development tools. This characterization is not a criticism of first-generation tools per se. Rather, it is a reflection of their inadequacy to fulfill the realities of the new tooling landscape argued in this series. The fact remains, as initial generation tools, that many of the existing tools are quite remarkable and will play central roles (mostly for the professional ontologist or developer) moving forward. At the risk of overlooking some important players, let’s trace the (partial) legacy of some of the more pivotal tools in today’s environment. As early as a decade ago the ontology standards languages were still in flux and the tools basis was similarly immature. Frame logic, description logics, common logic and many others were competing at that time for primacy and visibility. Most ontology tools at that time such as Protégé [5], OntoEdit [6], or OilEd [7] were based on F-logic or the predecessor to OWL, DAML+Oil. But the OWL language was under development by the W3C and in anticipation of its formal release the tools environment was also evolving to meet it. Swoop [8], for example, was one of the first dedicated OWL browsers. A Protégé plug-in for OWL was also developed by Holger Knublauch [9]. In parallel, the OWL group at the University of Manchester also introduced the OWL API [10]. With the formal release of OWL 1.0 in 2004, ontology tools continued to migrate to the language. Protégé, up through the version 3x series, became a popular open source system with many visualization and OWL-related plug-ins. Knublauch joined TopQuadrant and brought his OWL experience to TopBraid Composer, which shifted to the Eclipse IDE platform and leveraged the Jena API [9,11]. In Europe, the NeON (Networked Ontologies) project started in 2006 and by 2008 had an Eclipse-based OWL platform using the OWL API with key language processing capabilities through GATE [12]. Most recently, Protégé and NeON in open source, and TopBraid Composer on the commercial side, have likely had the largest market share of the comprehensive ontology toolkits. So far, with the release of OWL 2 in late 2009, only Protégé in version 4 and the TwoUse Toolkit have yet fully embraced all aspects of the new specification, doing so by intimately linking with the new OWL API (version 3x has full OWL 2 support) [13]. However, most leading reasoners now support OWL 2 and products such as TopBraid Composer and Ontotext’s OWLIM support OWL 2 RL as well [14]. The evolution of Protégé to version 4 (OWL 2) was led by the University of Manchester via its CO-ODE project [15], now ended, which has also been a source for most existing Protégé 4 plug-ins. (Because of the switch to OWL 2 and the OWL API most earlier plug-ins are incompatible with Protégé 4.) Manchester has also been a leading force in the development of OWL 2 and the alternative Manchester syntax. Though only recently stable because of the formalization of OWL 2, Protégé 4 and its linkage to the new OWL API provides for a very powerful combination. With Protégé, the system has a familiar ontology editing framework and a mechanism for plug-in migration and growth. With the OWL API, there is now a common API for leading reasoners (Pellet, HermiT, FaCT++, RacerPro, etc.), a solid ontology management and annotation framework, and validators for various OWL 2 profiles (RL, EL and QL). The system is widely embraced by the biology community, probably the most active scientific field in ontologies. However, plug-in support lags the diversity of prior versions of Protégé and there does not appear to be the energy and community standing behind it as in prior years.

A Normative Tools Landscape

These leading frameworks and toolkits have opted to be “ontology engineering” environments. Via plug-ins and complicated interfaces (tabs or Eclipse-style panes) the intent has apparently been to provide “all capabilities in one box.” The tools have been IDE-centric. Unfortunately, one must be a combination of ontologist, developer, programmer and IDE expert in order use the tools effectively. And, as incremental capabilities get added to the systems, these also inherit the same complexity and style of the host environment. It is simply not possible to make complex environments and conventions simple. Curiously, the existence or use of APIs have also not been adequately leveraged. The usefulness of an API means that subsets of information can be extracted and worked on in very clear and simple ways. This information can then be roundtripped without loss. An API allows a tailored subset abstraction of the underlying data model. In contrast, IDEs, such as Protégé or Eclipse, when they play a similar role, force all interfaces to share their built-in complexity. With these thoughts in mind, then, we set out to architect a tools suite and work flow that could truly take advantage of a central API. We further wanted to isolate the pieces into distributable Web services in keeping with our standard structWSF Web services framework design. This approach also allows us to split out simpler, focused tools that domain users and practitioners can use. And, we can do all of this while also enabling the existing professional toolsets and IDEs to also interoperate in the environment. The resulting tools landscape is shown in the diagram below. This diagram takes the same methodology flow from Figure 1 (blue and yellow boxes) and stretches them out in a more linear fashion. Then, we embed the various tools (brown) and APIs (orange) in relation to that methodology:

Figure 2. The Normative Ontology Tools Landscape (click to expand)

This diagram is worth expanding to full size and studying in some detail. Aspects of this diagram that deserve more discussion are presented in the sections below.

OWL API as Center of Gravity

As noted in the preceding methodology installment, the working ontology is the central object being managed and extended for a given deployment. Because that ontology will evolve and grow over time, it is important the complete ontology specification itself be managed by some form of version control system (green) [16]. This is the one independent tool in the landscape. Access to and from the working ontology is mediated by the OWL API [13]. The API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines [17]. Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API. It is reasonable to expect more APIs to emerge over time that also interoperate [18]. The OWL API is the best current choice because of its native capabilities and because Jena does not yet support OWL 2 [11]. However, because of the basic design with structWSF (see next), it is also possible to swap out with different APIs at a later time should developments warrant. In short, having the API play the central management role in the system means that any and all tools can be designed to interact effectively with the working ontology(ies) without any loss in information due to roundtripping.

Web Services (structWSF) as Canonical Access Layer

The same rationale that governed our development of structWSF [19] applies here: to abstract basic services and functionality through a platform-independent Web services layer. This Web services layer has canonical (standard) ways to interact with other services and is generally RESTful in design to support distributed deployments. The design conforms to proper separation of view from logic and structure. Moreover, because of the design, changes can be made on either side of the layer in terms of user interface or functionality. Use of the structWSF layer also means that tools and functionality can be distributed anywhere on the Web. Specialized server-side functions can be supported as well as dedicated specialty hardware. Text indexing or disambiguation services can fit within this design. The ultimate value of piggybacking on the structWSF framework is that all other extant services also become available. Thus, a wealth of converters, data managers, and semantic components (or display widgets) can be invoked depending on the needs of the specific tool.

Simpler, Task-specific Tools

The objective, of course, of this design is to promote more and simpler tools useful to domain users. Some of these are shown under the Use & Maintain box in the diagram above; others are listed by category in the table below. The RESTful interface and parameter calls of the structWSF layer further simplify the ontology management and annotation abstractions arising from the OWL API. The number of simple tools available to users under this design is virtually limitless. These tools are also fast to develop and test.

Combining These New Thrusts and Moving Forward

This landscape is not yet a full reality. It is a vision of adaptive and simpler tools, working with a common API, and accessible via platform-independent Web services. It also preserves many of the existing tools and IDEs familiar to present ontology engineers. However, pieces of this landscape do presently exist and more are on the way. The next section briefly overviews some of the major application areas where these tools might contribute.

Individual Tools within the Landscape

If one inspects the earlier listing of 185 ontology tools it is clear that there is a diversity of tools both in terms of scope and function across the entire ontology development stack. It is also clear that nearly all of those 185 tools listed do not communicate with one another. That is a tremendous waste. Via shared APIs and some degree of consistent design it should be possible to migrate these capabilities into a more-or-less interoperating whole. We have thus tried to categorize some important tool types and exemplar tools from that listing to show the potential that exists. (Please note that the Example Tools are links to the tools and categories from the earlier 185 tools listing.) This correlation of types and example tools is not meant to be exhaustive nor a recommendation of specific tools. But, this tabulation is illustrative of the potential that exists to both simplify and extend tool support across the entire ontology development workflow:

Tool Type	Comments	Example Tools
OWL API	OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL	OWL API
Web Services Layer	This layer provides a common access layer and set of protocols for almost all tools. It depends critically on linkage and communication with the OWL API	structWSF
Ontology Editor (IDE)	There are a variety of options in this area. Generally, more complete environments (that is, IDEs) based on OWL and with links to the OWL API are preferred. Less complete editor options are listed under other categories. Note that only Protégé 4 incorporates the OWL API	NeOn toolkit, Protégé, TopBraid Composer
Scripts	In all pragmatic cases the migration of existing structure and vocabulary assets to an ontology framework requires some form of scripting. These may be off the shelf resources, but more often are specific to the use case at hand. Typical scripting languages include the standard ones (Perl, Python, PHP, Ruby, XSLT, etc.) and often involve some form of parsing or regex	variety; specific to use case
Converters	Converters are more-or-less pre-packaged scripts for migrating one serialization or data format to another one. As the scripts above continue to be developed, this roster of off-she-shelf starting points can increase. Today, there are perhaps close to 200 converters useful to ontology purposes	irON, ReDeFer, SKOS2GenTax; also see RDFizers
Vocabulary Prompter	Domain ontologies are ultimately about meaning, and for that purpose there is much need for definitions, synonyms, hyponyms, and related language assets. Vocabulary prompters take input documents or structures and help identify additional vocabulary useful for characterizing semantic meaning	see the TechWiki’s vocab prompting tools; ROC
Spreadsheet	Spreadsheets can be important initial development environments for users without explicit ontology engineering backgrounds. The biggest issue with spreadsheets is that what is specified in them is more general or simplistic compared to what is contained in an actual ontology. Attempts to have spreadsheets capture all of this sophistication are often less than satisfactory. One way to effective “round trip” with spreadsheets (and many related simple tools) is to adhere to an OWL API	Anzo, RDF123, irON (commON), Excel, Open Office
Editor (general)	Ontology editing spans from simple structures useful to non-ontologists to those (like the IDEs or toolkits) that capture all aspects of the ontology. Further, some of these editors are strictly textual or (literally) editors; others span or attempt to enable visual editing. Visual editing (see below) can ultimately extend to the ontology graph itself	see the TechWiki’s ontology editing tools
Alignment API	The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF	Alignment API
Mapper	A variety of tools, algorithms and techniques are available for matching or mapping concepts between two different ontologies. In general, no single method has shown itself individually superior. The better approaches use voting methods based on multiple comparisons	see the TechWiki’s ontology mapping tools
Ontology Browser	Ontology browsers enable the navigation or exploration of the ontology — generally in visual form — but without allowing explicit editing of the structure	Relation Browser, Ontology Browser, OwlSight, FlexViz
Vocabulary Manager	Vocabulary managers provide a central facility for viewing, selecting, accessing and managing all aspects of the vocabulary in an ontology (that is, to the level of all classes and properties). This tool category is poorly represented at present. Ultimately, vocabulary managers should also be one (if not the main) access point to vocabulary editing	PoolParty, TermWiki, UMBEL Web service
Vocabulary Editor	Vocabulary editors provide (generally simple) interfaces for the editing and updating of vocabulary terms, classes and properties in an ontology	Neologism, TemaTres, ThManager, Vocab Editor
Structure Editor	A structure editor is a specific form of an ontology editor, geared to the subsumption (taxonomic) organization of a largely hierarchical structure. Editors of this form tend to use tree controls or spreadsheets with indented organization to show parent and child relationships	PoolParty, irON (commON)
Graph Analysis	Ontologies form graph structures, which are amenable to many specific network and graph analysis algorithms, included relatedness, shortest path, grouped structures, communities and the like	SNAP, igraph, Network Workbench, NetworkX, Ontology Metrics
Graph API	Graph visualization with associated tools is best enabled by working from a common API. This allows for expansion and re-use of other capabilities. Preferably, this graph API would also have direct interaction with the OWL API, but none exist at the moment	under investigation
Graph Visualizer	Graph visualizers enable the ontology to be rendered in graph form and presentation, often with multiple layout options. The systems also enable export to PDF or graphics formats for display or printing. The better tools in this category can handle large graphs, can have their displays easily configured, and are performant	see the TechWiki’s ontology visualization tools
Visual Editor	An ontology visual editor enables the direct manipulation of the graph in a visual mode. This capability includes adding and moving nodes, changing linkages between nodes, and other ontology specification. Very few tools exist in this category at present	COE, TwoUse Toolkit
Coherence Tester	Testing for coherence involves whether the ontology structure is properly constructed and has logical interconnections. The testing either involves inference and logic testing (including entailments) based on the structure as provided; comparisons with already vetted logical structures and knowledge bases (e.g., Cyc, Wikipedia); or both	Cyc, OWLim, FactForge
Gap Tester	Related to coherence testing, gap testing is the identification of key missing pieces or intermediary nodes in the ontology graph. This tends to happen when external specification of the ontology is made without reference to connecting information	requires use of a reference external ontology; see above
Documenter	Ontology documentation is not limited to the technical specifications of the structure, but also includes best practices, how-to and use guides, and the like. Automated generation of structure documentation is also highly desirable	TechWiki, SpecGen, OWLDoc
Tagger	Once constructed, ontologies (and their accompanying named entity dictionaries) can be very powerful resources for aiding tagging and information extraction utilities. Like vocabulary prompting, there is a broad spectrum of potential tools and uses in the tagging category	GATE (OBIE); many other options
Exporter	Exports need to range from full-blown OWL representations to the simpler export of data and constructs. Multiple serialization options and the ability to support the input requirements of third-party tools is also important	OWL Syntax Converter, OWL Verbalizer; many various options

The beauty of this approach is that most of the tools listed are open source and potentially amenable to the minor modifications necessary to conform with this proposed landscape.

Key Gaps in the Landscape

Contrasting the normative tools landscape above with the existing listing of ontology tools points out some key gaps or areas deserving more development attention. Some of these are:

Vocabulary managers — easy inspection and editing environments for concepts and predicates are lacking. Though standard editors allow direct ontology language edits (OWL or RDFS), these are not presently navigable or editable by non-ontologists. Intuitive browsing structures with more “infobox”-like editing environments could be helpful here
Graph API — it would be wonderful to have a graph API (including analysis options) that could communicate with the OWL API. Failing that, it would be helpful to have a graph API that communicated well with RDF and ontology structures; extant options are few
Large-graph visualizer — while we have earlier reviewed large-scale graph visualization software, the alternatives are neither easy to set up nor use. Being able to readily select layout options with quick zooms and scaling options are important
Graphical editor — some browsers or editors (e.g, FlexViz) provide nice graph-based displays of ontologies and their properties and annotations. However, there appear to be few environments where the ontology graph can be directly edited or visually used for design or expansion.

Finally, it does appear that the effort and focus behind Protégé seems to be slowing somewhat. The future has clearly shifted to OWL 2 with Protégé 4. Yet, besides the admirable CO-ODE project (now ended), tools and plug-in support seems to have slowed. Many of the admirable plug-ins for Protégé 3x do not appear to be under active development as upgrades to Protégé 4. While Protégé’s future (and similar IDEs) seems assured, its prominence possibly will (and should) be replaced by a simpler kit of tools useful to users and practitioners.

Funding and Pending Project Priorities

For the past few months we at Structured Dynamics have seen ontology design and management as the pending technical priorities within the semantic technology space. Now that the market no longer looks at “ontology” as a four-letter word, it is imperative to simplify the development and use of ontologies. The first generation of tools leading up to this point have been helpful to understand the semantic space; changes are now necessary to expand it. In our first generation we have begun to understand the types and nature of needed tools. But our focus on IDEs and comprehensive toolsets belies a developer’s or technologist’s perspective. We need to now shift focus and look at tool needs from the standpoint of users and actual use of ontologies. Many players and many toolmakers and innovators will need to contribute to build this market for semantic technologies and approaches. Fortunately, replacing an IDE focus with one based around APIs and Web services should be a fairly smooth and natural transition. If we truly desire to be market makers, we need to stand back and place ourselves into the shoes of the domain practitioners, the subject matter experts. We need to shield actual users from all of the silly technical details and complexity. And, then, let’s focus — task-by-task — on discrete items of management and use of ontologies. Growth of the semantic technology space depends on expanding our practitioner base. For its part, Structured Dynamics is presently seeking new projects and sponsors with a commitment to these aims. Like our prior development of structWSF and semantic components, we will be looking to make simpler ontology tools a priority in the coming months. Please let me know if you want to partner with us toward this commitment.

[1] This posting is part of a current series on ontology development and tools, now permanently archived and updated on the OpenStructs TechWiki. The series began with an update of the prior Ontology Tools listing, which now contains 185 tools. It continued with a survey of ontology development methodologies. The last part presented a Lightweight, Domain Ontologies Development Methodology. This part is archived on the TechWiki as the Normative Landscape of Ontology Tools. The last installment in the series is planned to cover ontology best practices.

[2] The original version, now slightly modified, was first published in M. K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, Nov. 23, 2009.

[3] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. These distinctions have their grounding in description logics.

[4] See M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, November 23, 2009.

[5] Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray W. Fergerson and Mark A. Musen, 2001. “Creating Semantic Web Contents with Protégé-2000,” IEEE Intelligent Systems, vol. 16, no. 2, pp. 60-71, Mar/Apr. 2001. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.7177&rep=rep1&type=pdf.

[6] York Sure, Michael Erdmann, Juergen Angele, Steffen Staab, Rudi Studer and Dirk Wenke, 2002. “OntoEdit: Collaborative Ontology Development for the Semantic Web,” in Proceedings of the International Semantic Web Conference (ISWC) (2002). See http://www.aifb.uni-karlsruhe.de/WBS/Publ/2002/2002_iswc_ontoedit.pdf.

[7] Sean Bechhofer, Ian Horrocks, Carole Goble and Robert Stevens, 2001. “OilEd: a Reasonable Ontology Editor for the Semantic Web,” in Proceedings of KI2001, Joint German/Austrian conference on Artificial Intelligence.

[8] Aditya Kalyanpur and James Hendler, 2004. “Swoop: Design and Architecture of a Web Ontology Browser/Editor,” Scholarly Paper for Master’s Degree in Computer Science, University of Maryland, Fall 2004. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.1779&rep=rep1&type=pdf.

[9] Holger Knublauch was formerly the designer and developer of Protégé-OWL, the leading open-source ontology editor. TopBraid Composer leverages the experiences gained with Protégé and other tools into a professional ontology editor and knowledge-base framework. Composer is based on the Eclipse platform and uses Jena as its underlying API. See further http://www.topquadrant.com/composer/tbc-protege.html.

[10] Sean Bechhofer, Phillip Lord and Raphael Volz, 2003. “Cooking the Semantic Web with the OWL API,” in Proceedings of the 2nd International Semantic Web Conference, ISWC, Sanibel Island, Florida, October 2003. See http://homepages.cs.ncl.ac.uk/phillip.lord/download/publications/cooking03.pdf.

[11] Jena is fundamentally an RDF API. Jena’s ontology support is limited to ontology formalisms built on top of RDF. Specifically this means RDFS, the varieties of OWL, and the now-obsolete DAML+OIL. At the time of writing, no decision has yet been made about when Jena will support the new OWL 2 features. See http://jena.sourceforge.net/ontology/.

[12] The NeON Toolkitis built on OntoStudio. It is based on Eclipse with support for the OWL API. A series of its key plug-ins utilize various aspects of GATE (General Architecture for Text Engineering). The four-year project began in 2006 and its first open source toolkit was released by the end of 2007. OWL features were added in 2008-09. NeON has since completed, though its toolkit and plug-ins can still be downloaded as open source.

[13] OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL. Two recent papers describing the updated API are: Matthew Horridge and Sean Bechhofer, 2009. “The OWL API: A Java API for Working with OWL 2 Ontologies,” presented at OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009. See http://www.webont.org/owled/2009/papers/owled2009_submission_29.pdf; and, Matthew Horridge and Sean Bechhofer, 2010. “The OWL API: A Java API for OWL Ontologies,” paper submitted to the Semantic Web Journal; see http://www.semantic-web-journal.net/sites/default/files/swj107.pdf.

[14] These links show the status of TopBraid Composer and Ontotext’s OWLIM with regard to OWL 2 RL. A newer effort, based on Eclipse, with broader OWL API support is the MOST Project’s TwoUse Toolkit. In all likelihood, the number of other tools with OWL 2 support is larger than our informal survey has found. Importantly, Jena still has not upgraded to OWL 2, but its open source site suggests it may.

[15] The CO-ODE project aimed to build authoring tools and infrastructure to make ontology engineering easier. It specifically supported the development and use of OWL-DL ontologies, by being heavily involved in the creation of infrastructure and plugins for the Protégé platform and OWL 2 support for the OWL API.

[16] For a great discussion on the unique aspects of version control in ontologies, see T. Redmond, M. Smith, N. Drummond and T. Tudorache, 2008. “Managing Change: An Ontology Version Control System,” paper presented at OWLED 2008, Karslruhe, Germany. See http://bmir.stanford.edu/file_asset/index.php/1435/BMIR-2008-1366.pdf.

[17] Birte Glimm, Matthew Horridge, Bijan Parsia and Peter F. Patel-Schneider, 2009. “A Syntax for Rules in OWL 2,” presented at the Sixth OWLED Workshop, 23-24 October 2009. See http://www.webont.org/owled/2009/papers/owled2009_submission_16.pdf.

[18] The Alignment API is one of the more venerable ones in this environment. A couple of other examples include a SKOS API (Simon Jupp, Sean Bechhofer and Robert Stevens, 2009. ” A Flexible API and Editor for SKOS,” presented at the 6th Annual European Semantic Web Conference (ESWC2009); see http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-401/iswc2008pd_submission_88.pdf) and the Ontology Common API Tasks (Tomasz Adamusiak, K Joeri van der Velde, Niran Abeygunawardena, Despoina Antonakaki, Helen Parkinson and Morris A. Swertz, 2010. “OntoCAT — A Simpler Way to Access Ontology Resources,” presented at ISMB2010, 10 July 2010. See http://www.iscb.org/uploaded/css/58/17254.pdf. OntoCAT is an open source package developed to simplify the task of querying heterogeneous ontology resources. It supports NCBO BioPortal and EBI Ontology Lookup Service (OLS), as well as local OWL and OBO files).

[19] The structWSF Web services framework is generally RESTful middleware that provides a bridge between existing content and structure and content management systems and available indexing engines and RDF data stores. structWSF is a platform-independent means for distributed collaboration via an innovative dataset access paradigm. It has about twenty embedded Web services. See http://openstructs.org/structwsf.

Posted:September 1, 2010

A New Methodology for Building Lightweight, Domain Ontologies

Bringing Ontology Development and Maintenance to the Mainstream

Download PDF

Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Ontologies provide a similar role for the organization of data that is provided by relational data schema. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data [1]. There are many ways to categorize ontologies. One dimension is between upper level and mid- and lower- (or domain-) level. Another is between reference or subject (domain) ontologies. Upper-level ontologies [2] tend to be encompassing, abstract and inclusive ways to split or organize all “things”. Reference ontologies tend to be cross-cutting such as ones that describe people and their interests (e.g., FOAF), reference subject concepts (e.g., UMBEL), bibliographies and citations (e.g., BIBO), projects (e.g., DOAP), simple knowledge structures (e.g., SKOS), social networks and activities (e.g., SIOC), and so forth. The focus here is on domain ontologies, which are descriptions of particular subject or domain areas. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies. According to Corcho et al. [3] “a domain ontology can be extracted from special purpose encyclopedias, dictionaries, nomenclatures, taxonomies, handbooks, scientific special languages (say, chemical formulas), specialized KBs, and from experts.” Another way of stating this is to say that a domain ontology — properly constructed — should also be a faithful representation of the language and relationships for those who interact with that domain. The form of the interaction can range from work to play to intellectual understanding or knowledge.

“… ontology engineering research should strive for a unified, lightweight and component-based methodological framework, principally targeted at domain experts ….”

Simperl et al. [4]

Another focus here is on lightweight ontologies. These are typically defined as more hierarchical or classificatory in nature. Like their better-known cousins of taxonomies, but with greater connectedness, lightweight ontologies are often designed to represent subsumption or other relationships between concepts. They have not too many or not too complicated predicates (relationships). As relationships are added and the complexities of the world get further captured, ontologies migrate from the lightweight to the “heavyweight” end of the spectrum. The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. For reasons as stated below, we prefer not to use the term ontology engineering, since it tends to convey a priesthood or specialized expertise in order to define or use them. As indicated, we see ontologies as being (largely) developed and maintained by the users or practitioners within a given domain. The tools and methodologies to be employed need to be geared to these same democratic (small “d”) objectives.

A Review of Prior Methodologies

For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have diminished somewhat in recent years. Yet the research as separately discussed in Ontology Development Methodologies [1] seems to indicate this state of methodology development in the field:

Very few uniquely different methods exist, and those that do are relatively older in nature
The methods tend to either cluster into incremental, iterative ones or those more oriented to comprehensive approaches
There is a general logical sharing of steps across most methodologies from assessment to deployment and testing and refinement
Actual specifics and flowcharts are quite limited; with the exception of the UML-based systems, most appear not to meet enterprise standards
The supporting toolsets are not discussed much, and most of the examples if at all are based solely on a single or governing tool. Tool integration and interoperability is almost non-existent in terms of the narratives, and
Development methodologies do not appear to be an active area of recent research.

While there is by no means unanimity in this community, some general consenses can be seen from these prior reviews, especially those that concentrate on practical or enterprise ontologies. In terms of design objectives, this general consensus suggests that ontologies should be [4]:

Collaborative
Lightweight
Domain-oriented (subject matter and expertise)
Integrated, and
Incremental.

While laudable, and which represent design objectives to which we adhere, current ontology development methods do not meet these criteria. Furthermore, to be discussed in our next installment, there is also an inadequate slate of tools ready to support these objectives.

A Call for a New Methodology

If you ask most knowledgeable enterprise IT executives what they understand ontologies to mean and how they are to be built, you would likely hear that ontologies are expensive, complicated and difficult to build. Reactions such as these (and not trying to set up strawmen) are a reflection of both the lack of methods to achieve the consensual objectives above and the lack of tools to do so. The use of ontology design patterns is one helpful approach [5]. Such patterns help indicate best design practice for particular use cases and relationship patterns. However, while such patterns should be part of a general methodology, they do not themselves constitute a methodology. Also, as Structured Dynamics has argued for some time, the future of the semantic enterprise resides in ontology-driven apps [6]. Yet, for that vision to be realized, clearly both methods and tools to build ontologies must improve. In part this series is a reflection of our commitment to plug these gaps. What we see at present for ontology development is a highly technical, overly engineered environment. Methodologies are only sparsely or generally documented. They are not lightweight nor collaborative nor really incremental. While many tools exist, they do not interoperate and are pitched mostly at the professional ontologist, not the domain user. In order to achieve the vision of ontology-driven apps the methods to develop the fulcrum of that vision — namely, the ontologies themselves — need much additional attention. An adaptive methodology for ontology development is well past due.

Design Criteria for an Adaptive Methodology

We can thus combine the results of prior surveys and recommendations with our own unique approach to adaptive ontologies in order to derive design criteria. We believe this adaptive approach should be:

Lightweight and domain-oriented
Contextual
Coherent
Incremental
Re-use structure
Separate the ABox and TBox (separate work), and
Simpler, with interoperable tools designs.

We discuss each of these design criteria below. While we agree with the advisability of collaboration as a design condition — and therefore also believe that tools to support this methodology must also accommodate group involvement — collaboration per se is not a design requirement. It is an implementation best practice. Effective ontology development is as much as anything a matter of mindset. This mindset is grounded in leveraging what already exists, “paying as one benefits” through an incremental approach, and starting simple and adding complexity as understanding and experience are gained. Inherently this approach requires domain users to be the driving force in ongoing development with appropriate tools to support that emphasis. Ontologists and ontology engineering are important backstops, but not in the lead design or development roles. The net result of this mindset is to develop pragmatic ontologies that are understood — and used by — actual domain practitioners.

Lightweight and Domain-oriented

By definition the methodology should be lightweight and oriented to particular domains. Ontologies built for the pragmatic purposes of setting context and aiding interoperability tend to be lightweight with only a few predicates, such as isAbout, narrowerThan or broaderThan. But, if done properly, these lighter weight ontologies can be surprisingly powerful in discovering connections and relationships. Moreover, they are a logical and doable intermediate step on the path to more demanding semantic analysis.

Contextual

Context simply means there is a reference structure for guiding the assignment of what content ‘is about’ [7]. An ontology with proper context has a balanced and complete scope of the domain at hand. It generally uses fairly simple predicates; Structured Dynamics tends to use the UMBEL vocabulary for its predicates and class definitions, and to link to existing UMBEL concepts to help ensure interoperability [8]. A good gauge for whether the context is adequate is whether there are sufficient concept definitions to disambiguate common concepts in the domain.

Coherent

The essence of coherence is that it is a state of consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. With relation to a content graph, this means that the right connections (edges or predicates) have been drawn between the object nodes (or content) in the graph [9]. Relating content coherently itself demands a coherent framework. At the upper reference layer this begins with UMBEL, which itself is an extraction from the vetted and coherent Cyc common sense knowledge base. However, as domain specifics get added, these details, too, must be testable against a unified framework. Logic and coherence testing are thus an essential part of the ontology development methodology.

Incremental

Much value can be realized by starting small, being simple, and emphasizing the pragmatic. It is OK to make those connections that are doable and defensible today, while delaying until later the full scope of semantic complexities associated with complete data alignment. An open world approach [10] provides the logical basis for incremental growth and adoption of ontologies. This is also in keeping with the continuous and incremental deployment model that Structured Dynamics has adopted from MIKE2.0 [11]. When this model is applied to the process of ontology development, the basic implementation increments appear as follows:

Figure 1. A Phased, Incremental Approach to Ontology Development (click to expand)

The first two phases are devoted to scoping and prototyping. Then, the remaining phases of creating a working ontology, testing it, maintaining it, and then revising and extending it are repeated over multiple increments. In this manner the deployment proceeds incrementally and only as learning occurs. Importantly, too, this approach also means that complexity, sophistication and scope only grows consistent with demonstrable benefits.

Re-use of Structure

Fundamental to the whole concept of coherence is the fact that domain experts and practitioners have been looking at the questions of relationships, structure, language and meaning for decades. Though perhaps today we now finally have a broad useful data and logic model in RDF, the fact remains that massive time and effort has already been expended to codify some of these understandings in various ways and at various levels of completeness and scope. These are prior investments in structure that would be silly to ignore. Yet, today, most methodologies do ignore these resources. This ignorance of prior investments in information relationships is perplexing. Though unquestioned adoption of legacy structure is inappropriate to modern interoperable systems, that fact is no excuse for re-inventing prior effort and discoveries, many of which are the result of laborious consensus building or negotiations. The most productive methodologies for modern ontology building are therefore those that re-use and reconcile prior investments in structural knowledge, not ignore them. These existing assets take the form of already proven external ontologies and internal and industry structures and vocabularies.

Separation of the ABox and TBox

Nearly a year ago we undertook a major series on description logics [12], a key underpinning to Structured Dynamics’ conceptual and logic foundation to its ontology development. While we can not always adhere to strict and conforming description logics designs, our four-part series helped provide guidance for the separation of concerns and work that can also lead to more effective ontology designs [13]. Conscious separation of the so-called ABox (assertions or instance records) and TBox (conceptual structure) in ontology design provides some compelling benefits:

Easier ingest and incorporation of external instance data, including conversion from multiple formats and serializations
Faster and more efficient inferencing and analysis and use of the conceptual structure (TBox)
Easier federation and incorporation of distributed data stores (instance records), and
Better segregation of specialized work to the ABox, TBox and specialty work modules, as this figure shows [14]:

Figure 2. Separation of the TBox and ABox [14]

Maintaining identity relations and disambiguation as separate components also has the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available. A low-fidelity service, for example, could be applied for quick or free uses, with more rigorous methods reserved for paid or batch mode analysis. Similarly, maintaining full-text search as a separate component means that work can be done by optimized search engines with built-in faceting.

Simple, Interoperable Tools Support

An essential design criteria is to have a methodology and work flow that explicitly accounts for simple and interoperable tools. By “simple” we mean targeted, task-specific tools and functionality that is also geared to domain users and practitioners. Of all design areas, this one is perhaps the weakest in terms of current offerings. The next installment in this series [1] will address this topic directly.

The New Methodology

Armed with these criteria, we are now ready to present the new methodology. In summary terms, we can describe the steps in the methodology as:

Scope, analyze, then leverage existing assets
Prototype structure
Pivot on the working ontology
Test
Use and maintain
Extend working ontology and repeat.

Two Parallel Tracks

After the scoping and analysis phase, the effort is split into two tracks:

Instances, and their descriptive characteristics, and
Conceptual relationships, or ontologies.

This split conforms to the separation of ABox and TBox noted above [15]. There are conceptual and workflow parallels between entities and data v. ontologies. However, the specific methodologies differ, and we only focus on the conceptual ontology side in the discussion below, shown as the upper part (blue) of Figure 3:

Figure 3. Flowchart of Ontology Development Methodology [16] (click to expand)

Two key aspects of the initial effort are to properly scope the size and purpose of the starting prototype and to inventory the existing assets (structure and data; internal and external) available to the project.

Re-Use Structure

Most current ontology methodologies do not emphasize re-use of existing structure. Yet these resources are rich in content and meaning, and often represent years to decades of effort and expenditure in creation, assembly and consensus. Just a short list of these potential sources demonstrates the treasure trove of structure and vocabularies available for re-use: Web portals; databases; legacy schema; metadata; taxonomies; controlled vocabularies; ontologies; master data catalogs; industry standards; exchange formats, etc. Metadata and available structure may have value no matter where or how it exists, and a fundamental aspect of the build methodology is to bring such candidate structure into a common tools environment for inspection and testing. Besides assembling and reviewing existing sources, those selected for re-use must be migrated and converted to proper ontological form (OWL in the case of those developed by Structured Dynamics). Some of these techniques have been demonstrated for prior patterns and schema [17]; in other instances various converters, RDFizers or scripts may need to be employed to effect the migration. Many tools and options exist at this stage, even though as a formal step this conversion is often neglected.

Prototype Structure

The prototype structure is the first operating instance of the ontology. The creation of this initial structure follows quite closely the approach recommended in Ontology Development 101 [18], with some modifications to reflect current terminology:

Determine the domain and scope of the ontology
Consider reusing existing ontologies
Enumerate important terms in the ontology
Define the classes and the class hierarchy
Define the properties of classes
Create instances

The prototype structure is important since it communicates to the project sponsors the scope and basic operation of the starting structure. This stage often represents a decision point for proceeding; it may also trigger the next budgeting phase.

Link Reference Ontologies

An essential aspect of a build methodology is to re-use “standard” ontologies as much as possible. Core ontologies are Dublin Core, DC Terms, Event, FOAF, GeoNames, SKOS, Timeline, and UMBEL. These core ontologies have been chosen because of universality, quality, community support and other factors [19]. Though less universal, there are also a number of secondary ontologies, namely BIBO, DOAP, and SIOC that may fit within the current scope. These are then supplemented with quality domain-specific ontologies, if such exist. Only then are new name spaces assigned for any newly generated ontology(ies).

Working Ontology

The working ontology is the first production-grade (deployable) version of the ontology. It conforms to all of the ontology building best practices and needs to be complete enough such that it can be loaded and managed in a fully conforming ontology editor or IDE [20]. By also using the OWL API, this working structure can also be the source for specialty tools and user maintenance functions, short of requiring a full-blown OWL editor. Many of these aspects are some of the poorest represented in the current tools inventory; we return to this topic in the next installment. The working ontology is the complete, canonical form of the domain ontology(ies) [21]. These are the central structures that are the focus for ongoing maintenance and extension efforts over the ensuing phases. As such, the ontologies need to be managed by a version control system with comprehensive ontology and vocabulary management support and tools.

Testing and Mapping

As new ontologies are generated, they should be tested for coherence against various reasoning, inference and other natural language processing tools. Gap testing is also used to discover key holes or missing links within the resulting ontology graph structure. Coherence testing may result in discovering missing or incorrect axioms. Gap testing helps identify internal graph nodes needed to establish the integrity or connectivity of the concept graph. Though used for different purposes, mapping and alignment tools may also work to identify logical and other inconsistencies in definitions or labels within the graph structure. Mapping and alignment is also important in its own right in order to establish the links that help promote ontology and information interoperability. External knowledge bases can also play essential roles in testing and mapping. Two prominent knowledge base examples are Cyc and Wikipedia, but many additional exist for any specific domain.

Use and Maintenance

Of course, the whole purpose of the development methodology is to create practical, working ontologies. Such uses include search, discovery, information federation, data interoperability, analysis and reasoning, The general purposes to which ontologies may be put are described in the Executive Intro to Ontologies [22]. However, it is also in day-to-day use of the ontology that many enhancements and improvements may be discovered. Examples include improved definitions of concepts; expansions of synonyms, aliases and jargon for concepts; better, more intuitive preferred labels; better means to disambiguate between competing meanings; missing connections or excessive connections; and splitting or consolidating of the underlying structure. Today, such maintenance enhancements are most often not pursued because existing tools do not support such actions. Reliance on IDEs and tools geared to ontology engineering are not well suited to users and practitioners being able to note or effect such changes. Yet ongoing ontology use and adaptation clearly suggest that users should be encouraged to do so. They are the ones in the front lines of identifying and potentially recording such improvements.

Extend

Ontology development is a process, not a static destination or event. This observation makes intuitive sense since we understand ontologies to be a means to capture our understanding of our domains, which is itself constantly changing due to new observations and insights. This factor alone suggests that ontology development methodologies must therefore give explicit attention to extension. But there is another reason for this attention. Incremental, adaptive ontologies are also explicitly designed to expand their scope and coverage, bite by bite as benefits prove themselves and justify that expansion. A start small and expand strategy is of course lower risk and more affordable. But, for it to be effective, it also must be designed explicitly for extension and expansion. Ontology growth thus occurs both from learning and discovery and from expanding scope. Versioning, version control and documentation (see below) thus assume more central importance than a more static view would suggest. The use of feedbacks and the continuous improvement design based on MIKE2.0 are therefore also central tenets of our ontology development methodology.

Documentation

This perspective of the ontology as a way to capture the structure and relationships of a domain — which is also constantly changing and growing — carries over to the need to document the institutional memory and use of it. Both better tools — such as vocabulary management and versioning — and better work processes need to be instituted to properly capture and record use and applications of ontologies. Some of these aspects are now handled with utilities such as OWLdoc or the TechWiki that Structured Dynamics has innovated to capture ontology knowledge bases on an ongoing basis. But these are still rudimentary steps that need to be enforced with management commitment and oversight. One need merely begin to probe the ontology development literature to observe how sparse the pickings are. Very little information on methodologies, best practices, use cases, recipes, how to manuals, conversion and use steps and other documentation really exists at present. It is unfortunately the case that documentation even lags the inadequate state of tools development in the ontology space.

Content Processing

Once formalized, these constructs — the structured ontologies or the named entity dictionaries as shown in Figure 3 — are then used for processing input content. That processing can range from conversion to direct information extraction. Once extracted, the structure may be injected (via RDFa or other means) back into raw Web pages. The concepts and entities that occur within these structures help inform various tagging systems [23]. The information can also be converted and exported in various forms for direct use or for incorporation in third-party systems. Visualization systems and specialized widgets (see next) can be driven by the structure and results sets obtained from querying the ontology structure and retrieving its related instance data. While these purposes are somewhat beyond the direct needs of the ontology development methodology, the ontology structures themselves must be designed to support these functions.

Semantic Component Ontology

In our methodology we also provide for administrative ontologies whose purpose is to relate structural understandings of the underlying data and data types with applicable end-use and visualization tools (“widgets”). Thus the structural knowledge of the domain gets combined with an understanding of data types and what kinds of visualization or presentation widgets might be invoked. The phrase ontology-driven apps results from this design. Amongst other utility ontologies, Structured Dynamics names its major tool-driver ontology the SCO (Semantic Component Ontology). The SCO works in intimate tandem with the domain ontologies, but is constructed and designed with quite different purposes. A description of the build methodology for the SCO (or its other complementary utility ontologies) is beyond the scope of this current document.

Tooling and Best Practices

As sprinkled throughout the above commentary, this methodology is also intimately related to tools and best practices. The next chapter in this series is devoted to and will be archived on the TechWiki as the lightweight domain ontology methodology. Best practices will be handled in a similar way for the chapter after that one and in its ontology best practices document on the TechWiki.

Time for a Leap Forward in Methodology

Earlier reviews and the information in this document suggest a real need for ontology building methodologies that are integrated, easier to use, interoperate with a richer tools set and are geared to practitioners versus priests. The good news is that there are architectures and building blocks to achieve this vision. The bad news is that the first steps on this path are only now beginning. The next two installments in this series add further detail for why it is time — and how — we can make a leap forward in methodology. Those critical remaining pieces are in tools and best practices.

[1] This posting is part of a current series on ontology development and tools. The series began with an update of my prior Ontology Tools listing, which now contains 185 tools. It continued with a survey of ontology development methodologies. The next part in this series will address a new architecture for tooling development. The last installment in the series is planned to cover ontology best practices. This same posting is permanently archived and updated on the OpenStructs TechWiki as Lightweight, Domain Ontologies Development Methodology.

[2] Examples of upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc and BFO (Basic Formal Ontology). Most of the content in their upper-levels is akin to broad, abstract relations or concepts (similar to the primary classes, for example, in a Roget’s Thesaurus — that is, real ontos stuff) than to “generic common knowledge.” Most all of them have both a hierarchical and networked structure, though their actual subject structure relating to concrete things is generally pretty weak. For a more detailed treatment of ontology classifications, see M. K. Bergman, 2007. “An Intrepid Guide to Ontologies,” AI3:::Adaptive Information blog, May 16, 2007.

[3] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in Data & Knowledge Engineering 46, 2003. See http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf.

[4] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. “Ontology Engineering: A Reality Check,” in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE 2006, 2006. See http://ontocom.ag-nbi.de/docs/odbase2006.pdf.

[5] OntologyDesignPatterns.org is a semantic Web portal dedicated to ontology design patterns (ODPs). The portal was started under the NeOn project, which still partly supports its development.

[6] See M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, November 23, 2009.

[7] See M.K. Bergman, 2008. “The Semantics of Context,” AI3:::Adaptive Information blog, May 6, 2008.

[8] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.

[9] See M.K. Bergman, 2008. “When is Content Coherent?,” AI3:::Adaptive Information blog, July 25, 2008.

[10] See M.K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009.

[11] MIKE2.0 (Method for Integrated Knowledge Environments) is an open source information development methodology championed by Bearing Point and Deloitte. Structured Dynamics has adopted the approach and has helped formulate MIKE2.0’s semantic enterprise offering. For a general intro to the approach, see further M.K. Bergman, 2010. “MIKE2.0: Open Source Information Development in the Enterprise,” AI3:::Adaptive Information blog, February 23, 2010.

[12] This is our working definition for description logics:

“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”

[13] See the four-part description logics series from M. K. Bergman, 2009. “Making Linked Data Reasonable using Description Logics, Part 1,” AI3:::Adaptive Information blog, Feb. 11, 2009; “Making Linked Data Reasonable using Description Logics, Part 2,” AI3:::Adaptive Information blog, Feb. 15, 2009; “Making Linked Data Reasonable using Description Logics, Part 3,” AI3:::Adaptive Information blog, Feb. 18, 2009; and “Making Linked Data Reasonable using Description Logics, Part 4,” AI3:::Adaptive Information blog, Feb. 23, 2009.

[14] See Part 2 in [13].

[15] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure.

[16] The original version, now slightly modified, was first published in M. K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, Nov. 23, 2009.

[17] As some examples, see for instance: SKOS: Mark van Assem, Veronique Malais, Alistair Miles and Guus Schreiber, 2006. “A Method to Convert Thesauri to SKOS,” in The Semantic Web: Research and Applications (2006), pp. 95-109. See http://www.cs.vu.nl/~mark/papers/Assem06b.pdf for paper, also http://thesauri.cs.vu.nl/eswc06/ and http://thesauri.cs.vu.nl/; taxonomies: Fausto Giunchiglia, Maurizio Marchese and Ilya Zaihrayeu, 2006. “Encoding Classifications into Lightweight Ontologies,” presented at Proceedings of the 3rd European Semantic Web Conference (ESWC 2006), Budva. See http://www.science.unitn.it/~marchese/pdf/encoding%20classifications%20into%20lightweight%20ontologies_JoDS8.pdf; metadata: Mikael Nilsson, 2007. See http://mikaelnilsson.blogspot.com/2007/11/semanticizing-metadata-specifications.html; relational schema: see the W3C workgroup on RDB2RDF; and, of course, there are many others.

[18] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.

[19] The various criteria that are considered in nominating an existing ontology to “core” status is that it should be general; highly used; universal; broad committee or community support; well done and documented; and easily understood.

[20] Example and comprehensive ontology editing toolkits or IDEs (integrated development environments) include NeOn toolkit, Protégé, and TopBraid Composer. A complement to these larger toolkits is the OWL API, which when used can also provide a canonical management framework for specific ontology tools and tasks. This topic is covered more in the next installment regarding the tools landscape.

[21] Good ontology design, especially for larger projects, does require a degree of modularity. An architecture of multiple ontologies often work together to isolate different work tasks so as to aid better ontology management. Ontology architecture and modularization is a separate topic in its own right.

[22] Originally published as M.K. Bergman, 2010. “An Executive Intro to Ontologies,” AI3:::Adaptive Information blog, August 9, 2010. This popular document has now been permanently archived on the the OpenStructs TechWiki as Intro to Ontologies.

[23] Another reason for the clear distinction between ABox and TBox is their use to aid one another in disambiguation. Structured Dynamics’ scones approach (subject concepts or named entities) is designed expressly for this purpose. It is also possible to integrate these approaches with third-party tools (e.g., Calais, Expert System (Cogito), etc.) to improve unstructured content characterization. Via this approach we now can assess concept matches in addition to entity matches. This means we can triangulate between the two assessments to aid disambiguation. Because of logical segmentation, we have increased the informational power of our concept graph.

Posted:August 30, 2010

A Brief Survey of Ontology Development Methodologies

The Recent Pace of Ontology Development Appears to Have Waned

Download PDF

The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. This paper summarizes key papers and links to this topic [18].

For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have actually diminished somewhat in recent years.

The main thrust of the papers listed herein is on domain ontologies, which model particular domains or topic areas. (As opposed to reference, upper or theoretical ontologies, which are more general or encompassing.) Also, little commentary is offered on any of the individual methodologies; please see the referenced papers for more details.

General Surveys

One of the first comprehensive surveys was done by Jones et al. in 1998 [1]. This study began to elucidate common stages and noted there are typically separate stages to produce first an informal description of the ontology and then its formal embodiment in an ontology language. The existence of these two descriptions is an important characteristic of many ontologies, with the informal description often carrying through to the formal description.

The next major survey was done by Corcho et al. in 2003 [2]. This built on the earlier Jones survey and added more recent methods. The survey also characterized the methods by tools and tool readiness.

More recently the work of Simperl and her colleagues has focused on empirical results of ontology costing and related topics. This series has been the richest source of methodology insight in recent years [3, 4, 5, 6]. More on this work is described below.

Though not a survey of methods, one of the more attainable descriptions of ontology building is Noy and McGuinness’ well-known Ontology Development 101 [7]. Also really helpful are Alan Rector’s various lecture slides on ontology building [8].

However, one general observation is that the pace of new methodology development seems to have waned in the past five years or so. This does not appear to be the result of an accepted methodology having emerged.

Some Specific Methodologies

Some of the leading methodologies, presented in rough order from the oldest to newest, are as follows:

Cyc – this oldest of knowledge bases and ontologies has been mapped to many separate ontologies. See the separate document on the Cyc mapping methodology for an overview of this approach [9]
TOVE (Toronto Virtual Enterprise) – a first-order logic approach to representing activities, states, time, resources, and cost in an enterprise integration architecture [10]
IDEF5 (Integrated Definition for Ontology Description Capture Method) – is part of a broader set of methodologies developed by Knowledge Based Systems, Inc. [11]
ONIONS (ONtologic Integration Of Naive Sources) – a set of methods especially geared to integrating multiple information sources [12], with a particular emphasis on domain ontologies
COINS (COntext INterchange System) – a long-running series of efforts from MIT’s Sloan School of Management [13]
METHONTOLOGY – one of the better known ontology building methodologies; however, not many known uses [14]
OTK (On-To-Knowledge) was a methodology that came from the major EU effort at the beginning of last decade; it is a common sense approach reflected in many ways in other methodologies [15]
UPON (United Process for ONtologies) – is a UML-based approach that is based on use cases, and is incremental and iterative [16].

Please note that many individual projects also describe their specific methodologies; these are purposefully not included. In addition, Ensan and Du look at some specific ontology frameworks (e.g., PROMPT, OntoLearn, etc.) from a domain-specific perspective [17].

Some Flowcharts

Here is the general methodology as presented in the various Simperl et al. papers [c.f., Fig. 1 in 3]:

The Corcho et al. survey also presented a general view of the tools plus framework necessary for a complete ontology engineering environment [Fig. 4 from 2]:

There are more examples that show ontology development workflows. Here is one again from the Simperl et al. efforts [Fig. 2 in 5]:

However, what is most striking about the review of the literature is the paucity of methodology figures and the generality of those that do exist. From this basis, it is unclear what the degree of use is for real, actionable methods.

Best Practices Observations

The Simperl and Tempich paper [3], besides being a rich source of references, also provides some recommended best practices based on their comparative survey. These are:

General Recommendations

Enforce dissemination, e.g.. publish more best practices
Define selection criteria for methodologies
Define a unified methodology following a method engineering approach
Support decision for the appropriate formality level given a specific use case

Process Recommendations

Define selection criteria for different knowledge acquisition (KA) techniques
Introduce process description for the application of different KA techniques
Improve documentation of existing ontologies
Improve ontology location facilities
Build robust translators between formalisms
Build modular ontologies
Define metrics for ontology evaluation
Offer user oriented process descriptions for ontology evaluation

Organizational Recommendations

Provide ontology engineering activity descriptions using domain-specific terminology
Improve consensus making process support

Technological Recommendations

Provide tools to extract ontologies from structured data sources
Build lightweight ontology engineering environments
Improve the quality of tools for domain analysis, ontology evaluation, documentation
Include methodological support in ontology editors
Build tools supporting collaborative ontology engineering.

Summary of Observations

This review has not set out to characterize specific methodologies, nor their strengths and weaknesses. Yet the research seems to indicate this state of methodology development in the field:

Very few discrete methods exist, and those that do are relatively older in nature
The methods tend to either cluster into incremental, iterative ones or those more oriented to more comprehensive approaches
There is a general logical sharing of steps across most methodologies from assessment to deployment and testing and refinement
Actual specifics and flowcharts are quite limited; with the exception of the UML-based systems, most appear not to meet enterprise standards
The supporting toolsets are not discussed much, and most of the examples are based solely on a governing tool. Tool integration and interoperability is almost non-existent in terms of the narratives
This does not appear to be a very active area of current research.

[1] D.M. Jones, T.J.M. Bench-Caponand, P.R.S. Visser, 1998.“Methodologies for Ontology Development,” in Proceedings of the IT and KNOWS Conference of the 15th FIP World Computer Congress, 1998. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.2437&rep=rep1&type=pdf.

[2] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in Data & Knowledge Engineering 46, 2003. See http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf.

[3] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. Ontology Engineering: A Reality Check, in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE2006, 2006. See http://citeseerx.ist.psu.edu/icons/pdf.gif;jsessionid=DE3414C0282C76F0EA787A06039941D2.

[4] Elena Paslaru Bontas Simperl, Christoph Tempich, and York Sure, 2006. “ONTOCOM: A Cost Estimation Model for Ontology Engineering,” presented at ISWC 2006; see http://ontocom.ag-nbi.de/docs/iswc2006.pdf.

[5] Elena Simperl, Christoph Tempich and Denny Vrandečić, 2008. “A Methodology for Ontology Learning,” in Frontiers in Artificial Intelligence and Applications 167 from the Proceedings of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 225-249, 2008. See http://wtlab.um.ac.ir/parameters/wtlab/filemanager/resources/Ontology%20Learning/ONTOLOGY%20LEARNING%20AND%20POPULATION%20BRIDGING% 20THE%20GAP%20BETWEEN%20TEXT%20AND%20KNOWLEDGE.pdf#page=241.

[6] Elena Simperl, Malgorzata Mochol and Tobias Burger, 2010. “Achieving Maturity: the State of Practice in Ontology Engineering in 2009,” in International Journal of Computer Science and Applications, 7(1), pp. 45 – 65, 2010. See http://www.tmrfindia.org/ijcsa/v7i13.pdf.

[7] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.

[8] See http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.pdf; http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.ppt; or http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Ontology-bulding-2005-Lect-5.ppt.

[9] Stephen L. Reed and Douglas B. Lenat, 2002. Mapping Ontologies into Cyc, paper presented at AAAI 2002 Conference Workshop on Ontologies For The Semantic Web, Edmonton, Canada, July 2002. See http://www.cyc.com/doc/white_papers/mapping-ontologies-into-cyc_v31.pdf . Also, as presented by Doug Foxvog, Ontology Mapping with Cyc, at WMSO, June 14, 2004; see www.wsmo.org/wsml/papers/presentations/Ontology%20Mapping%20at%20Cycorp.ppt. Also, see Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock, 2007. “Autonomous Classification of Knowledge into an Ontology,” in The 20th International FLAIRS Conference (FLAIRS), Key West, Florida, May 2007. See http://www.cyc.com/doc/white_papers/FLAIRS07-AutoClassificationIntoAnOntology.pdf.

[10] M. Gruninger and M.S. Fox, 1994. “The Design and Evaluation of Ontologies for Enterprise Engineering”, Workshop on Implemented Ontologies, European Conference on Artificial Intelligence 1994, Amsterdam, NL. See http://stl.mie.utoronto.ca/publications/gruninger-onto-ecai94.pdf.

[11] KBSI, 1994. “The IDEF5 Ontology Description Capture Method Overview”, Knowledge Based Systems, Inc. (KBSI) Report, Texas. The report describes the stages of: 1) organizing and scoping; 2) data collection; 3) data analysis; 4) initial ontology development; and 5) ontology refinement and validation. See http://en.wikipedia.org/wiki/IDEF5.

[12] A. Gangemi, G. Steve and F. Giacomelli, 1996. “ONIONS: An Ontological Methodology for Taxonomic Knowledge Integration”, ECAI-96 Workshop on Ontological Engineering, Budapest, August 13th. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3972&rep=rep1&type=pdf.

[13] The COINS approach was developed by Madnick et al. over the past two decades or so at the MIT Sloan School of Management. See further http://web.mit.edu/smadnick/www/wp/CISL-Sloan%20WP%20spreadsheet.htm for a listing of papers from this program; some are use cases, and some are architecture-related. For the most detailed treatment, see Aykut Firat, 2003. Information Integration Using Contextual Knowledge and Ontology Merging, Ph.D. Thesis for the Sloan School of Management, MIT, 151 pp. See http://www.mit.edu/~bgrosof/paps/phd-thesis-aykut-firat.pdf.

[14] M. Fernandez, A. Gomez-Perez and N. Juristo, 1997. “METHONTOLOGY: From Ontological Art Towards Ontological Engineering”, AAAI-97 Spring Symposium on Ontological Engineering, Stanford University, March 24-26th, 1997.

[15] York Sure, Christoph Tempich and Denny Vrandecic , 2006. “Ontology Engineering Methodologies,” in Semantic Web Technologies: Trends and Research in Ontology-based Systems, pp. 171-187, Wiley. The general phases of the approach are: 1) feasibility study; 2) kickoff; 3) refinement; 4) evaluation; and 5) application and evolution.

[16] A. De Nicola, M. Missikoff, R. Navigli, 2009. “A Software Engineering Approach to Ontology Building”. Information Systems, 34(2), Elsevier, 2009, pp. 258-275.

[17] Faezeh Ensan and Weichang Du, 2007. Towards Domain-Centric Ontology Development and Maintenance Frameworks; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.8915&rep=rep1&type=pdf.

[18] This document is permanently archived on the OpenStructs TechWiki. This document is part of a current series on ontology development and tools to be completed over the coming weeks.

Posted:August 23, 2010

Listing of 185 Ontology Building Tools

Earlier Listing is Expanded by More than 30%

At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing [1].

All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.

Comprehensive Ontology Tools

Altova SemanticWorks is a visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design. No open source version available
Amine is a rather comprehensive, open source platform for the development of intelligent and multi-agent systems written in Java. As one of its components, it has an ontology GUI with text- and tree-based editing modes, with some graph visualization
The Apelon DTS (Distributed Terminology System) is an integrated set of open source components that provides comprehensive terminology services in distributed application environments. DTS supports national and international data standards, which are a necessary foundation for comparable and interoperable health information, as well as local vocabularies. Typical applications for DTS include clinical data entry, administrative review, problem-list and code-set management, guideline creation, decision support and information retrieval.. Though not strictly an ontology management system, Apelon DTS has plug-ins that provide visualization of concept graphs and related functionality that make it close to a complete solution
DOME is a programmable XML editor which is being used in a knowledge extraction role to transform Web pages into RDF, and available as Eclipse plug-ins. DOME stands for DERI Ontology Management Environment
FlexViz is a Flex-based, Protégé-like client-side ontology creation, management and viewing tool; very impressive. The code is distributed from Sourceforge; there is a nice online demo available; there is a nice explanatory paper on the system, and the developer, Chris Callendar, has a useful blog with Flex development tips
<Newest> ITM supports the management of complex knowledge structures (metadata repositories, terminologies, thesauri, taxonomies, ontologies, and knowledge bases) throughout their lifecycle, from authoring to delivery. ITM can also manage alignments between multiple knowledge structures, such as thesauri or ontologies, via the integration of INRIA’s Alignment API. Commercial; from Mondeca
Knoodl facilitates community-oriented development of OWL based ontologies and RDF knowledge bases. It also serves as a semantic technology platform, offering a Java service-based interface or a SPARQL-based interface so that communities can build their own semantic applications using their ontologies and knowledgebases. It is hosted in the Amazon EC2 cloud and is available for free; private versions may also be obtained. See especially the screencast for a quick introduction
The NeOn toolkit is a state-of-the-art, open source multi-platform ontology engineering environment, which provides comprehensive support for the ontology engineering life-cycle. The v2.3.0 toolkit is based on the Eclipse platform, a leading development environment, and provides an extensive set of plug-ins covering a variety of ontology engineering activities. You can add these plug-ins or get a current listing from the built-in updating mechanism
ontopia is a relative complete suite of tools for building, maintaining, and deploying Topic Maps-based applications; open source, and written in Java. Could not find online demos, but there are screenshots and there is visualization of topic relationships
Protégé is a free, open source visual ontology editor and knowledge-base framework. The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema. There are a large number of third-party plugins that extends the platform’s functionality
- Protégé Plugin Library – frequently consult this page to review new additions to the Protégé editor; presently there are dozens of specific plugins, most related to the semantic Web and most open source
- Collaborative Protégé is a plug-in extension of the existing Protégé system that supports collaborative ontology editing as well as annotation of both ontology components and ontology changes. In addition to the common ontology editing operations, it enables annotation of both ontology components and ontology changes. It supports the searching and filtering of user annotations, also known as notes, based on different criteria. There is also an online demo
- <New>Web Protégé is an online version of Protégé attempting to capture all of the native functionality; still under development
<New>Sigma is open source knowledge engineering environment that includes ontology mapping, theorem proving, language generation in multiple languages, browsing, OWL read/write, and analysis. It includes the Suggested Upper Merged Ontology (SUMO), a comprehensive formal ontology. It’s under active development and use
TopBraid Composer is an enterprise-class modeling environment for developing Semantic Web ontologies and building semantic applications. Fully compliant with W3C standards, Composer offers comprehensive support for developing, managing and testing configurations of knowledge models and their instance knowledge bases. It is based on the Eclipse IDE. There is a free version (after registration) for small ontologies
<New>TwoUse Toolkit is an implementation of current OMG and W3C standards for developing ontology-based software models and model-based OWL2 ontologies, largely based around UML. There are a variety of tools, including graphics editors, with more to come
<New>Wandora is a topic maps engine written in Java with support for both in-memory topic maps and persisting topic maps in MySQL and SQL Server. It also contains an editor and a publishing system, and has support for automatic classification. It can read OBO, RDF(S), and many other formats, and can export topic maps to various graph formats. There is also a web-based topic maps browser, and graphical visualization.

Not Apparently in Active Use

Adaptiva is a user-centred ontology building environment, based on using multiple strategies to construct an ontology, minimising user input by using adaptive information extraction
Exteca is an ontology-based technology written in Java for high-quality knowledge management and document categorisation, including entity extraction. Though code is still available, no updates have been provided since 2006. It can be used in conjunction with search engines
IODT is IBM’s toolkit for ontology-driven development. The toolkit includes EMF Ontolgy Definition Metamodel (EODM), EODM workbench, and an OWL Ontology Repository (named Minerva)
KAON is an open-source ontology management infrastructure targeted for business applications. It includes a comprehensive tool suite allowing easy ontology creation and management and provides a framework for building ontology-based applications. An important focus of KAON is scalable and efficient reasoning with ontologies
Ontolingua provides a distributed collaborative environment to browse, create, edit, modify, and use ontologies. The server supports over 150 active users, some of whom have provided us with descriptions of their projects. Provided as an online service; software availability not known.

Vocabulary Prompting Tools

AlchemyAPI from Orchestr8 provides an API based application that uses statistical and natural language processing methods. Applicable to webpages, text files and any input text in several languages
BooWa is a set expander for any language (formerly known as SEALS); developed by RC Wang of Carnegie Mellon
Google Keywords allows you to enter a few descriptive words or phrases or a site URL to generate keyword ideas
Google Sets for automatically creating sets of items from a few examples
Open Calais is free limited API web service to automatically attach semantic metadata to content, based on either entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), or events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’). The metadata results are stored centrally and returned to you as industry-standard RDF constructs accompanied by a Globally Unique Identifier (GUID)
Query-by-document from BlogScope has a nice phrase extraction service, with a choice of ranking methods. Can also be used in a Firefox plug-in (not texted with 3.5+)
SemanticHacker (from Textwise) is an API that does a number of different things, including categorization, search, etc. By using ‘concept tags’, the API can be leveraged to generate metadata or tags for content
TagFinder is a Web service that automatically extracts tags from a piece of text. The tags are chosen based on both statistical and linguistic analysis of the original text
Tagthe.net has a demo and an API for automatic tagging of web documents and texts. Tags can be single words only. The tool also recognizes named entities such as people names and locations
TermExtractor extracts terminology consensually referred in a specific application domain. The software takes as input a corpus of domain documents, parses the documents, and extracts a list of “syntactically plausible” terms (e.g. compounds, adjective-nouns, etc.)
TermFinder uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language; available for English, French and Italian
TerMine is an online and batch term extractor that emphasizes part of speech (POS) and n-gram (phrase extraction). TerMine is the terminological management system with the C-Value term extraction and AcroMine acronym recognition integrated
Topia term extractor is a part-of-speech and frequency based term extraction tool implemented in python. Here is a term extraction demo based on this tool
Topicalizer is a service which automatically analyses a document specified by a URL or a plain text regarding its word, phrase and text structure. It provides a variety of useful information on a given text including the following: Word, sentence and paragraph count, collocations, syllable structure, lexical density, keywords, readability and a short abstract on what the given text is about

TrMExtractor does glossary extraction on pure text files for either English or Hungarian

Wikify! is a system to automatically “wikify” a text by adding Wikipedia-like tags throughout the document. The system extracts keywords and then disambiguates and matches them to their corresponding Wikipedia definition
Yahoo! Placemaker is a freely available geoparsing Web service. It helps developers make their applications location-aware by identifying places in unstructured and atomic content – feeds, web pages, news, status updates – and returning geographic metadata for geographic indexing and markup
Yahoo! Term Extraction Service is an API to Yahoo’s term extraction service, as well as many other APIs and services in a variety of languages and for a variety of tasks; good general resource. The service has been reported to be shut down numerous times, but apparently is kept alive due to popular demand.

Initial Ontology Development

COE COE (CmapTools Ontology Editor) is a specialized version of the CmapTools from IMHC. COE — and its CmapTools parent — is based on the idea of concept maps. A concept map is a graph diagram that shows the relationships among concepts. Concepts are connected with labeled arrows, with the relations manifesting in a downward-branching hierarchical structure. COE is an integrated suite of software tools for constructing, sharing and viewing OWL encoded ontologies based on these constructs
Conzilla2 is a second generation concept browser and knowledge management tool with many purposes. It can be used as a visual designer and manager of RDF classes and ontologies, since its native storage is in RDF. It also has an online collaboration server [apparently last updated in 2008]
http://diagramic.com/ has an online Flex network graph demo, which also has a neat facility for quick entry and visualization of relationships; mostly small scale; pretty cool. Does not appear to be code available anywhere
<New>DL-Learner is a tool for learning OWL class expressions from examples and background knowledge. It extends Inductive Logic Programming (ILP) to Description Logics and the Semantic Web. DL-Learner now has a flexible component based design, which allows to extend it easily with new learning algorithms, learning problems, reasoners, and supported background knowledge sources. A new type of supported knowledge sources are SPARQL endpoints, where DL-Learner can extract knowledge fragments, which enables learning classes even on large knowledge sources like DBpedia, and includes an OWL API reasoner interface and Web service interface.
DogmaModeler is a free and open source, ontology modeling tool based on ORM. The philosophy of DogmaModeler is to enable non-IT experts to model ontologies with a little or no involvement of an ontology engineer; project is quite old, but the software is still available and it may provide some insight into naive ontology development
Erca is a framework that eases the use of Formal and Relational Concept Analysis, a neat clustering technique. Though not strictly an ontology tool, Erca could be implemented in a work flow that allows easy import of formal contexts from CSV files, then algorithms that computes the concept lattice of the formal contexts that can be exported as dot graphs (or in JPG, PNG, EPS and SVG formats). Erca is provided as an Eclipse plug-in
GraphMind is a mindmap editor for Drupal. It has the basic mindmap features and some Drupal specific enhancements. There is a quick screencast about how GraphMind looks like and what is does. The Flex source is also available from Github
<New>H-Maps is a commercial suite of tools for building topic maps applications, consisting of a topic maps engine and server, a mapping framework for converting from legacy data, and a navigator for visualizing data. It is typically used in bioinformatics (drug discovery and research, toxicological studies, etc), engineering (support and expert systems), and for integration of hetereogeneous data. It supports the XTM 1.0 and TMAPI 1.0 specifications
irON using spreadsheets, via its notation and specification. Spreadsheets can be used for initial authoring, esp if the irON guidelines are followed. See further this case study of Sweet Tools in a spreadsheet using irON (commON)
<New>JXML2OWL API is a library for mapping XML schemas to OWL Ontologies on the JAVA platform. It creates an XSLT which transforms instances of the XML schema into instances of the OWL ontology. JXML2OWL Mapper is GUI application using the JXML2OWL API
MindRaider is Semantic Web outliner. It aims to connect the tradition of outline editors with emerging technologies. MindRaider mission is to organize not only the content of your hard drive but also your cognitive base and social relationships in a way that enables quick navigation, concise representation and inferencing
<New>Neologism is a simple web-based RDF Schema vocabulary editor and publishing system. Use it to create RDF classes and properties, which are needed to publish data on the Semantic Web. Its main goal is to dramatically reduce the time required to create, publish and modify vocabularies for the Semantic Web. It is written in PHP and built on the Drupal platform. Neologism is currently in alpha
<New>OCS – Ontology Creation System is software to develop ontologies in cooperative way with a graphical interface
RDF123 is an application and web service for converting data in simple spreadsheets to an RDF graph. Users control how the spreadsheet’s data is converted to RDF by constructing a graphical RDF123 template that specifies how each row in the spreadsheet is converted as well as metadata for the spreadsheet and its RDF translation
<New>ROC (Rapid Ontology Construction) is a tool that allows domain experts to quickly build a basic vocabulary for their domain, re-using existing terminology whenever possible. How this works is that the ROC tool asks the domain expert for a set of keywords that are ‘core’ terms of the domain, and then queries remote sources for concepts matching those terms. These are then presented to the user, who can select terms from the list, find relations to other terms, and expand the set of terms and relations, iteratively. The resulting vocabulary (or ‘proto-ontology’, basically a SKOS-like thesaurus) can be used as is, or can be used as input for a knowledge engineer to base a more comprehensive domain ontology on. Interface “triples-oriented,” not graphical.
Topincs is a Topic Map authoring software that allows groups to share their knowledge over the web. It makes use of a variety of modern technologies. The most important are Topic Maps, REST and Ajax. It consists of three components: the Wiki, the Editor, and the Server. The servier requires AMP; the Editor and Wiki are based on browser plug-ins.

Ontology Editing

First, see all of the Comprehensive Tools and Ontology Development listings above
Anzo for Excel includes an (RDFS and OWL-based) ontology editor that can be used directly within Excel. In addition to that, Anzo for Excel includes the capability to automatically generate an ontology from existing spreadsheet data, which is very useful for quick bootstrapping of an ontology
<New>ATop is a topic map browser and editor written in Java and supports the XTM 1.0 specification; project has not been updated since 2008
Hozo is an ontology visualization and development tool that brings version control constructs to group ontology development; limited to a prototype, with no online demo
Lexaurus Editor is for off-line creation and editing of vocabularies, taxonomies and thesauri. It supports import and export in Zthes and SKOS XML formats, and allows hierarchical / poly-hierarchical structures to be loaded for editing, or even multiple vocabularies to be loaded simultaneously, so that terms from one taxonomy can be re-used in another, using drag and drop. Not available in open source
Model Futures OWL Editor combines simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports. The editor is tree-based and has a “navigator” tool for traversing property and class-instance relationships. It can import XMI (the interchange format for UML) and Thesaurus Descriptor (BT-NT XML), and EXPRESS XML files. It can export to MS Word.
<New>OBO-Edit is an open source ontology editor written in Java. OBO-Edit is optimized for the OBO biological ontology file format. It features an easy to use editing interface, a simple but fast reasoner, and powerful search capabilities
<New>Onotoa is an Eclipse-based ontology editor for topic maps. It has a graphical UML-like interface, an export function for the current TMCL-draft and a XTM export
OntoTrack is a browsing and editing ontology authoring tool for OWL Lite. It combines a sophisticated graphical layout with mouse enabled editing features optimized for efficient navigation and manipulation of large ontologies
OWLViz is an attractive visual editor for OWL and is available as a Protégé plug-in
PoolParty is a triple store-based thesaurus management environment which uses SKOS and text extraction for tag recommendations. See further this manual, which describes more fully the system’s functionality. Also, there is a PoolParty Web service that enables a Zthes thesaurus in XML format to be uploaded and converted to SKOS (via skos:Concepts)
SKOSEd is a plugin for Protege 4 that allows you to create and edit thesauri (or similar artefacts) represented in the Simple Knowledge Organisation System (SKOS).
TemaTres is a Web application to manage controlled vocabularies, taxonomies and thesaurus. The vocabularies may be exported in Zthes, Skos, TopicMap, etc.
ThManager is a tool for creating and visualizing SKOS RDF vocabularies. ThManager facilitates the management of thesauri and other types of controlled vocabularies, such as taxonomies or classification schemes
Vitro is a general-purpose web-based ontology and instance editor with customizable public browsing. Vitro is a Java web application that runs in a Tomcat servlet container. With Vitro, you can: 1) create or load ontologies in OWL format; 2) edit instances and relationships; 3) build a public web site to display your data; and 4) search your data with Lucene. Still in somewhat early phases, with no online demos and with minimal interfaces.
<New>Vocab Editor is an RDF/OWL/SKOS vocabulary-diagram editor. It has both client- (Javascript) and server-side (Python) implmentations. It is open source with a demo. There is a blog (Spanish) and online sample vocabulary app editor.

Not Apparently in Active Use

Omnigator The Omnigator is a form-based manipulaton tool centered on Topic Maps, though it enables the loading and navigation of any conforming topic map in XTM, HyTM, LTM or RDF formats. There is a free evaluation version.
OntoGen is a semi-automatic and data-driven ontology editor focusing on editing of topic ontologies (a set of topics connected with different types of relations). The system combines text-mining techniques with an efficient user interface. It requires .Net.
OntoLight is a set of software modules for: transforming raw ontology data for several ontologies from their specific formats into a unifying light-weight ontology format, grounding the ontology and storing it into grounded ontology format, populating grounded ontologies with new instance data, and creating mappings between grounded ontologies; includes Cyc. Download no longer available. See http://analytics.ijs.si/~blazf/papers/Context_SiKDD07.pdf and http://www.neon-project.org/web-content/index.php?option=com_weblinks&task=view&catid=17&id=52 or http://www.neon-project.org/web-content/index.php?option=com_weblinks&catid=21&Itemid=73
OWL-S-editor is an editor for the development of services in OWL-S, with graphical, WSDL and import/export support
ReTAX+ is an aide to help a taxonomist create a consistent taxonomy and in particular provides suggestions as to where a new entity could be placed in the taxonomy whilst retaining the integrity of the revised taxonomy (c.f., problems in ontology modelling)
SWOOP is a lightweight ontology editor. (Swoop is no longer under active development at mindswap. Continuing development can be found on SWOOP’s Google Code homepage at http://code.google.com/p/swoop/)
WebOnto supports the browsing, creation and editing of ontologies through coarse grained and fine grained visualizations and direct manipulation.

Ontology Mapping

<New>The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF, so it is freely extensible. The Alignment API itself is a Java description of tools for accessing the common format. It defines four main interfaces (Alignment, Cell, Relation and Evaluator).
COMA++ is a schema and ontology matching tool with a comprehensive infrastructure. Its graphical interface supports a variety of interaction
ConcepTool is a system to model, analyse, verify, validate, share, combine, and reuse domain knowledge bases and ontologies, reasoning about their implication
<New>MapOnto is a research project aiming at discovering semantic mappings between different data models, e.g, database schemas, conceptual schemas, and ontologies. So far, it has developed tools for discovering semantic mappings between database schemas and ontologies as well as between different database schemas. The Protege plug-in is still available, but appears to be for older versions
MatchIT automates and facilitates schema matching and semantic mapping between different Web vocabularies. MatchIT runs as a stand-alone or plug-in Eclipse application and can be integrated with popular third party applications. MatchIT’s uses Adaptive Lexicon™ as an ontology-driven dictionary and thesaurus of English language terminology to quantify and ank the semantic similarity of concepts. It apparently is not available in open source
myOntology is used to produce the theoretical foundations, and deployable technology for the Wiki-based, collaborative and community-driven development and maintenance of ontologies instance data and mappings
OLA/OLA2 (OWL-Lite Alignment) matches ontologies written in OWL. It relies on a similarity combining all the knowledge used in entity descriptions. It also deal with one-to-many relationships and circularity in entity descriptions through a fixpoint algorithm
Potluck is a Web-based user interface that lets casual users—those without programming skills and data modeling expertise—mash up data themselves. Potluck is novel in its use of drag and drop for merging fields, its integration and extension of the faceted browsing paradigm for focusing on subsets of data to align, and its application of simultaneous editing for cleaning up data syntactically. Potluck also lets the user construct rich visualizations of data in-place as the user aligns and cleans up the data.
PRIOR+ is a generic and automatic ontology mapping tool, based on propagation theory, information retrieval technique and artificial intelligence model. The approach utilizes both linguistic and structural information of ontologies, and measures the profile similarity and structure similarity of different elements of ontologies in a vector space model (VSM).
<New>S-Match takes any two tree like structures (such as database schemas, classifications, lightweight ontologies) and returns a set of correspondences between those tree nodes which semantically correspond to one another.
Vine is a tool that allows users to perform fast mappings of terms across ontologies. It performs smart searches, can search using regular expressions, requires a minimum number of clicks to perform mappings, can be plugged into arbitrary mapping framework, is non-intrusive with mappings stored in an external file, has export to text files, and adds metadata to any mapping. See also http://sourceforge.net/projects/vine/.

Not Apparently in Active Use

ASMOV (Automated Semantic Mapping of Ontologies with Validation) is an automatic ontology matching tool which has been designed in order to facilitate the integration of heterogeneous systems, using their data source ontologies
Chimaera is a software system that supports users in creating and maintaining distributed ontologies on the web. Two major functions it supports are merging multiple ontologies together and diagnosing individual or multiple ontologies
CMS (CROSI Mapping System) is a structure matching system that capitalizes on the rich semantics of the OWL constructs found in source ontologies and on its modular architecture that allows the system to consult external linguistic resources
ConRef is a service discovery system which uses ontology mapping techniques to support different user vocabularies
DRAGO reasons across multiple distributed ontologies interrelated by pairwise semantic mappings, with a vision of peer-to-peer mapping of many distributed ontologies on the Web. It is implemented as an extension to an open source Pellet OWL Reasoner
Falcon-AO (Finding, aligning and learning ontologies) is an automatic ontology matching tool that includes the three elementary matchers of String, V-Doc and GMO. In addition, it integrates a partitioner PBM to cope with large-scale ontologies
FOAM is the Framework for ontology alignment and mapping. It is based on heuristics (similarity) of the individual entities (concepts, relations, and instances)
hMAFRA (Harmonize Mapping Framework) is a set of tools supporting semantic mapping definition and data reconciliation between ontologies. The targeted formats are XSD, RDFS and KAON
IF-Map is an Information Flow based ontology mapping method. It is based on the theoretical grounds of logic of distributed systems and provides an automated streamlined process for generating mappings between ontologies of the same domain
LILY is a system matching heterogeneous ontologies. LILY extracts a semantic subgraph for each entity, then it uses both linguistic and structural information in semantic subgraphs to generate initial alignments. The system is presently in a demo version only
MAFRA Toolkit – the Ontology MApping FRAmework Toolkit allows users to create semantic relations between two (source and target) ontologies, and apply such relations in translating source ontology instances into target ontology instances
OntoEngine is a step toward allowing agents to communicate even though they use different formal languages (i.e., different ontologies). It translates data from a “source” ontology to a “target”
OWLS-MX is a hybrid semantic Web service matchmaker. OWLS-MX 1.0 utilizes both description logic reasoning, and token based IR similarity measures. It applies different filters to retrieve OWL-S services that are most relevant to a given query
RiMOM (Risk Minimization based Ontology Mapping) integrates different alignment strategies: edit-distance based strategy, vector-similarity based strategy, path-similarity based strategy, background-knowledge based strategy, and three similarity-propagation based strategies
semMF is a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties
Snoggle is a graphical, SWRL-based ontology mapper. Snoggle attempts to solve the ontology mapping problem by providing a graphical user interface (similar to which of the Microsoft Visio) to guide the process of ontology vocabulary alignment. In Snoggle, user-defined mappings can be serialized into rules, which is expressed using SWRL
Terminator is a tool for creating term to ontology resource mappings (documentation in Finnish).

Ontology Visualization/Analysis

Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.

Social network graphing tools (many covered elsewhere)
Cytoscape is a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data; I have also written specifically about Cytoscape’s use in UMBEL
- RDFScape is a project that brings Semantic Web “features” to the popular Systems Biology software Cytoscape
- NetworkAnalyzer performs analysis of biological networks and calculates network topology parameters including the diameter of a network, the average number of neighbors, and the number of connected pairs of nodes. It also computes the distributions of more complex network parameters such as node degrees, average clustering coefficients, topological coefficients, and shortest path lengths. It displays the results in diagrams, which can be saved as images or text files; used by SD
Graphl is a tool for collaborative editing and visualisation of graphs, representing relationships between resources or concepts of the real world. Graphl may be thought of as a visual wiki, a place where everybody can contribute to a shared repository of knowledge
<New>Graphviz is open source graph visualization software. It has several main graph layout programs. It also has web and interactive graphical interfaces, and auxiliary tools, libraries, and language bindings.
<New>GrOWL is an ontology visualizer and editor. The layout of the GrOWL graph can be defined automatically or loaded from a separate style sheet. GrOWL implements configurable filters that can transform the display by simplifying it, hiding concepts and relationships that have no descriptions associated, or perform more complex translations. Concepts can be stored in ontologies with extensive annotations to provide documentation. GrOWL shows these annotation as tooltips, and supports complex HTML and links within them. The GrOWL browser can be used inside a web browser or as a stand-alone application. When used inside a browser, it supports Javascript interaction so that it can be used as a concept chooser with implementation-defined operations.
igraph is a free software package for creating and manipulating undirected and directed graphs
Network Workbench is a very complex, comprehensive; Swiss Army Knife
NetworkX – Python; very clean
<New>OntoGraf, a Protege 4 plug-in, gives support for interactively navigating the relationships of your OWL ontologies. Various layouts are supported for automatically organizing the structure of your ontology. Different relationships are supported: subclass, individual, domain/range object properties, and equivalence. Relationships and node types can be filtered.
<New>OWL2Prefuse is a Java package which creats Prefuse graphs and trees from OWL files (and Jena OntModels). It takes care of converting the OWL data structure to the Prefuse datastructure. This makes it is easy for developers, to use the Prefuse graphs and trees into their Semantic Web applications.
<New>RDF Gravity is a tool for visualising RDF/OWL Graphs/ ontologies. RDF Gravity is implemented by using the JUNG Graph API and Jena semantic web toolkit. Its main features are:
- Graph Visualization
- Global and Local Filters (enabling specific views on a graph)
- Full text Search
- Generating views from RDQL Queries
- Visualising multiple RDF files
<Newest> SKOS Reader is a SKOS browser and an HTML renderer of SKOS thesauri and terminologies that can display a SKOS file hierarchically, alphabetically, or permuted. Commercial; from Mondeca
Stanford Network Analysis Package (SNAP) is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes
Social Networks Visualizer (SocNetV) is a flexible and user-friendly tool for the analysis and visualization of Social Networks. It lets you construct networks (mathematical graphs) with a few clicks on a virtual canvas or load networks of various formats (GraphViz, GraphML, Adjacency, Pajek, UCINET, etc) and modify them to suit your needs. SocNetV also offers a built-in web crawler, allowing you to automatically create networks from all links found in a given initial URL
Tulip may be incredibly strong
- quite active (but not much online stuff): http://sourceforge.net/projects/auber/files/
Springgraph component for Flex
VizierFX is a Flex library for drawing network graphs. The graphs are laid out using GraphViz on the server side, then passed to VizierFX to perform the rendering. The library also provides the ability to run ActionScript code in response to events on the graph, such as mousing over a node or clicking on it.
<New>VUE (Visual Understanding Environment) is an open source project focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information.
<New>yEd is a diagram editor that can be used to quickly and effectively generate high-quality drawings of diagrams. It can support OWL imports.
<New>ZGRViewer is a graph visualizer implemented in Java and based upon the Zoomable Visual Transformation Machine. It is specifically aimed at displaying graphs expressed using the DOT language from AT&T GraphViz and processed by programs dot, neato or others such as twopi. ZGRViewer is designed to handle large graphs, and offers a zoomable user interface (ZUI), which enables smooth zooming and easy navigation in the visualized structure.

Miscellaneous Ontology Tools

Apolda (Automated Processing of Ontologies with Lexical Denotations for Annotation) is a plugin (processing resource) for GATE (http://gate.ac.uk/). The Apolda processing resource (PR) annotates a document like a gazetteer, but takes the terms from an (OWL) ontology rather than from a list
<Newest>CA Manager supports customized workflows for semantic annotation of content. Commercial; from Mondeca
<New>Gloze is a XML to RDF, RDF to XML, and XSD to OWL mapping tool based on Jena; see also http://jena.hpl.hp.com/juc2006/proceedings/battle/paper.pdf . See also http://jena.sourceforge.net/contrib/contributions.html
<New>Hoolet is an implementation of an OWL-DL reasoner that uses a first order prover. The ontology is translated to collection of axioms (in an obvious way based on the OWL semantics) and this collection of axioms is then given to a first order prover for consistency checking.
LexiLink is a tool for building, curating and managing multiple lexicons and ontologies in one enterprise-wide Web-based application. The core of the technology is based on RDF and OWL
mopy is the Music Ontology Python library, designed to provide easy to use python bindings for ontology terms for the creation and manipulation of music ontology data. mopy can handle information from several ontologies, including the Music Ontology, full FOAF vocab, and the timeline and chord ontologies
OBDA (Ontology Based Data Access) is a plugin for Protégé aimed to be a full-fledged OBDA ontology and component editor. It provides data source and mapping editors, as well as querying facilities that, in sum, allow you to design and test every aspect of an OBDA system. It supports relational data sources (RDBMS) and GLAV-like mappings. In its current beta form, it requires Protege 3.3.1, a reasoner implementing the OBDA extensions to DIG 1.1 (e.g., the DIG server for QuOnto) and Jena 2.5.5
<New>oBrowse is a web based ontology browser developed in java. oBrowse parses OWL files of an ontology and displays ontology in a tree view. Protege-API, JSF are used in development
OntoComP is a Protégé 4 plugin for completing OWL ontologies. It enables the user to check whether an OWL ontology contains “all relevant information” about the application domain, and extend the ontology appropriately if this is not the case
Ontology Browser is a browser created as part of the CO-ODE (http://www.co-ode.org/) project; rather simple interface and use
Ontology Metrics is a web-based tool that displays statistics about a given ontology, including the expressivity of the language it is written in
<New>OntoLT aims at a more direct connection between ontology engineering and linguistic analysis. OntoLT is a Protégé plug-in, with which concepts (Protégé classes) and relations (Protégé slots) can be extracted automatically from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé. Only available for older Protégé versions
OntoSpec is a SWI-Prolog module, aiming at automatically generating XHTML specification from RDF-Schema or OWL ontologies
OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API is focused towards OWL Lite and OWL DL and offers an interface to inference engines and validation functionality
OWL Module Extractor is a Web service that extracts a module for a given set of terms from an ontology. It is based on an implementation of locality-based modules that is part of the OWL API.
OWL Syntax Converter is an online tool for converting ontologies between different formats, including several OWL syntaxes, RDF/XML, KRSS
OWL Verbalizer is an on-line tool that verbalizes OWL ontologies in (controlled) English
OwlSight is an OWL ontology browser that runs in any modern web browser; it’s developed with Google Web Toolkit and uses Gwt-Ext, as well as OWL-API. OwlSight is the client component and uses Pellet as its OWL reasoner
Pellint is an open source lint tool for Pellet which flags and (optionally) repairs modeling constructs that are known to cause performance problems. Pellint recognizes several patterns at both the axiom and ontology level.
PROMPT is a tab plug-in for Protégé is for managing multiple ontologies by comparing versions of the same ontology, moving frames between included and including project, merging two ontologies into one, or extracting a part of an ontology
<New>ReDeFer is a compendium of RDF-aware utilities organised in a set of packages: RDF2HTML+RDFa: render a piece of RDF/XML as HTML+RDFa; XSD2OWL: transform an XML Schema into an OWL Ontology; CS2OWL: transform a MPEG-7 Classification Scheme into an OWL Ontology; XML2RDF: transform a piece of XML into RDF; and RDF2SVG: render a piece of RDF/XML as a SVG showing the corresponding graph
SegmentationApp is a Java application that segments a given ontology according to the approach described in “Web Ontology Segmentation: Analysis, Classification and Use” (http://www.co-ode.org/resources/papers/seidenberg-www2006.pdf)
SETH is a software effort to deeply integrate Python with Web Ontology Language (OWL-DL dialect). The idea is to import ontologies directly into the programming context so that its classes are usable alongside standard Python classes
SKOS2GenTax is an online tool that converts hierarchical classifications available in the W3C SKOS (Simple Knowledge Organization Systems) format into RDF-S or OWL ontologies
SpecGen (v5) is an ontology specification generator tool. It’s written in Python using Redland RDF library and licensed under the MIT license
Text2Onto is a framework for ontology learning from textual resources that extends and re-engineers an earlier framework developed by the same group (TextToOnto). Text2Onto offers three main features: it represents the learned knowledge at a metalevel by instantiating the modelling primitives of a Probabilistic Ontology Model (POM), thus remaining independent from a specific target language while allowing the translation of the instantiated primitives
Thea is a Prolog library for generating and manipulating OWL (Web Ontology Language) content. Thea OWL parser uses SWI-Prolog’s Semantic Web library for parsing RDF/XML serialisations of OWL documents into RDF triples and then it builds a representation of the OWL ontology
TONES Ontology Repository is primarily designed to be a central location for ontologies that might be of use to tools developers for testing purposes; it is part of the TONES project
Visual Ontology Manager (VOM) is a family of tools enables UML-based visual construction of component-based ontologies for use in collaborative applications and interoperability solutions.
Web Ontology Manager is a lightweight, Web-based tool using J2EE for managing ontologies expressed in Web Ontology Language (OWL). It enables developers to browse or search the ontologies registered with the system by class or property names. In addition, they can submit a new ontology file
RDF evoc (external vocabulary importer) is an RDF external vocabulary importer module (evoc) for Drupal caches any external RDF vocabulary and provides properties to be mapped to CCK fields, node title and body. This module requires the RDF and the SPARQL modules.

Not Apparently in Active Use

ActiveOntology is a library, written in Ruby, for easy manipulation of RDF and RDF-Schema models, thru a dynamic DSL based on Ruby idiom
Almo is an ontology-based workflow engine in Java supporting the ARTEMIS project; part of the OntoWare initiative
ClassAKT is a text classification web service for classifying documents according to the ACM Computing Classification System
Elmo provides a simple API to access ontology oriented data inside a Sesame RDF repository. The domain model is simplified into independent concerns that are composed together for multi-dimensional, inter-operating, or integrated applications
ExtrAKT is a tool for extracting ontologies from Prolog knowledge bases.
F-Life is a tool for analysing and maintaining life-cycle patterns in ontology development.
Foxtrot is a recommender system which represents user profiles in ontological terms, allowing inference, bootstrapping and profile visualization.
HyperDAML creates an HTML representation of OWL content to enable hyperlinking to specific objects, properties, etc.
LinKFactory is an ontology management tool, it provides an effective and user-friendly way to create, maintain and extend extensive multilingual terminology systems and ontologies (English, Spanish, French, etc.). It is designed to build, manage and maintain large, complex, language independent ontologies.
LSW – the Lisp semantic Web toolkit enables OWL ontologies to be visualized. It was written by Alan Ruttenberg
OntoClassify is a system for scalable classification of text into large topic ontologies currently including DMoz and Inspec. The system is available as Web service. The software runs under Windows platform.

Ontodella is a Prolog HTTP server for category projection and semantic linking
OntoWeaver is an ontology-based approach to Web sites, which provides high level support for web site design and development
OWLLib is a PHP library for accessing OWL files. OWL is w3.org standard for storing semantic information
pOWL is a Semantic Web development platform for ontologies in PHP. pOWL consists of a number of components, including RAP
ROWL is the Rule Extension of OWL; it is from the Mobile Commerce Lab in the School of Computer Science at Carnegie Mellon University
Semantic Net Generator is a utlity for generating Topic Maps automatically from different data sources by using rules definitions specified with Jelly XML syntax. This Java library provides Jelly tags to access and modify data sources (also RDF) to create a semantic network
SMORE is OWL markup for HTML pages. SMORE integrates the SWOOP ontology browser, providing a clear and consistent way to find and view Classes and Properties, complete with search functionality
SOBOLEO is a system for Web-based collaboration to create SKOS taxonomies and ontologies and to annotate various Web resources using them
SOFA is a Java API for modeling ontologies and Knowledge Bases in ontology and Semantic Web applications. It provides a simple, abstract and language neutral ontology object model, inferencing mechanism and representation of the model with OWL, DAML+OIL and RDFS languages; from java.dev
WebScripter is a tool that enables ordinary users to easily and quickly assemble reports extracting and fusing information from multiple, heterogeneous DAMLized Web sources.

[1] This listing is maintained on a permanent basis on the OpenStructs‘ TechWiki.

Main Links

Search

A Single-stop Assembly of Ontology Tips and Pointers

Learning by Example

Sources of Best Practices

General Best Practices

Scope and Content

Structure and Design

Naming and Vocabulary Best Practices

Documentation Best Practices

Organizational and Collaborative Best Practices

Testing Best Practices

Best Practices for Adaptive Ontologies

Shifting the Center of Gravity to the OWL API, Web Services

Methodology Reprise: The Nature of the Landscape

A Legacy of Excellent First Generation Tools

A Normative Tools Landscape

OWL API as Center of Gravity

Web Services (structWSF) as Canonical Access Layer

Simpler, Task-specific Tools

Combining These New Thrusts and Moving Forward

Individual Tools within the Landscape

Key Gaps in the Landscape

Funding and Pending Project Priorities

Bringing Ontology Development and Maintenance to the Mainstream

A Review of Prior Methodologies

A Call for a New Methodology

Design Criteria for an Adaptive Methodology

Lightweight and Domain-oriented

Contextual

Coherent

Incremental

Re-use of Structure

Separation of the ABox and TBox

Simple, Interoperable Tools Support

The New Methodology

Two Parallel Tracks

Re-Use Structure

Prototype Structure

Link Reference Ontologies

Working Ontology

Testing and Mapping

Use and Maintenance

Extend

Documentation

Content Processing

Semantic Component Ontology

Tooling and Best Practices

Time for a Leap Forward in Methodology

The Recent Pace of Ontology Development Appears to Have Waned

General Surveys

Some Specific Methodologies

Some Flowcharts

Best Practices Observations

General Recommendations

Process Recommendations

Organizational Recommendations

Technological Recommendations

Summary of Observations

Earlier Listing is Expanded by More than 30%

Comprehensive Ontology Tools

Not Apparently in Active Use

Vocabulary Prompting Tools

Initial Ontology Development

Ontology Editing

Not Apparently in Active Use

Ontology Mapping

Not Apparently in Active Use

Ontology Visualization/Analysis

Miscellaneous Ontology Tools

Not Apparently in Active Use