Confronting Misconceptions with Adaptive OntologiesAI3:::Adaptive InformationAI3:::Adaptive Information

Ontology Best Practices for Data-driven Applications: Part 4

The earlier portions of this occasional series have set the groundwork for the role of ontologies in data-driven applications. In this part, I address many of the current misconceptions of what ontologies do or do not do. For, as practiced by Structured Dynamics, our adaptive TBox-level ontologies [1] are definitely not your grandfather’s Oldsmobile.

To share the punch line early, these modern ontologies are fast to develop, easy to change, adaptive to new knowledge and perceptions, robust and flexible. Indeed, it is the structure and nature of these adaptive ontologies that is the heart and secret of data-driven applications.

Any knowledge worker can understand and refine the organization and relationship of information via these structures. And, most importantly, the resulting ontologies are sufficient to drive the generic applications that are based on them. Focusing on data and structure now becomes the emphasis. We can now remove prior bottlenecks arising from the need to customize applications, configure report writers, or wait for IT to generate SQL queries.

But, not all ontologies are created equally and not all practitioners explain or see them in the same way. The purpose of this Part 4 in our series is to present many of the misconceptions, offering a score of takeaway messages for how properly considered and constructed ontologies can achieve these benefits.

Misconception: No ‘Big Bang’ Needed

To be sure, there are many very large and comprehensive ontologies. Some are focused on specific applications or domains; some are general; and some are the result of large and well-funded projects [2]. I am not arguing that such efforts do not have their role and place. But when viewed as exemplars or notable cases, these complex and comprehensive ontologies can create a misconception that such a scope is an imperative of proper ontology design.

I believe quite the opposite to be true.

An incredible strength of RDF and OWL ontologies is that they can be built incrementally. So long as additions are coherent with some degree of self-consistency in terms of the world view in which they are represented, any of an ontology’s constituent concepts, predicates or entities and datasets can be added and enhanced as needed. This makes ontologies a very different cat from relational schema, which are notoriously brittle with expensive re-architecting required anytime that scope or schema change.

Enterprise consultants that advocate “big” upfront ontology development efforts are doing their clients a massive disservice. They are also cynically playing on the experience with relational schema. As soon as the marketplace begins to realize that ontologies are incredibly plastic and malleable, this huge advantage of ontologies over the relational model for data federation will ring clear.

Takeaway Message #1: Ontologies can (and should!) start small.

Takeaway Message #2: Ontologies can (and should!) grow incrementally.

Misconception: No ‘One Ring to Rule Them All’

As a practitioner, two of the most boring arguments I hear are: Ontology X is better than other ontologies and here is why; and, Use of some reference or upper ontology reduces choice and freedom. Both arguments are somewhat grounded in the ‘one ring to rule them all’ mindset — though coming from opposing perspectives — that I think fundamentally misreads the role and purpose of ontologies.

Ontologies provide an organizing context for relating disparate information together and for making meaningful inferences. Without such a framework these purposes can not be achieved. But the framework itself is a function of the world view, context and domain scope at hand. As a result, there is only context, and not some single, universal “truth.” As they say, it all depends.

The trick, then, to properly designed ontologies is to maintain internal coherence and self-consistency [3]. When done, it is then possible to relate disparate information and data to other data and to make intelligent business inferences.

So, the use of an ontology does not limit freedom. It sets the context for making connections and setting relations. And, as long as it is coherent, the “correct” ontology is the one that best captures the scope and domain at hand. Arguing for one ontology v another is wasted energy. Just get on with it.

Takeaway Message #3: There is no single “truth”, only coherence and relevant context.

Misconception: No Such Thing as an ‘Ontological Commitment’

One of the more pernicious ideas promoted by some practitioners or advocates is the idea of ‘ontological commitment.’ Though some definitions are relatively benign, such as the one offered by the Stanford Knowledge Systems Laboratory (KSL) [4], the unfortunate use of the term “commitment” implies permanence and immutability. (In fact, most definitions of this phrase affirm this interpretation.)

This is really unfortunate, as it again tends to reinforce the inaccurate analogies with brittle and inflexible relational schema.

A much better way to view ontologies is not as a “commitment,” but as a vehicle for developing a common world view within the enterprise. Under this viewpoint, ontology development is somewhat analogous to master data management (MDM) or corporate taxonomies [5]. In this broader sense, then, ontology development can become a means for developing and refining a common language within the enterprise through consensual or community processes.

For the reasons as noted above, as language or conceptual relationships or understandings change, so can the vocabulary or structural character of the ontology change. There is no “lock in”; there is no “commitment”. As long as it is coherent, the ontology can morph to reflect the scope and understandings of the current snapshot in time.

This flexibility results from the fact that the ontologies, properly constructed, can drive a generic set of tools and applications that express themselves based on the underlying structure and vocabulary within those ontologies. The ontologies can thus change at will without any adverse effects whatsoever on the applications based on them.

This data-driven aspect, as noted throughout this series, is quite different from any prior paradigm. So, under this view ontologies have considerably more focus and importance than even some of the strongest ontology advocates claim, yet paradoxically without the theoretical bloat or heaviness many purport. Like human languages, our language and concepts within ontologies change as our world and perceptions change.

Takeaway Message #4: There is no “lock-in” with ontologies; they may be modified and changed at will.

Takeaway Message #5: Like corporate taxonomies or MDM, ontologies provide a framework for enterprises to develop internally consistent common languages or vocabularies.

Takeaway Message #6: Unlike corporate taxonomies or MDM, ontologies can drive directly generic tools and applications.

Misconception: No Need for Completeness or Comprehensiveness

Ontology development is not some imperative for conceptual “truth”; rather, it is a very adaptable means for stating, testing and refining stuff. Like agile development for software, this refining approach can and should proceed incrementally. Too often ontology efforts get caught like deer in the headlights awaiting some “completeness” threshold before release.

One means to promote this approach is to tackle single datasets or data stores individually before moving on. Having a sense of the eventual scope is useful, of course. But it is also quite acceptable to only fill out those portions of the structure with data available at hand.

These observations reflect a prejudice to action and release, rather than theory. If mistakes are made, fine: simply correct them.

Takeaway Message #7: Understand the full scope, but only build out for the data in hand.

Misconception: No Need for Predicate Bloat

It is advisable to keep relationships (predicates) simple at first. Because, again, like human languages, keeping the verbs simple until fluency is gained is another best practice.

While all of us can see nuances and subtleties heading into a project, trying to accommodate those predicates (relationships) at the outset can introduce unnecessary complexity. This is not an advocacy in any way for inaccurate predicates, but perhaps to err on the side of the general and broader at first.

For organizations familiar with taxonomies, the SKOS vocabulary is a good focus, and there are some other standard starting ontologies that provide a good starting base of predicates [6]. Then, as you work with your data and its requirements, you can later expand to more sophisticated relationships.

In taking this approach you will still see immediate benefits due to the value of connected data through the Linked Data Law [7]. But, at the same time, you will be embracing a simpler language to start and then gain fluency.

Takeaway Message #8: Use simple, well-defined and documented predicates (properties or attributes).

Takeaway Message #9: You are building a common language for the enterprise; do so purposefully.

Misconception: No Need for Expensive Up-front Engineering

All of these observations lead to the conclusion that upfront ontology development need not be expensive. Any consultant selling six-figure ontology development to businesses ought to be seriously challenged. Start small and focused. Frankly, a simple spreadsheet taxonomy or quick conversion of existing XML or metadata or vocabulary standards is A-OK to get started.

Takeaway Message #10: Start small with stakeholders to build acceptance and best practices.

Takeaway Message #11: Start immediately to organize and federate existing information.

Misconception: No Need to Reinvent the Wheel

While it is true that the usefulness of ontologies as advocated by Structured Dynamics is greater than other constructs, these ontologies still just represent a more capable representation of knowledge structures that have been around in various other forms for years. For decades enterprises have created schema, taxonomies, controlled vocabularies, standards, and other knowledge structures that represent untold time, dollars and effort. It would be a waste to not fully leverage these sunk investments.

Further, many ontologies and interoperable structures also exist external to the enterprise, many open source and freely available. And, even if not all are already in proper ontological form, like internal structures these other constructs can be relatively easily leveraged and turned into ontology-ready form.

So, what we are doing with adaptive ontologies is not creating new structures or new representatiions from scratch, but leveraging the expressions of our current world views. These have been hard-earned, codified over years of effort, and are legacy expressions of the enterprise’s knowledge base.

In this vein, then, there is already much richness available to any organization upon which to embark on their ontology efforts. Use them, and gain great leverage.

Takeaway Message #12: Aggressively mine and re-use existing knowledge and structure.

Takeaway Message #13: Leverage and re-use appropriate portions of the “best” existing, external ontologies.

Misconception: No Requirement to Displace Existing Assets

Continuing in this same spirit, it is a mistake to see adaptive ontologies and the associated systems advocated by Structured Dynamics as a replacement for existing data assets. Rather, the idea and advantage is to keep data records in situ as much as possible. These are already performing investments that can be left largely as is. The role of the adaptive ontologies is to act as a federation layer that bridges across these existing assets.

This leverage of existing data assets can occur via the architecture of the system (generally Web-oriented architecture [8]) and a design of the data system and structures providing proper allocation between the ABox and TBox [1].

All of this maintaining of existing assets is aided by the ability to convert in-place data to ontology-ready RDF form. This is a separate topic in its own right and one I discuss elsewhere [9]. There is also a need to make sure that the attributes of the underlying instance records (generally, the columns within a relational table) are also properly modeled within the adaptive ontology. This is part of the best practices guidelines.

Of course, how much of the existing assets can be leveraged “as is” and what degree of modification or conversion might be necessary needs to be evaluated on a case-by-case basis. Generally, however, these mappings can be pretty straightforward and leave in place all existing hardware, software and administration procedures.

Takeaway Message #14: Leverage your existing databases as rich sources of instance records (“ABox”).

Takeaway Message #15: Explicitly design your TBox ontologies to be an interoperability layer over these existing record stores.

Takeaway Message #16: Reconcile the semantics across the enterprise’s data stores at this interoperable TBox layer.

Misconception: No Closed World Assumptions

A closed world assumption holds that any statement that is not known to be true is false. Most enterprise database and transaction systems are based on this premise. It works well where there is complete coverage of the entities within a knowledge base, such as the enumeration of all customers or all products of an enterprise.

Yet, in the real (“open”) world there is no guarantee or likelihood of complete coverage. Thus, under an open world assumption the lack of a given assertion or fact being available neither implies whether that possible assertion is true or false: it simply is not known.

An open world assumption is one of the key factors for enabing adaptive ontologies to grow incrementally. It is also the basis for enabling linkage to external (and surely incomplete) datasets.

In fact, systems designed around the open world assumption can still achieve closed world reasoning where the circumstances and completeness of the knowledge base permit. But, rather than being a logical outcome of the framework, such completeness axioms need to be explicitly stated. Thus, open world systems can achieve the same ends as closed ones where applicable, but with greater flexibility and extensibility.

Takeaway Message #17: No enterprise is an island; design according to the open world assumption.

Misconception: No Restriction to a Dedicated Priesthood

Consultants make their money and academics their reputation by often making things more obscure and jargon-laden than they need be. Ontologies — heck, even the name itself — is no exception.

But what we have laid out as general guidelines herein and their reduction to practice does not require a priesthood. Sure, there are some things to learn and some practices to follow, but these are certainly easier to understand and master than, say, a programming or scripting language. Adaptive ontologies done right can be a participatory activity within most any organization.

Some guidance and mentoring would certainly be helpful. Make sure to pick the right individuals that truly embrace these perspectives.

Also helpful would the assistance of groups skilled in team building and group participation [10].

Takeaway Message #18: Engage all knowledge stakeholders in ontology creation, review and refinement.

Takeaway Message #19: Use selected ontology engineers to help ensure consistency, but not necessarily structure.

Design for Data-driven Apps

The above addresses misconceptions related to how the market perceives current ontologies or how some advocates push the concept. But there are some unique perspectives that Structured Dynamics brings to ontology development specific to the purpose of data-driven applications. From a best practices standpoint, these considerations should also be included.

In order to properly “drive” applications and user interfaces and reports, specific design attention needs to be give to:

Linked data, and the use and accessibility of URIs as resource identifiers
Context- and instance-sensitive data display, including templates, and
Driving user interfaces via the inclusion of preferred and alternate labels in the ontology.

Of course, there are other considerations that come to bear. But these lend themselves to some rather simple checklist guidelines during ontology development and maintenance.

Takeaway Message #20: Follow some relatively straightforward best practices to gain all of the advantanges of adaptive ontologies.

This post is part of an occasional AI3 series on ontology best practices.

[1] We use the reference to “TBox” in accordance with our working definition for description logics:

"Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts."

[2] Chemicals, petroleum and pharmaceuticals are renowned for large-scale, vertical ontologies. Examples of general or upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc, BFO (Basic Formal Ontology) and UMBEL (Upper Mapping and Binding Exchange Layer). Many of the large exemplar ontology projects are funded under EU auspices; see write-ups for the 7th ICT (Information and Communications Technologies) program for the EU and prior ICT projects for more information.

[3] See, for example, my posting on When is Content Coherent? from about one year ago.

[4] See, for example, the Stanford KSL discussion on What is an Ontology? One part of that document explains ontological commitments as “agreements to use the shared vocabulary in a coherent and consistent manner,” which is benign enough. But other discussions and venues imply much more viz. the “commitment” term. This same Stanford source is also a useful for general philosophical discussions of ontologies.

[5] With respect to corporate taxonomies, see for example, Trish O’Kane, “United by a Common Language: Developing a Corporate Taxonomy“. Information Management Journal. FindArticles.com. 15 Aug, 2009. http://findarticles.com/p/articles/mi_qa3937/is_200607/ai_n17176092/.

[6] Some of the standard starting vocabularies that Structured Dynamics recommends include many of the ones listed on this useful ontology table from Freebase, and specifically include Dublin Core, Friend-Of-A-Friend (FOAF), GeoNames, SIOC, SKOS, RDF Schema, XML Schema, OWL, UMBEL, and BIBO. These are typically supplemented with domain-specific ontologies appropriate to the scope at hand.

[7] The Linked Data Law states the value of a linked data network is proportional to the square of the number of links between data objects. It is a derivative of Metcalfe's law, which states that the value of a telecommunications network is proportional to the square of the number of users of the system (n²), where the linkages between users (nodes) exist by definition. For information bases, the data objects are the nodes. Linked data works to add the connections between the nodes. This concept was first presented in ago in What is Linked Data? and then formalized in [9].

[8] In WOA, discrete functions are packaged into modular and shareable elements (services), then made available in a distributed and loosely coupled manner using Representational State Transfer. REST provides principles for how resources are defined and used with simple interfaces without additional messaging layers. REST is a foundation to the HTTP protocol and a key reason for the success and scalability of the Web.

[9] See further my posting, Structure the World.

[10] As a matter of full disclosure, Structured Dynamics does not have expertise nor strengths in these areas.

Schema.org Markup

headline:

Confronting Misconceptions with Adaptive Ontologies

alternativeHeadline:

author:

Mike Bergman

image:

description:

Ontology Best Practices for Data-driven Applications: Part 4 The earlier portions of this occasional series have set the groundwork for the role of ontologies in data-driven applications. In this part, I address many of the current misconceptions of what ontologies do or do not do. For, as practiced by Structured Dynamics, our adaptive TBox-level ontologies […]

articleBody:

see above

datePublished:

August 17, 2009

Posted:August 17, 2009

Confronting Misconceptions with Adaptive Ontologies

Ontology Best Practices for Data-driven Applications: Part 4

Misconception: No ‘Big Bang’ Needed

Misconception: No ‘One Ring to Rule Them All’

Misconception: No Such Thing as an ‘Ontological Commitment’

Misconception: No Need for Completeness or Comprehensiveness

Misconception: No Need for Predicate Bloat

Misconception: No Need for Expensive Up-front Engineering

Misconception: No Need to Reinvent the Wheel

Misconception: No Requirement to Displace Existing Assets

Misconception: No Closed World Assumptions

Misconception: No Restriction to a Dedicated Priesthood

Design for Data-driven Apps

Schema.org Markup

One thought on “Confronting Misconceptions with Adaptive Ontologies”

Leave a Reply

Main Links

Search

Categories

Calendar

Archives