Ontology Best Practices for Data-driven Applications: Part 4
The earlier portions of this occasional series have set the groundwork for the role of ontologies in data-driven applications. In this part, I address many of the current misconceptions of what ontologies do or do not do. For, as practiced by Structured Dynamics, our adaptive TBox-level ontologies [1] are definitely not your grandfather’s Oldsmobile.
To share the punch line early, these modern ontologies are fast to develop, easy to change, adaptive to new knowledge and perceptions, robust and flexible. Indeed, it is the structure and nature of these adaptive ontologies that is the heart and secret of data-driven applications.
Any knowledge worker can understand and refine the organization and relationship of information via these structures. And, most importantly, the resulting ontologies are sufficient to drive the generic applications that are based on them. Focusing on data and structure now becomes the emphasis. We can now remove prior bottlenecks arising from the need to customize applications, configure report writers, or wait for IT to generate SQL queries.
But, not all ontologies are created equally and not all practitioners explain or see them in the same way. The purpose of this Part 4 in our series is to present many of the misconceptions, offering a score of takeaway messages for how properly considered and constructed ontologies can achieve these benefits.
Misconception: No ‘Big Bang’ Needed
To be sure, there are many very large and comprehensive ontologies. Some are focused on specific applications or domains; some are general; and some are the result of large and well-funded projects [2]. I am not arguing that such efforts do not have their role and place. But when viewed as exemplars or notable cases, these complex and comprehensive ontologies can create a misconception that such a scope is an imperative of proper ontology design.
I believe quite the opposite to be true.
An incredible strength of RDF and OWL ontologies is that they can be built incrementally. So long as additions are coherent with some degree of self-consistency in terms of the world view in which they are represented, any of an ontology’s constituent concepts, predicates or entities and datasets can be added and enhanced as needed. This makes ontologies a very different cat from relational schema, which are notoriously brittle with expensive re-architecting required anytime that scope or schema change.
Enterprise consultants that advocate “big” upfront ontology development efforts are doing their clients a massive disservice. They are also cynically playing on the experience with relational schema. As soon as the marketplace begins to realize that ontologies are incredibly plastic and malleable, this huge advantage of ontologies over the relational model for data federation will ring clear.
Takeaway Message #1: Ontologies can (and should!) start small.
Takeaway Message #2: Ontologies can (and should!) grow incrementally.
Misconception: No ‘One Ring to Rule Them All’
As a practitioner, two of the most boring arguments I hear are: Ontology X is better than other ontologies and here is why; and, Use of some reference or upper ontology reduces choice and freedom. Both arguments are somewhat grounded in the ‘one ring to rule them all’ mindset — though coming from opposing perspectives — that I think fundamentally misreads the role and purpose of ontologies.
Ontologies provide an organizing context for relating disparate information together and for making meaningful inferences. Without such a framework these purposes can not be achieved. But the framework itself is a function of the world view, context and domain scope at hand. As a result, there is only context, and not some single, universal “truth.” As they say, it all depends.
The trick, then, to properly designed ontologies is to maintain internal coherence and self-consistency [3]. When done, it is then possible to relate disparate information and data to other data and to make intelligent business inferences.
So, the use of an ontology does not limit freedom. It sets the context for making connections and setting relations. And, as long as it is coherent, the “correct” ontology is the one that best captures the scope and domain at hand. Arguing for one ontology v another is wasted energy. Just get on with it.
Takeaway Message #3: There is no single “truth”, only coherence and relevant context.
Misconception: No Such Thing as an ‘Ontological Commitment’
One of the more pernicious ideas promoted by some practitioners or advocates is the idea of ‘ontological commitment.’ Though some definitions are relatively benign, such as the one offered by the Stanford Knowledge Systems Laboratory (KSL) [4], the unfortunate use of the term “commitment” implies permanence and immutability. (In fact, most definitions of this phrase affirm this interpretation.)
This is really unfortunate, as it again tends to reinforce the inaccurate analogies with brittle and inflexible relational schema.
A much better way to view ontologies is not as a “commitment,” but as a vehicle for developing a common world view within the enterprise. Under this viewpoint, ontology development is somewhat analogous to master data management (MDM) or corporate taxonomies [5]. In this broader sense, then, ontology development can become a means for developing and refining a common language within the enterprise through consensual or community processes.
For the reasons as noted above, as language or conceptual relationships or understandings change, so can the vocabulary or structural character of the ontology change. There is no “lock in”; there is no “commitment”. As long as it is coherent, the ontology can morph to reflect the scope and understandings of the current snapshot in time.
This flexibility results from the fact that the ontologies, properly constructed, can drive a generic set of tools and applications that express themselves based on the underlying structure and vocabulary within those ontologies. The ontologies can thus change at will without any adverse effects whatsoever on the applications based on them.
This data-driven aspect, as noted throughout this series, is quite different from any prior paradigm. So, under this view ontologies have considerably more focus and importance than even some of the strongest ontology advocates claim, yet paradoxically without the theoretical bloat or heaviness many purport. Like human languages, our language and concepts within ontologies change as our world and perceptions change.
Takeaway Message #4: There is no “lock-in” with ontologies; they may be modified and changed at will.
Takeaway Message #5: Like corporate taxonomies or MDM, ontologies provide a framework for enterprises to develop internally consistent common languages or vocabularies.
Takeaway Message #6: Unlike corporate taxonomies or MDM, ontologies can drive directly generic tools and applications.
Misconception: No Need for Completeness or Comprehensiveness
Ontology development is not some imperative for conceptual “truth”; rather, it is a very adaptable means for stating, testing and refining stuff. Like agile development for software, this refining approach can and should proceed incrementally. Too often ontology efforts get caught like deer in the headlights awaiting some “completeness” threshold before release.
One means to promote this approach is to tackle single datasets or data stores individually before moving on. Having a sense of the eventual scope is useful, of course. But it is also quite acceptable to only fill out those portions of the structure with data available at hand.
These observations reflect a prejudice to action and release, rather than theory. If mistakes are made, fine: simply correct them.
Takeaway Message #7: Understand the full scope, but only build out for the data in hand.
Misconception: No Need for Predicate Bloat
It is advisable to keep relationships (predicates) simple at first. Because, again, like human languages, keeping the verbs simple until fluency is gained is another best practice.
While all of us can see nuances and subtleties heading into a project, trying to accommodate those predicates (relationships) at the outset can introduce unnecessary complexity. This is not an advocacy in any way for inaccurate predicates, but perhaps to err on the side of the general and broader at first.
For organizations familiar with taxonomies, the SKOS vocabulary is a good focus, and there are some other standard starting ontologies that provide a good starting base of predicates [6]. Then, as you work with your data and its requirements, you can later expand to more sophisticated relationships.
In taking this approach you will still see immediate benefits due to the value of connected data through the Linked Data Law [7]. But, at the same time, you will be embracing a simpler language to start and then gain fluency.
Takeaway Message #8: Use simple, well-defined and documented predicates (properties or attributes).
Takeaway Message #9: You are building a common language for the enterprise; do so purposefully.
Misconception: No Need for Expensive Up-front Engineering
All of these observations lead to the conclusion that upfront ontology development need not be expensive. Any consultant selling six-figure ontology development to businesses ought to be seriously challenged. Start small and focused. Frankly, a simple spreadsheet taxonomy or quick conversion of existing XML or metadata or vocabulary standards is A-OK to get started.
Takeaway Message #10: Start small with stakeholders to build acceptance and best practices.
Takeaway Message #11: Start immediately to organize and federate existing information.
Misconception: No Need to Reinvent the Wheel
While it is true that the usefulness of ontologies as advocated by Structured Dynamics is greater than other constructs, these ontologies still just represent a more capable representation of knowledge structures that have been around in various other forms for years. For decades enterprises have created schema, taxonomies, controlled vocabularies, standards, and other knowledge structures that represent untold time, dollars and effort. It would be a waste to not fully leverage these sunk investments.
Further, many ontologies and interoperable structures also exist external to the enterprise, many open source and freely available. And, even if not all are already in proper ontological form, like internal structures these other constructs can be relatively easily leveraged and turned into ontology-ready form.
So, what we are doing with adaptive ontologies is not creating new structures or new representatiions from scratch, but leveraging the expressions of our current world views. These have been hard-earned, codified over years of effort, and are legacy expressions of the enterprise’s knowledge base.
In this vein, then, there is already much richness available to any organization upon which to embark on their ontology efforts. Use them, and gain great leverage.
Takeaway Message #12: Aggressively mine and re-use existing knowledge and structure.
Takeaway Message #13: Leverage and re-use appropriate portions of the “best” existing, external ontologies.
Misconception: No Requirement to Displace Existing Assets
Continuing in this same spirit, it is a mistake to see adaptive ontologies and the associated systems advocated by Structured Dynamics as a replacement for existing data assets. Rather, the idea and advantage is to keep data records in situ as much as possible. These are already performing investments that can be left largely as is. The role of the adaptive ontologies is to act as a federation layer that bridges across these existing assets.
This leverage of existing data assets can occur via the architecture of the system (generally Web-oriented architecture [8]) and a design of the data system and structures providing proper allocation between the ABox and TBox [1].
All of this maintaining of existing assets is aided by the ability to convert in-place data to ontology-ready RDF form. This is a separate topic in its own right and one I discuss elsewhere [9]. There is also a need to make sure that the attributes of the underlying instance records (generally, the columns within a relational table) are also properly modeled within the adaptive ontology. This is part of the best practices guidelines.
Of course, how much of the existing assets can be leveraged “as is” and what degree of modification or conversion might be necessary needs to be evaluated on a case-by-case basis. Generally, however, these mappings can be pretty straightforward and leave in place all existing hardware, software and administration procedures.
Takeaway Message #14: Leverage your existing databases as rich sources of instance records (“ABox”).
Takeaway Message #15: Explicitly design your TBox ontologies to be an interoperability layer over these existing record stores.
Takeaway Message #16: Reconcile the semantics across the enterprise’s data stores at this interoperable TBox layer.
Misconception: No Closed World Assumptions
A closed world assumption holds that any statement that is not known to be true is false. Most enterprise database and transaction systems are based on this premise. It works well where there is complete coverage of the entities within a knowledge base, such as the enumeration of all customers or all products of an enterprise.
Yet, in the real (“open”) world there is no guarantee or likelihood of complete coverage. Thus, under an open world assumption the lack of a given assertion or fact being available neither implies whether that possible assertion is true or false: it simply is not known.
An open world assumption is one of the key factors for enabing adaptive ontologies to grow incrementally. It is also the basis for enabling linkage to external (and surely incomplete) datasets.
In fact, systems designed around the open world assumption can still achieve closed world reasoning where the circumstances and completeness of the knowledge base permit. But, rather than being a logical outcome of the framework, such completeness axioms need to be explicitly stated. Thus, open world systems can achieve the same ends as closed ones where applicable, but with greater flexibility and extensibility.
Takeaway Message #17: No enterprise is an island; design according to the open world assumption.
Misconception: No Restriction to a Dedicated Priesthood
Consultants make their money and academics their reputation by often making things more obscure and jargon-laden than they need be. Ontologies — heck, even the name itself — is no exception.
But what we have laid out as general guidelines herein and their reduction to practice does not require a priesthood. Sure, there are some things to learn and some practices to follow, but these are certainly easier to understand and master than, say, a programming or scripting language. Adaptive ontologies done right can be a participatory activity within most any organization.
Some guidance and mentoring would certainly be helpful. Make sure to pick the right individuals that truly embrace these perspectives.
Also helpful would the assistance of groups skilled in team building and group participation [10].
Takeaway Message #18: Engage all knowledge stakeholders in ontology creation, review and refinement.
Takeaway Message #19: Use selected ontology engineers to help ensure consistency, but not necessarily structure.
Design for Data-driven Apps
The above addresses misconceptions related to how the market perceives current ontologies or how some advocates push the concept. But there are some unique perspectives that Structured Dynamics brings to ontology development specific to the purpose of data-driven applications. From a best practices standpoint, these considerations should also be included.
In order to properly “drive” applications and user interfaces and reports, specific design attention needs to be give to:
- Linked data, and the use and accessibility of URIs as resource identifiers
- Context- and instance-sensitive data display, including templates, and
- Driving user interfaces via the inclusion of preferred and alternate labels in the ontology.
Of course, there are other considerations that come to bear. But these lend themselves to some rather simple checklist guidelines during ontology development and maintenance.
Takeaway Message #20: Follow some relatively straightforward best practices to gain all of the advantanges of adaptive ontologies.
Hi Mike – just a superb post with ‘spot on’ Takeaway Messages from 1 – 20. Well done !