Posted:February 5, 2020

KBpedia Best PracticesFirst in an Occasional Series of KBpedia Best Practices

One of my favorite sayings regarding the semantic Web is from James Hendler, now a professor and program director at RPI, but a longstanding contributor to the semantic space, including, among other notable contributions, as a co-author on the seminal paper, “The Semantic Web,” in Scientific American in 2001. His statement was “A little semantics goes a long way,” and I whoeheartedly support that view. I previously gave a shoutout to this saying in my book [1]. In this ‘best practice’ note regarding KBpedia and creating and maintaining knowledge graphs, I want to point out two simple techniques that can immediately benefit your own knowledge representation efforts.

The two items I want to highlight are the use of ‘semsets’ (similar to the synsets used by WordNet) and emphasizing subsumption hierarchies in your knowledge graph design. The actual practice of these items involves, as much as anything, embracing a mindset that is attentive to the twin ideas of semantics and inference.

With this article, I’m also pleased to introduce an occasional series on best practices when creating, applying or maintaining knowledge graphs, using KBpedia as the reference knowledge system. I will be presenting this series throughout 2020 coincident with some exciting expansions and application of the system. These ‘best practice’ articles are not intended to be detailed pieces, my normal practice. Rather, I try to present a brief overview of the item, and then describe the process and benefits of applying it.


The fundamental premise of semantic technologies is “things, not strings.” Labels are only the pointers to a thing, and things may be referred to in many different ways, including, of course, many different languages. Is your ‘happy’ the same as my ‘glad’? Examples abound, as language is an ambiguous affair with meaning often dependent on context.

A single term can refer to different things and a single thing can be (and is!!) referred to by many different labels. The lexical database of WordNet helped attack this problem decades ago, by creating what it called ‘synsets‘ to aggregate the multiple ways (terms) by which a given thing may be referred. The portmanteau of this name comes from the ‘synset’ being an aggregation of synonyms. In keeping with Charles Peirce‘s framing of indexes to a given thing as anything which points to or draws attention to it, we have broadened the idea to include any term or phrase that points to a given thing. This is a broadened semantic sense, so we have given this aggregation of terms the name ‘semset‘, a portmanteau using semantics. Elsewhere [2], I have very broadly defined a semset as including: synonyms, abbreviations, acronyms, aliases, argot, buzzwords, cognomens, derogatives, diminutives, epithets, hypocorisms, idioms, jargon, lingo, metonyms, misspellings, nicknames, non-standard terms (e.g., Twitter), pejoratives, pen names, pseudonyms, redirects, slang, sobriquets, stage names, or synsets. Note this listing is itself a semset for semset.

So, the best practice is this. Whenever adding a new relation or entity or concept to a knowledge graph, give it as broad of an enumeration of a semset as you can assemble with reasonable effort [3]. Redirects in Wikipedia and altLabels from Wikidata are two useful starting sources. (You may need to discover other sources for specific domains.) You can see these by the altLabels within the KBpedia knowledge base; see, as examples, abominable snowman, bird, or cake. altLabels are part of the many useful constructs in the SKOS (Simple Knowledge Organization System) RDF language, another best practice to apply to your knowledge graphs.

Then, when querying or retrieving data, one can specify standard prefLabels alone (the single, canonical identifier for the entity) for narrow retrievals, or greatly broaden the query by including the altLabels. In our own deployments, we also often include a standard text search engine such as Lucene or Elasticsearch for such retrievals, which opens up even more control and flexibility. Semsets are an easily deployed way to bridge your semantics from ‘strings’ to ‘things’.

Subsumption Hierarchies

Subsumption hierarchies simply mean that a parent concept embraces (or ‘subsumes’) child concepts [4]. The subsumption relationship can be one of intensionality, extensionality, inheritance, or mereology. In intensionality, the child has attributes embraced by the parent, such as a bear having hair like other mammals. In extensionality, class members belong to an enumerable group, as in lions and tigers and bears all being mammals. In inheritance, an actual child is subsumed under a parent. In mereology, a composite thing like a car engine has parts such as pistons, rods, or timing device. In the W3C standards of RDF or OWL, what we use in KBpedia to capture our semantic knowledge representations, the ‘class’ construct and its related properties are used to express subsumption hierarchies.

The ‘hierarchy’ idea arises from establishing a tree scaffolding of linked items. In this way, subsets of your knowledge graph resemble taxonomies (or tree-like structures) that proceed from the most general at the top (the ‘root’) to most specific at the bottom (the ‘leaf’). Different types of subsumption relationships are best represented by their own trees. Using such subsumption relations do not preclude other connections or relations in your knowledge graph.

When consistently and logically constructed, a practice that can be learned and can be tested, subsumption hierarchies enable one to infer class memberships. For instance, using the ‘mammal’ example means we can infer a bear is a mammal without so specifying, or, alternatively, we can discover that lions and tigers are also mammals if we know that a bear is a mammal. Subsumption hierarchies are an efficient way to specify group memberships, and a powerful way to overcome imprecise query specifications or to discover implicit relationships.

Using semsets and subsumption hierarchies are easy techniques for incorporating semantics into your knowledge graphs. These two simple techniques (among a few others) readily demonstrate the truth of Hendler’s “a little semantics goes a long way” in improving your knowledge representations.

NOTE: This is the first article in an occasional series about KBpedia best practices to coincide with new advances, uses, and applications of KBpedia throughout 2020.

[1] Bergman, M. K. Building Out the System. in A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (ed. Bergman, M. K.) 273–294 (Springer International Publishing, 2018). doi:10.1007/978-3-319-98092-8_13
[2] See the Glossary in [1].
[3] SKOS also provides a property for capturing misspellings (hiddenLabel), which is a best practice to include, and the W3C standards allow for internationalization of all labels by use of the language tag for labels.
[4] In actual language use, one can say a parent ‘subsumes’ a child. Alternatively, one can say a child ‘is subsumed by’ or ‘is subsumed under’ the parent.

Posted by AI3's author, Mike Bergman Posted on February 5, 2020 at 12:22 pm in KBpedia Best Practices | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:December 15, 2019

'Dazzle' image by Shigeki Matsuyama, as found on The Choice Between Class and Instance Depends on Your Point of View

Readers of this blog know that I use the open-source Protégé ontology editor to build and maintain our knowledge graphs. Besides the usefulness of the tool, there is also an informative user mail list that discusses the Protégé application and modeling choices that may arise when using it [1]. A recent thread, ‘How to Relate Different Classes,’ is but one example of an issue one might encounter on this list [2]. As one of the frequent commenters on the list, Michael DeBellis, noted about this thread [3], “I think this is a common issue with modeling, what to make a class and what to make an instance.”

Michael is indeed correct that the distinction between classes and instances is a frequent topic, one that I have touched upon in various ways through the years. The liveliness of this recent thread convinced me it would be helpful to pull together how one chooses to use a class or instance in their knowledge graphs. The topic is also critical to the questions of knowledge representation and interoperability, two key uses for knowledge graphs. So, let’s look at this question of class v instance from the aspects of the nature of knowledge, modeling, and practical considerations.

Epistemological Issues

Epistemology is simply the study of the nature of knowledge. It gets at the questions of what is knowledge? what is belief? what is justification for action? how can we acquire and validate knowledge? is knowledge infallible? are there different kinds of knowledge?

Charles Sanders Peirce and his theory of signs was intimately related to these questions, as well to how we express and convey knowledge to others. Since, as humans, we communicate through our language as symbols, what we mean and intend to convey when expressing these symbols is also of utmost importance to how we understand and refine knowledge as a community process. My recent book has a number of chapters mostly if not exclusively related to these topics [4,5]. Many of the points in this section are drawn from these chapters.

We can illustrate some of the tricky epistemology issues associated with the nature of language using the example of the ‘toucan’ bird often used in discussions of semantic technologies. When we see something, or point to something, or describe something in words, or think of something, we are, of course, using proxies in some manner for the actual thing. If the something is a ‘toucan’ bird, that bird does not reside in our head when we think of it. The ‘it’ of the toucan is a ‘re-presentation’ of the real, dynamic toucan. The representation of something is never the actual something but is itself another thing — that is, a sign — that conveys to us the idea of the real something. In our daily thinking we rarely make this distinction. (For which we should be thankful, otherwise, our flow of thoughts would be wholly jangled.) Nonetheless, the difference is real, and we should be conscious of it when we are trying to be precise in representing knowledge.

How we ‘re-present’ something is also not uniform or consistent. For the toucan bird, perhaps we make caw-caw bird noises or flap our arms to indicate we are referring to a bird. Perhaps we point at the bird. Alternatively, perhaps we show a picture of a toucan or read or say aloud the word “toucan” or see the word embedded in a sentence or paragraph, as in this one, that also provides additional context. How quickly or accurately we grasp the idea of ‘toucan’ is partly a function of how closely associated one of these accompanying signs may be to the idea of toucan bird. Probably all of us would agree that arm flapping is not nearly as useful as a movie of a toucan in flight or seeing one scolding from a tree branch to convey the ‘toucan’ concept.

The question of what we know and how we know it fascinated Peirce over the course of his intellectual life. He probed this relationship between the real or actual thing, the object, with how that thing is represented and understood. (Also understand that Peirce’s concept of the object may embrace individual or particular things to classifications or generalities.) This triadic relationship between immediate object, representation, and interpretation forms a sign and is the basis for the process of sign-making and understanding, what Peirce called semiosis [6].

Even the idea of the object, in this case, the toucan bird, is not necessarily so simple. The real thing itself, an actual toucan bird, has characters and attributes. How do we ‘know’ this real thing? Bees, like many insects, may perceive different coloration for the toucan because they can see in the ultraviolet spectrum, while we do not. On the other hand, most mammals in the rainforest would also not perceive the reds and oranges of the toucan’s feathers, which we readily see. The ‘toucan’ object is thus perceived differently by bees, humans, and other animals. Beyond physical attributes, this actual toucan may be healthy, happy, or sad, nuances beyond our perception that only some fellow toucans may perceive. Though humans, through our ingenuity, may create devices or technologies that expand our standard sensory capabilities to make up for some of these perceptual gaps, our technology will never make our knowledge fully complete. Given limits to perceptions and the information we have on hand, we can never completely capture the nature of the dynamic object, the real toucan bird.

Things get murkier still when we try to convey to others what we mean by the ‘toucan’ bird. For example, when we inspect what might be a description of a toucan on Wikipedia, we see that the term more broadly represents the family of Ramphastidae, which contains five genera and forty different species. The picture we use to refer to ‘toucan’ may be, say, that of the keel-billed toucan (Ramphastos sulfuratus).Keel-billed Toucan However, if we view the images of a list of toucan species, we see just how physically divergent various toucans are from one another. Across all species, average sizes vary by more than a factor of three with great variation in bill sizes, coloration, and range. Further, if I assert that the picture of the toucan is that of my pet keel-billed toucan, Pretty Bird, then we can also understand that this representation is for a specific individual bird, and not the physical keel-billed toucan species as a whole. The point is not a lesson on toucans, but an affirmation that distinctions between what we think we may be describing occurs over multiple levels. The meaning of what we call a ‘toucan’ bird is not embodied in its label or even its name, but in the accompanying referential information that places the referent into context.

If, in our knowledge graph we intend to convey all of these broader considerations, then we are best defining ‘toucan’ as a class. On the other hand, if we are discussing the individual Pretty Bird toucan or are describing ‘toucan’ and average attributes in relation to a wider context of many types of other birds including eagles and wrens, then perhaps treating the ‘toucan’ as an instance is the better approach. Context and what we intend to convey are essential components to how we need to represent our knowledge. Whether something is an ‘instance’ or a ‘class’ is but the first of the distinctions we need to convey, and those may often vary by context.

Modeling Issues

Because these principles are universal, let’s shift our example to ‘truck’ [7]. In the English language, one of the ways we distinguish between an instance and a class is guided by the singular and plural (though English is notorious for its many different plural forms and exceptions). The attributes we assign to a term differ whether we are discussing ‘trucks’, which we think about more in terms of transport purpose, brands, model, and model year; or are discussing a ‘truck’, which has a particular driver, engine, transmission and mileage. Here is one way to look at such ‘truck’ distinctions (for this discussion, we’ll skip the ABox and TBox, another modeling topic importantly using description logics [8]):

Different Views of 'Truck'

To accommodate the twin views of class and individual, we could double the number of entities in our knowledge graphs by separately modeling single instances or plural classes, but that rapidly balloons the size of our graphs. What is more efficient is an approach that would enable us to combine both the organization of concepts and their relations and set members with the description and characterization of these concepts as things unto themselves. As our examples of ‘toucans’ and ‘trucks’ show, this dual treatment is a natural and common way to refer to things for most any domain of interest. Further, class and sub-class relationships enable us to construct tree-like hierarchies over which we can infer or inherit attributes and characteristics between parents and children.

For modeling purposes, we also want our graphs to be decidable, which importantly means we can reason over our knowledge graphs with an expectation that we can get definitive answers (even if the answer is “don’t know”) in a reasonable computation time. It is for these reasons that we have chosen the standard OWL 2 as the representation language for our knowledge graphs (in addition to other benefits [9]). A proper OWL 2 knowledge graph is decidable, and it handles both class and instance views using the metamodeling technique of “punning” [10]. Objects in OWL 2 are named with IRIs (internationalized Web links). The trick with “punning” is to evaluate the object based on how it is used contextually; the IRI is shared but its referent may be viewed as either a class or instance based on context. Any entity declared as a class and with an asserted object or data property is punned. Thus, objects used both as concepts (classes) and individuals (instances) are allowed and standard OWL 2 reasoners may be used against them.

Other Practical Issues

We’ve already discussed context, inference, and decidability, but I thought Igor Toujilov highlighted another important benefit in the mail thread of using class over instance declarations in a knowledge graph. The example he provided was based on drug development [11]:

However from my point of view (software engineering), many modern drugs are developed as a specialisation of existing drugs, i.e. by bringing new features to existing drugs. So, some new drug can be considered as a subclass of an existing drug. This is similar to object-orientated design in software: to bring new features, establish a subclass and implement it.

For example, methylphenidate can be considered as a superclass of Ritalin. If an earlier version of your ontology represents methylphenidate as an individual, then it would be difficult to represent Ritalin in later versions without breaking backward compatibility with existing interoperable applications.

This example shows that the preferable approach in ontology development is: use classes instead of individuals, if there is any chance you would need subclasses in the future.

Since knowledge is constantly dynamic and growing, it would seem prudent advice to allow for expansion of the things in your knowledge graph. Classes are the better choice in this instance (pun intended).

Like any language, there is a trade-off in OWL 2 between expressivity and reasoning efficiency [12]. Some prefer a less-constrained RDF and RDFS construct for their knowledge graphs. This approach allows virtually any statement to be asserted and is a least-common denominator for dealing with data encountered in the wild. However, one loses the punning and decidability advantages of OWL 2, and has a less-powerful framework for staging training sets and corpora for machine learning, another key motivation for our own knowledge graphs.

One could also choose a more powerful modeling language such as Datalog or Common Logic to gain the advantages of OWL 2, plus more. We have nothing critical to say about making such a choice. For our use cases, though, we do like the broader use and tools afforded by the use of OWL 2 and other W3C standards. Finding your own ‘sweet spot’ means understanding some of these knowledge representation trade-offs in context with your anticipated applications.

[2] Protégé user email list, ‘How to Relate Different Classes’,, Nov 9, 2019.
[4] Bergman, M. K. Information, Knowledge, Representation. in A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (ed. Bergman, M. K.) 15–42 (Springer International Publishing, 2018). doi:10.1007/978-3-319-98092-8_2.
[5] Bergman, M. K. A KR Terminology. in A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce (ed. Bergman, M. K.) 129–149 (Springer International Publishing, 2018). doi:10.1007/978-3-319-98092-8_7.
[6] Peirce actually spelled it “semeiosis.” While it is true that other philosophers such as Ferdinand de Saussure also employed the shorter term “semiosis.” I also use this more common term due to greater familiarity.
[7] Bergman, M. K. Metamodeling in Domain Ontologies. AI3:::Adaptive Information (2010).
[8] See, for example, my four-part series on description logics, beginning with Bergman, M. K. Making Linked Data Reasonable using Description Logics, Part 1, AI3:::Adaptive Information (2009).
[9] See Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter Patel-Schneider and Ulrike Sattler, 2008. “OWL2: The Next Step for OWL,” see; and also see the OWL 2 Quick Reference Guide by the W3C, which provides a brief guide to the constructs of OWL 2, noting the changes from OWL 1.
[10] “Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Truck and the individual Truck as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see
[12] OWL has historically been described as trying to find the proper tradeoff between expressive power and efficient reasoning support. See, for example, Grigoris Antoniou and Frank van Harmelen, 2003. “Web Ontology Language: OWL,” in S. Staab and R. Studer, eds., Handbook on Ontologies in Information Systems, Springer-Verlag, pp. 76-92. See

Posted by AI3's author, Mike Bergman Posted on December 15, 2019 at 11:59 pm in Ontology Best Practices, Peircean Principles | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:December 4, 2019

KBpediaVersion 2.20 of the Knowledge Graph Now Prepped for Release on Public Repositories

Fred Giasson and I, as co-editors, are pleased to announce today the release of version 2.20 of the open-source KBpedia system. KBpedia is a knowledge graph that provides an overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata,, DBpedia, GeoNames, and OpenCyc. KBpedia contains more than 53,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases.

We are preparing to register KBpedia on many public repository sites, and we wanted to make sure quality was a high as possible as we begin this process. Since KBpedia is a system built from many constituent knowledge bases, duplicates and inconsistencies can arise when combining them. The rationale for this release was to conduct a comprehensive manual review to identify and remove most of these issues.

We made about 10,000 changes in this newest release. The major changes we made to KBpedia resulting from this inspection include:

  • Removal of about 2,000 reference concepts (RCs) and their mappings and definitions pertaining to individual plant and animal species, which was an imbalance in relation to the other generic RCs in the system;
  • Manual inspection and fixes to the 70 or so typologies (for instance, Animals or Facilities) that are used to cluster the RCs into logical groupings;
  • Removal of references to UMBEL, one of KBpedia’s earlier constituent knowledge bases, due to retirement of the UMBEL system;
  • Fixes due to user comments and suggestions since the prior release of version 2.10 in April 2019; and
  • Adding some select new RCs in order to improve the connectivity and fill gaps with the earlier version.

Without a doubt this is now the cleanest and highest quality release for the knowledge graph. We are now in position to extend the system to new mappings, which will be the focus of future releases. (Expect the next after the first of the year.) The number and structure of KBpedia’s typologies remain unchanged from prior versions. The number of RCs now stands at 53,465, smaller than the 55,301 reference concepts in the prior version.

Besides combining the six major public knowledge bases of Wikipedia, Wikidata,, DBpedia, GeoNames, and OpenCyc, KBpedia includes mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is sponsored by Cognonto Corporation.

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates (or relations) based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.) See further the Github site for further downloads. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:October 16, 2019

AI3 Pulse

” . . . in the next two-three years all mid-range and high-end chipsets [on smartphones] will get enough power to run the vast majority of standard deep learning models developed by the research community and industry. This, in turn, will result in even more AI projects targeting mobile devices as the main platform for machine learning model deployment.”

The authors, most from leading smartphone providers, note AI is already used in selected smartphones:

“Among the most popular tasks are different computer vision problems like image classification, image enhancement, image super-resolution, bokeh simulation, object tracking, optical character recognition, face detection and recognition, augmented reality, etc. Another important group of tasks running on mobile devices is related to various NLP (Natural Language Processing) problems, such as natural language translation, sentence completion, sentence sentiment analysis, voice assistants and interactive chatbots. Additionally, many tasks deal with time series processing, e.g., human activity recognition, gesture recognition, sleep monitoring, adaptive power management, music tracking and classification.” (inline reference numbers removed)

Expect to see greater ubiquity and deeper applications.

Ignatov, A. et al. AI Benchmark: All About Deep Learning on Smartphones in 2019. arXiv:1910.06663 [cs] 1–19 (2019).

Posted by AI3's author, Mike Bergman Posted on October 16, 2019 at 10:59 am in Artificial Intelligence, Software Development | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:September 11, 2019

Dynamic Apps with KGs and OntologiesA Refinement of What We Call ODapps (Ontology-driven Applications)

In a recent article about knowledge graphs I noted that I tend to use the KG term interchangeably with the term ‘ontology‘. While this interchangeability is generally true when ontologies are used to model instance and class knowledge, in other words for knowledge representation (KR), it does overlook important cases when ontologies are themselves a specification for aspects such as access control, applications, or user interfaces. In these cases the ontology is less related to knowledge and more related to specifications or control. In such cases it is probably best to retain the distinction of an ontology from a knowledge graph (which I tend to think of as more oriented to content). I elaborate further on this distinction in this article.

What brought this distinction to mind was a recent post by Bob DuCharme on custom HTML forms to drive back-end SPARQL queries. The example Bob uses is getting a listing of cocktails from Wikidata given a specified ingredient. The example he provides uses Perl for a CGI (Common Gateway Interface) script. Bob has discussed generic SPARQL queries before; he features many useful Python examples in his excellent SPARQL book [1].

The basic idea is to provide values for variables entered via a Web page form to complete a patterned SPARQL query (SPARQL is the query language for RDF). The example Bob uses is to have the user enter a cocktail ingredient, which then returns all of the cocktails listed on Wikidata that contain that ingredient. The advantage of the idea is that users need know nothing about SPARQL or how to form a proper SPARQL query. By simply entering in missing information on a Web form or making other Web form choices (such as picking from a list or a radiobutton), all of the heavy lifting is done by the patterned SPARQL script in the background. Letting the Web forms provide the values for SPARQL variables is the key to the method.

We use this idea aggressively on, for example, our KBpedia Web site. By picking a search term from an auto-completed search listing [2] or picking a live link from that same page [3], we are able to re-use a fixed set of SPARQL query patterns to drive simple Web page templates. In our case, we use JavaScript to control the display and canvas and to invoke Clojure scripts that generate the SPARQL queries. (Over the years we have also used PHP and JavaScript directly to generate these queries. The point, as is made by DuCharme, is most any scripting language may be used for the backend.) You may inspect any of the sub-pages under the ‘Knowledge Graph‘ section on the site by using ‘View Page Source’. Sample Clojure code is also available for inspection to see how we have implemented the approach [4].

Ontology-driven Apps

This basic idea of patterned SPARQL queries forms the baseline for what we have been calling data-driven applications [5] for more than 10 years, when we first began experimenting with the approach and using it in early customer engagements. And our embrace of the idea is not the first. For example, in 1998, more than a decade before our own efforts, Guarino [6] was already talking of “ontology-driven” information systems and the use of ontologies for user interfaces. Still, ten years later, though Uschold was noting the same prospects, his survey of advances to that point showed little in actual development of ontologies “driving” applications [7].

It was roughly at that time that our own efforts began. One of our first realizations was that dynamic retrieval and presentation of data on a Web page only began the process. With the Web page as the medium of interaction, the idea of using interfaces to manage data became concrete. By organizing information into datasets and setting profiles for access and CRUD (create – read – update – delete) rights, an effective environment for data sharing and federation is established. We saw that we could abstract the complexity of the languages and specifications (SPARQL, RDF, and OWL) into the background, letting developers write the backend scripts, while letting the users and subject matter experts deal with updating, selecting and managing the content via Web front-ends.

Today, most approaches to semantic technologies and ontologies are still, unfortunately, rather static and fixed. Separate applications or IDEs are used to write and manage the ontologies. The ontologies are not generally subject to continuous upgrades and refinements, and end-users are the recipients, not the ‘drivers’ of the systems. But our early efforts showed how we could democratize this process, making updates and interactions dynamic.

With the embrace of CRUD, we also needed dynamic ways for changes made to the system — now codified and maintained in ontologies — to be reflected back to the user interfaces. We saw that a layer of specific Web services could both submit and query information to the ontology, and present those changes dynamically to scripts within the HTML user interfaces. (We also saw that access control to both data and applications needed to be imposed for enterprise uses, functions that can also be mediated by ontologies. Those topics are not discussed further here, but we have documented elsewhere [8]). Because the user interface was becoming the medium of interaction, it was also apparent that we needed to expand our use of labels in the ontologies. Thus, besides standard SKOS concepts like altLabels for node synonyms or prefLabels for preferred node labels, we also needed to accommodate labels for tooltips and labels that appear as titles or instructions on forms on user interfaces.

Once this rubicon of dynamic interfaces driven by ontologies is crossed, many new opportunities come to the fore. One opportunity, based on the idea of patterned information, is that different information in the ontology may lend itself to different display or visualization. For example, all location information may be displayed on a map as points, regions, or paths. Or, people and many objects may warrant displaying a picture if available. Or, numeric values over similar dimensions may lend themselves to charting. Or, ordered or unordered lists may warrant a listing display, or, when characterized by numeric values, by pie charts or other chart types.

These realizations led us to create a series of display and visualization components, the invoking of which may be triggered by the datatypes coming back in a results set initiated by a SPARQL query. The widget code for these display and visualization options may be served up by Web services in response to the characteristics in the results streams in a similar way we can serve up filtering, searching, browsing, import/export, or other functional widgets. In other words, the nature of the information in the ontology can inform what functions — including visualization — we can perform with a given results stream. (See, for example, any of the displays such as charts or maps for the Peg community indicator system built with our design for the United Way of Winnipeg.)

Another opportunity arises from the idea of a data record coming back in a results set. We see, for example, how the so-called ‘infoboxes’ in Wikipedia or on a Google search results page show us a suite of data attributes for a given entity. We see ‘people’ entities characterized by birth, death, parents, children, country of origin, occupation, and such. We see ‘automobile’ entities characterized by body type, brand, horsepower, year built, etc. These kinds of characterizations are patterned, too, and can begin to be organized into hierarchies and types.

Because of this patterned, structured nature of entity types, we can generalize our data display templates further. What if we detect our instance represents a camera but do not have a display template specific to cameras? Well, the ontology and simple inferencing can tell us that cameras are a form of digital or optical products, which more generally are part of a product concept, which more generally is a form of a human-made artifact, or similar. However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:

Digital Camera
SLR Digital Camera
Olympus Evolt E520

At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.

This design is meant to provide placeholders for any ‘thing’ in any domain, while also providing the latitude to tailor and customize to every ‘thing’ in the domain. By tracing this inferencing chain from the specific to the more general we can ‘fall back’ until a somewhat OK display template is discovered, even in the absence of the better and more specific one. Then, if we find we are trying to display information on cameras frequently, we only need take one of the more general, parent templates and specifically modify it for the desired camera attributes. We also keep presentation separate from data so that the styling and presentation mode of these templates is also freely modifiable.

Coming to a Generalized Understanding

Within a couple of years of first working with this approach we came to have a more generalized understanding of what we call ‘ODapps’ [9]. We modularized the ontologies to separate the information (what is now called the ‘knowledge graph’) from the specifications of the semantic components. We also enhanced the label constructs in the KG to handle user interface labels and related. I have slightly updated the workflow we showed for this process back in 2011:

Dynamic Apps Animation

(click for full size)

The basic process begins when the user interacts with various semantic components embedded in the layout of the Web page. Once the user interacts with these various components, new queries are generated (most often as SPARQL queries) in the background to the various Web services endpoints, which are specific to either management or display functions. The first consequence of the query is to generate a results set of data from the knowledge graph. At the same time, the datatypes of the results inform a components ontology that produces a schema useful to the display widgets. This schema constitutes the formal instructions to the semantic components on the Web page. When this schema is combined with the results set data, the new instructions for the semantic components on the Web page are complete. Here is an example schema:

(click for full size)

These instructions are then presented to the various semantic components, and determine which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations. In this way, the user interactions may act as a form of recorder for later automatic playback of the interaction choices.

A New Dynamic Paradigm for User Apps

ODapps are thus a balanced abstraction within the framework of canonical architectures, data models and data structures. Under this design, software developer time is focused on creating the patterned scripts that underlie the Web page layouts, developing the semantic component widgets, and writing the functional Web services. Users and subject matter experts can concentrate on doing analysis and keeping the ontologies and knowledge graph accurate and up-to-date. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures.

This new paradigm began with the simple observation that Bob DuCharme demonstrates that we can use SPARQL queries driven by users in a Web page form to get relevant information back to the user. We have taken this simple premise and have — over the past nearly ten years — expanded it to be a more generalized approach to ontology-driven apps, or ODapps. We have also continued to talk about how we may modularize our ontology architectures for a breadth of enterprise purposes [10].

Yet, while we have prototyped these capabilities and have demonstrated them within our own customer engagements, this general approach is by no means common.

Perhaps now, with the resurgent interest in knowledge graphs, we can finally see our way clear to a suite of semantic approaches that promise a revolution in software design practices and the democratization of information technologies. Through the ODapp approach, we believe that customers can see:

  • Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly
  • Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse
  • Increased reliability — formal constructs with automation reduces human error
  • Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain.

As I have noted before, these first four items are similar to the benefits that may accrue from other advanced software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold [7] also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:

  • Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking
  • Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and
  • Agility/flexibility — ontology-driven information systems are more flexible, because you can more easily and reliably make changes in the model than in code.

So, as practiced today, most uses of ontologies are for knowledge representation, and in that sense we may use the terms ‘knowledge graph’ and ‘ontologies’ more-or-less interchangeably. However, taken to its logical extent and embraced for driving software specifications, we see the term of ‘ontology’ as much more general and powerful. Like I have said before, the meaning of these terms is intimately related to their context of use.

[1] Bob DuCharme, Learning SPARQL: Querying and Updating with SPARQL 1.1, Second Edition, 2013, O’Reilly Media, 386 pp.
[2] From this URI, for example,, begin typing into the upper right search box and then picking one of the suggested auto-completion terms.
[3] For example, picking the ‘amniote’ link ( from the lower left Broader Concepts text box.
[4] To see an example of JS code calling the Clojure routines see Then, look for the Clojure call noted ‘nb-entities’. You can see the actual Clojure routines under this same name in the sample file. (This sample file contains other functions to clean up input strings, for example. Also note that most Clojure code used by the system is not available for inspection.)
[5] Our series on this topic began with the article, M.K. Bergman, “Concepts and an Introduction to the Occasional Series on ‘Ontology Best Practices for Data-driven Applications,’ AI3:::Adaptive Information blog, May 12, 2009, and continued with a more detailed discussion in M.K. Bergman, “Ontologies as the ‘Engine’ for Data-Driven Applications,” AI3:::Adaptive Information blog, June 10, 2009. The later article introduced the ideas of data-driven displays and user interfaces based on ontologies specifically enhanced to include those specifications.
[6] Nicola Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see
[7] Michael Uschold, “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), 2008, Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see
[8] M.K. Bergman, “structWSF: A Framework for Collaboration Networks,” AI3:::Adaptive Information blog, July 7, 2009.
[9] M.K. Bergman, “Ontology-Driven Apps Using Generic Applications,” AI3:::Adaptive Information blog, March 7, 2011.
[10] M.K. Bergman, “An Ontologies Architecture for Ontology-driven Apps ,” AI3:::Adaptive Information blog, December 5, 2011.

Posted by AI3's author, Mike Bergman Posted on September 11, 2019 at 6:58 am in Adaptive Innovation, Ontologies, Software Development | Comments (2)
The URI link reference to this post is:
The URI to trackback this post is: