Posted:July 13, 2020

KBpediaNew KBpedia ID Property and Mappings Added to Wikidata

Wikidata editors last week approved adding a new KBpedia ID property (P8408) to their system, and we have therefore followed up by adding nearly 40,000 mappings to the Wikidata knowledge base. We have another 5000 to 6000 mappings still forthcoming that we will be adding in the coming weeks. Thereafter, we will continue to increase the cross-links, as we partially document below.

This milestone is one I have had in my sights for at least the past few years. We want to both: 1) provide a computable overlay to Wikidata; and 2) increase our dependence and use of Wikidata’s unparalleled structured data resources as we move KBpedia forward. Below I give a brief overview of the status of Wikidata, share some high-level views of our experience in putting forward and then mapping a new Wikidata property, and conclude with some thoughts of where we might go next.

The Status of Wikidata

Wikidata is the structured data initiative of the Wikimedia Foundation, the open-source group that oversees Wikipedia and many other notable Web-wide information resources. Since its founding in 2012, Wikidata’s use and prominence have exceeded expectations. Today, Wikidata is a multi-lingual resource with structured data for more than 95 million items, characterized by nearly 10,000 properties. Items are being added to Wikidata at a rate of nearly 5 million per month. A rich ecosystem of human editors and bots patrols the knowledge base and its entries to enforce data quality and consistency. The ecosystem includes tools for bulk loading of data with error checks, search including structured SPARQL queries, and navigation and visualization. Errors and mistakes in the data occur, but the system ensures such problems are removed or corrected as discovered. Thus, as data growth has occurred, so has quality and usefulness improved.

From KBpedia’s standpoint, Wikidata represents the most complete complementary instance data and characterization resource available. As such, it is the driving wheel and stalking horse (to mix eras and technologies) to guide where and how we need to incorporate data and its types. These have been the overall motivators for us to embrace a closer relationship with Wikidata.

As an open governance system, Wikidata has adopted its own data models, policies, and approval and ingest procedures for adopting new data or characterizations (properties). You might find it interesting to review the process and ongoing dialog that accompanied our proposal for a KBpedia ID as a property in Wikidata. As of one week ago, KBpedia ID was assigned Wikidata property P8408. To date, more than 60% of Wikidata properties have been such external identifiers, and IDs are the largest growing category of properties. Since most properties that relate to internal entity characteristics have already been identified and adopted, we anticipate mappings to external systems will continue to be a dominant feature of the growth in Wikidata properties to come.

Our Mapping Experience

There are many individuals that spend considerable time monitoring and overseeing Wikidata. I am not one of them. I had never before proposed a new property to Wikidata, and had only proposed one actual Q item (Q is the standard prefix for an entity or concept in Wikidata) for KBpedia prior to proposing our new property.

Like much else in the Wikimedia ecosystem, there are specific templates put in place for proposing a new Q item or P proposal (see the examples of external identifiers, here). Since there are about 10,000 times more Q items than properties, the path for getting a new property approved is more stringent.

Then, once a new property is granted, there are specific paths like QuickStatements or others that need to be followed in order to submit new data items (Q ids) or characteristics (property by Q ids). I made some newbie mistakes in my first bulk submissions, and fortunately had a knowledgeable administrator (@Epidosis) help guide me through making the fixes. For example, we had to back off about 10,000 updates because I had used the wrong form for referencing a claim. Once reclaimed, we were able to again upload the mappings.

As one might imagine, updates and changes are being submitted by multiple human agents and (some) bots at all times into the system. The facilities like QuickStatements are designed to enable batch uploads, and allow re-submissions due to errors. You might want to see what is currently active on the system by checking out this current status.

With multiple inputs and submitters, it takes time for large claims to be uploaded. In the case our our 40,000 mappings, we also accompanied each of those with a source and update data characterization, leading to a total upload of more than 120,000 claims. We split our submissions over multiple parts or batches, and then re-submitted if initial claims error-ed out (for example, if the base claim had not been fully registered, the next subsidiary claims might error due to lack of the registered subject; upon a second pass, the subject would be there and so no error). We ran our batches at off times for both Europe and North America, but the total time of the runs still took about 12 hours. 

Once loaded, the internal quality controls of Wikidata kick in. There are both bots and human editors that monitor concepts, both of which can flag (and revert) the mapping assignments made. After three days of being active on Wikidata, we had a dozen reverts of initial uploaded mappings, representing about 0.03% of our suggested mappings, which is gratifyingly low. Still, we expect to hear of more such errors, and we are committed to fix all identified. But, at this juncture, it appears our initial mappings were of pretty high quality.

We had a rewarding and learning experience in uploading mappings to Wikidata and found much good will and assistance from knowledgeable members. Undoubtedly, everything should be checked in advance to ensure quality assertions when preparing uploads to Wikidata. But, if done, the system and its editors also appear quite capable to identify and enforce quality control and constraints as encountered. Overall, I found the entire data upload process to be impressive and rewarding. I am quite optimistic of this ecosystem continuing to improve moving forward.

The result of our external ID uploads and mappings can be seen in these SPARQL queries regarding the KBpedia ID property on Wikidata:

As of this writing, the KBpedia ID is now about the 500th most prevalent property on Wikidata.

What is Next?

Wikidata is clearly a dynamic data environment. Not only are new items being added by the millions, but existing items are being better characterized and related to external sources. To deal with the immense scales involved requires automated quality checking bots with human editors committed to the data integrity of their domains and items. To engage in a large-scale mapping such as KBpedia also requires a commitment to the Wikidata ecosystem and model.

Initiatives that appear immediately relevant to what we have put in place relating to Wikidata include to:

  • Extend the current direct KBpedia mappings to fix initial mis-assignments and to extend coverage to remaining sections of KBpedia
  • Add additional cross-mappings that exist in KBpedia but have not yet been asserted in Wikidata (for example, there are nearly 6,000 such UNSPSC IDs)
  • Add equivalent class (P1709) and possible superproperties (P2235) and subproperties (P2236) already defined in KBpedia
  • Where useful mappings are desirable, add missing Q items used in KBpedia to Wikidata
  • And, most generally, also extend mappings to the 5,000 or so shared properties between Wikidata and KBpedia.

I have been impressed as a user of Wikidata for some years now. This most recent experience also makes me enthused about contributing data and data characterizations directly.

To Learn More

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s writings related to knowledge representation. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is partially sponsored by Cognonto Corporation. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:June 15, 2020

KBpediaNew Version Finally Meets the Hurdle of Initial Vision

I am pleased to announce that we released a powerful new version of KBpedia today with e-commerce and logistics capabilities, as well as significant other refinements. The enhancement comes from adding the United Nations Standard Products and Services Code as KBpedia’s seventh core knowledge base. UNSPSC is a comprehensive and logically organized taxonomy for products and services, organized into four levels, with third-party crosswalks to economic and demographic data sources. It is a leading standard for many industrial, e-commerce, and logistics applications.

This was a heavy lift for us. Given the time and effort involved, Fred Giasson, KBpedia’s co-editor, and I decided to also tackle a host of other refinements we had on our plate. All told, we devoted many thousands of person-hours and more than 200 complete builds from scratch to bring this new version to fruition. Proudly I can say that this version finally meets the starting vision we had when we first began KBpedia’s development. It is a solid baseline to build from for all sorts of applications and to make broad outreach for adoption in 2020. Because of the extent of changes in this new version, we have leapfrogged KBpedia’s version numbering from 2.21 to 2.50.

KBpedia is a knowledge graph that provides a computable overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata, GeoNames, DBpedia, schema.org, OpenCyc, and, now, UNSPSC. KBpedia now contains more than 58,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases with the twin goals of data interoperability and knowledge-based artificial intelligence (KBAI).

KBpedia is built from a expandable set of simple text ‘triples‘ files, specified as tuples of subject-predicate-object (EAVs to some, such as Kingsley Idehen) that enable the entire knowledge graph to be constructed from scratch. This process enables many syntax and logical tests, especially consistency, coherency, and satisfiability, to be invoked at build time. A build may take from one to a few hours on a commodity workstation, depending on the tests. The build process outputs validated ontology (knowledge graph) files in the standard W3C OWL 2 semantic language and mappings to individual instances in the contributing knowledge bases.

As Fred notes, we continue to streamline and improve our build procedures. Major changes like what we have just gone through, be it adding a main source like UNSPSC or swapping out or adding a new SuperType (or typology), often require multiple build iterations to pass the system’s consistency and satisfiability checks. We need these build processes to be as easy and efficient as possible, which also was a focus of our latest efforts. One of our next major objectives is to release KBpedia’s build and maintenance codes, perhaps including a Python option.

Incorporation of UNSPSC

Though UNSPSC is consistent with KBpedia’s existing three-sector economic model (raw products, manufactured products, services), adding it did require structural changes throughout the system. With more than 150,000 listed products and services in UNSPSC, incorporating it needed to balance with KBpedia’s existing generality and scope. The approach was to include 100% of the top three levels of UNSPSC — segments, families, and classes — plus more common and expected product and service ‘commodities’ in its fourth level. This design maintains balance while providing a framework to tie-in any remaining UNSPSC commodities of interest to specific domains or industries. This approach led to integrating 56 segments, 412 families, 3700+ classes, and 2400+ commodities into KBpedia. Since some 1300 of these additions overlapped with existing KBpedia reference concepts, we checked, consolidated, and reconciled all duplicates.

We fully specified and integrated all added reference concepts (RCs) into the existing KBpedia structure, and then mapped these new RCs to all seven of KBpedia’s core knowledge bases. Through this process, for example, we are able to greatly expand the coverage of UNSPSC items on Wikidata from 1000 or so Q (entity) identifiers to more than 6500. Contributing such mappings back to the community is another effort our KBpedia project will undertake next.

Lastly with respect to UNSPSC, I will be providing a separate article on why we selected it as KBpedia’s products and services template, and how we did the integration and what we found as we did. For now, the quick point is that UNSPSC is well-structured and organized according to the three-sector model of the economy, which matches well with Peirce’s three universal categories underlying our design of KBpedia.

Other Major Refinements

These changes were broad in scope. Effecting them took time and broke open core structures. Opportunities to rebuild the structure in cleaner ways arise when the Tinkertoys get scattered and then re-assembled. Some of the other major refinements the project undertook during the builds necessary to create this version were to:

  • Further analyze and refine the disjointedness between KBpedia’s 70 or so typologies. Disjoint assertions are a key mechanism for sub-set selections, various machine learning tasks, querying, and reasoning
  • Increase the number of disjointedness assertions 62% over the prior version, resulting in better modularity. (However, note the actual RCs affected by these improvements is lower than this percentage since many were already specified in prior disjoint pools)
  • Add 37% more external mappings to the system (DBpedia and UNSPSC, principally)
  • Complete 100% of the definitions for RCs across KBpedia
  • Greatly expand the altLabel entries for thousands of RCs
  • Improve the naming consistency across RC identifiers
  • Further clean the structure to ensure that a given RC is specified only once to its proper parent in an inheritance (subsumption) chain, which removes redundant assertions and improves maintainability, readability, and inference efficiency
  • Expand and update the explanations within the demo of the upper KBpedia Knowledge Ontology (KKO) (see kko-demo.n3). This non-working ontology included in the distro makes it easier to relate the KKO upper structure to the universal categories of Charles Sanders Peirce, which provides the basic organizational framework for KKO and KBpedia, and
  • Integrate the mapping properties for core knowledge bases within KBpedia’s formal ontology (as opposed to only offering as separate mapping files); see kbpedia-reference-concepts-mappings.n3 in the distro.

Current Status of the Knowledge Graph

The result of these structural and scope changes was to add about 6,000 new reference concepts to KBpedia, then remove the duplicates, resulting in a total of more than 58,200 RCs in the system. This has increased KBpedia’s size about 9% over the prior release. KBpedia is now structured into about 73 mostly disjoint typologies under the scaffolding of the KKO upper ontology. KBpedia has fully vetted, unique mappings (nearly all one-to-one) to these key sources:

  • Wikipedia – 53,323 (including some categories)
  • DBpedia – 44,476
  • Wikidata – 43,766
  • OpenCyc – 31,154
  • UNSPSC – 6,553
  • schema.org – 842
  • DBpedia ontology – 764
  • GeoNames – 680
  • Extended vocabularies – 249.

The mappings to Wikidata alone link to more than 40 million unique Q instance identifiers. These mappings may be found in the KBpedia distro. Most of the class mapping are owl:equivalentClass, but a minority may be subClass or superClass or isAbout predicates as well.

KBpedia also includes about 5,000 properties, organized into a multi-level hierarchy of attributes, external relations, and representations, most derived from Wikidata and schema.org. Exploiting these properties and sub-properties is also one of the next priorities for KBpedia.

To Learn More

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reach-throughs are straightforward to construct.) See further the Github site for further downloads.

KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is partially sponsored by Cognonto Corporation. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:December 4, 2019

KBpediaVersion 2.20 of the Knowledge Graph Now Prepped for Release on Public Repositories

Fred Giasson and I, as co-editors, are pleased to announce today the release of version 2.20 of the open-source KBpedia system. KBpedia is a knowledge graph that provides an overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, and OpenCyc. KBpedia contains more than 53,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases.

We are preparing to register KBpedia on many public repository sites, and we wanted to make sure quality was a high as possible as we begin this process. Since KBpedia is a system built from many constituent knowledge bases, duplicates and inconsistencies can arise when combining them. The rationale for this release was to conduct a comprehensive manual review to identify and remove most of these issues.

We made about 10,000 changes in this newest release. The major changes we made to KBpedia resulting from this inspection include:

  • Removal of about 2,000 reference concepts (RCs) and their mappings and definitions pertaining to individual plant and animal species, which was an imbalance in relation to the other generic RCs in the system;
  • Manual inspection and fixes to the 70 or so typologies (for instance, Animals or Facilities) that are used to cluster the RCs into logical groupings;
  • Removal of references to UMBEL, one of KBpedia’s earlier constituent knowledge bases, due to retirement of the UMBEL system;
  • Fixes due to user comments and suggestions since the prior release of version 2.10 in April 2019; and
  • Adding some select new RCs in order to improve the connectivity and fill gaps with the earlier version.

Without a doubt this is now the cleanest and highest quality release for the knowledge graph. We are now in position to extend the system to new mappings, which will be the focus of future releases. (Expect the next after the first of the year.) The number and structure of KBpedia’s typologies remain unchanged from prior versions. The number of RCs now stands at 53,465, smaller than the 55,301 reference concepts in the prior version.

Besides combining the six major public knowledge bases of Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, and OpenCyc, KBpedia includes mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is sponsored by Cognonto Corporation.

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates (or relations) based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.) See further the Github site for further downloads. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:September 11, 2019

Dynamic Apps with KGs and OntologiesA Refinement of What We Call ODapps (Ontology-driven Applications)

In a recent article about knowledge graphs I noted that I tend to use the KG term interchangeably with the term ‘ontology‘. While this interchangeability is generally true when ontologies are used to model instance and class knowledge, in other words for knowledge representation (KR), it does overlook important cases when ontologies are themselves a specification for aspects such as access control, applications, or user interfaces. In these cases the ontology is less related to knowledge and more related to specifications or control. In such cases it is probably best to retain the distinction of an ontology from a knowledge graph (which I tend to think of as more oriented to content). I elaborate further on this distinction in this article.

What brought this distinction to mind was a recent post by Bob DuCharme on custom HTML forms to drive back-end SPARQL queries. The example Bob uses is getting a listing of cocktails from Wikidata given a specified ingredient. The example he provides uses Perl for a CGI (Common Gateway Interface) script. Bob has discussed generic SPARQL queries before; he features many useful Python examples in his excellent SPARQL book [1].

The basic idea is to provide values for variables entered via a Web page form to complete a patterned SPARQL query (SPARQL is the query language for RDF). The example Bob uses is to have the user enter a cocktail ingredient, which then returns all of the cocktails listed on Wikidata that contain that ingredient. The advantage of the idea is that users need know nothing about SPARQL or how to form a proper SPARQL query. By simply entering in missing information on a Web form or making other Web form choices (such as picking from a list or a radiobutton), all of the heavy lifting is done by the patterned SPARQL script in the background. Letting the Web forms provide the values for SPARQL variables is the key to the method.

We use this idea aggressively on, for example, our KBpedia Web site. By picking a search term from an auto-completed search listing [2] or picking a live link from that same page [3], we are able to re-use a fixed set of SPARQL query patterns to drive simple Web page templates. In our case, we use JavaScript to control the display and canvas and to invoke Clojure scripts that generate the SPARQL queries. (Over the years we have also used PHP and JavaScript directly to generate these queries. The point, as is made by DuCharme, is most any scripting language may be used for the backend.) You may inspect any of the sub-pages under the ‘Knowledge Graph‘ section on the site by using ‘View Page Source’. Sample Clojure code is also available for inspection to see how we have implemented the approach [4].

Ontology-driven Apps

This basic idea of patterned SPARQL queries forms the baseline for what we have been calling data-driven applications [5] for more than 10 years, when we first began experimenting with the approach and using it in early customer engagements. And our embrace of the idea is not the first. For example, in 1998, more than a decade before our own efforts, Guarino [6] was already talking of “ontology-driven” information systems and the use of ontologies for user interfaces. Still, ten years later, though Uschold was noting the same prospects, his survey of advances to that point showed little in actual development of ontologies “driving” applications [7].

It was roughly at that time that our own efforts began. One of our first realizations was that dynamic retrieval and presentation of data on a Web page only began the process. With the Web page as the medium of interaction, the idea of using interfaces to manage data became concrete. By organizing information into datasets and setting profiles for access and CRUD (create – read – update – delete) rights, an effective environment for data sharing and federation is established. We saw that we could abstract the complexity of the languages and specifications (SPARQL, RDF, and OWL) into the background, letting developers write the backend scripts, while letting the users and subject matter experts deal with updating, selecting and managing the content via Web front-ends.

Today, most approaches to semantic technologies and ontologies are still, unfortunately, rather static and fixed. Separate applications or IDEs are used to write and manage the ontologies. The ontologies are not generally subject to continuous upgrades and refinements, and end-users are the recipients, not the ‘drivers’ of the systems. But our early efforts showed how we could democratize this process, making updates and interactions dynamic.

With the embrace of CRUD, we also needed dynamic ways for changes made to the system — now codified and maintained in ontologies — to be reflected back to the user interfaces. We saw that a layer of specific Web services could both submit and query information to the ontology, and present those changes dynamically to scripts within the HTML user interfaces. (We also saw that access control to both data and applications needed to be imposed for enterprise uses, functions that can also be mediated by ontologies. Those topics are not discussed further here, but we have documented elsewhere [8]). Because the user interface was becoming the medium of interaction, it was also apparent that we needed to expand our use of labels in the ontologies. Thus, besides standard SKOS concepts like altLabels for node synonyms or prefLabels for preferred node labels, we also needed to accommodate labels for tooltips and labels that appear as titles or instructions on forms on user interfaces.

Once this rubicon of dynamic interfaces driven by ontologies is crossed, many new opportunities come to the fore. One opportunity, based on the idea of patterned information, is that different information in the ontology may lend itself to different display or visualization. For example, all location information may be displayed on a map as points, regions, or paths. Or, people and many objects may warrant displaying a picture if available. Or, numeric values over similar dimensions may lend themselves to charting. Or, ordered or unordered lists may warrant a listing display, or, when characterized by numeric values, by pie charts or other chart types.

These realizations led us to create a series of display and visualization components, the invoking of which may be triggered by the datatypes coming back in a results set initiated by a SPARQL query. The widget code for these display and visualization options may be served up by Web services in response to the characteristics in the results streams in a similar way we can serve up filtering, searching, browsing, import/export, or other functional widgets. In other words, the nature of the information in the ontology can inform what functions — including visualization — we can perform with a given results stream. (See, for example, any of the displays such as charts or maps for the Peg community indicator system built with our design for the United Way of Winnipeg.)

Another opportunity arises from the idea of a data record coming back in a results set. We see, for example, how the so-called ‘infoboxes’ in Wikipedia or on a Google search results page show us a suite of data attributes for a given entity. We see ‘people’ entities characterized by birth, death, parents, children, country of origin, occupation, and such. We see ‘automobile’ entities characterized by body type, brand, horsepower, year built, etc. These kinds of characterizations are patterned, too, and can begin to be organized into hierarchies and types.

Because of this patterned, structured nature of entity types, we can generalize our data display templates further. What if we detect our instance represents a camera but do not have a display template specific to cameras? Well, the ontology and simple inferencing can tell us that cameras are a form of digital or optical products, which more generally are part of a product concept, which more generally is a form of a human-made artifact, or similar. However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:

Thing
Product
Camera
Digital Camera
SLR Digital Camera
Olympus Evolt E520

At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.

This design is meant to provide placeholders for any ‘thing’ in any domain, while also providing the latitude to tailor and customize to every ‘thing’ in the domain. By tracing this inferencing chain from the specific to the more general we can ‘fall back’ until a somewhat OK display template is discovered, even in the absence of the better and more specific one. Then, if we find we are trying to display information on cameras frequently, we only need take one of the more general, parent templates and specifically modify it for the desired camera attributes. We also keep presentation separate from data so that the styling and presentation mode of these templates is also freely modifiable.

Coming to a Generalized Understanding

Within a couple of years of first working with this approach we came to have a more generalized understanding of what we call ‘ODapps’ [9]. We modularized the ontologies to separate the information (what is now called the ‘knowledge graph’) from the specifications of the semantic components. We also enhanced the label constructs in the KG to handle user interface labels and related. I have slightly updated the workflow we showed for this process back in 2011:

Dynamic Apps Animation

(click for full size)

The basic process begins when the user interacts with various semantic components embedded in the layout of the Web page. Once the user interacts with these various components, new queries are generated (most often as SPARQL queries) in the background to the various Web services endpoints, which are specific to either management or display functions. The first consequence of the query is to generate a results set of data from the knowledge graph. At the same time, the datatypes of the results inform a components ontology that produces a schema useful to the display widgets. This schema constitutes the formal instructions to the semantic components on the Web page. When this schema is combined with the results set data, the new instructions for the semantic components on the Web page are complete. Here is an example schema:

(click for full size)

These instructions are then presented to the various semantic components, and determine which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations. In this way, the user interactions may act as a form of recorder for later automatic playback of the interaction choices.

A New Dynamic Paradigm for User Apps

ODapps are thus a balanced abstraction within the framework of canonical architectures, data models and data structures. Under this design, software developer time is focused on creating the patterned scripts that underlie the Web page layouts, developing the semantic component widgets, and writing the functional Web services. Users and subject matter experts can concentrate on doing analysis and keeping the ontologies and knowledge graph accurate and up-to-date. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures.

This new paradigm began with the simple observation that Bob DuCharme demonstrates that we can use SPARQL queries driven by users in a Web page form to get relevant information back to the user. We have taken this simple premise and have — over the past nearly ten years — expanded it to be a more generalized approach to ontology-driven apps, or ODapps. We have also continued to talk about how we may modularize our ontology architectures for a breadth of enterprise purposes [10].

Yet, while we have prototyped these capabilities and have demonstrated them within our own customer engagements, this general approach is by no means common.

Perhaps now, with the resurgent interest in knowledge graphs, we can finally see our way clear to a suite of semantic approaches that promise a revolution in software design practices and the democratization of information technologies. Through the ODapp approach, we believe that customers can see:

  • Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly
  • Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse
  • Increased reliability — formal constructs with automation reduces human error
  • Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain.

As I have noted before, these first four items are similar to the benefits that may accrue from other advanced software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold [7] also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:

  • Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking
  • Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and
  • Agility/flexibility — ontology-driven information systems are more flexible, because you can more easily and reliably make changes in the model than in code.

So, as practiced today, most uses of ontologies are for knowledge representation, and in that sense we may use the terms ‘knowledge graph’ and ‘ontologies’ more-or-less interchangeably. However, taken to its logical extent and embraced for driving software specifications, we see the term of ‘ontology’ as much more general and powerful. Like I have said before, the meaning of these terms is intimately related to their context of use.


[1] Bob DuCharme, Learning SPARQL: Querying and Updating with SPARQL 1.1, Second Edition, 2013, O’Reilly Media, 386 pp.
[2] From this URI, for example, http://kbpedia.org/knowledge-graph/reference-concept/?uri=Mammal, begin typing into the upper right search box and then picking one of the suggested auto-completion terms.
[3] For example, picking the ‘amniote’ link (http://kbpedia.org/knowledge-graph/reference-concept/?uri=Amniote) from the lower left Broader Concepts text box.
[4] To see an example of JS code calling the Clojure routines see http://kbpedia.org/entity/browse/js/browse-entities.js. Then, look for the Clojure call noted ‘nb-entities’. You can see the actual Clojure routines under this same name in the sample https://www.mkbergman.com/wp-content/themes/ai3v2/files/2019Posts/named_entities.clj file. (This sample file contains other functions to clean up input strings, for example. Also note that most Clojure code used by the system is not available for inspection.)
[5] Our series on this topic began with the article, M.K. Bergman, “Concepts and an Introduction to the Occasional Series on ‘Ontology Best Practices for Data-driven Applications,’ AI3:::Adaptive Information blog, May 12, 2009, and continued with a more detailed discussion in M.K. Bergman, “Ontologies as the ‘Engine’ for Data-Driven Applications,” AI3:::Adaptive Information blog, June 10, 2009. The later article introduced the ideas of data-driven displays and user interfaces based on ontologies specifically enhanced to include those specifications.
[6] Nicola Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1776&rep=rep1&type=pdf.
[7] Michael Uschold, “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), 2008, Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see http://mba.eci.ufmg.br/downloads/recol/FormalOntologyinInformationSystems2008.pdf.
[8] M.K. Bergman, “structWSF: A Framework for Collaboration Networks,” AI3:::Adaptive Information blog, July 7, 2009.
[9] M.K. Bergman, “Ontology-Driven Apps Using Generic Applications,” AI3:::Adaptive Information blog, March 7, 2011.
[10] M.K. Bergman, “An Ontologies Architecture for Ontology-driven Apps ,” AI3:::Adaptive Information blog, December 5, 2011.

Posted by AI3's author, Mike Bergman Posted on September 11, 2019 at 6:58 am in Adaptive Innovation, Ontologies, Software Development | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/2267/combining-knowledge-graphs-and-ontologies-for-dynamic-apps/
The URI to trackback this post is: https://www.mkbergman.com/2267/combining-knowledge-graphs-and-ontologies-for-dynamic-apps/trackback/
Posted:July 1, 2019

KGs Have a Long History and a Diversity of Definitions

A Growing Knowledge Graph

Of late, it seems everywhere I turn some discussion or debate is taking place as to what a ‘knowledge graph’ is and how to define one. The semantic Web community not infrequently engages in such bursts of definition and vocabulary harangues. For a community ostensibly focused on ‘semantics’, definitions and terminology disputes sometimes seem all too common. Still, the interest in knowledge graphs (KGs) is quite real and represents an important visibility point for the semantic Web.

For example, in the last year there have been major sessions or workshops on KGs at the major semantic technology conferences of WWW, ESWC, ISWC, and Semantics 2019, and special conferences in Dagstuhl, New York City (Columbia), China and Cuba, among others I have surely missed. Ongoing for the past month has been a lengthy discussion on what is a knowledge graph on the W3C’s semantic Web mailing list. Google Scholar, as another example, now lists over 16,000 academic papers on knowledge graphs. These listings do not include the many individual blog and company posts on the subject. This blossoming of attention is great news.

In tracing the lineage of the ‘knowledge graph’ term, many erroneously date it to 2012 when Google (some claim) “coined” it for its structured entity-attribute information, now a common view on its search results page (with Marie Curie being one of the first-cited examples [1]). Though Google’s embrace has been important from a marketing and visibility viewpoint, the term ‘knowledge graph’ goes back to the 1970s, and the ideas behind a KG goes back even further. According to Pat Hayes [2], “The idea of representing, or at least displaying, knowledge as a graphical diagram (rather then as, say, a set of sentences) has a very old history. In its modern sense it goes back at least to 1885 (C S Peirce ‘existential graph’) and can probably be traced into medieval writings and earlier. (The Torah version of Genesis refers to a ‘tree of knowledge’.) It has been re-invented or rediscovered many times since, and seems to blossom in public (or at least academic) discussions with a periodicity of roughly 40 years.”

As someone who has written extensively on ‘ontologies‘, the kissing cousins of knowledge graphs, and who has adopted the KG term as my preferred label, I thought it would be useful to survey the discussion and frame it in terms of some of these historical perspectives. Curious of how I was using the term myself, I searched my recent book and found that I use the term ‘ontology’ some 568 times, while using ‘knowledge graph’ 226 times [3]. Though I can acknowledge some technical bases for distinguishing these two terms, in practice, I tend to use them interchangeably. I think many practitioners, including many ‘experts’ and academics with direct ties to the semantic Web, tend to do the same, for reasons I outline below. Though I am sure the world is not clamoring for still another definition of the term, I think we can benefit from a broad understanding of this increasingly important term from common-sense and pragmatic grounds. That is what I try to provide in this article.

What is a Knowledge Graph?

If one searches on the phrase ‘what is a knowledge graph’, one finds there are some 99 references on Google as a whole (my article will make it a cool 100!) and some 22 academic papers on Google Scholar. Those are impressive numbers for such a specific phrase, and reflect the general questioning around the concept. Still, for those very interested, these numbers are also quite tractable, and I encourage you to repeat the searches yourself if you want to see the diversity of opinions, though most of the perspectives are captured herein.

I present and compare various definitions of ‘knowledge graph’ three sections below. However, to start, from a common-sense standpoint, we can easily decompose an understanding of the phrase from its composite terms. ‘Knowledge’ seems straightforward, but an inspection of the literature and philosophers over time indicates the concept is neither uniformly understood nor agreed. My own preference, which I discuss in Chapter 2 in my book [3], is from Charles S. Peirce. His view of knowledge had a similar basis to what is known as justified, true belief (JTB). Peirce extended this basis with the notion that knowledge is information sufficiently believed to be acted upon. Nonetheless, we should assume that not all who label things as knowledge graphs embrace this precise understanding.

The concept of ‘graph’, the second composite term, has a precise and mathematical understanding as nodes (or vertices) connected by edges. Further, since knowledge is represented by language in the form of statements or assertions, the graph by definition is a directed and labeled one. In precise terminology, a knowledge graph has the form of a directed (mostly acyclic) graph (or DAG). (There can be cyclic or transitive relationships in a KG, but most are subsumptive or hierarchical in nature.)

So, while there is some dispute around what constitutes ‘knowledge’, there should be a pretty common-sense basis for what a knowledge graph is, at least at a general level.

Some attempt to further categorize KGs into general and domain ones (sometimes using the terms ‘vertical’ or ‘enterprise’). In the real world we also see KGs that are heavily biased toward instances and their attributes (the Google Knowledge Graph, for one) or reflect more complete knowledge structures with instances and a conceptual framework for organizing and reasoning over the knowledge space (such as our own KBpedia or the KGs used by virtual intelligent agents). I should note we see this same diversity in how one characterizes ‘ontologies’.

Knowledge Graphs: Citations Over Time

As noted, the use of the phrase ‘knowledge graph’ did not begin with Google. In fact, we can trace back the earliest references to the phrase to 1973. Here is a graph of paper citations for the phrase in the period 1970 – 2019 [4]:

'Knowledge Graph' Citations, 1970-2019
Figure 1: ‘Knowledge Graph’ Citations, 1970 – 2019

Coincident with Google’s adoption, we see a pretty significant spike in citations beginning in 2012. Today, we are seeing about 5,000 citations yearly in the academic literature.

Because later citations swamp out the numbers of earlier ones, we can plot these citations on a log scale to see earlier references:

'Knowledge Graph' Citations (log basis), 1970-2019
Figure 2: ‘Knowledge Graph’ Citations (log basis), 1970 – 2019

The shape of this log curve is consistent with a standard power-law distribution. The earliest citations we see are from 1973, when there were two. One was in the context of education and learning [5]; the other was based on linguistic analysis [6]. The first definition of ‘knowledge graph’ was presented by Marchi and Miquel in 1974 [7]. There were few further references for the next decade. However, that began to change in the mid-1980s, which I elaborate upon below.

Knowledge Graphs: A Marketing Perspective

As we will see in a few moments, not all knowledge graphs are created the same. Yet, as the charts above show, one KG was the first gorilla to enter the room: Google. Though there were many mentions for 40 years prior, and quite a few in the prior decade, the announcement of Google’s Knowledge Graph (title case) in 2012 totally changed perceptions and visibility. It is an interesting case study to dissect why this large impact may have occurred.

The first obvious reason is Google’s sheer size and market clout. Further, Google’s announcement was accompanied by some nifty graphics, which can be viewed on YouTube, that showed connections between entities that conveyed a sense of dynamism and navigation. Still, not all announcements or new initiatives by Google gain the traction that knowledge graphs did. Clearly, more was at work.

The first additional reason, I think, was the groundwork for the ideas of graphs and connections that had come from the linked data and semantic Web communities in the 5-10 years prior. People had been prepped for the ideas of connections and that, when visualized, they would take on the form of a graph.

Another additional reason came from within the community itself. The first term for a knowledge graph was ‘ontology.’ (Though as we will see in the next section, not everyone agrees that KGs and ontologies can be used synonomously.) Yet all of us who were advocates and practitioners well knew that the idea of an ontology was difficult to explain to others. The term seemed to have both technical and abstract connotations and, when first introduced, always required some discussion and definition. The notion of a knowledge graph, on the other hand, is both descriptive and intuitive.

Lastly, I think there was another factor at work in the embrace of knowledge graphs. The semantic Web community had been split for some time between two camps. One camp saw the need for formal ontologies and were often likely to use the OWL language, at least for formal conceptual schemas. Another camp leaned more to diversity and decentralized approaches to vocabularies, and tended to prefer RDF (and, later, linked data). At the same time, this latter camp recognized the need for connectivity and data integration. The idea of a ‘knowledge graph’ was able to sidestep these differences and provide a common terminology for both camps. I think each camp embraced the term, in part, for these different reasons. Both camps, however, seemed to share the view that the term ‘knowledge graph’ offered some strong marketing advantages, not least of which was the implicit endorsement of Google.

Knowledge Graphs: Definitions

Once Google popularized the ‘knowledge graph’ term in 2012, a number of authors began to assemble various definitions of the concept. One could argue that this scramble to define the term was in part an attempt to bound the term consistent with prior prejudices. Another possible motivation was the desire to give precision to a heretofore imprecise term. In any case, the proliferation of definitions led some researchers to compile recent and historical examples. Notable among these have been McCusker et al. [8], Ehrlinger and Wöß [9], Gutierrez [10], and Hogan et al. [11]. I have built upon their compilations, plus added new entries based on the research noted above, to produce the following table of ‘knowledge graph’ definitions, reaching back to the earliest ones:

Source Year Definition
Marchi and Miquel [7] 1974 A mathematical structure with vertices as knowledge units connected by edges that represent the prerequisite relation
Hoede [12] 1982 See [12] and text and Bakker and de Vries below
Bakker [13] 1987 A knowledge graph is a labeled directed graph D(P,A) with (a) P is a set of points that represent concepts, relations or frameworks; (b) A ⊆ P χ P is a set of arcs that form the connections between the entities. An arc exists for p1 ∈ P to p2 ∈ P if: (b1) p1 represents a concept that is the tail of a relation represented by p2; (b2) p2 represents a concept that is the head of a relation represented by p1. (b3) p1 represents an element in the contents of a framework and p2 represents the FPAR-relation. (b4) p2 is a framework and p1 represents the accompanying FPAR-relation
de Vries [14] 1989 A directed graph that distinguishes three types of asserted relationships: (1) an object has a certain property, (2) an object is an instance of another object, (3) a change in a property of an object leads to a change in another property of that object
James [15] 1992 A knowledge graph is a kind of semantic network. . . . One of the essential differences between knowledge graphs and semantic networks is the explicit choice of only a few types of relations
Zhang [16] 2002 A new method of knowledge representation, [which] belongs to the category of semantic networks. In principle, the composition of a knowledge graph is including concept (tokens and types) and relationship (binary and multivariate relation)
Popping [17] 2003 A particular kind of semantic network
Singhal (Google) [1] 2012 A graph that understands real-world entities and their relationships to one another: things, not strings
Ehrlinger and Wöß [9] 2016 A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge
Krötzsch and Weikum [18] 2016 Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities
Krötzsch [19] 2017 Are characterized by several properties that together distinguish them from more traditional knowledge management paradigms: (1) Normalization: Information is decomposed into small units of information, interpreted as edges of some form of graph. (2) Connectivity: Knowledge is represented by the relationships between these units. (3) Context: Data is enriched with contextual information to record aspects such as temporal validity, provenance, trustworthiness, or other side conditions and details
Paulheim [20] 2017 A knowledge graph (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains
Shao et al. [21] 2017 A knowledge graph is a graph constructed by representing each item, entity and user as nodes, and linking those nodes that interact with each other via edges.
Villazon-Terrazas et
al.
[22] 2017 A knowledge graph is a set of typed entities (with attributes) which relate to one another by typed relationships. The types of entities and relationships are defined in schemas that are called ontologies. Such defined types are called vocabulary. A knowledge graph is a structured dataset that is compatible with the RDF data model and has an (OWL) ontology as its schema.
Wilcke et al. [23] 2017 A data model used in the Semantic Web . . . based on three basic principles: 1. Encode knowledge using statements. 2. Express background knowledge in ontologies. 3. Reuse knowledge between datasets
Bianchi et al. [24] 2018
  • Large knowledge bases
  • Entities classified using types
  • Types organized in sub-types graphs
  • Binary relationships between entities
  • Semantics and inference via rules/axioms
  • Semantic similarity with lexical, topological and other feature-based approaches
McCusker et al. [8] 2018 An Unambiguous Graph [a graph where the relations and entities are unambiguously identified] with a limited set of relations used to label the edges that encodes the provenance, especially justification and attribution, of the assertions
Xiong [25] 2018 A data resource that contains entries (‘entities’) that have their own meanings and also information (‘knowledge graph semantics’) about those entries
Bellomarini et al. [26] 2019 A semi-structured datamodel characterized by three components: (i) a ground extensional component, that is, a set of relational constructs for schema and data (which can be effectively modeled as graphs or generalizations thereof); (ii) an intensional component, that is, a set of inference rules over the constructs of the ground extensional component; (iii) a derived extensional component that can be produced as the result of the application of the inference rules over the ground extensional component (with the so-called “reasoning” process)
d’Amato et al. [27] 2019 A graph-based structured data organization, endowed with formal semantics (i.e., a graph-based data organization where a schema and multiple labelled relations with formal meaning are available)
Columbia University [28] 2019 An organized and curated set of facts that provide support for models to understand the world
Hogan et al. [11] 2019 A graph of data with the intent to compose knowledge
Kejriwal [29] 2019 A graph-theoretic representation of human knowledge such that it can be ingested with semantics by a machine; a set of triples, with each triple intuitively representing an ‘assertion’
Morrison [30] 2019 A knowledge base that’s made machine readable with the help of logically consistent, linked graphs that together constitute an interrelated group of facts
PoolParty [31] 2019 A graph-theoretic representation of human knowledge such that it can be ingested with semantics by a machine; a set of triples, with each triple intuitively representing an ‘assertion’
Tran and Takasu [32] 2019 A collection of triples, with each triple (h,t,r) denoting the fact that relation r exists between head entity h and tail entity t
Vidal et al. [33] 2019 A knowledge graph is presented as the intersection of the formal models able to represent facts of various types and levels of abstraction using a graph-based formalism

Table 1: Selected Definitions of ‘Knowledge Graph’ Over Time

As the usage charts showed, more definitions have been forthcoming since the Google KG announcement than came before. There has also been a concerted effort to provide more “precise” understandings of what the KG concept means since its broad embrace by the semantic Web community. That effort really began in earnest about 2016. A few conferences have been devoted to the topic.

Knowledge Graphs: Concepts, Connections, Contradictions

The first definition of ‘knowledge graph’ by Marchi and Miquel in 1974 [7] found so far in the literature is really not too far from our common-sense understanding of knowledge assertions in the form of a directed graph. What constituted knowledge was left undefined in this paper.

A more focused treatment of knowledge graphs began in 1982 with efforts by the Universities of Twente and Groninger [12]. Cornelis Hoede was the lead researcher on an effort to capture science or medical knowledge in the form of graphs. A key component of this research, which stretched over decades, was the parsing and analysis of human language. Some of the key students and their theses that derived from these efforts were Bakker [13], de Vries [14], van de Berg [34], and Zhang [16], among others. (The preface to Zhang’s thesis provides a nice overview of the history of these students.) Popping was another of the co-researchers [17]. Many of these provided definitions of KGs, as captured in Table 1 above.

These researchers readily acknowledged that their efforts were a derivation of semantic networks, a graph-based AI approach dating from the mid-1950s, and had close relationships to the conceptual graphs developed by John Sowa [35], though they were developed independently. Interestingly, both Hoede and students and Sowa traced the lineage of their respective efforts back to the existential graphs of Charles Peirce from 1885 [36 for Hoede and his students] to 1909 [37 and MS 514 for Sowa]. Eventually Hoede and Sowa published together and Hoede came to describe knowledge graphs as a subset of conceptual graphs.

Though the knowledge graphs of Hoede and his students initially employed a relatively small number of relationships (such as part of, a kind of, a cause of), they came to understand that the graphs should be embedded in “frameworks” (what today we call ‘ontologies’) and that they could be constructed of multiple sub-graphs [13]. According to one of Hoede’s earliest students, René Ronald Bakker, “The choice of a graph for the representation of empirical scientlfic knowledge ís appealing because, besídes having the advantage of being an ‘obvious representation’, the well developed mathematical theory of graphs can be used for the analysis and structurlng of the represented knowledge.” [13, p. 22]. The group also acknowledged that the graphs could be used as the knowledge base of an expert system or for knowledge extraction.

The work of Hoede and students was not casual and extended over at least 25 years [12]. Though terminology was different then, and today’s standards were lacking, it is striking the degree to which those efforts both reflect Peirce and many current efforts. Bakker, in the first thesis for the group in 1987, which mentions KGs more than 150 times, presented the following diagram:

Knowledge Graph Schema per Hoede, approx. 1985
Figure 3: Knowledge Graph Schema per Hoede, approx. 1982 [13]

We can see the reflection of a ‘triple’ at the bottom of the diagram, including how objects may roll up into concepts. The figure also acknowledges the relationship to language and symbol representations, striking in its relation to current information extraction and consistent with Peirce’s semiosis. The group further recognized the importance of objects being either subjects or objects of assertions, as well as how to deal with language ambiguity [38].

Much of this perspective appears to have been overlooked by the time Google made its KG announcement in 2012, and thus by many subsequent researchers.

Subsequent to 2012 definitions of KGs have added other considerations. Knowledge bases, large scales, use of particular languages (RDF or OWL, for example), provenance, an entity focus, type classification, context, machine readability, dataset linkages, data consistency, formal models, formal semantics, use of reasoners, use of triples, use of rules or axioms, etc., are all aspects mentioned by one or more of the definers listed in Table 1. Granted, any of these may be useful additions to make a knowledge graph more usable or defensible, but it is unclear if any of them are a strict requirement to be a member of the genus.

Partially, in reaction to this tightening of requirements, Hogan et al. [11] have taken the approach to broaden and make the definition of KGs as flexible as possible as “a graph of data with the intent to compose knowledge.” We seem to be resurrecting the old tensions between formality and flexibility that characterized the early evolution of the semantic Web.

Since Peirce has clearly influenced many of the ideas leading to knowledge graphs, their use, and their representation, we could perhaps look to his ethics of terminology to provide guidance for how to adjudicate these different definitions [39]. Peirce grants primacy to the first formulator of a concept and defers to her naming conventions. By this basis, we would perhaps look to Hoede and his group for our guidance. However, given much of this work was neglected, and the Google announcement set a new basis for a large-scale entity repository characterized by types and attributes, our guidance may not be so clear-cut.

Knowledge Graphs: What They Are Not

One observation arising from this survey is that knowledge graphs are not precise. They are not precise in scope, intent, use, or construction. That means, to me, that certain proscriptions offered in KG-wide definitions are not appropriate:

  • Provenance is not required, though what makes information believable so as to engender action is a possible condition for ‘knowledge’
  • Context is useful, and can help in disambiguation and understanding knowledge, but is not strictly required
  • A specific language or data model, such as RDF, concept graphs, or OWL, is not required
  • Types or type classification are not required
  • Neither instances, nor attributes, nor concepts, nor specific relations are required, but one or two is
  • A knowledge graph need not be both machine- and human readable
  • No specific schema or formal logic is required
  • Curation is not required, though again believability of the information is important
  • A specific scope, broad or narrow, is not required, and
  • Statements in the knowledge graph need not be ‘triples’, but they do need to be some form of knowledge assertion.

Knowledge Graphs: Best Practices and a Spectrum of Uses

This survey has pointed out a few things regarding knowledge graphs. We see that the term has been around for many years, and has been used differently and with imprecision by many authors. The diversity of knowledge graph use extends from mostly attribute characterizations of instances (such as the Google Knowledge Graph) to mostly conceptual frameworks akin to upper-level ontologies. Attempts to provide a “precise” definition of KGs appear likely doomed to failure, and appear to be more a reflection of prior prejudices than a derivation of some discovered inherent meaning.

About the most we can say about knowledge graphs in general, in keeping with their constituent terms, is that they are a representation of knowledge (however defined) in the structural form of a directed (mostly acyclic) graph. Recent attempts to add more rigor and precision to the definition of a KG beyond this do not appear warranted. Because our understanding of KGs remains somewhat vague, we should ask and expect authors of KGs to define the scope and basis of the knowledge used in their graphs.

In this diversity, KGs are not much different than ontologies, with a similarly broad and contextual use. Indeed, in the words of Ehrlinger and Wöß [9], “Graph-based knowledge representation has been researched for decades and the term knowledge graph does not constitute a new technology. Rather, it is a buzzword reinvented by Google and adopted by other companies and academia to describe different knowledge representation applications.”

Still, there is a common-sense intuitiveness to the term ‘knowledge graph’ and it appears far superior to the label ‘ontology’ as a means of describing these knowledge structures to the general public. For these reasons, I think the term has ‘legs’ and is likely to be the term of use for many years to come. Yet, given the diversity of use, I also agree with the advice of Hogan et al. [11] who “strongly encourage researchers who wish to refer to ‘Knowledge Graphs’ as their object of study to rigorously define how they instantiate the term as appropriate to their investigation.” While we may not find a precise definition across all uses of knowledge graphs, we should strive to define how they are used in specific contexts.

In our own contexts based on Cognonto’s work and our open-source KBpedia, we tend to define knowledge graphs as complete knowledge bases of concepts and instances and their attributes, coherently organized into types and a logical and computable graph, and written in RDF, SKOS, and OWL 2. (Amongst other researchers, our usage perhaps comes closest to that for Pan et al. [40].) The actionability of the ‘knowledge’ in these graphs comes not so much from the explicit tagging of items with provenance metadata as from the use of trustworthy sources and manual vetting and logical testing during graph construction. This definition is appropriate given our use of knowledge graphs, more or less in a centralized form, for data interoperability, knowledge management, and knowledge-based artificial intelligence. Knowledge graphs designed for other purposes — such as for less discriminant Web-wide navigation or retrieval or simple vocabularies — may employ different constructions.

Thus, knowledge graphs, like ontologies, have a broad range of applications and constructions. When one hears the term, it is less important to reflect on some precise understanding as to realize that human language and knowledge is being presented in a connected, graph form. That, I believe, is the better practical and common-sense understanding of KGs.


[1] Amit Singhal, 2012. “Introduction the Knowledge Graph: Things, Not Strings,” originally published May 16, 2012, on the Google blog. Now moved to https://www.blog.google/products/search/introducing-knowledge-graph-things-not/. Retrieved June 28, 2019.
[3] Michael Bergman, 2018. A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce, Springer, 462 pp.
[4] The data is from citations on Google Scholar using a date-range query for ‘knowledge graph’ for citations up to the current year for each year (ex., https://scholar.google.com/scholar?q=%22knowledge+graph%22&hl=en&as_sdt=0%2C5&as_ylo=&as_yhi=1973), and then subtracting prior years’ citations to get the current year count. Note that Google provides about 15 citations prior to 1973, which are all erroneous. These were subtracted from subsequent counts.
[5] E.W. Schneider, 1973. “Course Modularization Applied: The Interface System and Its Implications For Sequence Control and Data Analysis,” 21 pp. paper presented at the Association for the Development of Instructional Systems, Chicago, Illinois, April 1972.
[6] Peter Kümmel, 1973. “An Algorithm of Limited Syntax Based on Language Universals,” in Proceedings of the 5th Conference on Computational Linguistics,-Volume 2, pp. 225-247. Association for Computational Linguistics, 1973.
[7] E. Marchi and O. Miguel, 1974. “On the Structure of the Teaching-learning Interactive Process,” International Journal of Game Theory, 3(2):83–99.
[8] J.P. McCusker, J. Erickson, K. Chastain, S. Rashid, R. Weerawarana, and D. McGuinness, 2018. “What is a Knowledge Graph?,” Semantic Web J. (submitted, but not accepted); see http://www.semantic-web-journal.net/content/what-knowledge-graph.
[9] Lisa Ehrlinger and Wolfram Wöß, 2016. “Towards a Definition of Knowledge Graphs,” SEMANTiCS (Posters, Demos, SuCCESS) 48.
[10] Claudio Gutierrez, 2019, “Concise Account of the Notion of Knowledge Graph,” in Piero Andrea Bonatti, Stefan Decker, Axel Polleres, and Valentina Presutti, eds., Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, pp. 43-45, Dagstuhl Seminar 18371, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[11] Aidan Hogan, Dan Brickley, Claudio Gutierrez, Axel Polleres, and Antoine Zimmerman, 2019, “(Re)Defining Knowledge Graphs,” in Piero Andrea Bonatti, Stefan Decker, Axel Polleres, and Valentina Presutti eds., Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, pp. 74-79, Dagstuhl Seminar 18371, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[12] Sri Nurdiati and Cornelis Hoede, 2008. “25 Years Development of Knowledge Graph Theory: The Results and the Challenge,” Memorandum 1876, September 2008, Department of Applied Mathematics, The University of Twente, Enschede, The Netherlands, 10 pp; available from https://core.ac.uk/download/pdf/11468596.pdf.
[13] R. R. Bakker, 1987. Knowledge Graphs: Representation and Structuring of Scientific Knowledge, Ph.D. thesis,  University of Twente, Enschede, ISBN 9001963-4.
[14] P.H. de Vries, 1989. Representation of Science Texts in Knowledge Graphs, Ph.D. thesis, University of Groningen, Groningen, The Netherlands ISBN 90-367-0179-1.
[15] P. James, 1992. “Knowledge Graphs,” in Reind P. Van de Riet and Robert A. Meersman, eds., Linguistic Instruments in Knowledge Engineering, Proceedings of the 1991 Workshop on Linguistic Instruments in Knowledge Engineering, Tilburg, The Netherlands, 17-18 January 1991, Elsevier Science Inc., 1992. [P. James is likely a pseudonym for other authors]
[16] Liecal Zhang, 2002. Knowledge Graph Theory and Structural Parsing, Ph.D. thesis, Twente University Press, 232 pp.
[17] Roel Popping, 2003. “Knowledge Graphs and Network Text Analysis,” Social Science Information 42 (1): 91-106.
[18] Markus Krötzsch and Gerhard Weikum, 2016. “Web Semantics: Science, Services and Agents on the World Wide Web,” Journal of Web Semantics: Special Issue on Knowledge Graphs, Vol 37-38, pp 53-54.
[19] Markus Krötzsch, 2017. “Ontologies for Knowledge Graphs?,” in Alessandro Artale, Birte Glimm and Roman Kontchakov, eds., Proceedings of the 30th International Workshop on Description Logics, Montpellier, France, July 18-21, 2017, Vol. CEUR-1879.
[20] Heiko Paulheim, 2017. “Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods,” Semantic Web 8 (3): 489-508.
[21] Lixu Shao, Yucong Duan, Xiaobing Sun, Honghao Gao, Donghai Zhu, and Weikai Miao, 2017. “Answering Who/When, What, How, Why through Constructing Data Graph, Information Graph, Knowledge Graph and Wisdom Graph,” in SEKE, pp. 1-6.
[22] Boris Villazon-Terrazas, Nuria Garcia-Santa, Yuan Ren, Alessandro Faraotti, Honghan Wu, Yuting Zhao, Guido Vetere, and Jeff Z. Pan, 2017. “Knowledge Graph Foundations,” in Jeff Z. Pan, Guido Vetere, Jose Manuel Gomez-Perez, and Honghan Wu, eds., Exploiting Linked Data and Knowledge Graphs in Large Organisations, Springer, Heidelberg.
[23] Xander Wilcke, Peter Bloem, and Victor De Boer, 2017. “The Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge,” Data Science 1 (1-2): 39-57.
[24] Frederico Bianchi, Matteo Palmonari, and Debora Nozza, 2018. “Towards Encoding Time in Text-Based Entity Embeddings,” in Denny Vrandečić, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee and Elena Simperl, eds., Proceedings of the 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, pp. 56-71, Springer.
[25] Chenyan Xiong, 2018. “Text Representation, Retrieval, and Understanding with Knowledge Graphs,” Ph.D. thesis, University of Massachusetts, Amherst, MA.
[26] Luigi Bellomarini, Daniele Fakhoury, Georg Gottlob, and Emanuel Sallinger, 2019. “Knowledge Graphs and Enterprise AI: the Promise of an Enabling Technology,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 26-37.
[27] Claudia d’Amato, Sabrina Kirrane, Piero Bonatti, Sebastian Rudolph, Markus Krötzsch, Marieke van Erp, and Antoine Zimmermann, 2019. “Foundations,” in in Piero Andrea Bonatti, Stefan Decker, Axel Polleres, and Valentina Presutti, eds., Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[28] Columbia University, 2019. Knowledge Graph Conference, May 7-8, 2019, New York City. See http://sps.columbia.edu/executive-education/knowledge-graph-conference.
[29] Mayank Kejriwal, 2019. “What Is a Knowledge Graph?,” in Domain-Specific Knowledge Graph Construction, pp. 1-7. Springer, Cham.
[31] PoolParty, Ltd., 2019. Knowledge Graphs: Transforming Data into Knowledge, white paper from Semantic Web Company, Jan 2019, 22 pp. Downloaded June 28, 2019.
[32] Hung Nghiep Tran and Atsuhiro Takasu, 2019. “Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective.” arXiv preprint arXiv:1903.11406.
[33] Maria-Esther Vidal, Kemele M. Endris, Samaneh Jazashoori, Ahmad Sakor, and Ariam Rivas, 2019. “Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case,” Datenbank-Spektrum: 1-12.
[34] H. van den Berg, 1993. Knowledge Graphs and Logic: One of Two Kinds, Ph.D. thesis, University of Twente, Enschede, The Netherlands, ISBN 90-9006360-9.
[35] John F. Sowa, 1976. “Conceptual Graphs for a Data Base Interface,” IBM Journal of Research and Development 20, (4): 336-357.
[36] C. S. Peirce, 1885, “On the Algebra of Logic,” American Journal of Mathematics, 7: 180-202, 1885.
[37] C. S. Peirce (with J. Sowa), 2017. “Existential Graphs (MS 514 of 1909), with commentary by John Sowa,” see http://www.jfsowa.com/peirce/ms514.htm, last modified October 29, 2017; retrieved June 30, 2019.
[38] From Bakker [13, p. 16]: “According to this conception objects, properties and values are base elements of empirically oriented scientific theories and, by consequence, of systems that represent scientific knowledge. Within the empírical system, a property is the result of the identiflcatlon of an aspect of an object, and a value is produced by the measurement of a property. Symbols and concepts (lncluding subsets of natural language, logic, and mathematics) are the linguistlc elements to denote these base elements. Objects, properties and values are related to concepts by a realization or projection of the linguistic concept or symbol into an empirical, ‘real world’ system. Objects, properties and values must have been defined on a linguistic level to be able to speak about empirical elements. Note that a single concept can play different roles. For example, a ‘sphere’ may be an object ín relation to its property ‘radius’, or it may be a value ln relation to a property ‘shape’.”
[39] C. S. Peirce, 1903. “The Ethics of Terminology,” in Syllabus of Certain Topics of Logic, pp. 1931-1959, Alfred Mudge & Son, Boston; see MS 434 and parts of MS 433 and CP 2.219-226.
[40] Jeff Z. Pan, Guido Vetere, Jose Manuel Gomez-Perez, and Honghan Wu, eds. Exploiting Linked Data and Knowledge Graphs in Large Organisations, Heidelberg: Springer, 2017.

Posted by AI3's author, Mike Bergman Posted on July 1, 2019 at 4:09 pm in Charles Sanders Peirce, Ontologies | Comments (6)
The URI link reference to this post is: https://www.mkbergman.com/2244/a-common-sense-view-of-knowledge-graphs/
The URI to trackback this post is: https://www.mkbergman.com/2244/a-common-sense-view-of-knowledge-graphs/trackback/