Posted:March 27, 2017

When we first released Cognonto toward the end of 2016, we provided a starting Web site that had all of the basics, but no frills. In looking at the competition in the artificial intelligence and semantic technology space, we decided a snazzier entry page was warranted. So, we are pleased to announce our new entry page:

Cognonto Entry Page

We also had fun playing around with using recent AI programs to generate images based on various input visual styles. We used AI imagery and our own Cognonto logo as the way to generate some of these.

As I said to a colleague, maybe it was time for us to try to “run with the cool kids.” We hope you like it. We made some other site tweaks as well along the way to releasing this new entry page.

Let me know if you have any comments (good or bad) on this site re-design. Meanwhile, it’s time to get back to the substance . . . .

Posted by AI3's author, Mike Bergman Posted on March 27, 2017 at 4:27 am in Cognonto, KBpedia, Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/2032/new-cognonto-entry-page/
The URI to trackback this post is: http://www.mkbergman.com/2032/new-cognonto-entry-page/trackback/
Posted:March 15, 2017

Download as PDF

CognontoDialing In Queries from the General to the Specific

Inferencing is a common term heard in association with semantic technologies, but one that is rarely defined and still less frequently described as to value and rationale. I try to redress this gap in part with this article.

Inferencing is the drawing of new facts, probabilities or conclusions based on reasoning over existing evidence. Charles Sanders Peirce classed inferencing into three modes: deductive reasoning, inductive reasoning and abductive reasoning. Deductive reasoning extends from premises known to be true and clear to infer new facts. Inductive reasoning looks at the preponderance of evidence to infer what is probably true. And abductive reasoning poses possible explanations or hypotheses based on available evidence, often winnowing through the possibilities based on the total weight of evidence at hand or what is the simplest explanation. Though all three reasoning modes may be applied to knowledge graphs, the standard and most used form is deductive reasoning.

An inference engine may be applied to a knowledge graph and its knowledge bases in order to deduce new knowledge. Inference engines apply either backward- or forward-chaining deductive reasoning. In backward chaining, the reasoning tests are conducted “backwards” from a current consequent or “fact” to determine what antecedents can support that conclusion, based on the rules used to construct the graph. (“What reasons bring us to this fact?”) In forward chaining the opposite occurs; namely, a goal or series of goals are stated and then existing facts (as rules) are checked to see which ones can lead to the goal. (” A goal X may be possible because of?”) The process is iterated until the goal is reached or not; if reached, new knowledge in terms of heretofore unstated connections may be added to the knowledge base.

Inference engines can be applied at the time of graph building or extension to test the consistency and logic of the new additions. Or, semantic reasoners may be applied to a current graph in order to expand queries for semantic search or for these other reasoning purposes. In the case of Cognonto‘s KBpedia knowledge structure, which is written in OWL 2, though the terminology is slightly different, the groundings are in first-order logic (FOL) and description logics. These logical foundations provide the standard rules by which reasoners can be applied to the knowledge graph [1]. In this article, we will not be looking at how inferencing is applied during graph construction, a deserving topic in its own right. Rather, we will be looking at how inferencing may be applied to the existing graph.

Use of Reasoning at Run Time

Once a completed graph passes its logic tests during construction, perhaps importantly after being expanded for the given domain coverage, its principal use is as a read-only knowledge structure for making subset selections or querying. The standard SPARQL query language, occasionally supplemented by rule-based queries using SWRL or for bulk actions using the OWL API, are the means by which we access the knowledge graph in real time. In many instances, such as for the KBpedia knowledge graph, these are patterned queries. In such instances, we substitute variables in the queries and pass those from the HTML to query templates.

When doing machine learning, generally slices get retrieved via query and then staged for the learner. A similar approach is taken to generate entity lists for things like training recognizers and taggers. Some of the actions may also do graph traversals in order to retrieve the applicable subset.

However, the main real-time use of the knowledge structure is search. This relies totally on SPARQL. We discuss some options on how this is controlled below.

Hyponymy, Subsumption and Natural Classes

Select Your Spray

The principal reasoning basis in the knowledge graph is based on hierarchical, hyponymous relations and instance types. These establish the parent-child lineages, and enable individuals (or instances, which might be entities or events) to be related to their natural kinds, or types. Entities belong to types that share certain defining essences and shared descriptive attributes.

For inferencing to be effective, it is important to try to classify entities into the most natural kinds possible. I have spoken elsewhere about this topic [2]; clean classing into appropriate types is one way to ensure the benefits from related search and related querying are realized. Types may also have parental types in a hyponymous relation. This ‘accordion-like’ design is an important aspect that enables external schema to be tied into multiple points in KBpedia [3].

Disjointedness assertions, where two classes are logically distinct, and other relatedness options provide other powerful bases for winnowing potential candidates and testing placements and assignments. Each of these factors also may be used in SPARQL queries.

These constructs of semantic Web standards, combined with a properly constructed knowledge graph and the use of synonymous and related vocabularies in semsets as described in a previous use case, provide powerful mechanisms for how to query a knowledge base. By using these techniques, we may dial-in or broaden our queries, much in the same way that we choose different types of sprays for our garden watering hose. We can focus our queries to the particular need at hand. We explain some of these techniques in the next sections.

Adjusting Query Focus

We can see a crude application of this control when browsing the KBpedia knowledge graph. When we enter a particular query, in this case, ‘knowledge graph‘, one result entry is for the concept of ontology in information science. We see that a direct query gives us a single answer:

Direct Query

However, by picking the inferred option, we now see a listing of some 83 super classes for our ontology concept:

Inferring Querying

By reasoning for deductive inference, we are actually broadening our query to include all of the parental links in the subsumption chain within the graph. Ultimately, this inference chain traces upward into the highest order concept in the graph, namely owl:Thing. (By convention, owl:Thing itself is excluded from these inferred results.)

By invoking inference in this case, while we have indeed broadened the query, it also is quite indiscriminate. We are reaching all of the ancestors to our subject concept, reaching all of the way to the root of the graph. This broadening is perhaps more than what we actually seek.

Scoping Queries via Proerty Paths

Among many other options, SPARQL also gives us the ability to query specific property paths [4]. We can invoke these options either in our query templates or programmatically in order to control the breadth and depth of our desired query results.

Let’s first begin with the SPARQL query that uses ‘knowledge graph’ in its altLabel

==============
select ?s ?p ?o
from <http://kbpedia.org/1.40/>
where
{
  ?s <http://www.w3.org/2004/02/skos/core#altLabel> "Knowledge graph"@en ;
     ?p ?o .
}
==============

You can see from the results below that only the concept of ontology (information science) is returned as a prefLabel result, with the concept’s other altLabels also shown:

      ==============
      s     p     o

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      http://www.w3.org/2002/07/owl#Class
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#isDefinedBy
      http://kbpedia.org/kko/rc/
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#prefLabel "Ontology (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction(computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Language"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology media"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "New media relations"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Strong ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Libraries (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologing"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Computational ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Populated ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Knowledge graph"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Domain ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#definition    
      "In computer science and information science, an ontology is a formal
      naming and definition of the types, properties, and interrelationships of
      the entities that really or fundamentally exist for a particular domain
      of discourse."@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://kbpedia.org/ontologies/kko#superClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

This result gives us the basis for now asking for the direct parents of our ontology concept, using this query:

      ==============
      select ?directParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>
        ?directParent .
      }
      ==============

We see that the general concepts of knowledge representation-CW and ontology are parents to our concept, as well as the external Wikipedia result on ontology (information science):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>

      ==============

If we turn on the inferred option, we will get the full listing of the 83 concepts noted earlier. This is way too general for our current needs.

While it is not possible to specify a depth using SPARQL, it is possible to use property paths to control the extent of the query results from the source. In this case, we specify a path length of 1:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,1}
        ?inferredParent .
      }
      ==============

Which produces results equivalent to the “direct” search (namely, direct parents only):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

However, by expanding our path length to two, we now can request the parents and grandparents for the ontology (information science) concept:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,2}
        ?inferredParent .
      }
      =============

This now gives us 15 results from the parental chain:

      ==============
      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254

      ==============

Similarly we can expand our query request to a path length of 3, which gives us the parental chain from parents + grandparents + great-grandparents):

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,3}
        ?inferredParent .
      }
      =============

In this particular case, we do not add any further results for great-grandparents:

      ==============

      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254
      ==============

Without a property path specification, our inferred request would produce the listing of 83 results shown by the Inferred tab on the KBpedia knowledge graph, as shown in the screen capture provided earlier.

The online knowledge graph does not use these property path restrictions in its standard query templates. But these examples show how programmatically it is possible to broaden or narrow our searches of the graph, depending on the relation chosen (subClassOf in this example) and the length of the specified property path.

Many More Options and Potential for Control

This use case is but a small example of the ways in which SPARQL may be used to dial-in or control the scope of queries posed to the knowledge graph. Besides all of the standard query options provided by the SPARQL standard, we may also remove duplicates, identify negated items, and search inverses, selected named graphs or selected graph patterns.

Beyond SPARQL and now using SWRL, we may also apply abductive reasoning and hypothesis generation to our graphs, as well as mimic the action of expert systems in AI through if-then rule constructs based on any structure within the knowledge graph. A nice tutorial with examples that helps highlight some of the possibilities in combining OWL 2 with SWRL is provided by [5]

A key use of inference is its ability to be applied to natural language understanding and the extension of our data systems to include unstructured text, as well as structured data. For this potential to be fully realized, it is important that we chunk (“parse”) our natural language using primitives that themselves are built upon logical foundations. Charles S. Peirce made many contributions in this area as well. Semantic grammars that tie directly into logic tests and reasoning would be a powerful addition to our standard semantic technologies. Revisions to the approach taken to Montague grammars may be one way to achieve this illusive aim. This is a topic we will likely return to in the months to come.

Finally, of course, inference is a critical method for testing the logic and consistency of our knowledge graphs as we add new concepts, make new relations or connections, or add attribute data to our instances. All of these changes need to be tested for consistency moving forward. Nurturing graphs by testing added concepts, entities and connections is an essential prerequisite to leveraging inferencing at run time as well.

This article is part of an occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most center around the general use and benefits of knowledge graphs, but best practices and other applications are also discussed. Prior machine learning use cases, and the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.

[1] See, for example, Markus Krötzsch, Frantisek Simancik, and Ian Horrocks, 2012. “A Description Logic Primer.” arXiv preprint, arXiv:1201.4089; and Franz Baader, 2009.  “Description Logics,” in Sergio Tessaris, Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried Handschuh, Marie-Christine  Rousset, and Renate  A. Schmidt, editors, Reasoning Web. Semantic Technologies for Information Systems – 5th International Summer School, 2009, volume 5689 of LNCS, pages 1–39. Springer, 2009. 
[2] M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web,” in AI3:::Adaptive Information blog, July 13, 2015.
[3] M.K. Bergman, 2016. “Rationales for Typology Designs in Knowledge Bases,” in AI3:::Adaptive Information blog, June 6, 2016.
[4] Steve Harris and Andy Seaborne, eds., 2013. SPARQL 1.1 Query Language, World Wide Web Consortium (W3C) Recommendation, 21 March 2013; see especially Section 9 on property paths.
[5] Martin Kuba, 2012. “Owl 2 and SWRL Tutorial,” from Kuba’s Web site. 

Posted by AI3's author, Mike Bergman Posted on March 15, 2017 at 12:10 pm in Cognonto, KBpedia, Searching, Semantic Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/
The URI to trackback this post is: http://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/trackback/
Posted:March 2, 2017

CognontoaltLabels Help Reinforce ‘Things not Strings’

One of the strongest contributions that semantic technologies make to knowledge-based artificial intelligence (KBAI) is to focus on what things mean, as opposed to how they are labeled. This focus on underlying meaning is captured by the phrase, “things not strings.”

The idea of something — that is, its meaning — is conveyed by how we define that something, the context in which the various tokens (terms) for that something is used, and in the variety of terms or labels we apply to that thing. The label alone is not enough. The idea of a parrot is conveyed by our understanding of what the name parrot means. Yet, in languages other than English, the same idea of parrot may be conveyed by the terms Papagei, perroquet, loro, попугай, or オウム, depending on the native language.

The idea of the ‘United States‘, even just in English, may be conveyed with labels ranging from America, to US, USA, U.S.A., Amerika, Uncle Sam, or even the Great Satan. As another example, the simple token ‘bank‘ can mean a financial institution, a side of a river, turning an airplane, tending a fire, or a pool shot, depending on context. What these examples illustrate is that a single term is more often not the only way to refer to something, and a given token may mean vastly different things depending on the context in which it is used.

Knowledge graphs are also not comprised of labels, but of concepts, entities and the relationships between those things. Knowledge graphs constructed from single labels for individual nodes and single labels for different individual relations are, therefore, unable to capture these nuances of context and varieties of reference. In order for a knowledge graph to be useful to a range of actors it must reflect the languages and labels meaningful to those actors. In order for us to be able to distinguish the accurate references of individual terms we need the multiple senses of terms to each be associated with its related concepts, and then to use the graph relationships for those concepts to help disambiguate the intended meaning of the term based on its context of use.

In the lexical database of WordNet, the variety of terms by which a given term might be known is collectively called a synset. According to WordNet, a synset (short for synonym set) is “defined as a set of one or more synonyms that are interchangeable in some context without changing the truth value of the proposition in which they are embedded.” In Cognonto‘s view, the concept of a synset is helpful, but still does not go far enough. Any name or label that draws attention to a given thing can provide the same referential power as a synonym. We can include in this category abbreviations, acronyms, argot, diminutives, epithets, idioms, jargon, lingo, misspellings, nicknames, pseudonyms, redirects, and slang, as well as, of course, synonyms. Collectively, we call all of the terms that may refer to a given concept or entity a semset. In all cases, these terms are mere pointers to the actual something at hand.

In the KBpedia knowledge graph, these terms are defined either as skos:prefLabel (the preferred term), skos:altLabel (all other semset variants) or skos:hiddenLabel (misspellings). In this article, we show an example of semsets in use, discuss what we have done specifically in KBpedia to accommodate semsets, and summarize with some best practices for semsets in knowledge graph construction.

A KBpedia Example

Let’s see how this concept of semset works in KBpedia. Though in practice much of what is done with KBpedia is done via SPARQL queries or programmatically, here we will simply use the online KBpedia demo. Our example Web page references our recent announcement of KBpedia v. 1.40. We begin by entering the URL for this announcement into the Analyze Web Page demo box on the Cognonto main page:

Running the Cognonto Demo

After some quick messages on the screen telling us how the Web page is being processed, we receive the first results page for the analysis of our KBpedia version announcement. The tab we are looking at here highlights the matching concepts we have found, with the most prevalent shown in red-orange. Note that one of those main concepts is a ‘knowledge graph’:

Concept Tagging in the Web Page

If we mouseover this ‘knowledge graph’ tag we see a little popup window that shows us what KBpedia concepts the string token matches. In this case, there is only one concept match, that of OntologyInformationScience; the term ‘knowledge graph’ itself is not a listed match (other highlighted terms may present multiple possible matches):

Tag Links to KBpedia

When we click on the live link to OntologyInformationScience we are then taken to that concept’s individual entry within the KBpedia knowledge structure:

'Knowledge Graph' in the Semset

Other use cases describe more fully how to browse and navigate KBpedia or how to search it. To summarize here, what we are looking at above is the standard concept header that presents the preferred label for the concept, followed by its URI and then its alternative labels (semset). Note that ‘knowledge graph’ is one of the alternative terms for OntologyInformationScience.

We can also confirm that only one concept is associated with the ‘knowledge graph’ term by searching for it. Note, as well, that we have the chance to also search individual fields such as the title (preferred label), alternative labels (semset), URI or definitions of the concept:

Searching altLabels Alone

What this example shows is that a term, ‘knowledge graph’, in our original Web page, while not having a concept dedicated to that specific label, has a corresponding concept of OntologyInformationScience as provided through its semset of altLabels. Semsets provide us an additional term pool by which we can refer to a given concept or entity.

Implementing Semsets in KBpedia

For years now, given that our knowledge graphs are grounded in semantic technologies, we have emphasized the generous use of altLabels to provide more contextual bases for how we can identify and get to appropriate concepts. In our prior public release of KBpedia (namely, v 120), we had more than 100,000 altLabels for the approximately 39,000 reference concepts in the system (an average of 2.6 altLabels per concept).

As part of the reciprocal mapping effort we undertook in moving from version 1.20 to version 1.40 of KBpedia, as we describe in its own use case, we also made a concerted effort to mine the alternative labels within Wikipedia. (The reciprocal mapping effort, you may recall, involved adding missing nodes and structure in Wikipedia that did not have corresponding tie-in points in the existing KBpedia.)  Through these efforts, we were able to add more than a quarter million new alternative terms to KBpedia. Now, the average number of altLabels per RC exceeds 6.6, representing a 2.5X increase over the prior version, even while we were also increasing concept coverage by nearly 40%. This is the kind of effort that enables us to match to ‘knowledge graph’.

Best Practices for Semsets in Knowledge Graphs

At least three best-practice implications arise from these semset efforts:

  • First, it is extremely important when mapping new knowledge sources into an existing target knowledge graph to also harvest and map semset candidates. Sources like Wikipedia are rich repositories of semsets;
  • Second, like all textual additions to a knowledge graph, it is important to include the language label for the specific language being used. In the baseline case of KBpedia, the specific baseline language is English (en). By tagging text fields with their appropriate language tag, it is possible to readily swap out labels in one language for another. This design approach leads to a better ability to represent the conceptual and entity nature of the knowledge graph in multiple natural languages; and,
  • Third, in actual enterprise implementations, it is important to design and include workflow steps that enable subject matter experts to add new altLabel entries as encountered. Keeping semsets robust and up-to-date is an essential means for knowledge graphs to fulfill their purpose as knowledge structures that represent “things not strings.”

Semsets, along with other semantic technology techniques, are essential tools in constructing meaningful knowledge graphs.

This article is part of an occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most center around the general use and benefits of knowledge graphs, but best practices and other applications are also discussed. Prior machine learning use cases, and the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.

Posted by AI3's author, Mike Bergman Posted on March 2, 2017 at 9:23 am in Cognonto, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/2027/the-importance-of-semsets-in-knowledge-graph-design/
The URI to trackback this post is: http://www.mkbergman.com/2027/the-importance-of-semsets-in-knowledge-graph-design/trackback/
Posted:February 28, 2017

CognontoNew Technique of Reciprocal Mapping Adds 40% to Scope

We are pleased to announce that KBpedia, the large-scale dedicated knowledge graph for knowledge-based artificial intelligence (or KBAI), was released in a greatly expanded version today. The new KBpedia v.1.40 was expanded by 40% to now 54,000 concepts via a new method called reciprocal mapping that I covered in an article last week.

Knowledge graphs, technically known as ontologies, are normally expanded by mapping external knowledge systems to concepts that already reside in the target graph. This poses problems when the new source has different structure or much greater detail than the target graph has on its own. I likened this in my article last week to the “Swiss cheese problem” of gaps in coverage when aligning or combining knowledge graphs.

Reciprocal mapping is a new artificial intelligence method for identifying detailed structure in source knowledge bases, and then in identifying the proper placement points for that new structure within the target knowledge graph. These candidate placements are then tested against a series of logic and consistency tests to ensure the placements and the scope of the added structure in the now-expanded knowledge graph remain coherent. Candidates that pass these tests are then manually vetted for final acceptance before committing to a new build.

This reciprocal mapping method was applied to the source of “clean” Wikipedia categories against the KBpedia target. After all logic and consistency tests, KBpedia was expanded by nearly 15,000 new categories. The same process was used to also add missing definitions and new synonyms to KBpedia.

Frédérick Giasson, Cognonto’s CTO, developed some new graph embedding techniques coupled with machine learning to automate the generation of candidates for this reciprocal mapping process. This two-step process of standard mappings followed by reciprocal mappings can be applied to any external knowledge base. The technique means we can achieve the ‘highest common denominator’ capturing the full structure of source and target knowledge bases when mapping. Reciprocal mapping overcomes prior gaps when integrating enterprise knowledge into computable knowledge bases.

The new version 1.40 of the online KBpedia may be browsed, searched and inspected on the Cognonto Web site. The site also provides further documentation on how to browse the graph and how to search it.

Knowledge graphs are under constant change and need to be extended with specific domain information for particular enterprise purposes. The combinatorial aspects of adding new external schema or concepts to an existing store of concepts can be extensive. KBpedia, with its already tested and logical knowledge structure, is a computable foundation for guiding and testing new mappings. Such expanded versions may be tailored for any domain and enterprise need.

The KBpedia knowledge structure combines six (6) public knowledge bases — Wikipedia, Wikidata, OpenCyc, GeoNames, DBpedia and UMBEL — into an integrated whole. These core KBs are supplemented with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and artificial intelligence. KBpedia greatly reduces the time and effort traditionally required for data preparation and tuning common to AI tasks. KBpedia was first released in October 2016, though it has been under active development for more than six years.

Posted by AI3's author, Mike Bergman Posted on February 28, 2017 at 4:09 am in Cognonto, KBpedia, Knowledge-based Artificial Intelligence | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/2026/new-kbpedia-release-greatly-expands-knowledge-structure/
The URI to trackback this post is: http://www.mkbergman.com/2026/new-kbpedia-release-greatly-expands-knowledge-structure/trackback/
Posted:February 22, 2017

Download as PDF

Mobius Band I - MC EsherAn Advance Over Simple Mapping or Ontology Merging

The technical term for a knowledge graph is ontology. As artifacts of information science, artificial intelligence, and the semantic Web, knowledge graphs may be constructed to represent the general nature of knowledge, in which case they are known as upper ontologies, or for domain or specialized purposes. Ontologies in these senses were first defined more than twenty years ago, though as artifacts they have been used in computer science since the 1970s. The last known census of ontologies in 2007 indicated there were more than 10,000 then in existence, though today’s count is likely in excess of 40,000 [1]. Because of the scope and coverage of these general and domain representations, and the value of combining them for specific purposes, key topics of practical need and academic research have been ontology mappings or ontology mergers, known collectively as ontology alignment. Mapping or merging make sense when we want to combine existing representations across domains of interest.

At Cognonto, ontology alignment is a central topic. Our KBpedia knowledge structure is itself the result of mapping nearly 30 different information sources, six of which are major knowledge bases such as Wikipedia and Wikidata. When applied to domain problems, mapping of enterprise schema and data is inevitably an initial task. Mapping techniques and options are thus of primary importance to our work with knowledge-based artificial intelligence (KBAI). When mapping to new sources we want to extract the maximum value from each contributing source without devolving to the tyranny of the lowest common denominator. We further need to retain the computability of the starting KBpedia knowledge structure, which means maintaining the logic, consistency, and coherence when integrating new knowledge.

We are just now completing an update to KBpedia that represents, we think, an important new option in ontology alignment. We call this option reciprocal mapping, and it represents an important new tool in our ontology mapping toolkit. I provide a high-level view and rationale for reciprocal mapping in this article. This article accompanies a more detailed use case by Fred Giasson that discusses implementation details and provides sample code.

The Mapping Imperative and Current Mappings

Cognonto, like other providers of services in the semantic Web and in artificial intelligence applied to natural languages, views mapping as a central capability. This importance is because all real-world knowledge problems amenable to artificial intelligence best express their terminology, concepts, and relations between concepts in a knowledge graph. Typically, that mapping relies upon a general knowledge graph, or upper ontology, or multiples of them, as the mapping targets for translating the representation of the domain of interest to a canonical form. Knowledge graphs, as a form, can represent differences in semantic meaning well (you say car, I say auto) while supporting inference and reasoning. For these problems, mapping must be seen as a core requirement.

There is a spectrum of approaches for how to actually conduct these mappings. At the simplest and least accurate end of the spectrum are string matching methods, sometimes supplemented by regular expression processing and heuristic rules. An intermediate set of methods uses concepts already defined in a knowledge base as a way to “learn” representations of those concepts; while there are many techniques, two that Cognonto commonly applies are explicit semantic analysis and word embedding. Most of these intermediate methods require some form of supervised machine learning or other ML techniques. At the more state-of-the-art end of the spectrum are graph embeddings or deep learning, which also capture context and conceptual relationships as codified in the graph.

Aside from the simple string match approaches, all of the intermediate and state-of-the-art methods use machine learning. Depending on the method, these machine learners require developing either training sets or corpuses as a reference basis for tuning the learners. These references need to be manually scoped, as in the case of training corpuses for unsupervised learning, or manually scored into true and false positives and negatives for training sets for supervised learning. Cognonto uses all of these techniques, but importantly supplements them with logic tests and scripts applied to the now-modified knowledge graph to test coherence and consistency issues that may arise. The coherency of the target knowledge graph is tested as a result of the new mappings.

Items failing those tests are fixed or dropped. Though not a uniform practice by others, Cognonto also adheres to a best practice that requires candidate mappings to be scored and manually inspected before final commitment to the knowledge graph. The knowledge graph is a living artifact, and must also be supported by proper governance and understood workflows. Properly constructed and maintained knowledge graphs can power much work from tagging to classification to question answering. Knowledge graphs are one of the most valuable information assets an enterprise can create.

We have applied all of these techniques in various ways to map the six major knowledge bases that make up the core of KBpedia, plus to the other 20 common knowledge graphs mapped to KBpedia. These mappings are what is contained in our current version 1.20 of KBpedia, the one active on the Cognonto Web site. All of these nearly 30 mappings represent using the basis of KBpedia (A) as the mapping target for the contributing KBs (B). All version 1.20 mappings are of this B A form.

Reciprocal Mappings to Extend Scope

This is well and good, and is the basis for how we have populated what is already in the KBpedia knowledge graph, but what of the opposite? In other words, there are concepts and local graph structure within the contributing KBs that do not have tie-in points (targets) within the existing KBpedia. This is particularly true for Wikipedia with its (mostly) comprehensive general content. From the perspective of Wikipedia (B), KBpedia v 1.20 (A) looks a bit like Swiss cheese with holes and gaps in coverage. Any source knowledge graph is likely to have rich structure and detail in specific areas beyond what the target graph may contain. 

As part of our ongoing KBpedia improvement efforts, the next step in a broader plan in support of KBAI, we have been working for some time on a form of reverse mapping. This process analyzes coverage in B to augment the scope of A (KBpedia). This kind of mapping, which we call reciprocal mapping, poses new challenges because candidate new structure must be accurately placed and logically tested against the existing structure. Fred recently wrote up a use case that covers this topic in part.

We are now nearly complete with this reciprocal mapping from Wikipedia to KBpedia (B (A+B) ). Reciprocal mapping results in new concepts and graph structure being added to KBpedia (now A+B). It looks like this effort, which we will finalize and release soon, will add nearly 40% to the scope of KBpedia.

This expansion effort is across-the-board, and is not focused or limited to any particular domain, topic or vocabulary. KBpedia’s coverage will likely still be inadequate for many domain purposes. Nonetheless, the effort does provide a test bed for showing how external vocabularies may be mapped and what benefits may arise from an expanded KBpedia scope, irrespective of domain. We will report here and on the Cognonto Web site upon these before and after results when we release.

Knowledge graphs are under constant change and need to be extended with specific domain information for particular domain purposes. KBpedia is perhaps not the typical ontology mapping case, since its purpose is to be a coherent, consistent, feature-rich structure to support machine learning and artificial intelligence. Yet, insofar as domain knowledge graphs aspire to support similar functionality, our new methods for reciprocal mapping may be of interest. In any case, effective means at acceptable time and cost must be found for enhancing or updating knowledge graphs, and reciprocal mapping is a new tool in the arsenal to do so.

The Reciprocal Mapping Process

Wikipedia is a particularly rich knowledge base for the reciprocal mapping process, though other major KBs also are suitable, including smaller domain-specific ones. In order to accomplish the reciprocal mapping process, we observed there were several differences for this reverse mapping case from our standard B A mapping. First, categories within the source KB are the appropriate basis for mapping to the equivalent concept (class) structure in KBpedia. The category structure in the source KB is the one which also establishes a similar graph structure. Second, whatever the source knowledge base, we need to clean its categories to make sure they correspond to the actual subsumption bases reflective in the target KBpedia. Cleaning involves removing administrative and so-called “compound” categories, or ones of convenience but not reflective of natural classes. In the case of Wikipedia, this cleaning results in the elimination of about 80% of the starting categories. Third, we also need to capture structural differences in the source knowledge graph (B). Possible category matches fall into three kinds: 1) leaf categories, which represent child extensions to existing KBpedia terminal nodes; 2) near-leaf categories, which also are extensions to existing KBpedia terminal nodes, but which also are parents to additional child structure in the source; and 3) core categories, which tie into existing intermediate nodes in KBpedia that are not terminal nodes. By segregating these structural differences, we are able to train more precise placement learners.

From our initial simple mapping (B A) we already have thousands of KBpedia reference concepts mapped to related Wikipedia categories. What we want to do is to use this linkage to propose a series of new sub-classes that we could add to KBpedia based on the sub-categories that exist in Wikipedia for each of these mappings. The challenge we face by proceeding in this way is that our procedure potentially creates tens of thousands of new candidates. Because the Wikipedia category structure has a completely different purpose than the KBpedia knowledge graph, and because Wikipedia’s creation rules are completely different than KBpedia, many candidates are inconsistent or incoherent to include in KBpedia. A cursory inspection shows that most of the candidate categories need to be dropped. Reviewing hundred of thousands of new candidates manually is not tenable; we need an automatic way to rank potential candidates.

The way we automate this process is to use an SVM classifier trained over graph-based embedding vectors generated using the DeepWalk method [2]. DeepWalk learns the sub-category patterns that exists in the Wikipedia category structure in an unsupervised manner. The result is to create graph embedding vectors for each candidate node. Our initial  B A maps enable us to quickly create training sets with thousands of pre-classified sub-categories. We split 75% of the training set for training, and 25% for cross-validation. We also employ some hyperparameter optimization techniques to converge to the best learner configuration.Once these three steps are completed, we classify all of the proposed sub-categories and create a list of potential sub-class-of candidates to add into KBpedia, which is then validated by a human. These steps are more fully described in the detailed use case.

We use visualization and reference “gold” standards as means to further speed the time to get to a solution. We visualize interim passes and tests using the TensorFlow Projector web application. Here, for example, is a 3D representation of the some of the clusters in the concepts:

SuperTypes View

Another way we might visualize things is to investigate the mappings by topic. Here, for example, we highlight the Wikipedia categories that have the word “wine” in them.

Wine Concepts View

While the visualizations are nice and help us to understand the mapping space, we are most interested in finding the learner configurations that produce the “best” results. Various statistics such as precision, recall, accuracy and others can help us determine that. The optimization target is really driven by the client. If you want the greatest coverage and can accept some false positives, then you might favor recall. If you only want a smaller set of correct results, then you would likely favor precision. Other objectives might emphasize other measures such as accuracy or AUM.

The reference “gold” standards in the scored training sets provide the basis for computing all of these statistics. We score the training sets as to whether a given mapping is true or false (correct or not). (False mappings need to be purposefully introduced.) Then, when we parse the test candidates against the training set, we note whether the learner result is either positive or negative (indicated as correct or indicated as not correct). When we match the test to the training set, we thus get one of four possible scores: true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). Those four simple scoring categories are sufficient for calculating any of the statistical measures.

We capture the reciprocal mapping process using a repeatable pipeline with the reporting of these various statistical measures, enabling rapid refinements in parameters and methods to achieve the best-performing model, according to how we have defined “best” per the discussion above. Once appropriate candidate categories are generated using this optimized model, the results are then inspected by a human to make selection decisions. We then run these selections against the logic and coherency tests for the now-modified graph, and keep or modify or drop the final candidate mappings depending on how they meet the tests. The semi-automatic methods in this use case can be applied to extending KBpedia with any external schema, ontology or vocabulary.

This semi-automated process takes 5% of the time it would normally take to conduct this entire process by comparable manual means. We know, since we have been doing the manual approach for nearly a decade.

The Snake Eating Its Tail

As we add each new KB to KBpedia or expand its scope through these mapping methods, verified to pass all of our logic and satisfiability tests, we continue to have a richer structure for testing coherence and consistency for the next iteration. The accretive process that enables these growing knowledge bases means there is more structure, more assertions, and more relations to test new mappings. The image that comes to mind is that of ouroboros, one of the oldest images of fertility. The snake or serpent eating its tail signifies renewal.

KBpedia has already progressed through this growth and renewal process hundreds of times. Our automated build scripts mean we can re-generate KBpedia on a commodity machine from scratch in about 45 minutes. If we add all of the logic, consistency and satisfiability checks, a new build can be created in about two hours.This most recent reciprocal mapping effort adds about 40% more nodes to KBpedia’s current structure. Frankly, this efficient and clean build structure is remarkable for one of the largest knowledge structures around with about 55,000 nodes and 20 million mapped instances.  Remarkably, using the prior KBpedia as the starting structure, we have been able to achieve this expansion with even better logical coherence of the graph in just a few hundred hours of effort.

The mapping methods discussed herein can extend KBpedia using most any external source of knowledge, one which has a completely different structure than KBpedia and one which has been built completely differently with a different purpose in mind than KBpedia. A variety of machine learning methods can reduce the effort required to add new concepts or structure by 95% or more. Machine learning techniques can filter potential candidates automatically to reduce greatly the time a human reviewer has to spend to make final decisions about additions to the knowledge graph. A workable and reusable pipeline leads to fast methods for testing and optimizing parameters used in the machine learning methods. The systematic approach to the pipeline and the use of positive and negative training sets means that tuning the approach can be fully automated and rapidly vetted.

This is where the real value of KBpedia resides. It is already an effective foundation for guiding and testing new domain extensions. KBpedia now shows itself to be the snake capable of consuming (mapping to), and thereby growing from, nearly any new knowledge base.


[1] A simple Google search of https://www.google.com/search?q=filetype:owl (OWL is the Web Ontology Language, one of the major ontology formats) shows nearly 39,000 results, but there are multiple ontology languages available, such as RDF, RDFS, and others (though use of any of these languages does not necessarily imply the artifact is a vocabulary or ontology).
[2] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.