Posted:May 8, 2017

KBpediaIt’s Time for Ontologies to Put on Their Big Boy Pants

Many of us have been in the semantic Web game for more than a decade. My own first exposure in the early 2000s was spent trying to figure out what the difference was between XML and RDF. (Fortunately, that confusion has long since passed.) We also grappled with the then-new concept of ontologies, now more easily understood as knowledge graphs. In this process many lessons have been learned, but also much promise has yet to be realized.

One of the most important lessons is that the semantic Web is best seen not as an end unto itself. Rather, it, and the semantic technologies that underly it, is really just a means to get at important, longstanding challenges in data interoperability and artificial intelligence. Our work with knowledge graphs needs to be viewed through this lens of what we can do with these technologies to address real problems, not solely for technology’s sake.

It is with this spirit in mind that we are working on our next release of KBpedia, the knowledge structure that knits together six major public knowledge bases for the purpose of speeding machine learning and providing a scaffolding for data interoperability. This pending new release will expand KBpedia in important ways. It will provide a next step on the path to realizing the promise of knowledge graphs.

I will be sharing a series of articles to lay the groundwork for this release, as well as then, after release, to explain what some of it means. This first article begins by discussing the state-of-the-art in semantic knowledge graphs, what they currently do, and what they (often) currently don’t. I grade each of three major areas related to knowledge graphs, in declining order of achievement. My basic report is that we have gotten many things right — witness the growth and credibility of knowledge graphs across all current search services and intelligent agents — but it is time for knowledge graphs to grow out of knickers and don big boy pants.

Important note: Some ontologies in industrial, engineering and biomedical realms do take a more sophisticated view of relations and data. However, these are not the commonly known knowledge graphs used in artificial intelligence, natural language understanding, or intelligent, virtual agents. These latter areas are my primary focus due to our emphasis on knowledge-based artificial intelligence.

The Current Knowledge Graph Reader: Concepts and Entities

We watch our children first learn the names of things as they begin mastering language. The learning focus is on nouns, and building a vocabulary about the things that populate the tangible world. By the time we begin putting together our first sentences, lampooned in such early books such as Dick and Jane and the dog Spot, our nouns are getting increasingly numerous and rich, though our verbs remain simple. Early language acquisition, as with the world itself, is much more populated by different kinds of objects than different kinds of actions. Our initial verbs tend to be fewer in number and much less varied than the differences of form and circumstance we can see from objects. Most knowledge graphs have an orientation to things and concepts, the nouns of the knowledge space, much like a Dick and Jane reader. Entities and concepts have occupied my own work on the UMBEL and KBpedia ontologies over the past decade. It is clear similar emphasis has occurred in public knowledge bases, as well. Nouns and categorizing things have been the major focus of efforts to date.

For example, major knowledge base constituents of KBpedia, such as Wikidata, Wikipedia or GeoNames, have millions of concepts or entities within them, but fewer than a few thousand predicates (approx. 2500 useful in Wikidata and 750 or so in DBpedia and schema.org). Further, reasoners that we apply over these graphs have not been expanded to deal with rich predicates. Reasoners mostly rely on inference over subsumption hierarchies, disjointedness, and property conditions like cardinality and range. Mapping predicates are mostly related to subsumption and equivalence, with the latter commonly misused [1].

Yet, even within the bounds of nouns, we unfortunately have not done well in identifying context. Disambiguation is made difficult without context. Though context may be partially described by nouns related to perception, situations, states and roles, we ultimately require an understanding of events, actions and relations. Until these latter factors are better captured and understood, our ability to establish context remains limited.

The semantic technology languages of RDF and OWL give us the tools to handle these constructs, at least within the limits of first-order logic, but we have mostly spent the past 15 years mastering kindergarten-level basics. To illustrate how basic this is, try to understand how different knowledge graphs treat entities (are they individuals, instances, particulars, or including events or concepts?) versus concepts (are they classes, types, generals, including or not abstractions?). There is certainly not uniformity of treatment of these basic noun grammars. Poor mappings and the inability to capture context further drag down this grade.

Grade: B-

Only the Simplest of Relations

I’ve already noted the paucity of relations in (most) current knowledge graphs. But a limited vocabulary is not the only challenge.

There is no general nor coherent theory expressed in how to handle relations in use within the semantic Web. We have expressions that characterize individual things, what we, in our own work, term attributes. We have expressions that name or describe things, including annotations or metadata, what we term denotatives. We have expressions that point to or indicate things, what we term indexicals. And, we have expressions that characterize relations between external objects or actions an agent might take, what we term external relations. These are some of our terms for these relations — which we will describe in detail in the second and third parts of this series — but it is unlikely you will find most or all of these distinctions in any knowledge graph. This lack is a reflection of the inattention to relations.

Modeling relations and predicates needs to capture a worldview of how things are connected, preferably based on some coherent, underlying rationale. Similar to how we categorize the things and entities in our world, we also need to make ontological choices (in the classic sense of the Greek ontos, or the nature of being) as to what a predicate is and how predicates may be classified and organized. Not much is discussed about this topic in the knowledge graph literature, let alone put into practice.

The semantic Web has no well-known or accepted ontology of relations or properties. True, OWL offers the distinction of annotation, object and datatype properties, and also allows property characteristics such as transitivity, domain, range, cardinality, inversion, reflexivity, disjunction and the like to be expressed, but it is a rare ontology that uses any or many of these constructs. The subProperty expression is used, but only in limited instances and rarely (none, to my knowledge) in a systematic schema. For example, it is readily obvious that some broader predicate such as animalAction could be split into involuntaryAction and voluntaryAction, and then into specific actions such as breathing or walking, and so on, but schema with these kinds of logical property subsumptions are not evident. Structurally, OWL can be used to reason over actions and relations in a similar means as we reason over entities and types, but our common ontologies have yet to do so. Yet creating such schema are within grasp, since we have language structures such as VerbNet and other resources we could put to the task.

We want to establish such a schema so as to be able to reason and organize (categorize) actions and relations. We further want such a schema to segregate out intrinsic relations (attributes) from relations between things, or from descriptions about or indexes to things. This greater understanding is exactly what is needed to reason over relations. It is also what is called for in being able to relate parsed tokens to a semantic grammar. Relation and fact extraction from text further requires this form of schema. Without these broader understandings, we can not adequately capture situations and context, necessary for disambiguating the things in our world.

Though the splits and names may not be exactly as I would have preferred, we nonetheless have sufficient syntax and primitives in OWL by which we can develop such a schema of relations. However, since virtually nothing has been done in this regard over the 15 years of the semantic Web, I have to decrement its grade accordingly.

Grade: C

Oh, and Then There’s the Problem with Data

Besides machine learning, my personal motivations and strongly held beliefs in semantic technologies have been driven by the role they can play in data interoperability. By this term I mean the ability to bring information together from two or more sources so as to effectively analyze and make decisions over the combined information. The first challenge in data interoperability is to ensure that when we talk about things in two or more settings, we understand whether we are talking about the same or different things. To date, this has been a primary use of semantic technologies, though equivalence distinctions remain problematic [1]. We can now relate information in unstructured, semi-structured and structured formats to a common basis. Ontologies are getting mature for capturing nouns. That portion of data interoperability, as noted above, gets a grade of B-.

But there are two additional factors in play with data interoperability. The first is to ensure we are understanding situations and contexts, what received a grade of C above. The remaining factor is actually relating the values associated with the entities or things at hand. In this regard, our track record to date has been abysmal.

As Kingsley Idehen is wont to explain, the linked data model of the semantic Web can be seen to conform to the EAV (entity-attribute-value) data model. We can do pretty well about entities (E), so long as we agree what constitutes an entity and we can accept some mis-assignments. No one really agrees as to what constitutes an attribute (A) (a true attribute, a property, or something other). And while we all intuitively know what constitutes a value, there is no agreement as to data types, units, or ways to relate values in different formats to one another. Though the semantic Web knows how to pump out data using the EAV model, there’s actually very little guidance on how we ingest and conform values across sources. Without this factor, there is no data interoperability. The semantic Web may know how to port relational data to a semantic model, but it still does not how to reconcile values. The ABox, in descriptive logic terms, is barely being tapped [2].

We fortunately have a rich reservoir of past logical, semantic and philosophical writings to draw upon in relation to all of these factors. We also have many formalized measuring systems and crosswalks between many of them. We are also seeing a renewed effort surrounding more uniform ways to characterize the essential characteristics of data, namely quantities, units, dimensions and datatypes (QUDT) [3]. Better models for data interoperability and resolving these areas exist. Yet, however, insufficient time and effort has yet been expended to bring these resources together into a logical, computable schema. Until all of these factors are brought together with focus, actual data interoperability based on semantic technologies will remain limited.

Grade: C-

Why Important to AI?

Relations identification and contextual understanding are at the heart of current challenges in artificial intelligence applications related to knowledge and text. Without these perspectives, it is harder to do sentiment analysis, “fact” (or assertion) extraction, reasoning over relations, reasoning over attributes, context analysis, or disambiguation. We need to learn how to speak the King’s English in these matters, and graduate beyond kindergarten readers.

Deep learning and inclusion of both supervised and unsupervised machine learning is best served when the feature (variable) pool is rich and logically coherent, and when the output targets are accurately defined. “Garbage in, garbage out” applies to artificial intelligence learning in the very same ways it applies to any kind of computational activity. We want coherence, clarity and accuracy in our training sets and corpuses no less than we want it in our analysis and characterizations of the world.

“Dirty” training bases with embedded error can be trained to do no better than their inputs. If we want to train our knowledge applications with Dick and Jane reader inputs, too often in error to begin with, we will not get beyond the most basic of knowledge levels. We can not make the transition to more sophisticated levels without a more sophisticated understanding of the symbolic means for communicating knowledge: that is, human language. Predicate understanding expressed through predicate representations are necessary for predicate logic.

To be sure, progress has been made in the first decade and one-half of the semantic Web. We have learned many best practices and have started to get pretty good in capturing nouns and their types. But what results is a stilted, halting conversation. To begin to become fluent, our knowledge bases must be able to capture and represent verbs, actions and events.

The Anticipated Series

Part II of this series will discuss what the ontological role is of events, and how that relates to a broader model of actions, activities and situations. This foundation will enable a discussion in Parts III and IV of the actual relations model in KBpedia, and how it is expressed in the KBpedia Knowledge Ontology (KKO). A summary of the KBpedia grammar will be provided in Part V. These next parts will set the context for our release of KBpedia v 150, incorporating these new representations, to coincide at the same time.

After this release of KBpedia, the series will continue to discuss such topics as what is real and reality, and some speculations as to practical applications arising from  the new relations capabilities in KBpedia. Some of the topics to be discussed in concluding parts will be semantic parsers and natural language understanding, robotics as a driving force in expanded knowledge graphs, and best practices for constructing capable ontologies.

Throughout this series I will repeatedly harken to the teachings of Charles Sanders Perice, and how his insights in logic and sign-making help inform the ontological choices that we are making. We have been formulating our thoughts in this area for years, and Peirce provides important guidance for how to crack some very hard nuts. I hope we can help the evolving state of knowledge graphs grow up a bit, in the process establishing a more complete, coherent and logical basis for constructing knowledge structures useful for advances in artificial intelligence.

This series on KBpedia relations covers topics from background, to grammar, to design, and then to implications from explicitly representing relations in accordance to the principals put forth through the universal categories by Charles Sanders Peirce. Relations are an essential complement to entities and concepts in order to extract the maximum information from knowledge bases. This series accompanies the next release of KBpedia (v 150), which includes the relations enhancements discussed.

[1] M. K. Bergman, 2009. “When Linked Data Rules Fail,” AI3:::Adaptive Information blog, November 16, 2009.
[2] As I have previously written: “Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”
Posted:April 4, 2017

The Grim ReaperAI Brings an End to an Era

OpenCyc is no longer available to the public. Without notice and with only some minor statements on Web pages, Cycorp has pulled OpenCyc from the marketplace. It appears this change occurred in March 2017. After 15 years, the abandonment of OpenCyc represents the end of one of the more important open source knowledge graphs of the early semantic Web.

OpenCyc was the first large-scale, open-source knowledge base provided in OWL format. OpenCyc preceded Wikipedia in a form usable by the semantic Web, though it never assumed the prominent position that DBpedia did in terms of helping to organize semantic Web content.

OpenCyc was first announced in July 2001, with the first release occurring in 2002. By release 0.9, OpenCyc had grown to include some 47,000 concepts in an OWL distribution. By the time of OpenCyc’s last version 4.0 release in mid-2012, the size of the system had grown to some 239,000 concepts. This last version also included significant links to DBpedia, WordNet and UMBEL, among other external sources. This last version included references to about 19 K places, 26 K organizations, 13 K persons, and 28 K business-related things. Over the course of its lifetime, OpenCyc was downloaded at least 60,000 times, perhaps more than 100,000, and was a common reference in many research papers and other semantic Web projects [1].

At the height of its use, the distribution of OpenCyc not only included the knowledge graph, but also a Java-based inference engine, a browser for the knowledge base, and a specification of the CycL language and a specification of the Cyc API for application development.

The company has indicated it may offer a cloud option in the future for research or educational purposes, but the date and plans are unspecified. Cycorp will continue to support its ResearchCyc and EnterpriseCyc versions.

Reasons for the Retirement

Cycorp’s Web site states OpenCyc was discontinued because OpenCyc was “fragmented” and was confused by the technical community with the other versions of Cyc. Current verbiage also indicates that OpenCyc was an “experiment” that “proved to be more confusing than it was helpful.” We made outreach to senior Cycorp officials for additional clarification as to the reasons for its retirement but have not received a response.

I suspect the reasons for the retirement go deeper than this. As recently as last summer, senior Cycorp officials were claiming a new major release of OpenCyc was “imminent”.  There always appeared to be a tension within the company about the use and role of an open source version. Key early advocates for OpenCyc, including John De Oliveira, Stephen Reed and Larry Lefkowitz, are no longer with the company. The Cyc Foundation established to support the open source initiative was quietly shut down in 2015. The failure last year of the major AI initiative known as Lucid.ai, which was focused on a major commercialization push behind Cyc and reportedly to be backed by “hundreds of millions of dollars” of venture capital that never materialized, also apparently took its toll on company attention and resources.

Whatever the reasons, and there are likely others, it is hard to see how a 15-year effort could be characterized as experimental. While versions of OpenCyc v 4.0 can still be downloaded from third parties, including a fork, it is clear this venerable contributor to the early semantic Web will soon be available no longer, third parties or not.

Impact on Cognonto

OpenCyc is one of the six major core knowledge bases that form the nucleus of Cognonto‘s KBpedia knowledge structure. This linkage to OpenCyc extends back to UMBEL, another of the six core knowledge bases. UMBEL is itself a subset extraction of OpenCyc [2].

As we began the KBpedia effort, it was clear to us that major design decisions within Cyc (all versions) were problematic to our modeling needs [3]. Because of its common-sense nature, Cyc places a major emphasis on the “tangibility” of objects, including “partial tangibility”. We also found (in our view) major modeling issues in how Cyc handles events v actions v situations. KBpedia’s grounding in the logic and semiosis of Charles Sanders Peirce was at odds with these basic ontological commitments.

I have considered at various times writing one or more articles on the differences we came to see with OpenCyc, but felt it was perhaps snarky to get into these differences, given the different purposes of our systems. We continue to use portions of OpenCyc with important and useful subsumption hierarchies, but have also replaced the entire upper structure better reflective of our approach to knowledge-based artificial intelligence (KBAI). We will continue to retain these existing relations.

Thus, fortunately, given our own design decisions from some years back, the retirement of OpenCyc will have no adverse impact on KBpedia. However, UMBEL, as a faithful subset of OpenCyc designed for possible advanced reasoning, may be impacted. We will await what form possible new Cycorp initiatives takes before making any decisions regarding UMBEL. Again, however, KBpedia remains unaffected.

Fare Thee Well!

So, it is with sadness and regret that I bid adieu to OpenCyc. It was a noble effort to help jump-start the early semantic Web, and one that perhaps could have had more of an impact had there been greater top-level commitment. But, like many things in the Internet, generations come and go at ever increasing speed.

OpenCyc certainly helped guide our understanding and basis for our own semantic technology efforts, and for that we will be eternally grateful to the system and its developers and sponsors. Thanks for a good ride!


[1] You can see some of these statistics yourself from the Wayback Machine of the Internet Archive using the URLs of http://www.opencyc.org/, http://www.cyc.com/, https://sourceforge.net/projects/opencyc/ and http://cycfoundation.org.
[2] The intent of UMBEL is to provide a lightweight scaffolding for relating concepts on the Web to one another. About 99% of UMBEL is a direct subset extraction of OpenCyc. This design approach was purposeful to allow systems linked to UMBEL to further reach through to Cyc (OpenCyc, but other versions as well) for advanced reasoning.
[3] I discuss some of these design decisions in M.K. Bergman, 2016. “Threes All the Way Down to Typologies,” blog post on AI3:::Adaptive Information, October 3, 2016.

Posted by AI3's author, Mike Bergman Posted on April 4, 2017 at 1:18 pm in KBpedia, Ontologies, UMBEL | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/2034/fare-thee-well-opencyc/
The URI to trackback this post is: https://www.mkbergman.com/2034/fare-thee-well-opencyc/trackback/
Posted:March 27, 2017

When we first released Cognonto toward the end of 2016, we provided a starting Web site that had all of the basics, but no frills. In looking at the competition in the artificial intelligence and semantic technology space, we decided a snazzier entry page was warranted. So, we are pleased to announce our new entry page:

Cognonto Entry Page

We also had fun playing around with using recent AI programs to generate images based on various input visual styles. We used AI imagery and our own Cognonto logo as the way to generate some of these.

As I said to a colleague, maybe it was time for us to try to “run with the cool kids.” We hope you like it. We made some other site tweaks as well along the way to releasing this new entry page.

Let me know if you have any comments (good or bad) on this site re-design. Meanwhile, it’s time to get back to the substance . . . .

Posted by AI3's author, Mike Bergman Posted on March 27, 2017 at 4:27 am in Cognonto, KBpedia, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2032/new-cognonto-entry-page/
The URI to trackback this post is: https://www.mkbergman.com/2032/new-cognonto-entry-page/trackback/
Posted:March 15, 2017

CognontoDialing In Queries from the General to the Specific

Inferencing is a common term heard in association with semantic technologies, but one that is rarely defined and still less frequently described as to value and rationale. I try to redress this gap in part with this article.

Inferencing is the drawing of new facts, probabilities or conclusions based on reasoning over existing evidence. Charles Sanders Peirce classed inferencing into three modes: deductive reasoning, inductive reasoning and abductive reasoning. Deductive reasoning extends from premises known to be true and clear to infer new facts. Inductive reasoning looks at the preponderance of evidence to infer what is probably true. And abductive reasoning poses possible explanations or hypotheses based on available evidence, often winnowing through the possibilities based on the total weight of evidence at hand or what is the simplest explanation. Though all three reasoning modes may be applied to knowledge graphs, the standard and most used form is deductive reasoning.

An inference engine may be applied to a knowledge graph and its knowledge bases in order to deduce new knowledge. Inference engines apply either backward- or forward-chaining deductive reasoning. In backward chaining, the reasoning tests are conducted “backwards” from a current consequent or “fact” to determine what antecedents can support that conclusion, based on the rules used to construct the graph. (“What reasons bring us to this fact?”) In forward chaining the opposite occurs; namely, a goal or series of goals are stated and then existing facts (as rules) are checked to see which ones can lead to the goal. (” A goal X may be possible because of?”) The process is iterated until the goal is reached or not; if reached, new knowledge in terms of heretofore unstated connections may be added to the knowledge base.

Inference engines can be applied at the time of graph building or extension to test the consistency and logic of the new additions. Or, semantic reasoners may be applied to a current graph in order to expand queries for semantic search or for these other reasoning purposes. In the case of Cognonto‘s KBpedia knowledge structure, which is written in OWL 2, though the terminology is slightly different, the groundings are in first-order logic (FOL) and description logics. These logical foundations provide the standard rules by which reasoners can be applied to the knowledge graph [1]. In this article, we will not be looking at how inferencing is applied during graph construction, a deserving topic in its own right. Rather, we will be looking at how inferencing may be applied to the existing graph.

Use of Reasoning at Run Time

Once a completed graph passes its logic tests during construction, perhaps importantly after being expanded for the given domain coverage, its principal use is as a read-only knowledge structure for making subset selections or querying. The standard SPARQL query language, occasionally supplemented by rule-based queries using SWRL or for bulk actions using the OWL API, are the means by which we access the knowledge graph in real time. In many instances, such as for the KBpedia knowledge graph, these are patterned queries. In such instances, we substitute variables in the queries and pass those from the HTML to query templates.

When doing machine learning, generally slices get retrieved via query and then staged for the learner. A similar approach is taken to generate entity lists for things like training recognizers and taggers. Some of the actions may also do graph traversals in order to retrieve the applicable subset.

However, the main real-time use of the knowledge structure is search. This relies totally on SPARQL. We discuss some options on how this is controlled below.

Hyponymy, Subsumption and Natural Classes

Select Your Spray

The principal reasoning basis in the knowledge graph is based on hierarchical, hyponymous relations and instance types. These establish the parent-child lineages, and enable individuals (or instances, which might be entities or events) to be related to their natural kinds, or types. Entities belong to types that share certain defining essences and shared descriptive attributes.

For inferencing to be effective, it is important to try to classify entities into the most natural kinds possible. I have spoken elsewhere about this topic [2]; clean classing into appropriate types is one way to ensure the benefits from related search and related querying are realized. Types may also have parental types in a hyponymous relation. This ‘accordion-like’ design is an important aspect that enables external schema to be tied into multiple points in KBpedia [3].

Disjointedness assertions, where two classes are logically distinct, and other relatedness options provide other powerful bases for winnowing potential candidates and testing placements and assignments. Each of these factors also may be used in SPARQL queries.

These constructs of semantic Web standards, combined with a properly constructed knowledge graph and the use of synonymous and related vocabularies in semsets as described in a previous use case, provide powerful mechanisms for how to query a knowledge base. By using these techniques, we may dial-in or broaden our queries, much in the same way that we choose different types of sprays for our garden watering hose. We can focus our queries to the particular need at hand. We explain some of these techniques in the next sections.

Adjusting Query Focus

We can see a crude application of this control when browsing the KBpedia knowledge graph. When we enter a particular query, in this case, ‘knowledge graph‘, one result entry is for the concept of ontology in information science. We see that a direct query gives us a single answer:

Direct Query

However, by picking the inferred option, we now see a listing of some 83 super classes for our ontology concept:

Inferring Querying

By reasoning for deductive inference, we are actually broadening our query to include all of the parental links in the subsumption chain within the graph. Ultimately, this inference chain traces upward into the highest order concept in the graph, namely owl:Thing. (By convention, owl:Thing itself is excluded from these inferred results.)

By invoking inference in this case, while we have indeed broadened the query, it also is quite indiscriminate. We are reaching all of the ancestors to our subject concept, reaching all of the way to the root of the graph. This broadening is perhaps more than what we actually seek.

Scoping Queries via Proerty Paths

Among many other options, SPARQL also gives us the ability to query specific property paths [4]. We can invoke these options either in our query templates or programmatically in order to control the breadth and depth of our desired query results.

Let’s first begin with the SPARQL query that uses ‘knowledge graph’ in its altLabel

==============
select ?s ?p ?o
from <http://kbpedia.org/1.40/>
where
{
  ?s <http://www.w3.org/2004/02/skos/core#altLabel> "Knowledge graph"@en ;
     ?p ?o .
}
==============

You can see from the results below that only the concept of ontology (information science) is returned as a prefLabel result, with the concept’s other altLabels also shown:

      ==============
      s     p     o

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      http://www.w3.org/2002/07/owl#Class
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#isDefinedBy
      http://kbpedia.org/kko/rc/
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#prefLabel "Ontology (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction(computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Language"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology media"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "New media relations"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Strong ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Libraries (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologing"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Computational ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Populated ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Knowledge graph"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Domain ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#definition    
      "In computer science and information science, an ontology is a formal
      naming and definition of the types, properties, and interrelationships of
      the entities that really or fundamentally exist for a particular domain
      of discourse."@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://kbpedia.org/ontologies/kko#superClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

This result gives us the basis for now asking for the direct parents of our ontology concept, using this query:

      ==============
      select ?directParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>
        ?directParent .
      }
      ==============

We see that the general concepts of knowledge representation-CW and ontology are parents to our concept, as well as the external Wikipedia result on ontology (information science):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>

      ==============

If we turn on the inferred option, we will get the full listing of the 83 concepts noted earlier. This is way too general for our current needs.

While it is not possible to specify a depth using SPARQL, it is possible to use property paths to control the extent of the query results from the source. In this case, we specify a path length of 1:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,1}
        ?inferredParent .
      }
      ==============

Which produces results equivalent to the “direct” search (namely, direct parents only):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

However, by expanding our path length to two, we now can request the parents and grandparents for the ontology (information science) concept:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,2}
        ?inferredParent .
      }
      =============

This now gives us 15 results from the parental chain:

      ==============
      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254

      ==============

Similarly we can expand our query request to a path length of 3, which gives us the parental chain from parents + grandparents + great-grandparents):

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,3}
        ?inferredParent .
      }
      =============

In this particular case, we do not add any further results for great-grandparents:

      ==============

      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254
      ==============

Without a property path specification, our inferred request would produce the listing of 83 results shown by the Inferred tab on the KBpedia knowledge graph, as shown in the screen capture provided earlier.

The online knowledge graph does not use these property path restrictions in its standard query templates. But these examples show how programmatically it is possible to broaden or narrow our searches of the graph, depending on the relation chosen (subClassOf in this example) and the length of the specified property path.

Many More Options and Potential for Control

This use case is but a small example of the ways in which SPARQL may be used to dial-in or control the scope of queries posed to the knowledge graph. Besides all of the standard query options provided by the SPARQL standard, we may also remove duplicates, identify negated items, and search inverses, selected named graphs or selected graph patterns.

Beyond SPARQL and now using SWRL, we may also apply abductive reasoning and hypothesis generation to our graphs, as well as mimic the action of expert systems in AI through if-then rule constructs based on any structure within the knowledge graph. A nice tutorial with examples that helps highlight some of the possibilities in combining OWL 2 with SWRL is provided by [5]

A key use of inference is its ability to be applied to natural language understanding and the extension of our data systems to include unstructured text, as well as structured data. For this potential to be fully realized, it is important that we chunk (“parse”) our natural language using primitives that themselves are built upon logical foundations. Charles S. Peirce made many contributions in this area as well. Semantic grammars that tie directly into logic tests and reasoning would be a powerful addition to our standard semantic technologies. Revisions to the approach taken to Montague grammars may be one way to achieve this illusive aim. This is a topic we will likely return to in the months to come.

Finally, of course, inference is a critical method for testing the logic and consistency of our knowledge graphs as we add new concepts, make new relations or connections, or add attribute data to our instances. All of these changes need to be tested for consistency moving forward. Nurturing graphs by testing added concepts, entities and connections is an essential prerequisite to leveraging inferencing at run time as well.

This article is part of an occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most center around the general use and benefits of knowledge graphs, but best practices and other applications are also discussed. Prior machine learning use cases, and the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.

[1] See, for example, Markus Krötzsch, Frantisek Simancik, and Ian Horrocks, 2012. “A Description Logic Primer.” arXiv preprint, arXiv:1201.4089; and Franz Baader, 2009.  “Description Logics,” in Sergio Tessaris, Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried Handschuh, Marie-Christine  Rousset, and Renate  A. Schmidt, editors, Reasoning Web. Semantic Technologies for Information Systems – 5th International Summer School, 2009, volume 5689 of LNCS, pages 1–39. Springer, 2009. 
[2] M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web,” in AI3:::Adaptive Information blog, July 13, 2015.
[3] M.K. Bergman, 2016. “Rationales for Typology Designs in Knowledge Bases,” in AI3:::Adaptive Information blog, June 6, 2016.
[4] Steve Harris and Andy Seaborne, eds., 2013. SPARQL 1.1 Query Language, World Wide Web Consortium (W3C) Recommendation, 21 March 2013; see especially Section 9 on property paths.
[5] Martin Kuba, 2012. “Owl 2 and SWRL Tutorial,” from Kuba’s Web site. 

Posted by AI3's author, Mike Bergman Posted on March 15, 2017 at 12:10 pm in Cognonto, KBpedia, Searching, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/
The URI to trackback this post is: https://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/trackback/
Posted:March 2, 2017

CognontoaltLabels Help Reinforce ‘Things not Strings’

One of the strongest contributions that semantic technologies make to knowledge-based artificial intelligence (KBAI) is to focus on what things mean, as opposed to how they are labeled. This focus on underlying meaning is captured by the phrase, “things not strings.”

The idea of something — that is, its meaning — is conveyed by how we define that something, the context in which the various tokens (terms) for that something is used, and in the variety of terms or labels we apply to that thing. The label alone is not enough. The idea of a parrot is conveyed by our understanding of what the name parrot means. Yet, in languages other than English, the same idea of parrot may be conveyed by the terms Papagei, perroquet, loro, попугай, or オウム, depending on the native language.

The idea of the ‘United States‘, even just in English, may be conveyed with labels ranging from America, to US, USA, U.S.A., Amerika, Uncle Sam, or even the Great Satan. As another example, the simple token ‘bank‘ can mean a financial institution, a side of a river, turning an airplane, tending a fire, or a pool shot, depending on context. What these examples illustrate is that a single term is more often not the only way to refer to something, and a given token may mean vastly different things depending on the context in which it is used.

Knowledge graphs are also not comprised of labels, but of concepts, entities and the relationships between those things. Knowledge graphs constructed from single labels for individual nodes and single labels for different individual relations are, therefore, unable to capture these nuances of context and varieties of reference. In order for a knowledge graph to be useful to a range of actors it must reflect the languages and labels meaningful to those actors. In order for us to be able to distinguish the accurate references of individual terms we need the multiple senses of terms to each be associated with its related concepts, and then to use the graph relationships for those concepts to help disambiguate the intended meaning of the term based on its context of use.

In the lexical database of WordNet, the variety of terms by which a given term might be known is collectively called a synset. According to WordNet, a synset (short for synonym set) is “defined as a set of one or more synonyms that are interchangeable in some context without changing the truth value of the proposition in which they are embedded.” In Cognonto‘s view, the concept of a synset is helpful, but still does not go far enough. Any name or label that draws attention to a given thing can provide the same referential power as a synonym. We can include in this category abbreviations, acronyms, argot, diminutives, epithets, idioms, jargon, lingo, misspellings, nicknames, pseudonyms, redirects, and slang, as well as, of course, synonyms. Collectively, we call all of the terms that may refer to a given concept or entity a semset. In all cases, these terms are mere pointers to the actual something at hand.

In the KBpedia knowledge graph, these terms are defined either as skos:prefLabel (the preferred term), skos:altLabel (all other semset variants) or skos:hiddenLabel (misspellings). In this article, we show an example of semsets in use, discuss what we have done specifically in KBpedia to accommodate semsets, and summarize with some best practices for semsets in knowledge graph construction.

A KBpedia Example

Let’s see how this concept of semset works in KBpedia. Though in practice much of what is done with KBpedia is done via SPARQL queries or programmatically, here we will simply use the online KBpedia demo. Our example Web page references our recent announcement of KBpedia v. 1.40. We begin by entering the URL for this announcement into the Analyze Web Page demo box on the Cognonto main page:

Running the Cognonto Demo

After some quick messages on the screen telling us how the Web page is being processed, we receive the first results page for the analysis of our KBpedia version announcement. The tab we are looking at here highlights the matching concepts we have found, with the most prevalent shown in red-orange. Note that one of those main concepts is a ‘knowledge graph’:

Concept Tagging in the Web Page

If we mouseover this ‘knowledge graph’ tag we see a little popup window that shows us what KBpedia concepts the string token matches. In this case, there is only one concept match, that of OntologyInformationScience; the term ‘knowledge graph’ itself is not a listed match (other highlighted terms may present multiple possible matches):

Tag Links to KBpedia

When we click on the live link to OntologyInformationScience we are then taken to that concept’s individual entry within the KBpedia knowledge structure:

'Knowledge Graph' in the Semset

Other use cases describe more fully how to browse and navigate KBpedia or how to search it. To summarize here, what we are looking at above is the standard concept header that presents the preferred label for the concept, followed by its URI and then its alternative labels (semset). Note that ‘knowledge graph’ is one of the alternative terms for OntologyInformationScience.

We can also confirm that only one concept is associated with the ‘knowledge graph’ term by searching for it. Note, as well, that we have the chance to also search individual fields such as the title (preferred label), alternative labels (semset), URI or definitions of the concept:

Searching altLabels Alone

What this example shows is that a term, ‘knowledge graph’, in our original Web page, while not having a concept dedicated to that specific label, has a corresponding concept of OntologyInformationScience as provided through its semset of altLabels. Semsets provide us an additional term pool by which we can refer to a given concept or entity.

Implementing Semsets in KBpedia

For years now, given that our knowledge graphs are grounded in semantic technologies, we have emphasized the generous use of altLabels to provide more contextual bases for how we can identify and get to appropriate concepts. In our prior public release of KBpedia (namely, v 120), we had more than 100,000 altLabels for the approximately 39,000 reference concepts in the system (an average of 2.6 altLabels per concept).

As part of the reciprocal mapping effort we undertook in moving from version 1.20 to version 1.40 of KBpedia, as we describe in its own use case, we also made a concerted effort to mine the alternative labels within Wikipedia. (The reciprocal mapping effort, you may recall, involved adding missing nodes and structure in Wikipedia that did not have corresponding tie-in points in the existing KBpedia.)  Through these efforts, we were able to add more than a quarter million new alternative terms to KBpedia. Now, the average number of altLabels per RC exceeds 6.6, representing a 2.5X increase over the prior version, even while we were also increasing concept coverage by nearly 40%. This is the kind of effort that enables us to match to ‘knowledge graph’.

Best Practices for Semsets in Knowledge Graphs

At least three best-practice implications arise from these semset efforts:

  • First, it is extremely important when mapping new knowledge sources into an existing target knowledge graph to also harvest and map semset candidates. Sources like Wikipedia are rich repositories of semsets;
  • Second, like all textual additions to a knowledge graph, it is important to include the language label for the specific language being used. In the baseline case of KBpedia, the specific baseline language is English (en). By tagging text fields with their appropriate language tag, it is possible to readily swap out labels in one language for another. This design approach leads to a better ability to represent the conceptual and entity nature of the knowledge graph in multiple natural languages; and,
  • Third, in actual enterprise implementations, it is important to design and include workflow steps that enable subject matter experts to add new altLabel entries as encountered. Keeping semsets robust and up-to-date is an essential means for knowledge graphs to fulfill their purpose as knowledge structures that represent “things not strings.”

Semsets, along with other semantic technology techniques, are essential tools in constructing meaningful knowledge graphs.

This article is part of an occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most center around the general use and benefits of knowledge graphs, but best practices and other applications are also discussed. Prior machine learning use cases, and the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.

Posted by AI3's author, Mike Bergman Posted on March 2, 2017 at 9:23 am in Cognonto, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2027/the-importance-of-semsets-in-knowledge-graph-design/
The URI to trackback this post is: https://www.mkbergman.com/2027/the-importance-of-semsets-in-knowledge-graph-design/trackback/