This article is the first installment in a new, occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most of these articles will center around the general use and benefits of knowledge graphs, but best practices and other applications will also be discussed. Prior to this, our use cases have centered on machine learning and knowledge-based artificial intelligence (KBAI). These prior use cases, plus the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.
This kick-off article deals with browsing the KBpedia knowledge structure, found under the Knowledge Graph main menu link on the Cognonto Web site. KBpedia combines six public knowledge bases — Wikipedia, Wikidata, GeoNames, OpenCyc, DBpedia and UMBEL — and their concepts, entity types, attributes and relations. (Another 20 general vocabularies are also mapped into the KBpedia structure.) KBpedia is organized as a knowledge graph. This article describes the various components of the graph and how to browse and inspect them. Since client knowledge graphs also bridge off of the initial KBpedia structure, these same capabilities apply to client versions as well.
Uses of the Knowledge Graph
The uses for browsing a knowledge graph include:
- Learning about individual concepts and entities
- Discovering related concepts and entities
- Understanding the structure and typologies of the knowledge graph
- Tracing conceptual lineages
- Exploring inferences based on the logical assertions in the graph
- General grazing and discovery, and many more, plus
- Parallel access to structured and semantic search.
These uses, of course, do not include the work-related tasks in natural language processing or knowledge-based artificial intelligence.
KBpedia and the Graph
This combined KBpedia knowledge structure contains more than 39,000 reference concepts (RCs), organized into a knowledge graph as defined by the KBpedia Knowledge Ontology. KKO is a logically organized and computable structure that supports inference and reasoning.
About 85% of the RCs are themselves entity types — that is, 33,000 natural classes of similar entities such as astronauts or zoo animals — that are organized into about 30 “core” typologies that are mostly disjoint (non-overlapping) with one another. By definition an entity type is also a ‘reference concept’, or RC.
KBpedia’s typologies provide a powerful means for slicing-and-dicing the knowledge structure. The individual entity types provide the tie-in points to about 20 million individual entities. The remaining RCs are devoted to other logical divisions of the knowledge graph, specifically attributes, relations and topics.
It is this structure, plus often connections to another 20 leading external vocabularies, that forms the basis of the KBpedia Knowledge Graph.
For the standard Cognonto browser, each RC concept has a record with potentially eight (8) main panels or sections, each of which is described below:
- Core Structure
- Extended Linkages
- Aspect-related Entities
- Broader Concepts
- Narrower Concepts.
Panels are only displayed when there are results for them.
Each entry begins with a header:
Above the header to the left is the listing for the current KBpedia version and its date of release. Next to it is a link for sending an email to a graph administrator should there be a problem with the current entry. Above the header to the right is the search box, itself the topic of another application case.
The Header consists of these possible entries:
- prefLabel – URI – image — the prefLabel is the name or “title” for the RC. While the name has no significant meaning in and of itself (the meaning for the RC is a result of all specifications and definitions, including relations to other objects, for the concept), the prefLabel does provide a useful shorthand or handle for referring to the concept. The URI is the full Web reference to the concept, such as
http://kbpedia.org/kko/rc/Currency. If there is an image for the RC, it is also displayed
- semset — the entries here, also known as altLabels, are meant to inclusively capture all roughly equivalent references to the concept, including synonyms, slang, jargon and acronyms
- definition — the readable text definition or description of the RC; some live links may be found in the definition.
The Core Structure for KBpedia is the next panel. Two characteristics define what is a core contributor to the KBpedia structure: 1) the scale and completeness of the source; and 2) its contribution of a large number of RCs to the overall KKO knowledge graph. The KBs in the core structure play a central role in the scope and definition of KBpedia. This core structure of KBpedia is supplemented by mappings to about 20 additional external linkages, which are highly useful for interoperability purposes, but do not themselves contribute as much to the RC scope of the KKO graph. The Core Structure is derived from the six (6) main knowledge bases — OpenCyc, UMBEL, GeoNames, DBpedia, Wikipedia and Wikidata.
The conceptual relationships in the KBpedia Knowledge Ontology (KKO) are largely drawn from OpenCyc, UMBEL, or Wikipedia, though any of the other sources may contribute local knowledge graph structure. Additional reference concepts are contributed primarily from GeoNames. Wikidata contributes the bulk of the instance data, though instance records are actually drawn from all sources. DBpedia and Wikidata are also the primary sources for attribute characterizations of the instances. Instance data, by definition, are not part of the core structure.
Here is the Core Structure panel:
The Core Structure panel, like the other panels, has a panel title followed by a brief description. The Core Structure panel lists the equivalent class (
owl:equivalentClass), parent super classes (
kko:superClassOf), child sub classes (
rdfs:subClassOf), or a closely related concept (
kko:isCloselyRelated) (not shown). These relationships define the edges between the nodes in the graph structure, and are also the basis for logical inferencing.
Sub-classes and super-classes may be determined either as direct assertions or those that are inferred from parent-child relationships in the Knowledge Graph. An inferred relationship includes any of the parent or child ancestors; the direct is the immediate child or parent. Picking one of these links restricts the display to the concepts related to that category. Like familial relationships, the closer the concept is to its lineage relation, the likely closer are the shared attributes or characteristics of the concepts. Such lineage inferences arise from the relations in the KBpedia Knowledge Ontology (KKO).
Each of the related concepts is presented as a live link, which if clicked, will take you to a new entry for that concept. Some of the icons and information for equivalent classes are discussed under other panels below.
In addition to the Core Structure, KBpedia RCs are linked to thousands of classes defined in nearly 20 external ontologies used to describe all kinds of public and private datasets. Some of the prominent external vocabularies include schema.org, the major structured data system for search engines, and Dublin Core, a key vocabulary from the library community. Other external vocabularies cover music, organizations, projects, social media, and the like.
Here is how the External Linkages panel looks, which has many parallels to the Core Structure panel:
The external links, like the core ones, are shown as live links with an icon associated to each source. For RCs that are entity types, the entry might also display the count of entities (orange background with count) or related-aspect entities (blue background with count) linked to that RC (either directly or inferred, depending on the option chosen). Clicking on the specific RC link will take you to that reference concept. Clicking on the highlighted background will take you to a listing of the entities for that RC (based on either its direct or inferred option).
Also, like the short descriptions on each of these panels, clicking the more link expands the description available:
Entities are distinct, nameable, individual things. There are more than 20 million of them in the baseline KBpedia.
Entities may be physical objects or conceptual works or discrete ideas, so long as they may be characterized by attributes shared by other instances within similar kinds or types. Entities may be parts of other things, so long as they have a distinct identity and character. Entities with shared attributes that are the essences of the things may be grouped into natural types, called entity types. These entity types may be further related to other entity types in natural groupings or hierarchies depending on the attributes and their essences that are shared among them.
Here is how the general Entities panel appears:
In this case for currency, there are 2003 instances (individual entities) in the current KBpedia knowledge base. The first few of these are shown in the panel, with the live links then taking you to the an entity report for that instance. Similarly, you can click the Browse all entities button, which then allows you to scroll through the entire listing of entities. Here is how that subsidiary page, in part, appears:
Nearly 85%, or 33,000, of the reference concepts within the KBpedia Knowledge Ontology (KKO) are entity types, these natural classes of entities. They are key leverage points for inteoperability and mapping. Instances (or entities) are related to the KKO graph via the
rdfs:type predicate, which assigns an entity to one or more parental classes. It is through this link that you view the individual entities.
Entities may also be characterized according to one or more of about 80 aspects. Aspects help to group related entities by situation, and not by identity nor definition. Aspects thus provide a secondary means for organizing entities independent of their nature, but helpful for placing the entity in real-world contexts. Not all aspects relate to a given entity.
The Aspects panel has a similar presentation to the other panels:
If an entity with a related aspect occurs in the knowledge system, its aspect label will be shown with then a listing of the top entities for that aspect. Each of these entities is clickable, which will take you to the standard entity record. A button to Browse all entities means there are more entities for that aspect than the short listing will allow; click on it to be able to paginate through the full listing of related entities.
Note, as well, on this panel that we are also highlighting the down arrow at the upper right of the panel. Clicking that causes the entire panel to collapse, leaving only the title. Clicking on the arrow again causes the panel to expand. This convention applies to all of the panels discussed here.
About 85% of all of the reference concepts (RCs) in KBpedia represent classes of entities, which themselves are organized into about 30 core typologies. Most of these typologies are disjoint (lack overlap) from one another, which provides an efficient mechanism for testing subsets and filtering entities into smaller groups for computational purposes. (Another 30 or so SuperTypes provide extended organization of these entities.)
The Typologies panel follows some of the standard design of the other panels. Only the typologies to which the current entry belongs, in this case currency, are shown:
As noted, the major groupings of types reside in core typologies, which is where the largest degree of disjointedness occurs. There are some upper typologies (such as
Living Things over
Animals, etc.) that are used mostly for organizational purposes; these are the extended ones. The core typologies are the key ones to focus upon for distinguishing large groupings of entities.
The last panel section for a concept presents both the parental (Broader) and child (Narrower) concepts for the current entry (again, in this case, currency). Broader concepts represent the parents (or grandparental lineage in the case of inference) for the current reference concept. The broader concept relationship is expressed using the transitive
kko:superClassOf property. This property is the inverse of the
rdfs:subClassOf property. Narrower concepts represent the children (or grandchild lineages in the case of inference) for the current RC. The narrower concept relationship is expressed using the transitive
rdfs:subClassOf property. This property is the inverse of the
Here is the side-by-side panel presentation for these relationships:
Like some of the prior panels, it is possible to toggle between direct and inferred listings of these related concepts. If the RC is an entity type, it may also show counts for all entities subsumed under that type (orange color) or that have aspects of that type (blue color). Clicking on these count icons will take you to a listing of these entities.
This browsing and discovery use case is based on the standard configuration and the baseline KBpedia. Client variants may change the design and functionality of the application. More importantly, however, client applications are invariably extensions to the base KBpedia knowledge structure. These sometimes have some typologies removed because they are not relevant, but more likely have been expanded with the mapping of domain schema, vocabularies, and instances. In these cases, the actual content to be browsed may differ significantly from what is shown.