Posted:February 21, 2018

The Compleat Knowledge GraphNine Features Wanted for Ontologies

I think the market has spoken in preferring the term of ‘knowledge graph’ over that for ‘ontology.’ I suppose we could argue nuances in differences for what the terms mean. We will continue to use both terms, more-or-less interchangeably. But, personally, I do find the concept of ‘knowledge graph’ easier to convey to clients.

As we see knowledge graphs proliferate in many settings — from virtual agents (Siri, Alexa, Cortana and Google Assistant, among others) to search and AI platforms (Watson) — I’d like to take stock of the state-of-the-art and make some recommendations for what I would like to see in the next generation of knowledge graphs. We are just at the beginning of tapping the potential of knowledge graphs, as my recommendations show.

Going back for twenty years to Nicola Guarino in 1998 [1] and Michael Uschold in 2008 [2], there is a sense that ontologies could be relied upon for even more central aspects of overall applications. Both Guarino and Uschold termed this potential ’ontology-driven information systems.’ It is informative of the role that ontologies may play by listing some of these incipient potentials, some of which have only been contemplated or met in one or two actual installations. Let me list nine main areas of (largely) untapped potential:

  1. Context and meaning — by this, I mean the ability to model contexts and situations, which requires specific concepts for such and an ability to express gradations of adjacency (spatial and otherwise). Determining or setting contexts is essential to disambiguate meaning. Context and situations have been particularly difficult ideas for ontologies to model, especially those that have a binary or dichotomous design;

  2. A relations component — true, OWL offers the distinction of annotation, object and datatype properties, and we can express property characteristics such as transitivity, domain, range, cardinality, inversion, reflexivity, disjunction and the like, but it is a rare ontology that uses any or many of these constructs. The subProperty expression is used, but only in limited instances and rarely in a systematic schema. For example, it is readily obvious that some broader predicates such as animalAction could be split into  involuntaryAction and voluntaryAction, and then into specific actions such as breathing or walking, and so on, but schema with these kinds of logical property subsumptions are not evident. Structurally, we can use OWL to reason over actions and relations in a similar means as we reason over entities and types, but our common ontologies have yet to do so. Creating such schema are within grasp since we have language structures such as VerbNet and other resources we could put to the task;

  3. An attributes component — the lack of a schema and organized presentation of attributes means it is a challenge to do ABox-level integration and interoperability. As with a relations component, this gap is largely due to the primary focus on concepts and entities in the early stages of semantic technologies. Optimally, what we would like to see is a well-organized attributes schema that enables instance data characteristics from different sources to be mapped to a canonical attributes schema. Once in place, not only would mapping be aided, but we should also be able to reason over attributes and use them as intensional cues for classifying instances. At one time Google touted its Biperpedia initiative [3] to organize attributes, but that effort went totally silent a couple of years ago;

  4. A quantity units ontology —  is the next step beyond attributes, as we attempt to bring data values for quantities (and well as the units and labeling used) into alignment. Fortunately, of late, the QUDT ontologies (quantities, units and data types) has become an active project again with many external supporters. Something like this needs to accompany the other recommendations listed;

  5. A statistics and probabilities ontology —  the world is not black-and-white, but vibrantly colored with all kinds of shades. We need to be able to handle gradations as well as binary choices. Being able to add probabilistic reasoners is appropriate given the idea of continua (Thirdness) from Charles Sanders Peirce and capturing the idea of fallibility. Probabilistic reasoning is still a young field in ontology. Some early possibilities include Costa [4] and the PR-OWL ontology using Multi-Entity Bayesian Networks (MEBN) [5] which are a probabilistic first-order logic that goes beyond Peirce’s classic deterministic logic; as well as fuzzy logic applied to ontologies [6];

  6. Abductive reasoning and hypothesis generation —  Peirce explicated a third kind of logical reasoning, abduction, that combines hypothesis generation with an evaluation of likelihood of success and effort required. This logic method has yet to be implemented in any standard Web ontologies to my knowledge. The method could be very useful to pose desired outcome cases and then to work through what may be required to get there. Adding this to existing knowledge graphs would likely require developing a bespoke abductive reasoner;

  7. Rich feature set for KBAI —  we want a rich features set useful to provide labeled instances for supervised machine learners. I addressed this need earlier with a rather comprehensive listing of possible features for knowledge graphs useful to learners [7]. We now need to start evaluating this features pool to provide pragmatic guidance for which features and learners match best for various knowledge-based artificial intelligence (KBAI) tasks;

  8. Consistent, clean, correct and coherent — we want knowledge graphs that are as free from error as possible to make sure we are not feeding garbage to our machine learners and as a coherent basis for evaluating new additions and mappings; and

  9. ODapps — ‘ontology-driven applications’ go beyond the mere templating or completions of user interface components to devise generic software packages driven by ontology specifications for specific applications. We have developed and deployed ODapps to import or export datasets; create, update, delete (CRUD) or otherwise manage data records; search records with full-text and faceted search; manage access control at the interacting levels of users, datasets, tools, and CRUD rights; browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria; or process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions. ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality is quite similar. The major change in ODapps is to use a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. We may embed these ODapps in a layout canvas for a Web page, where, as the user interacts with the system, the service generates new queries (most often SPARQL) to the various Web services endpoints, which produce new structured results sets, which can drive new displays and visualizations. As new user interactions occur, the iteration cycle is generated anew, again starting a new cycle of queries and results sets.

Fortunately, we are actively addressing multiple of these recommendations (#1 – #3, #6 – #9) with our KBpedia initiative. We are also planning to add mapping to QUDT (#4) in a near-future release. We are presently evaluating probabilistic reasoners and hypothesis generators (#5 and #6).

Realizing these potentials will enable our knowledge management (KM) efforts to shift to the description, nature, and relationships of the information environment. In other words, ontologies themselves need to become the focus of development. KM no longer needs to be abstracted to the IT department or third-party software. The actual concepts, terminology and relations that comprise coherent ontologies now become the explicit focus of KM activities, and subject to the direct control and refinement by their users, the knowledge workers, and subject matter experts.

We are still some months from satisfying our desiderata for knowledge graphs. Fortunately, we have already made good progress, and we are close at hand to check off all of the boxes. Stay tuned!

[1] N. Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, 1998, pp. 3–15.
[2] M. Uschold, “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, 2008, pp. 3–20.
[3] R. Gupta, A. Halevy, X. Wang, S.E. Whang, and F. Wu. “Biperpedia: An Ontology for Search Applications,” Proceedings of the VLDB Endowment 7, no. 7, 2014, pp. 505-516.
[4] P. C. Costa, “Bayesian Semantics for the Semantic Web,” Ph.D., George Mason University, 2005.
[5] K. B. Laskey, “MEBN: A Language for First-Order Bayesian Knowledge Bases,” Artificial Intelligence, vol. 172, no. 2–3, pp. 140–178, Feb. 2008.
[6] F. Bobillo and U. Straccia, “Fuzzy Ontology Representation Using OWL 2,” International Journal of Approximate Reasoning, vol. 52, no. 7, pp. 1073–1094, Oct. 2011.
[7] M.K. Bergman, “A (Partial) Taxonomy of Machine Learning Features,” AI3:::Adaptive Information blog, November 23, 2015.


Posted:January 22, 2018

OWL2 Web Ontology Language More Active Tools than Last Census

For the last couple of years one of the more popular articles on this blog has been my 2014 listing of 50 ontology alignment tools. When published, only 20 of those fifty were active; the rest had been abandoned. Ontology alignment, also sometimes called ontology mapping or ontology matching, is making formal correspondences between concepts in two or more knowledge graphs, or ontologies. Entity matching may also be included in the mix.

I had occasion to update this listing for some recent work. Three active tools from that last listing have now been retired, but I was also able to identify nine new ones and to update quite a few others. Here is the updated listing:

  • AgreementMakerLight is an automated and efficient ontology matching system derived from AgreementMaker

  • ALCOMO is a shortcut for Applying Locical Constraints on Matching Ontologies. ALCOMO is a debugging system that allows incoherent alignments to be transformed to coherent ones by removing some correspondences from the alignment, called a diagnosis. It is complete in the sense that it detects any kind of incoherence in SHIN(D) ontologies

  • Alignment is a collaborative, system aided, user driven ontology/vocabulary matching application

  • The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF, so it is freely extensible. The Alignment API itself is a Java description of tools for accessing the common format. It defines four main interfaces (Alignment, Cell, Relation and Evaluator)

  • ALIN is an ontology alignment system specializing in the interactive alignment of ontologies. Its main characteristic is the selection
    of correspondences to be shown to the expert, depending on the previous feedbacks given by the expert. This selection is based on semantic and structural characteristics
  • Blooms is a tool for ontology matching. It utilizes information from Wikipedia category hierarchy and from the web to identify subclass relationship between entities. See also its Wiki page

  • CODI (Combinatorial Optimization for Data Integration) leverages terminological structure for ontology matching. The current implementation produces mappings between concepts, properties, and individuals. CODI is based on the syntax and semantics of Markov logic and transforms the alignment problem to a maximum-a-posteriori optimization problem

  • COMA++ is a schema and ontology matching tool with a comprehensive infrastructure. Its graphical interface supports a variety of interaction

  • Falcon-AO (Finding, aligning and learning ontologies) is an automatic ontology matching tool that includes the three elementary matchers of String, V-Doc and GMO. In addition, it integrates a partitioner PBM to cope with large-scale ontologies

  • hMAFRA (Harmonize Mapping Framework) is a set of tools supporting semantic mapping definition and data reconciliation between ontologies. The targeted formats are XSD, RDFS and KAON

  • GOMMA is a generic infrastructure for managing and analyzing life science ontologies and their evolution. The component-based infrastructure utilizes a generic repository to uniformly and efficiently manage many versions of ontologies and different kinds of mappings. Different functional components focus on matching life science ontologies, detecting and analyzing evolutionary changes and patterns in these ontologies

  • HerTUDA is a simple, fast ontology matching tool, based on syntactic string comparison and filtering of irrelevant mappings. Despite its simplicity, it outperforms many state-of-the-art ontology matching tools

  • Karma is an information integration tool to integrate data from databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information according to an ontology of their choice using a graphical user interface that automates much of the process. Karma learns to recognize the mapping of data to ontology classes and then uses the ontology to propose a model that ties together these classes

  • KitAMO is a tool for evaluating ontology alignment strategies and their combinations. It supports the study, evaluation and comparison of alignment strategies and their combinations based on their performance and the quality of their alignments on test cases. Based on the SAMBO project

  • The linked open data enhancer (LODE) framework is a set of integrated tools that allow digital humanists, librarians, and information scientists to connect their data collections to the linked open data cloud. It can be applied to any domain with RDF datasets

  • LogMap is highly scalable ontology matching system with ‘built-in’ reasoning and diagnosis capabilities. LogMap can deal with semantically rich ontologies containing tens (and even hundreds) of thousands of classes

  • Map-On is a collaborative ontology mapping environment which supports different users –domain experts, data owners, and ontology engineers– to integrate data in a collaborative way using standard semantic technologies

  • MapOnto is a research project aiming at discovering semantic mappings between different data models, e.g, database schemas, conceptual schemas, and ontologies. So far, it has developed tools for discovering semantic mappings between database schemas and ontologies as well as between different database schemas. The Protege plug-in is still available, but appears to be for older versions

  • OntoM is one component of the OntoBuilder, which is a comprehensive ontology building and managing framework. OntoM provides a choice of mapping and scoring methods for matching schema

  • OntoSim is a Java API allowing to compute similarities between ontologies. It relies on the Alignment API for ontology loading so it is quite independent of the ontology API used (JENA or OWL API)

  • OpenII Harmony is a schema matching tool that combines multiple language-based matching algorithms and a graphical user interface

  • OxO is a service for finding mappings (or cross-references) between terms from ontologies, vocabularies and coding standards. OxO imports mappings from a variety of sources including the Ontology Lookup Service and a subset of mappings provided by the UMLS

  • PARIS is a system for the automatic alignment of RDF ontologies. PARIS aligns not only instances, but also relations and classes. Alignments at the instance level cross-fertilize with alignments at the schema level

  • S-Match takes any two tree like structures (such as database schemas, classifications, lightweight ontologies) and returns a set of correspondences between those tree nodes which semantically correspond to one another

  • ServOMap is an ontology matching tool based on Information Retrieval technique relying on the ServO system. To run it, please follow the directions described at

  • The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. While designed for mapping instance data, it can also be used for schema

  • is a web based tool to: 1) import category systems (tree based taxonomies/ontologies) in the form of JSON files; 2) map them using a visual user interface; 3) export a single unified ontology

  • WikiV3 is an ontology matching system that uses Wikipedia as an external knowledge base useful for concepts, entities, and properties and multi=lingual alignments
  • Yam++ (not) Yet Another Matcher is a flexible and self-configuring ontology matching system for discovering semantic correspondences between entities (i.e., classes, object properties and data properties) of ontologies. This new version YAM++ 2013 has a significant improvement from the previous versions. See also the 2013 results. Code not apparently available.

Posted by AI3's author, Mike Bergman Posted on January 22, 2018 at 4:46 am in Ontologies, Semantic Web Tools | Comments (1)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:December 10, 2017

Mouse Light Multi-disciplinary Team is Unlocking the Connected Secrets of Neurons

An amazing multi-disciplinary team of about 40 neurobiological and computer scientists and technicians is systematically cataloging, tracing and visualizing neurons in the mouse brain. The effort, called MouseLight, is funded by the Howard Hughes Medical Institute and is conducting the work at HHMI’s spectacular Janella Research Campus in Virginia. MouseLight is generating maps of individual neurons across the entire mouse brain using high-speed, high-resolution light microscopy. So far, 300 neurons have been painstakenly completed, with a target of 1000 by the end of 2018. While impressive, there is still a considerable ways to go to capture all of the estimated 70 million! neurons in the mouse brain.

The visualizations of the neurons, singly and in combination, are just amazing. For example, according to a recent project news release, some “axons of individual neurons in the thalamus often branch profusely in unexpected combinations of cortical areas, such as regions involved in taste, touch, and movement. Similarly, in the subiculum, a region involved in learning and memory, neurons almost always reach out to a few different places. In the neocortex, the six-layered structure associated with the highest cognitive functions, many single-neuron projections spread expansively. One neuron the researchers traced ran scattershot across the cerebral cortex, sending long, branching axons arcing across both hemispheres like a fireworks explosion.”

At the project’s Web site you can see an introductory video and use an interactive Neuron Browser, which allows you to select and view and manipulate specific neurons, including a nice tutorial and pre-canned examples:

Mouse Light Video

Only one to a few neurons may be processed at a time to generate the primary data underlying all of this analysis. The basic process works as follows: First, the researchers inject the mouse brain with a virus that highlights those few neurons. The brain is then excised and ‘cleared’ to enable light to penetrate the tissue. The brain is hit with a pulse of light to illuminate the target neurons using ‘two-photon microscopy’ to get the sharp images needed to identify the specific neurons, and then sliced (‘microtomed’) into 200-micrometer thick layers with a vibrating razor. After digitizing the image on the slice, the process is repeated until the entire brain is characterized. The digitized, illuminated areas on each slice are then re-constructed by computer with trained annotators tracing and confirming the actual neuron trace. Each processing of a few neurons yields approximately 20 terabytes of data. Neurons on the combined datasets are then color-coded, and are processed by 3-D visualization software for animations and rotations. The software, as demonstrated by the online browser, enables specific neurons to be turned on or off for inspection and other 3-D manipulations. As the team has continued to learn, it has continued to speed up the process.

These kinds of advances are now giving us an unprecedented look at the physical structure and connectedness of neurons. For those of us doing knowledge representation and digital cognition work, and if we believe ultimately our work needs to be a reflection of physical reality in some form, we are apparently still a long, long ways away from being able to model animate brain functions. The degrees of connectivity suggested are staggering, and well short of current practice in any digital artifact. One of the next challenges for neuroscience is to try to discover physical patterns of connections that shed light on generalizations, for it looks like, based on these early physical findings, that the brain is an unbelievably jumbled mess of pathways and connections. Finding general keys will be needed to unlock the mysteries.

Posted:November 22, 2017

Keel-billed ToucanKnowledge Representation Guidelines from Charles S. Peirce

‘Representation’ is the second half of knowledge representation (KR), the field of artificial intelligence dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks. One dictionary sense is that ‘representation’ is the act of speaking or acting on behalf of someone else; this is the sense, say, of a legislative representative. Another sense is a statement made to some formal authority communicating an assertion, opinion or protest, such as a notarized document. The sense applicable to KR, however, according to the Oxford Dictionary of English, is one of ‘re-presenting’. That is, “the description or portrayal of someone or something in a particular way or as being of a certain nature” [1]. In this article I investigate this sense of ‘re-presenting’ following the sign-making guidelines of Charles Sanders Peirce [2] [3] (which we rely upon in our KBpedia knowledge structure).

When we see something, or point to something, or describe something in words, or think of something, we are, of course, using proxies in some manner for the actual thing. If the something is a ‘toucan’ bird, that bird does not actually reside in our head when we think of it. The ‘it’ of the toucan is a ‘re-presentation’ of the real, dynamic toucan. The representation of something is never the actual something, but is itself another thing that conveys to us the idea of the real something. In our daily thinking we rarely make this distinction, thankfully, otherwise our flow of thoughts would be completely jangled. Nonetheless the distinction is real, and when inspecting the nature of knowledge representation, needs to be consciously considered.

How we ‘re-present’ something is also not uniform or consistent. For the toucan bird, perhaps we make caw-caw bird noises or flap our arms to indicate we are referring to a bird. Perhaps we simply point at the bird. Or, perhaps we show a picture of a toucan or read or say aloud the word “toucan” or see the word embedded in a sentence or paragraph, as in this one, that also provides additional context. How quickly or accurately we grasp the idea of toucan is partly a function of how closely associated one of these signs may be to the idea of toucan bird. Probably all of us would agree that arm flapping is not nearly as useful as a movie of a toucan in flight or seeing one scolding from a tree branch.

The question of what we know and how we know it fascinated Peirce over the course of his intellectual life. He probed this relationship between the real or actual thing, the object, with how that thing is represented and understood. This triadic relationship between object, representation and interpretation forms a sign, and is the basis for the process of sign-making and understanding, which Peirce called semiosis [4]. Peirce’s basic sign relationship is central to his own epistemology and resides at the core of how we use knowledge representation in KBpedia.

The Shadowy Object

Yet even the idea of the object, in this case the toucan bird, is not necessarily so simple. There is the real thing itself, the toucan bird, with all of its characters and attributes. But how do we ‘know’ this real thing? Bees, like many insects, may perceive different coloration for the toucan and adjacent flowers because they can see in the ultraviolet spectrum, while we do not. On the other hand, most mammals in the rain forest would also not perceive the reds and oranges of the toucan’s feathers, which we readily see. Perhaps only fellow toucans could perceive by gestures and actions whether the object toucan is healthy, happy or sad (in the toucan way). Humans, through our ingenuity, may create devices or technologies that expand our standard sensory capabilities to make up for some of these perceptual gaps, but technology will never make our knowledge fully complete. Given limits to perceptions and the information we have on hand, we can never completely capture the nature of the dynamic object, the real toucan bird.

Then, of course, whatever representation we have for the toucan is also incomplete, be it a mental image, a written description, or a visual image (again, subject to the capabilities of our perceptions). We can point at the bird and say “toucan”, but the immediate object that it represents still is different than the real object. Or, let’s take another example more in keeping with the symbolic nature of KR, in this case the word for ‘bank’. We can see this word, and if we speak English, even recognize it, but what does this symbol mean? A financial institution? The shore of a river? Turning an airplane? A kind of pool shot? Tending a fire for the evening? In all of these examples, there is an actual object that is the focus of attention. But what we ‘know’ about this object depends on what we perceive or understand and who or what is doing the perceiving and the understanding. We can never fully ‘know’ the object because we can never encompass all perspectives and interpretations.

Peirce well recognized these distinctions. He termed the object of our representations the immediate object, while also acknowledging this representation is not fully capturing of the underlying, real dynamical object:

“Every cognition involves something represented, or that of which we are conscious, and some action or passion of the self whereby it becomes represented. The former shall be termed the objective, the latter the subjective, element of the cognition. The cognition itself is an intuition of its objective element, which may therefore be called, also, the immediate object.” (CP 5.238)

“Namely, we have to distinguish the Immediate Object, which is the Object as the Sign itself represents it, and whose Being is thus dependent upon the Representation of it in the Sign, from the Dynamical Object, which is the Reality which by some means contrives to determine the Sign to its Representation.” (CP 4.536)

“As to the Object, that may mean the Object as cognized in the Sign and therefore an Idea, or it may be the Object as it is regardless of any particular aspect of it, the Object in such relations as unlimited and final study would show it to be. The former I call the Immediate Object, the latter the Dynamical Object.” (CP 8.183)

Still, we can not know anything without the sign process. One imperative of knowledge representation — within reasonable limits of time, resources and understanding — is to try to ensure that our immediate representation of the objects of our discourse are in as close a correspondence to the dynamic object as possible. This imperative, of course, does not mean assembling every minute bit of information possible in order to characterize our knowledge spaces. Rather, we need to seek a balance between what and how we characterize the instances in our domains with the questions we are trying to address, all within limited time and budgets. Peirce’s pragmatism, as expressed through his pragmatic maxim, helps provide guidance to reach this balance.

Three Modes of Representation

Representations are signs (CP 8.191), and the means by which we point to, draw or direct attention to, or designate, denote or describe a particular object, entity, event, type or general. A representational relationship has the form of re:A. Representations can be designative of the subject, that is, be icons or symbols (including labels, definitions, and descriptions). Representations may be indexes that more-or-less help situate or provide traceable reference to the subject. Or, representations may be associations, resemblances and likelihoods in relation to the subject, more often of indeterminate character.

In Peirce’s mature theory of signs, he characterizes signs according to different typologies, which I discuss further in the next section. One of his better known typologies is how we may denote the object, which, unlike some of his other typologies, he kept fairly constant throughout his life. Peirce formally splits these denotative representations into three kinds: icons, indexes, or symbols (CP 2.228, CP 2.229 and CP 5.473).

“. . . there are three kinds of signs which are all indispensable in all reasoning; the first is the diagrammatic sign or icon, which exhibits a similarity or analogy to the subject of discourse; the second is the index, which like a pronoun demonstrative or relative, forces the attention to the particular object intended without describing it; the third [or symbol] is the general name or description which signifies its object by means of an association of ideas or habitual connection between the name and the character signified.” (CP 1.369)

The icon, which may also be known as a likeness or semblance, has a quality shared with the object such that it resembles or imitates it. Portraits, logos, diagrams, and metaphors all have an iconic denotation. Algebraic expressions are also viewed by Peirce as icons, since he believed (and did much to prove) that mathematical operations can be expressed through diagrammatic means (as is the case with his later existential graphs).

An index denotes the object by some form of linkage or connection. An index draws or compels attention to the object by virtue of this factual connection, and does not require any interpretation or assertion about the nature of the object. A pointed finger to an object or a weathervane indicating which direction the wind is blowing are indexes, as are keys in database tables or Web addresses (URIs or URLs [5]) on the Internet. Pronouns, proper names, and figure legends are also indexes.

Symbols, the third kind of denotation, represent the object by virtue of accepted conventions or ‘laws’ or ‘habits’ (Peirce’s preferred terms). There is an understood interpretation, gained through communication and social consensus. All words are symbols, plus their combinations into sentences and paragraphs. All symbols are generals, but which need to be expressed as individual instances or tokens. For example, ‘the’ is a single symbol (type), but it is expressed many times (tokens) on this page. Knowledge representation, by definition, is based on symbols, which need to be interpreted by either humans or machines based on the conventions and shared understandings we have given them.

Peirce confined the word representation to the operation of a sign or its relation to the interpreter for an object. The three possible modes of denotation — that is, icon, index or symbol — Peirce collectively termed the representamen:

“A very broad and important class of triadic characters [consists of] representations. A representation is that character of a thing by virtue of which, for the production of a certain mental effect, it may stand in place of another thing. The thing having this character I term a representamen, the mental effect, or thought, its interpretant, the thing for which it stands, its object.” (CP 1.564)

Peirce’s Semiosis and Triadomany

A core of Peirce’s world view is thus based in semiotics, the study and logic of signs. In a seminal writing, “What is in a Sign?” [6], Peirce wrote that “every intellectual operation involves a triad of symbols” and “all reasoning is an interpretation of signs of some kind.” This basic triad representation has been used in many contexts, with various replacements or terms at the nodes. One basic form is known as the Meaning Triangle, popularized by Ogden and Richards in 1923 [7], surely reflective of Peirce’s ideas.

For Peirce, the appearance of a sign starts with the representamen, which is the trigger for a mental image (by the interpretant) of the object. The object is the referent of the representamen sign. None of the possible bilateral (or dyadic) relations of these three elements, even combined, can produce this unique triadic perspective. A sign can not be decomposed into something more primitive while retaining its meaning.

Sign (Semiosis) Triad
Figure 1: The Object-Representamen-Interpretant Sign Process (Semiosis)

Let’s summarize the interaction of these three sign components [8]. The object is the actual thing. It is what it is. Then, we have the way that thing is conveyed or represented, the representamen, which is an icon, index or symbol. Then we have how an agent or the perceiver of the sign understands and interprets the sign, the interpretant, which in its barest form is a sign’s meaning, implication, or ramification. For a sign to be effective, it must represent an object in such a way that it is understood and used again. Basic signs can be building blocks for still more complex signs, such as words combined into sentences. This makes the assignment and use of signs a community process of understanding and acceptance [9], as well as a truth-verifying exercise of testing and confirming accepted associations (such as the meanings of words or symbols).

Complete truth is the limit where the understanding of the object by the interpretant via the sign is precise and accurate. Since this limit is never achieved, sign-making and understanding is a continuous endeavor. The overall process of testing and refining signs so as to bring understanding to a more accurate understanding is what Peirce meant by semiosis. Peirce’s logic of signs in fact is a taxonomy of sign relations, in which signs get reified and expanded via still further signs, ultimately leading to communication, understanding and an approximation of canonical truth. Peirce saw the scientific method as an exemplar of this process.

The understanding of the sign is subject to the contexts for the object and agent and the capabilities of the interpreting agent; that makes the interpretant an integral component of the sign. Two different interpretants can derive different meanings from the same representation, and a given object may be represented by different tokens. When the interpretant is a human and the signs are language, shared understandings arise from the meanings given to language by the community, which can then test and add to the truth statements regarding the object and its signs, including the usefulness of those signs. Again, these are drivers to Peirce’s semiotic process.

In the same early 1867 paper in which Peirce laid out the three modes of denotation of icon, index, and symbol [10] [11], he also presented his three phenomenological categories for the first time, what I (and others) have come to call his universal categories of Firstness, Secondness and Thirdness. This seminal paper also provides the contextual embedding of these categories, which is worth repeating in full:

“The five conceptions thus obtained, for reasons which will be sufficiently obvious, may be termed categories. That is,


Quality (reference to a ground),

Relation (reference to a correlate),

Representation (reference to an interpretant),


The three intermediate conceptions may be termed accidents.” (EP 1:6, CP 1.55)

Note the commas, suggesting the order, and the period, in the listing. In his later writings, Peirce ceases to discuss Being and Substance directly, instead focusing on the ‘accidental’ categories that became the first expression of his universal categories. Being, however, represents all that there is and is the absolute, most abstract starting point for Peirce’s epistemology. The three ‘accidental’ categories of Quality, Relation and Representation are one of the first expressions of Peirce’s universal categories or Firstness, Secondness and Thirdness as applied to Substance. “Thus substance and being are the beginning and end of all conception. Substance is inapplicable to a predicate, and being is equally so to a subject.” (CP 1.548)

These two, early triadic relations — one, the denotations in signs, and, two, the universal categories — are examples of Peirce’s lifelong fascination with trichotomies [12]. He used triadic thinking in dozens of areas in his various investigations, often in a recursive manner (threes of threes). It is not surprising, then, that Peirce also applied this mindset to the general characterization of signs themselves.

Peirce returned to the idea of sign typologies and notations at the time of his Lowell Institute lectures at Harvard in 1903 [13]. Besides the denotations of icons, indexes and symbols, that he retained, and represent the three different ways to denote an object, Peirce also proffered three ways to describe the signs themselves (representamen) to fulfill different purposes, and three ways to interpret signs (interpretant) based on possibility, fact, or reason. This more refined view of three trichotomies should theoretically result in 27 different sign possibilities (3 x 3 x 3), except the nature of the monadic, dyadic and triadic relationships embedded in these trichotomies only logically leads to 10 variants (1 + 3 + 6) [14].

Peirce split the purposes (uses) of signs into qualisigns (also called tones, potisigns, or marks), which are signs that consists in a quality of feeling or possibility, and are in Firstness; into sinsigns (also called tokens or actisigns), which consist in action/reaction or actual single occurrences or facts, and are in Secondness; or legisigns (also called types or famisigns), which are signs that consist of generals or representational relations, and are in Thirdness. Instances (tokens) of legisigns are replicas, and thus are a sinsign. All symbols are legisigns. Synonyms, for example, are replicas of the same legisign, since they mean the same thing, but are different sinsigns.

Peirce split the interpretation of signs into three categories. A rheme (also called sumisign or seme) is a sign that stands for its object for some purpose, expressed as a character or a mark. Terms are rhemes, but they also may be icons or indexes. Rhemes may be diagrams, proper nouns or common nouns. A proposition expressed with its subject as a blank (unspecified) is also a rheme. A dicisign (also called dicent sign or pheme ) is the second type of sign, that of actual existence. Icons can not be dicisigns. Dicisigns may be either indexes or symbols, and provide indicators or pointers to the object. Standard propositions or assertions are dicisigns. And an argument (also called suadisign or delome) is the third type of sign that stands for the object as a generality, as a law or habit. A sign itself is an argument, including major and minor premises and conclusions. Combinations of assertions or statements, such as novels or works of art, are arguments.

Table 1 summarizes these 10 sign types and provides some examples of how to understand them:

Sign by use Relative
Sign name (redundancies) Some examples
I Qualisign Icon Rheme (Rhematic Iconic) Qualisign A feeling of “red”
II Sinsign Icon Rheme (Rhematic) Iconic Sinsign An individual diagram
III Index Rheme Rhematic Indexical Sinsign A spontaneous cry
IV Dicisign Dicent (Indexical) Sinsign A weathercock or photograph
V Legisign Icon Rheme (Rhematic) Iconic Legisign A diagram, apart from its factual individuality
VI Index Rheme Rhematic Indexical Legisign A demonstrative pronoun
VII Dicisign Dicent Indexical Legisign A street cry (identifying the individual by tone, theme)
VIII Symbol Rheme Rhematic Symbol (Legisign) A common noun
IX Dicisign Dicent Symbol (Legisign) A proposition (in the conventional sense)
X Argument Argument (Symbolic Legisign) A syllogism

Table 1: Ten Classifications of Signs [15]

This schema is the last one fully developed by Peirce. However, in his last years, he also developed 28-class and 66-class sign typologies, though incomplete in important ways and details. These expansions reflected sign elaborations for various sub-classes of Peirce’s more mature trichotomies, such as for the immediate and dynamic objects previously discussed (see CP 8.342-379). There is a symmetry and recursive beauty to these incomplete efforts, with sufficient methodology suggested to enable informed speculations as to where Peirce may have been heading [16] [17] [18] [19].

We have taken a different path with KBpedia. Rather than engage in archeology, we have chosen to try to fathom and plumb Peirce’s mindset, and then apply that mindset to the modern challenge of knowledge representation. Peirce’s explication of the centrality and power of signs, his fierce belief in logic and reality, and his commitment to discover the fundamental roots of episteme, have convinced us there is a way to think about Peirce’s insights into knowledge representation attuned to today. Peirce’s triadomany [12], especially as expressed through the universal categories, provides this insight.

[2] Charles S. Peirce (1839 – 1914), pronounced “purse,” was an American logician, scientist, mathematician, and philosopher of the first rank. Peirce is a major guiding influence for our KBpedia knowledge system. Quotes in the article are mostly from the electronic edition of The Collected Papers of Charles Sanders Peirce, reproducing Vols. I-VI, Charles Hartshorne and Paul Weiss, eds., 1931-1935, Harvard University Press, Cambridge, Mass., and Arthur W. Burks, ed., 1958, Vols. VII-VIII, Harvard University Press, Cambridge, Mass. The citation scheme is volume number using Arabic numerals followed by section number from the collected papers, shown as, for example, CP 1.208.
[3] Some material in this article was drawn from my prior articles at the AI3:::Adaptive Information blog: “Give Me a Sign: What Do Things Mean on the Semantic Web?” (Jan 2012); “A Foundational Mindset: Firstness, Secondness, Thirdness” (March 2016); “The Irreducible Truth of Threes” (Sep 2016); “Being Informed by Peirce” (Feb 2017). For all of my articles about Peirce, see
[4] Peirce actually spelled it “semeiosis”. While it is true that other philosophers such as Ferdinand de Saussure also employed the shorter term “semiosis”, I also use this more common term due to greater familiarity.
[5] The URI “sign” is best seen as an index: the URI is a pointer to a representation of some form, be it electronic or otherwise. This representation bears a relation to the actual thing that this referent represents, as is true for all triadic sign relationships. However, in some contexts, again in keeping with additional signs interpreting signs in other roles, the URI “sign” may also play the role of a symbolic “name” or even as a signal that the resource can be downloaded or accessed in electronic form. In other words, by virtue of the conventions that we choose to assign to our signs, we can supply additional information that augments our understanding of what the URI is, what it means, and how it is accessed.
[6] Charles Sanders Peirce. 1894. “What is in a Sign?”. Retrieved from
[7] C.K. Ogden and I. A. Richards. 1923. The Meaning of Meaning. Harcourt, Brace, and World, New York.
[8] Peirce himself sometimes used a Y-shaped figure. The triangle is simpler to draw and in keeping with the familiar Ogden and Richards figure of 1923.
[9] Catherine Legg. 2010. “Pragmaticism on the Semantic Web”. In Ideas in Action: Proceedings of the Applying Peirce Conference, 173–188. Retrieved from
[10] Charles S. Peirce. 1867. “On a New List of Categories”. In Proceedings of the American Academy of Arts and Sciences.
[11] Among all of his writings, Peirce said “The truth is that my paper of 1867 was perhaps the least unsatisfactory, from a logical point of view, that I ever succeeded in producing; and for a long time most of the modifications I attempted of it only led me further wrong.” (CP 2.340).
[12] See CP 1.568, wherein Peirce provides “The author’s response to the anticipated suspicion that he attaches a superstitious or fanciful importance to the number three, and forces divisions to a Procrustean bed of trichotomy.”
[13] Charles S. Peirce and The Peirce Edition Project. 1998. “Nomenclature and Divisions of Triadic Relations, as Far as They Are Determined”. In The Essential Peirce: Selected Philosophical Writings, Volume 2 (1893-1913). Indiana University Press, Bloomington, Indiana, 289–299.
[14] Understand each trichotomy is comprised of three elements, A, B and C. The monadic relations are a singleton, A, which can only match with itself and A variants. The dyadic relations can only be between A and B and derivatives. And the triadic relations are between all variants and derivatives. Thus, the ten logical combinations for the three trichotomies are: A-A’-A’’; B-A’-A’’; B-B’-A’’; B-B’-B’’; C-A’-A’’; C-B’-A’’; C-B’-B’’; C-C’-A’’; C-C’-B’’; and C-C’-C’’, for a total of ten options.
[15] From CP 2.254-263, EP 2:294-296, and MS 540 of 1903.
[16] Priscila Borges. 2010. “A Visual Model of Peirce’s 66 Classes of Signs Unravels His Late Proposal of Enlarging Semiotic Theory”. . 221–237.
[17] Robert W. Burch. 2011. “Peirce’s 10, 28, and 66 Sign-Types: The Simplest Mathematics”. Semiotica 2011, 184.
[18] P. Farias and J. Queiroz. 2003. “On Diagrams for Peirce’s 10, 28, and 66 Classes of Signs”. Semiotica 147, 1/4: 165–184.
[19] Tony Jappy. 2017. Peirce’s Twenty-Eight Classes of Signs and the Philosophy of Representation: Rhetoric, Interpretation and Hexadic Semiosis. Bloomsbury Academic. Retrieved September 29, 2017 from
Posted:November 14, 2017

KBpediaSome Basic Use Cases from KBpedia

The human propensity to categorize is based on trying to make sense of the world. The act of categorization is based on how to group things together and how to relate those things and groups to one another. Categorization demands that we characterize or describe the things of the world using what we have termed attributes in order to find similarities [1]. Categorization may also be based on the relationships of things to external things [2]. No matter the method, the results of these categorizations tend to be hierarchical, reflective of what we see in the natural world. We see hierarchies in Nature based on bigger and more complex things being comprised of simpler things, based on fractals or cellular automata, or based on the evolutionary relationships of lifeforms. According to Annila and Kuismanen, “various evolutionary processes naturally emerge with hierarchical organization” [3]. Hierarchy, and its intimate relationship with categorization and categories, is thus fundamental to the why and how we can represent knowledge for computable means.

Depending on context, we can establish hierarchical relationships between types, classes or sets, with instances or individuals, with characteristics of those individuals, and between all of these concepts. There is potentially different terminology depending on context, and the terminology or syntax may also carry formal understanding of how we can process and compute these relationships. Nillson provides a general overview of these kinds of considerations with a useful set of references [4].

Types of Hierarchical Relationships

As early as 1997 Doyle noted in the first comprehensive study of KR languages, “Hierarchy is an important concept. It allows economy of description, economy of storage and manipulation of descriptions, economy of recognition, efficient planning strategies, and modularity in design.” He also noted that “hierarchy forms the backbone in many existing representation languages” [5].

The basic idea of a hierarchy is that some item (‘thing’) is subsidiary to another item. Categorization, expressed both through the categories themselves and the process of how one splits and grows categories, is a constant theme in knowledge representation. The idea of hierarchy is central to what is treated as a category or other such groupings and how those categories or groupings are tied together. A hierarchical relationship is shown diagrammatically in Figure 1 with A or B, the ‘things’, shown as nodes.

Direct Hierarchy

Figure 1: Direct Hierarchy

All this diagram is really saying is that A has some form of superior or superordinate relationship to B (or vice versa, that B is subordinate to A). This is a direct hierarchical relationship, but one of unknown character. Hierarchies can also relate more than two items:

Simple Hierarchy

Figure 2: Simple Hierarchy

In this case, the labels of the items may seem to indicate the hierarchical relationship, but relying on labels is wrong. For example, let’s take this relationship, where our intent is to show the mixed nature of primary and secondary colors [6]:


Figure 3: Multiple Hierarchy

Yet perhaps our intent was rather to provide a category for all colors to be lumped together, as instances of the concept ‘color’ shows here:

Extensional Hierarchy

Figure 4: Extensional Hierarchy

The point is not to focus on colors – which are, apparently, more complicated to model than first blush – but to understand that hierarchical relations are of many types and what one chooses about a relation carries with it logical implications, the logic determined by the semantics of the representation language chosen and how we represent it. For this clarity we need to explicitly define the nature of the hierarchical relationship. Here are some (vernacular) examples one might encounter:





is more basic than



is a superClassOf



is more fundamental than



is broader than






is more general






is parent of



has member



has an instance of



has attribute



has part


Table 1: Example Hierarchical Relationships

Again, though we have now labeled the relationships, which in a graph representation are the edges between the nodes, it is still unclear the populations to which these relations may apply and what their exact semantic relationships may be.

Table 2 shows the basic hierarchical relations that one might want to model, and whether the item resides in the universal categories of Charles Sanders Peirce of Firstness, Secondness or Thirdness, introduced in one of my previous articles [7]:





token (instance)


















Table 2: Possible Pairwise (―) Hierarchical Relationships

Note that, depending on context, some of the items may reside in either Secondness or Thirdness (depending on whether the referent is a particular instance or a general). Also note the familial relationships shown: child-parent-grandparent and child-child relationships occur in actual families and as a way of talking about inheritance or relatedness relations. The idea of type or is-a is another prominent one in ontologies and knowledge graphs. Natural classes or kinds, for example, fall into the type-token relationship. Also note that mereological relationships, such as part-whole, may also leave open ambiguities. We also see certain pairs, such a sub-super, child-parent, or part-whole, need context to resolve the universal category relation.

Reliance on item labels alone for the edges and nodes, even for something as seemingly straightforward as color or pairwise relationships, does not give us sufficient information to determine how to evaluate the relationship nor how to properly organize. We thus see in knowledge representation that we need to express our relationships explicitly. Labels are merely assigned names that, alone, do not specify the logic to be applied, what populations are affected, or even the exact nature of the relationship. Without these basics, our knowledge graphs can not be computable. Yet well over 95% of the assignments in contemporary knowledge bases have this item-item character. We need interpretable relationships to describe the things that populate our domains of inquiry so as to categorize that world into bite-sized chunks.

Salthe categorizes hierarchies into two types: compositional hierarchies and subsumption hierarchies [8]. Mereological and part-whole hierarchies are compositional, as are entity-attribute. Subsumption hierarchies are ones of broader than, familial, or evolutionary. Cottam et al. believe hierarchies to be so basically important as to propose a model abstraction over all hierarchical types, including levels of abstraction [9]. These discussions of structure and organization are helpful to understand the epistemological bases underlying various kinds of hierarchy. We should also not neglect recursive hierarchies, such as fractals or cellular automata, which are also simple, repeated structures commonly found in Nature. Fortunately, Peirce’s universal categories provide a powerful and consistent basis for us to characterize these variations. When paired with logic and KR languages and “cutting Nature at its joints” [10], we end up with an expressive grammar for capturing all kinds of internal and external relations to other things.

So far we have learned that most relationships in contemporary knowledge bases are of a noun-noun or noun-adjective nature, which I have loosely lumped together as hierarchical relationships. These relationships span from attributes to instances (individuals) and classes [11] or types, with and between one another. We have further seen that labels either for the subjects (nodes) or for their relationships (edges) are an insufficient basis for computers (or us!) to reason over. We need to ground our relationships in specific semantics and logics in order for them to be unambiguous to reasoning machines.

Structures Arising from Hierarchies

Structure needs to be a tangible part of thinking about a new KR installation, since many analytic choices need to be supported by the knowledge artifact. Different kinds of structure are best for different tools or kinds of analysis. The types of relations chosen for the artifact affects its structural aspects. These structures can be as simple and small as a few members in a list, to the entire knowledge graph fully linked to its internal and external knowledge sources. Here are some of the prominent types of structures that may arise from connectedness and characterization hierarchies:

  • Lists — unordered members or instances, with or without gaps or duplicates, useful for bulk assignment purposes. Lists generally occur through a direct relation assignment (e.g., rdf:Bag)
  • Neural networks (graphs) — graph designs based on connections modeled on biological neurons, still in the earliest stages with respect to relations and KR formalisms [12]
  • Ontologies (graphs) — sometimes ontologies are treated as synonymous with knowledge graphs, but more often as a superset that may allow more control and semantic representation [13] Ontologies are a central design feature of KBpedia [14]
  • Parts-of-speech — a properly designed ontology has the potential to organize the vocabulary of the KR language itself into corresponding parts-of-speech, which greatly aids natural language processing
  • Sequences — ordered members or instances, with or without gaps or duplicates, useful for bulk assignment purposes. Sequences generally occur through a direct relation assignment (e.g., rdf:Seq)
  • Taxonomies (trees)— trees are subsumption hierarchies with single (instances may be assigned to only one class) or multiple (instances may be assigned to multiple classes or types) inheritance. The latter is the common scaffolding for most knowledge graphs
  • Typologies — are essentially multi-inheritance taxonomies, with the hierarchical organization of types as natural as possible. Natural types (classes or kinds) enable the greatest number of disjoint assertions to be made, leading to efficient processing and modular design. Typologies are a central design feature of KBpedia; see further [15].

Typically KR formalisms and their internal ontologies (taxonomy or graph structures) have a starting node or root, often called ‘thing’, ‘entity’ or the like. Close inspection of the choice of root may offer important insights. ‘Entity’, for example, is not compatible with a Peircean interpretation, since all entities are within Secondness.

KBpedia’s foundational structure is the subsumption hierarchy shown in the KBpedia Knowledge Ontology (KKO) — that is, KBpedia’s upper ontology — and its nodes derived from the universal categories. The terminal, or leaf, nodes in KKO each tie into typologies. All of the typologies are themselves composed of types, which are the hierarchical classification of natural kinds of instances as determined by shared attributes (though not necessarily the same values for those attributes). Most of the types in KBpedia are composed of entities, but attributes and relations also have aggregations of types.

Of course, choice of a KR formalism and what structures it allows must serve many purposes. Knowledge extension and maintenance, record design, querying, reasoning, graph analysis, logic and consistency tests, planning, hypothesis generation, question and answering, and subset selections for external analysis are properly the purview of the KR formalism and its knowledge graph. Yet other tasks such as machine learning, natural language processing, data wrangling, statistical and probabalistic analysis, search indexes, and other data- and algorithm-intensive applications are often best supported by dedicated external applications. The structures to support these kinds of applications, or the ability to export them, must be built into the KR installation, with explicit consideration for the data forms and streams useful to possible third-party applications.

[1] The most common analogous terms to attributes are properties or characteristics; in the OWL language used by KBpedia, attributes are assigned to instances (called individuals) via property (relation) declarations.
[2] The act of categorization may thus involve intrinsic factors or external relationships, with the corresponding logics being either intensional or extensional.
[3] Arto Annila and Esa Kuismanen. 2009. “Natural Hierarchy Emerges from Energy Dispersal”. Biosystems 95, 3: 227–233.
[4] Jørgen Fischer Nilsson. 2006. “Ontological Constitutions for Classes and Properties”. In Conceptual Structures: Inspiration and Application (Lecture Notes in Computer Science), 35–53.
[5] Jon Doyle. 1977. Hierarchy in Knowledge Representations. MIT Artificial Intelligence Laboratory. Retrieved October 24, 2017 from
[6] The first and more standard 3-color scheme was first explicated by J W von Goethe (1749-1832). What is actually more commonly used in design is a 4-color scheme from Ewald Hering (1834-1918).
[7] Michael K. Bergman. 2016. “A Foundational Mindset: Firstness, Secondness, Thirdness”. AI3:::Adaptive Information. Retrieved September 18, 2017 from
[8] Stanley Salthe. 2012. Hierarchical Structures.<
[9] Ron Cottam, Willy Ranson, and Roger Vounckx. 2016. “Hierarchy and the Nature of Information”. Information 7, 1: 1.
[10] Plato. “Phaedrus Dialog (page 265e)”. Perseus Digital Library. Retrieved November 11, 2017 from
[11] In the OWL 2 language used by KBpedia, a class is any arbitrary collection of objects. A class may contain any number of instances (called individuals) or a class may be a subclass of another. Instances and subclasses may belong to none, one or more classes. Both extension and intension may be used to assign instances to classes.
[12] Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. 2017. “A Simple Neural Network Module for Relational Reasoning”. arXiv:1706.01427 [cs]. Retrieved November 1, 2017 from
[13] RDF graphs are more akin to the first sense; OWL 2 graphs more to the latter.
[14] In the semantic Web space, “ontology” was the original term because of the interest to capture the nature or being (Greek ὄντως, or ontós) of the knowledge domain at hand. Because the word ‘ontology’ is a bit intimidating, a better variant has proven to be the knowledge graph (because all semantic ontologies take the structural form of a graph).
[15] Michael K. Bergman. 2016. “Rationales for Typology Designs in Knowledge Bases”. AI3:::Adaptive Information. Retrieved September 18, 2017 from

Posted by AI3's author, Mike Bergman Posted on November 14, 2017 at 3:20 pm in Adaptive Information, Big Structure, KBpedia | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: