Posted:April 5, 2016

AI3 PulseNew Method Appears Promising for Machine Learning, Feature Generation

An exciting new network analysis framework was published today. The paper, Deep Graphs – A General Framework to Represent and Analyze Heterogeneous Complex Systems Across Scales, presents the background information and derivation of methods applied to this new approach for analyzing networks [1]. The authors of the paper, Dominik Traxl, Niklas Boers and Jürgen Kurths, also released the open source DeepGraph network analysis package, written in Python, for undertaking and conducting the analysis. Detailed online documentation accompanies the entire package.

The basic idea behind Deep Graphs is to segregate graph nodes and edges into types, which form supernodes and superedges, respectively. These grouped types then allow the graph to be partitioned into lattices, which can be intersected (combinations of nodes and edges) into representing deeper graph structures embedded in the initial graph. The method can be applied to a graph representation of anything, since the approach is grounded in the graph primitives of nodes and edges using a multi-layer network (MLN) representation.

AI3 PulseThese deeper graph structures can themselves be used as new features for machine learning or other applications. A deep graph, which the authors formally define as a geometric partition lattice of the source graph, conserves the original information in the graph and allows it to be redistributed to the supernodes and superedges. Intersections of these may surface potentially interesting partitions of the graph that deserve their own analysis.

The examples the authors present show the suitability of the method for time-series data, using precipitation patterns in South America. However, as noted, the method applies to virtually any data that can be representated as a graph.

Though weighted graphs and other techniques have been used, in part, for portions of this kind of analysis in the past, this appears to be the first generalized method applicable to the broadest ways to aggregate and represent graph information. The properties associated with a given node may similarly be representated and aggregated. The aggregation of attributes may provide an additional means for mapping and relating external datasets to one another.

There are many aspects of this approach that intrigue us here at Structured Dynamics. First, we are always interested in network and graph analytical techniques, since all of our source schema are represented as knowledge graphs. Second, our specific approach to knowledge-based artificial intelligence places a strong emphasis on types and typologies for organizing entities (nodes and event relations) and we also separately segregate attribute property information [2]. And, last, finding embedded superstructures within the source graphs should also work to enhance the feature sets available for supervised machine learning.

We will later post our experiences in working with this promising framework.


[1] Dominik Traxl, Niklas Boers and Jürgen Kurths, “Deep Graphs – A General Framework to Represent and Analyze Heterogeneous Complex Systems Across Scales“, arXiv:1604.00971, April 5, 2016. To be published in Chaos: An Interdisciplinary Journal of Nonlinear Science.
[2] See M. K. Bergman, 2014. “Knowledge-based Artificial Intelligence,” from AI3:::Adaptive Information blog, November 17, 2014.
Posted:March 28, 2016

AI3 PulseLong-lost Global Warming Paper is Still Pretty Good

My first professional job was being assistant director and then project director for a fifty-year look at the future of coal use by the US Environmental Protection Agency. The effort, called the Coal Technology Assessment (CTA), was started under the Carter Administration in the late 1970s, and then completed after Reagan took office in 1981. That era also spawned the Congressional Office of Technology Assessment. Trying to understand and forecast technological change was a big deal at that time.

 We produced many, many reports from the CTA program, some of which were never published because of politics and whether they were at odds or not with official policies of one or the other administration. Nonetheless, we did publish quite a few reports. Perhaps it is the sweetness of memory, but I also recollect we did a pretty good job. Now that more than 35 years have passed, it is possible to see whether we did a good job or not in our half-century forecasts.

The CTA program was the first to publish an official position of EPA on global warming [1], which we also backed up with a more formal academic paper [2]. I have thought much of that paper on occasion over the years, but I did not have a copy myself and only had a memory, but not hard copy, of the paper.

Last week, however, I was contacted by a post-doctoral researcher in Europe trying to track down early findings and recollections of some of the earliest efforts on global climate change. She had a copy of our early paper and was kind enough to send me a copy. I have since been able to find other copies online [2].

In reading over the paper again, I am struck by two things. First, the paper is pretty good, and still captures (IMO) the uncertainty of the science and how to conduct meaningful policy in the face of that uncertainty. And, second, but less positive, is the sense of how little truly has gotten done in the intervening decades. This same sense of déjà vu all over again applies to many of the advanced energy technologies — such as fuel cells, photovoltaics, and passive solar construction — we were touting at that time.

Of course, my own career has moved substantially from energy technologies and policy to a different one of knowledge representation and artificial intelligence. But, it is kind of cool to look back on the passions of youth, and to see that my efforts were not totally silly. It is also kind of depressing to see how little has really changed in nearly four decades.


[1] M.K. Bergman, 1980. “Atmospheric Pollution: Carbon Dioxide,” Environmental Outlook — 1980, Strategic Analysis Group, U.S. Environmental Protection Agency, EPA 600/8 80 003, July 1980, pp. 225-261.
[1] Kan Chen, Richard C. Winter, and Michael K. Bergman, 1980. “Carbon dioxide from fossil fuels: Adapting to uncertainty.” Energy Policy 8, no. 4 (1980): 318-330.

Posted by AI3's author, Mike Bergman Posted on March 28, 2016 at 11:50 am in Adaptive Information, Pulse | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1934/withstanding-the-test-of-time/
The URI to trackback this post is: http://www.mkbergman.com/1934/withstanding-the-test-of-time/trackback/
Posted:March 21, 2016

Download as PDF

Penrose TriangleUnlocking Some Insights into Charles Sanders Peirce’s Writings

I first encountered Charles Sanders Peirce from the writings of John Sowa about a decade ago. I was transitioning my research interests from search and the deep Web to the semantic Web. Sowa’s writings are an excellent starting point for learning about logic and ontologies [1]. I was particularly taken by Sowa’s presentation on the role of signs in our understanding of language and concepts [2]. Early on it was clear to me that knowledge modeling needed to focus on the inherent meaning of things and concepts, not their surface forms and labels. Sowa helped pique my interest that Peirce’s theory of semiotics was perhaps a foundational basis for getting at these ideas.

In the decade since that first encounter, I have based my own writings on Peirce’s insights on a number of occasions [3]. I have also developed a fascination into his life and teachings and thoughts across many topics. I have become convinced that Peirce was the greatest American combination of philosopher, logician, scientist and mathematician, and quite possibly one of the greatest thinkers ever. While the current renaissance in artificial intelligence can certainly point to the seminal contributions of George Boole, Claude Shannon, and John von Neumann in computing and information theory (of course among many others), my own view, not alone, is that C.S. Peirce belongs in those ranks from the perspective of knowledge representation and the meaning of information.

“The primary task of ontology, as it was practiced by its founder Aristotle, is to bridge the gap between what exists and the languages, both natural and artificial, for talking and reasoning about what exists.”
John Sowa [4]

Peirce is hard to decipher, for some of the reasons outlined below. Yet I have continued to try to crack the nut of Peirce’s insights because his focus is so clearly on the organization and categorization of information, essential to the knowledge foundations and ontologies at the center of Structured Dynamics‘ client activities and my own intellectual passions. Most recently, I had one of those epiphanies from my study of Peirce that scientists live for, causing me to change perspective from specifics and terminology to one of mindset and a way to think. I found a key to unlock the meaning basis of information, or at least one that works for me. I try to capture a sense of those realizations in this article.

A Starting Point: Peirce’s Triadic Semiosis

Since it was the idea of sign-forming and the nature of signs in Peirce’s theory of semiosis that first caught my attention, it makes sense to start there. The figure to the right shows Peirce’s understanding of the basic, triadic nature of the sign. Triangles and threes pervade virtually all aspects of Peirce’s theories and metaphysics.Sign (Semiosis) Triad

For Peirce, the appearance of a sign starts with the representamen, which is the trigger for a mental image (by the interpretant) of the object [20]. The object is the referent of the representamen sign. None of the possible bilateral (or dyadic) relations of these three elements, even combined, can produce this unique triadic perspective. A sign can not be decomposed into something more primitive while retaining its meaning.

A sign is an understanding of an “object” as represented through some form of icon, index or symbol, from environmental to visual to aural or written. Complete truth is the limit where the understanding of the object by the interpretant via the sign is precise and accurate. Since this limit is rarely (ever?!) achieved, sign-making and understanding is a continuous endeavor. The overall process of testing and refining signs so as to bring understanding to a more accurate understanding is what Peirce called semiosis [5].

In Peirce’s world view — at least as I now understand it — signs are the basis for information and life (yes, you read that right) [6]. Basic signs can be building blocks for still more complex signs. This insight points to the importance of the ways these components of signs relate to one another, now adding the perspective of connections and relations and continuity to the mix.

Because the interpretant is an integral component of the sign, the understanding of the sign is subject to context and capabilities. Two different interpretants can derive different meanings from the same representation, and a given object may be represented by different tokens. When the interpretant is a human and the signs are language, shared understandings arise from the meanings given to language by the community, which can then test and add to the truth statements regarding the object and its signs, including the usefulness of those signs. Again, these are drivers to Peirce’s semiotic process.

Thinking in Threes: Context for Peirce’s Firstness, Secondness, Thirdness

As Peirce’s writings and research evolved over the years, he came to understand more fundamental aspects of this sign triad. Trichotomies and triads permeate his theories and writings in logic, realism, categories, cosmology and metaphysics. He termed this tendency and its application in the general as Firstness, Secondness and Thirdness. In Peirce’s own words [7]:

“The first is that whose being is simply in itself, not referring to anything nor lying behind anything. The second is that which is what it is by force of something to which it is second. The third is that which is what it is owing to things between which it mediates and which it brings into relation to each other.” (CP 2.356)

Peirce’s fascination with threes is not unique. In my early career designing search engines, we often used threes as quick heuristics for setting weights and tuning parameters. We note that threes are at the heart of the Resource Description Framework data model, with its subjectpredicateobject ‘triples’ that are its basic statements and assertions. The logic gates of transistors are based on threes. From an historical perspective prior to Peirce, scholastic philosophers, ranging from Duns Scotus and the Modists from medieval times to John Locke and Immanuel Kant with his three formulations, expressed much of their thinking in threes [8]. As Locke wrote in 1690 [9]:

“The ideas that make up our complex ones of corporeal substances are of three sorts. First, the ideas of the primary qualities of things, which are discovered by our senses, and are in them even when we perceive them not; such are the bulk, figure, number, situation, and motion of the parts of bodies which are really in them, whether we take notice of them or no. Secondly, the sensible secondary qualities which, depending on these, are nothing but the powers these substances have to produce several ideas in us by our senses; which ideas are not in the things themselves otherwise than as anything is in its cause. Thirdly, the aptness we consider in any substance to give or receive such alteration of primary qualities, as that the substance, so altered should produce in us different ideas from what it did before.”

More recently, one the pioneers of artificial intelligence, Marv Minksy, who passed away in late January, noted his penchant for threes [10]:

But in knowledge representation, as practiced today in foundational or upper ontologies, the organizational view of the world is mostly binary. Upper ontologies often reflect one or more of these kinds of di-chotomies [11,12] (to pick up on Minksy’s joke):

  • abstract-physical — a split between what is fictional or conceptual and what is tangibly real
  • occurrent-continuant — a split between a “snapshot” view of the world and its entities versus a “spanning” view that is explicit about changes in things over time
  • perduant-endurant — a split for how to regard the identity of individuals, either as a sequence of individuals distinguished by temporal parts (for example, childhood or adulthood) or as the individual enduring over time
  • dependent-independent — a split between accidents (which depend on some other entity) and substances (which are independent)
  • particulars-universals — a split between individuals in space and time that cannot be attributed to other entities versus abstract universals such as properties that may be assigned to anything, or
  • determinate-indeterminate.

While it is true that most of these distinctions are important ones in a foundational ontology, that does not mean that the entire ontology space should be dichotomized between them. Further, with the exception of Sowa’s ontology [4], none of the more common upper ontologies embrace any semblance of Peirce’s triadic perspective. Further, even Sowa’s ontology only partially applies Peircean principles, and it has been criticized on other grounds as well [11].

The triadic model of signs was built and argued by Peirce as the most primitive basis for applying logic suitable for the real world, with conditionals, continua and context. Truthfulness and verifiability of assertions is by nature variable. The ability of the primitive logic to further categorize the knowledge space led Peirce to elaborate well a 10-sign system, followed by a 28-sign and then a 66-sign one [13]. Neither of the two larger systems were sufficiently described by Peirce before his death. Though Peirce notes in multiple places the broad applicability of the logic of semiosis to things like crystal formation, the emergence of life, animal communications, and automation, his primary focus appears to have been human language and signs used to convey concepts and thoughts. But we are still mining Peirce’s insights, with only about 25% of his writings yet published [14].

The nature needed to be the sign because that is how information is conveyed, and the trichotomy parts were the fewest “decomposable” needed to model the real world; we would call these “primitives” in modern terminology. Here are some of Peirce’s thoughts as to what makes something “indecomposable” (in keeping with his jawbreaking terminology) [7]:

“It is a priori impossible that there should be an indecomposable element which is what it is relatively to a second, a third, and a fourth. The obvious reason is that that which combines two will by repetition combine any number. Nothing could be simpler; nothing in philosophy is more important.” (CP 1.298)

“We find then a priori that there are three categories of undecomposable elements to be expected in the phaneron: those which are simply positive totals, those which involve dependence but not combination, those which involve combination.” (CP 1.299)

“I will sketch a proof that the idea of meaning is irreducible to those of quality and reaction. It depends on two main premisses. The first is that every genuine triadic relation involves meaning, as meaning is obviously a triadic relation. The second is that a triadic relation is inexpressible by means of dyadic relations alone. . . . every triadic relation involves meaning.” (CP 1.345)

“And analysis will show that every relation which is tetradic, pentadic, or of any greater number of correlates is nothing but a compound of triadic relations. It is therefore not surprising to find that beyond the three elements of Firstness, Secondness, and Thirdness, there is nothing else to be found in the phenomenon.” (CP 1.347)

Robert Burch has called Peirce’s ideas of “indecomposability” the ‘Reduction Thesis’ [15]. Peirce was able to prove these points with his form of predicate calculus (first-order logic) and via the logics of his existential graphs.

Once the basic structure of the trichotomy and the nature of its primitives were in place, it was logical for Peirce to generalize the design across many other areas of investigation and research. Because of the signs’ groundings in logic, Peirce’s three main forms of deductive, inductive and abductive logic also flow from the same approach and mindset. Using his broader terminology of the general triad, Peirce writes that when the First and Second [7]:

“. . . are found inadequate, the third is the conception which is then called for. The third is that which bridges over the chasm between the absolute first and last, and brings them into relationship. We are told that every science has its qualitative and its quantitative stage; now its qualitative stage is when dual distinctions — whether a given subject has a given predicate or not — suffice; the quantitative stage comes when, no longer content with such rough distinctions, we require to insert a possible halfway between every two possible conditions of the subject in regard to its possession of the quality indicated by the predicate. Ancient mechanics recognized forces as causes which produced motions as their immediate effects, looking no further than the essentially dual relation of cause and effect. That was why it could make no progress with dynamics. The work of Galileo and his successors lay in showing that forces are accelerations by which [a] state of velocity is gradually brought about. The words “cause” and “effect” still linger, but the old conceptions have been dropped from mechanical philosophy; for the fact now known is that in certain relative positions bodies undergo certain accelerations. Now an acceleration, instead of being like a velocity a relation between two successive positions, is a relation between three. . . . we may go so far as to say that all the great steps in the method of science in every department have consisted in bringing into relation cases previously discrete.” (CP 1.359)

My intuition of the importance of the third part of the triad comes from such terms as perspective, gradation and probability, concepts impossible to capture in a binary world.

Some Observations on the Knowledge Of and Use of the Peircean Triad

C.S. Peirce embraced a realistic philosophy, but also embedded it in a belief that our understanding of the world is fallible and that we needed to test our perceptions via logic. Better approximations of truth arise from questioning using the scientific method (via a triad of logics) and from refining consensus within the community about how (via language signs) we communicate that truth. Peirce termed this overall approach pragmatism; it is firmly grounded in Peirce’s views of logic and his theory of signs. While there is absolute truth, in Peirce’s semiotic process it acts more as a limit, to which our seeking of additional knowledge and clarity of communication with language continuously approximates. Through the scientific method and questioning we get closer and closer to the truth and to an ability to communicate it to one another. But new knowledge may change those understandings, which in any case will always remain proximate [16].

Peirce greatly admired the natural classification systems of Louis Agassiz and used animal lineages in many of his examples. He was a strong proponent of natural classification. Though the morphological basis for classifying organisms in Peirce’s day has been replaced with genetic means, Peirce would surely support this new knowledge, since his philosophy is grounded on a triad of primitive unary, binary and tertiary relations, bound together in a logical sign process seeking truth. Again, Peirce called these Firstness, Secondness, and Thirdness.

Like many of Peirce’s concepts, his ideas of Firstness, Secondness and Thirdness (which I shall hereafter just give the shorthand of ‘Thirdness‘) have proven difficult to grasp, let alone articulate. After a decade of reading and studying Peirce, I think I can point to these factors as making Peirce a difficult nut to crack:

  • First, though most papers that Peirce published during his lifetime are available, perhaps as many as three-quarters of his writings still wait to be transcribed [14];
  • Second, Peirce is a terminology junky, coining and revising terms with infuriating frequency. I don’t think he did this just to be obtuse. Rather, in his focus on language and communications (as signs) he wanted to avoid imprecise or easily confused terms. He often tried to ground his terminology in Greek language roots, and tried to be painfully precise in his use of suffixes and combinations. Witness his use of semeiosis over semiosis, or the replacement of pragmatism with pragmaticism to avoid the misuse he perceived from its appropriation by William James. That Peirce settled on his terminology of Thirdness for his triadic relations signifies its generality and universal applicability;
  • Third, Peirce wrote and refined his thinking over a written historical record of nearly fifty years, which was also a period of the most significant technological changes in human history. Terms and ideas evolved much over this time. His views of categories and signs evolved in a similar manner. In general, revisions in terminology or concepts in his later writings should hold precedence over earlier ones;
  • Fourth, he was active in elaborating his theory of signs to be more inclusive and refined, a work of some 66 putative signs that remained very much incomplete at the time of his death. There has been a bit of a cottage industry in trying to rationalize and elucidate what this more complex sign schema might have meant [17], though frankly much of this learned inspection feels terminology-bound and more like speculation than practical guidance; and
  • Fifth, and possibly most importantly, most Peircean scholarship appears to me to be more literal with an attempt to discern original intent. Many arguments seem fixated on nuance or terminology interpretation as opposed to its underlying meaning or mindset. To put it in Peircean terms, most scholarship of Peirce’s triadic signs seems to be focused on Firstness and Secondness, rather than Thirdness.

The connections of Peirce’s sign theory, his three-fold logic of deduction-induction-abduction, the role he saw for the scientific method as the proper way to understand and adjudicate “truth”, and his really neat ideas about a community of inquiry have all fed my intuition that Peirce was on to some very basic insights. My Aha! moment, if I can elevate it as such, was when I realized that trying to cram these insights into Peirce’s elaborate sign terminology and other literal aspects of his writing were self-defeating. The Aha! arose when I chose rather to try to understand the mindset underlying Peirce’s thinking and the triadic nature of his semiosis. The very generalizations Peirce made himself around the rather amorphous designations of Firstness, Secondness, Thirdness seemed to affirm that what he was truly getting at was a way of thinking, a way of “decomposing” the world, that had universal applicability irrespective of domain or problem.

Thus, in order to make this insight operational, it first was necessary to understand the essence of what lies behind Peirce’s notions of Firstness, Secondness and Thirdness.

An Expanded View of Firstness, Secondness and Thirdness

Peirce’s notions of Thirdness are expressed in many different ways in many different contexts. These notions have been further interpreted by the students of Peirce. In order to get at the purpose of the triadic Thirdness concepts, I thought it useful to research the question in the same way that Peirce recommends. After all, Firstness, Secondness and Thirdness should themselves be prototypes for what Peirce called the “natural classes” [7]:

“The descriptive definition of a natural class, according to what I have been saying, is not the essence of it. It is only an enumeration of tests by which the class may be recognized in any one of its members. A description of a natural class must be founded upon samples of it or typical examples.” (CP 1.223)

The other interesting aspect of Peirce’s Thirdness is how relations between Firstness, Secondness and Thirdness are treated. Because of the sort of building block nature inherent in a sign, not all potential dyadic relations between the three elements are treated equally. According to the ‘qualification rule’, “a First can be qualified only by a first; a Second can be qualified by a First and a Second; and a Third can be qualified by a First, Second, and a Third” [18]. Note that a Third can not be involved in either a First or Second.

Keeping these dynamics in mind, here is my personal library of Thirdness relationships as expressed by Peirce in his own writings, or in the writings of his students. Generally, references to Thirdness are scattered, and to my knowledge no where can one see more than two or three examples side-by-side. The table below is thus “an enumeration of tests by which the class may be recognized in any one of its members” [19]:

Firstness Secondness Thirdness
first second third
monad dyad triad
point line triangle
being existence external
qualia particularity generality
chaos order structure
“past” “present” “future”
sign object interpretant
inheres adheres coheres
attribute individual type
icon index symbol
quality “fact” thought
sensation reaction convergence
independent relative mediating
intension extension information
internal external conceptual
spontaneity dependence meaning
possibility fact law
feeling effort habit
chance law habit-taking
qualities of phenomena actual facts laws (and thoughts)
feeling consciousness thought
thought-sign connected interpreted
possible modality actual modality necessary modality
possibles occurrences collections
abstractives concretetives collectives
descriptives denominatives distributives
conscious (feeling) self-conscious mind
words propositions arguments
terms propositions inferences/syllogisms
singular characters dual characters plural characters
absolute chance mechanical necessity law of love
symbols generality interpreter
simples recurrences comprehensions
idea (of) kind of existence continuity
ideas determination of ideas by
previous ideas
determination of ideas by
previous process
what is possible what is actual what is necessary
hypothetical categorical relative
deductions inductions abductions
clearness of conceptions clearness of distinctions clearness of practical implications
speculative grammar logic and classified arguments methods of truth-seeking
phenomenology normative science metaphysics
tychasticism anancasticism agapasticism
primitives and essences characterizing the objects transformations and reflections
what may be what characterizes it what it means
complete in itself, freedom, measureless variety, freshness, multiplicity, manifold of sense, peculiar, idiosyncratic, suchness idea of otherness, comparison, dichotomies, reaction, mutual action, will, volition, involuntary attention, shock, sense of change idea of composition, continuity, moderation, comparative, reason, sympathy, intelligence, structure, regularities, representation
Examples from Research and the Literature of Firstness, Secondness, Thirdness

The best way to glean meaning from this table is through some study and contemplation.

Because these examples are taken from many contexts, it is important to review this table on a row-by-row basis when investigating the nature of ‘Thirdness’. Review of the columns helps elucidate the “natural classes” of Firstness, Secondness and Thirdness. Some items appear in more than one column, reflecting the natural process of semiosis wherein more basic concepts cascade to the next focus of semiotic attention. The last row is a kind of catch-all trying to capture other mentions of Thirdness in Peirce’s phenomenology.

The table spans from the fully potential or abstract, such as “first” or “third”, to entire realms of science or logic. This spanning of scope reflects the genius of Peirce’s insight wherein semiosis can begin literally at the cusp of Nothingness [20] and then proceed to capture the process of signmaking, language, logic, the scientific method and thought abstraction to embrace the broadest and most complex of topics. This process is itself mediated by truth-testing and community use and consensus, with constant refinement as new insights and knowledge arise.

Reviewing these trichotomies affirms the fulsomeness of Peirce’s semiotic model. Further, as Peirce repeatedly noted, there are no hard and fast boundaries between these categories [21]. Forces of history or culture or science are complex and interconnected in the extreme; trying to decompose complicated concepts into their Thirdness is a matter of judgment and perspective. Peirce, however, was serene about this, since the premises and assignments resulting from such categorizations are (ultimately) subject to logical testing and conformance with the observable, real world.

The ‘Thirdness’ Mindset Applied to Categorization

Our excursion into Peirce’s foundational, triadic view was driven by pragmatic needs. Structured Dynamics‘ expertise in knowledge-based artificial intelligence (KBAI) benefits from efficient and coherent means to represent knowledge. The data models and organizational schema underlying KR should be as close as possible to the logical ways the world is structured and perceived. A key aspect of that challenge is how to define a grammar and establish a logical structure for representing knowledge. Peirce’s triadic approach and mindset have come to be, in my view, essential foundations to that challenge.

As before, we will again let Peirce’s own words guide us in how to approach the categorization of our knowledge domains. Let’s first address the question of where we should direct attention. How do we set priorities for where our categorization attention should focus? [7]:

“Taking any class in whose essential idea the predominant element is Thirdness, or Representation, the self development of that essential idea — which development, let me say, is not to be compassed by any amount of mere “hard thinking,” but only by an elaborate process founded upon experience and reason combined — results in a trichotomy giving rise to three sub-classes, or genera, involving respectively a relatively genuine thirdness, a relatively reactional thirdness or thirdness of the lesser degree of degeneracy, and a relatively qualitative thirdness or thirdness of the last degeneracy. This last may subdivide, and its species may even be governed by the three categories, but it will not subdivide, in the manner which we are considering, by the essential determinations of its conception. The genus corresponding to the lesser degree of degeneracy, the reactionally degenerate genus, will subdivide after the manner of the Second category, forming a catena; while the genus of relatively genuine Thirdness will subdivide by Trichotomy just like that from which it resulted. Only as the division proceeds, the subdivisions become harder and harder to discern.” (CP 5.72)

The way I interpret this (in part) is that categories in which new ideas or insights have arisen — themselves elements of Thirdness for that category — are targets for new categorization. That new category should focus on the idea or insight gained, such that each new category has a character and scope different from the one that spawned it. Of course, based on the purpose of the KBAI effort, some ideas or insights have larger potential effect on the domain, and those should get priority attention. As a practical matter this means that categories of more potential importance to the sponsor of the KBAI effort receive the most focus.

Once a categorization target has been chosen, Peirce also put forward some general execution steps [7]:

Concept Triad

“. . . introduce the monadic idea of »first« at the very outset. To get at the idea of a monad, and especially to make it an accurate and clear conception, it is necessary to begin with the idea of a triad and find the monad-idea involved in it. But this is only a scaffolding necessary during the process of constructing the conception. When the conception has been constructed, the scaffolding may be removed, and the monad-idea will be there in all its abstract perfection. According to the path here pursued from monad to triad, from monadic triads to triadic triads, etc., we do not progress by logical involution — we do not say the monad involves a dyad — but we pursue a path of evolution. That is to say, we say that to carry out and perfect the monad, we need next a dyad. This seems to be a vague method when stated in general terms; but in each case, it turns out that deep study of each conception in all its features brings a clear perception that precisely a given next conception is called for.” (CP 1.490)

We are basing this process of categorization upon the same triadic design noted above. However, now that our context is categorization, the nature of the triad is different than that for the basic sign, as the similar figure to the right attests.

The area of the Secondness is where we surface and describe the particular objects or elements that define this category. Peirce described it thus [7]:

“So far Hegel is quite right. But he formulates the general procedure in too narrow a way, making it use no higher method than dilemma, instead of giving it an observational essence. The real formula is this: a conception is framed according to a certain precept, [then] having so obtained it, we proceed to notice features of it which, though necessarily involved in the precept, did not need to be taken into account in order to construct the conception. These features we perceive take radically different shapes; and these shapes, we find, must be particularized, or decided between, before we can gain a more perfect grasp of the original conception. It is thus that thought is urged on in a predestined path. This is the true evolution of thought, of which Hegel’s dilemmatic method is only a special character which the evolution is sometimes found to assume.” (CP 1.491)

In Thirdness we are contemplating the category, thinking about it, analyzing it, using and gaining experience with it, such that we can begin to see patterns or laws or “habits” (as Peirce so famously put it) or new connections and relationships with it. The ideas and insights (and laws or standardizations) that derive from this process are themselves elements of the category’s Thirdness. This is where new knowledge arises or purposes are fulfilled, and then subsequently split and codified as new signs useful to the knowledge space.

As domains are investigated to deeper levels or new insights expand the branches of the knowledge graph, each new layer is best tackled via this three-fold investigation. Of course, context requires its own perspectives and slices; the listing of Thirdness options provided above can help stimulate these thoughts.

Using Peirce’s labels, but my own diagram, we can show the categorization process as having some sequential development:

A "Peircean" Approach to Categorization

But, of course, interrelationships adhere to the Peircean Thirdness and there continues to be growth and additions. Categories thus tend to fill themselves up with more insights and ideas until such time as the scope and diversity compel another categorization. In these ways categorization is not linear, but accretive and dynamic.

Like our investigations of the broad idea of Thirdness above, there are some Firstness, Secondness, and Thirdness aspects of how to think about the idea of categorization. I use this kind of mental checklist when it comes time to split a concept or category into a new categorization:

Firstness Secondness Thirdness
Symbols idea of; nature of; milieu;
“category potentials”
reference concepts standards
Generality cross-products of Firstness language (incl. domain); computational analysis; representation; continua
Interpreters
(human or machine)
What are the ingredients, ideas, essences of the category? What are the new things or relationships of the category? What are the laws, practices, outputs arising from the category?
General Thoughts on Using ‘Thirdness’ for Categorization

The essential point is to break free from Peirce’s often stultifying terminology and embrace the mindset behind Thirdness. Categorization, or any other knowledge representation task for that matter, can be approached logically and, yes, systematically.

The Perspective of Thirdness

Just as perspective does not occur without Thirdness, I think we will see Peirce’s contributions make a notable difference in how knowledge representation efforts move forward. A driver of this change is knowledge-based artificial intelligence. I feel like problems and questions that have stymied me for decades are lifting like so much fog as I embrace the Peircean Thirdness mindset. I think that it is possible to codify and train others to use this mindset, which is really but a specialized application of Peirce’s overall conception of semiosis [22].

Twenty five years ago Nathan Houser opined that “. . . a sound and detailed extension of Peirce’s analysis of signs to his full set of ten divisions and sixty-six classes is perhaps the most pressing problem for Peircean semioticians” [23]. I agree with the sense of this opinion, but the ten divisions and sixty-six classes are a sign classification; the greater primitive for Peirce’s thinking is the triad and his application of it across all domains of discourse. This is the better grounding for understanding Peirce.

John Sowa, mentioned in the intro, also put forward a knowledge representation, which he partially attributed to Peirce [2,4], and included the three basic elements of the sign triad. But Sowa did not infuse his design with the Peircean triad, with the amalgam criticized for its lack of coherency [11]. Peircean ideas have also informed computational approaches [24] and language parsing [25]. Nonetheless, despite important Peircean ideas and contributions across the knowledge representation spectrum, I have been unable to find any upper ontology or vocabulary based on Thirdness. Terminology can get in the way.

In the intro, I mentioned my epiphany from specifics to mindset in Peirce’s teachings. This insight has not caused me to suddenly understand everything Peirce was trying to say, nor to come to some new level of consciousness. However, what it has done is to open a door to a new way of thinking and looking at the world. I am now finding prior, knotty problems of categorization and knowledge representation are becoming (more) tractable. I am excited and eager to look at some problems that have stymied me for years. Many of these problems, such as how to model events, situations, identity, representation, and continuity or characterization through time, may sound like philosophers’ mill stones, but they often lie at the heart of the most difficult problems in knowledge modeling and representation. Even the tiniest break in the mental and conceptual logjams around such issues feels like major progress. For that, I thank Peirce’s triads.


[1] See Sowa’s Web site, especially the sections on ontology, knowledge representation, and publications.
[2] See, for example, John F. Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS 2000 in Darmstadt, Germany, on August 14, 2000; published in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. May be found at http://www.jfsowa.com/ontology/ontometa.htm. Also see John F. Sowa, 2006. “Peirce’s Contributions to the 21st Century,” presented at International Conference on Conceptual Structures, Aalborg, Denmark, July 17, 2006; and [4] below.
[3] I have written a number of pieces based primarily around Peirce’s insights; see, for example, http://www.mkbergman.com/category/peircean-principles/.
[4] John F. Sowa, 2001. “Signs, Processes, and Language Games: Foundations for Ontology,” in Proceedings of the 9th International Conference On Conceptual Structures, ICCS’01. 2001.
[5] Peirce actually spelled his approach as semeiosis, but I use the simpler version here. See also separate discussion of pragmaticism.
[6] For example, Peirce said [7]: “Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there.” (CP 4.551). At first this seems rather strange. However, “thought” for Peirce in this context is the notion of the process by which the sign is recognized and interpreted. See also [20].
[7] See the electronic edition of The Collected Papers of Charles Sanders Peirce, reproducing Vols. I-VI, Charles Hartshorne and Paul Weiss, eds., 1931-1935, Harvard University Press, Cambridge, Mass., and Arthur W. Burks, ed., 1958, Vols. VII-VIII, Harvard University Press, Cambridge, Mass. The citation scheme is volume number using Arabic numerals followed by section number from the collected papers, shown as, for example, CP 1.208.
[8] Also see, for example, the use of trichotomies in philosophy or some of the nature of three in mathematics or religion.
[9] J. Locke, 1690. “An Essay Concerning Human Understanding”, Book II, Chapter XXXIII. Reprinted, 1964: 249. John Y. Yolton, Ed. Dutton. New York, NY.
[11] Ludger Jansen, 2008. “Categories: The Top-level Ontology,” Applied ontology: An introduction (2008): 173-196.
[12] Nicola Guarino, 1997. “Some Organizing Principles For A Unified Top-Level Ontology,” National Research Council, LADSEB-CNR Int. Report, V3.0, August 1997 
[13] P. Farias and J. Queiroz, 2003. “On Diagrams for Peirce’s 10, 28, and 66 Classes of Signs“, Semiotica 147(1/4), pp.165-184.
[14] Spencer Case, 2014. “The Man with a Kink in His Brain,” from online National Review, July 21, 2014. “Over the course of Peirce’s life, that kinky brain produced a total of about 12,000 printed pages and 80,000 handwritten pages. The Peirce Edition Project, founded in 1976, is still organizing and editing the massive Peirce corpus. So far, Indiana University Press has published seven volumes of his writings — of an expected thirty.”
[15] Robert Burch has called Peirce’s ideas of “indecomposability” the ‘Reduction Thesis’; see Robert Burch, 1991. A Peircean Reduction Thesis: The Foundations of Topological Logic, Texas Tech University Press, Lubbock, TX. Peirce’s reduction thesis is never stated explicitly by Peirce, but is alluded to in numerous snippets. The basic thesis is that ternary relations suffice to construct arbitrary relations, but that not all relationscan be constructed from unary and binary relations alone.
[16] M.K. Bergman, 2016. “Re-thinking Knowledge Representation,” AI3:::Adaptive Information blog, March 14, 2016.
[17] Amongst many, see, for example, Janos J. Sarbo and József I. Farkas, 2013. “Towards Meaningful Information Processing: A Unifying Representation for Peirce’s Sign Types,” Signs-International Journal of Semiotics 7 (2013): 1-44. In that article, the authors state: ” . . . our model has the potential of representing three types of relation, consisting of 10, 28, and 66 elements, that are analogous to Peirce’s three classifications of signs. This implies the possibility of a common representation for Peirce’s different classifications.. . . By virtue of the above relation with Peircean semiotics, and because of the fundamental nature of signs, our approach has the potential for a uniform modeling of information processing in any domain, theoretically.” Two other researchers of Peircean signs are, for example, P. Farias and J. Queiroz, 2003. “On Diagrams for Peirce’s 10, 28, and 66 Classes of Signs”, Semiotica 147(1/4), pp.165-184. Also, the Web site Minute Semiotic is dedicated to one interpretation of Peirce’s signs, including interactive descriptions (from the author’s perspective) of the 66 Peircean signs.
[18] David Savan, 1987-1988. “An Introduction to C.S.Peirce’s Full System of Semeiotic,” Monograph Series of the Toronto Semiotic Circle. Vol. 1. 
[19] Table sources and the order of presentation very roughly move from the primitive to the more complex and elaborative.
[20] The idea of Firstness may range from something like an energetic input that causes chemicals to combine into a new structured form or ordered state to something like a new recognition in the mind occasioned by a flick of the eye or a shifting thought. The representamen is merely a potential sign until it is energized or intrudes on consciousness, wherein the object is now made apparent as interpreted. The process of reifying the sign itself produces a new reality, its Thirdness, which can then become a subject of the sign-recognizing process in its own right. In this regard, Peirce was formulating a theory of signs that could describe how more order may occur in the world, including the formation and evolution of the cosmos and the initial origins of life.
[21] As one example, Peirce states [7]: “. . . it may be quite impossible to draw a sharp line of demarcation between two classes, although they are real and natural classes in strictest truth. Namely, this will happen when the form about which the individuals of one class cluster is not so unlike the form about which individuals of another class cluster but that variations from each middling form may precisely agree.” (Peirce CP 1.208)
[22] Semiosis has been viewed my many as applicable to a wide variety of domains such as animal calls and language, the chemical and energetic origin of life, evolution, and language analysis and parsing. The linkage of these ideas to Peirce results from his statement such as [7]: “Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there. . . . Not only is thoughtin the organic world, but it develops there.” (Peirce CP 4.551)
[23] Nathan Houser, 1992. “On Peirce’s Theory of Propositions: A Response to Hilpinen.” Transactions of the Charles S. Peirce Society 28, no. 3 (1992): 489-504.
[24] See, for example, Gary Richmond’s trikonic approach: Gary Richmond, 2005. “Outline of trikonic Diagrammatic Trichotomic,” in: F. Dau, M.L. Mugnier, and G. Tumme, ed., Conceptual Structures: Common Semantics for Sharing Knowledge: 13th International Conference on Conceptual Structures, ICCS 2005, Kassel, Germany, 17–22 July 2005. Springer-Verlag GmbH, pp. 453 – 466.
[25] See, for example, one of the earlier examples, John F. Sowa, 1991. “Toward the Expressive Power of Natural Language.” Principles of Semantic Networks (1991): 157-189.
Posted:March 14, 2016

AI3 PulseA New Era in Artificial Intelligence Will Open Pandora’s Box

Here’s a prediction: the new emphasis on artificial intelligence and robotics will occasion some new looks at knowledge representation. Prior to the past few years many knowledge representation (KR) projects have been more in the way of prototypes or games. But, now that we are seeing real robotics and knowledge-based AI activities take off, some of the prior warts and problems of leading KR approaches are starting to become evident.

For example, for years major upper-level ontologies have tended to emphasize dichotomous splits in how to “model” the world, including:

  • abstract-physical — a split between what is fictional or conceptual and what is tangibly real
  • occurrent-continuant — a split between a “snapshot” view of the world and its entities versus a “spanning” view that is explicit about changes in things over time
  • perduant-endurant — a split for how to regard the identity of individuals, either as a sequence of individuals distinguished by temporal parts (for example, childhood or adulthood) or as the individual enduring over time
  • dependent-independent — a split between accidents (which depend on some other entity) and substances (which are independent)
  • particulars-universals — a split between individuals in space and time that cannot be attributed to other entities versus abstract universals such as properties that may be assigned to anything
  • determinate-indeterminate.

Since the mid-1980s, description logics have also tended to govern most KR languages, and are the basis of the semantic Web data model and languages of RDF and OWL. (However, common logic and its dialects are also used as a more complete representation of first-order logic.) The trade-off in KR language design is one of expressiveness versus complexity.

Cyc was developed as one means to address a gap in standard KR approaches: how to capture and model common sense. Conceptual graphs, formally a part of common logic, were developed to handle n-ary relationships and the questions of sign processes (semiosis), fallibility and processes of pragmatic learning.

Zhou offers a new take on an old strategy to KR, which is to use set theory as the underlying formalism [1]. This first paper deals with the representation itself; a later paper is planned on reasoning.

We do not live in a dichotomous world. And, I personally find Charles Peirce’s semeiosis to be a more compelling grounding for what a KR design should look like. But as Zhou points out, and is evident in current AI advances, robotics and the need for efficient, effective reasoning are testing today’s standards in knowledge representation as never before. I suspect we are in for a period of ferment and innovation as we work to get our KR languages up to task.


[1] Yi Zhou, 2016. “A Set Theoretic Approach for Knowledge Representation: the Representation Part,” arXiv:1603.03511, 14 Mar 2016.
Posted:March 8, 2016

Download as PDF

Fine-grained EntitiesA Typology Design Aids Continuous, Logical Typing

Entity recognition or extraction is a key task in natural language processing and one of the most common uses for knowledge bases. Entities are the unique, individual things in the world, and are also sometimes used to characterize some concepts [1]. Context plays an essential role in entity recognition. In general terms we may refer to a thing such as a camera; but a photographer may want more fine-grained distinctions such as SLR cameras or further sub-types like digital SLR cameras or even specific models like the Canon EOS 7D Mark II or even the name of the photographer’s favorite camera, such as ‘Shutter Sue‘. Capitalized names (as is the reference source for named entity recognition) often signals we are dealing with a true individual entity, but again, depending on context, a named automobile such as Chevy Malibu may refer to a specific car or to the entire class of Malibu cars.

The “official” practice of named entity recognition began with the Message Understanding Conferences, especially MUC-6 and MUC-7, in 1995 and 1997. These conferences began competitions for finding “named entities” as well as the practice of in-line tagging [2]. Some of these accepted ‘named entities‘ are also written in lower case, with examples such as rocks (‘gneiss’) or common animals or plants (‘daisy’) or chemicals (‘ozone’) or minerals (‘mica’) or drugs (‘aspirin’) or foods (‘sushi’) or whatever. Some deference was given to the idea of Kripke’s “rigid designators” as providing guidance for how to identify entities; rigid designators include proper names as well as certain natural kinds of terms like biological species and substances. Because of these blurrings, the nomenclature of “named entities” began to fade away. Some practitioners still use the term of named entities, though for some of the reasons outlined in this paper, Structured Dynamics prefers simply to use entity.

Much has changed in the twenty years since the seminal MUC conferences regarding entity recognition and characterization. We are learning to adopt a very fine-grained approach to entity types and a typology design suited to interoperating (“bridging”) over a broad range of viewpoints and contexts. Most broadly, the idea of fine-grained entity types has led us to a logically grounded typology design.

The Growing Trend to Fine-Grained Entity Types

Beginning with the original MUC conferences, the initial entity types tested and recognized were for person, organization, and location names [3]. However, it did not take long for various groups and researchers to want more entity types, more distinctions. BBN categories, proposed in 2002, were used for question answering and consisted of 29 types and 64 subtypes [4]. Sekine put forward and refined over many years his Extended Entity Types, which grew to about 200 types [5], as shown in this figure:

Sekine Extended Entity Types

Sekine Extended Entity Types

These ideas of extended entity types helped inform a variety of tagging services over the past decade, notably including OpenCalais, Zemanta, AlchemyAPI, and OpenAmplify, among others. Moreover the research community also expanded its efforts into more and more entity types, or what came to be known as fine-grained entities [6].

Some of these produced more formal organizations of entity type classifications. This one, from Ling and Weld proposed 112 entity types in 2012 [7]:

Ling 112 Entity Types

Ling 112 Entity Types

Another one, from Gillick et al. in 2014 proposed 86 entity types [8], organized, in part, according to the same person, organization, and location types from the earliest MUC conferences:

Gillick 86 Entity Types

Gillick 86 Entity Types

These efforts are also notable because machine learners have been trained to recognize the types shown. What entity types are covered, the different conceptions of the world, and how to organize entity types varies broadly across these references.

The complement to entity extraction for unstructured text is to label the text in the first place. For this, a number of schema presently exist that provide vocabularies of entity types and standard means for tagging text. These include:

  • DBpedia Ontology: 738 types [9]
  • schema.org: 636 types [10]
  • YAGO: 505 types; see also HYENA [11]
  • GeoNames: 654 “feature codes” [12]

In Structured Dynamics’ own work, we have mapped the UMBEL knowledge graph against Wikipedia content and found that 25,000 nodes, or more than 70 percent of its 35,000 reference concepts, correspond to entity types [13]. These mappings provide typing connections for millions of Wikipedia articles. The typing and organization of entity types thus appears to be of enormous importance in modeling and leveraging the use of knowledge bases.

When we track the coverage of entity types over the past two decades we see logarithmic growth [13]:

Growth in Recognition of Entity Types

Growth in Recognition of Entity Types

This growth in entity types comes from wanting to describe and organize things with more precision. Tagging and extracting structured information from text are obviously a key driver. Yet, for a given enterprise, what is of interest — and at what depth — for a particular task varies widely.

The fact that knowledge bases, such as Wikipedia (but, the lesson applies to domain-specific ones as well), can be supported by entity-level information for literally thousands of entity types means that rich information is available for driving the finest of fine-grained entity extractors. To leverage this raw, informational horsepower it is essential to have a grounded understanding of what an entity is, how to organize them into logical types, and an intensional understanding of the attributes and characteristics that allow inferencing to be conducted over these types. These understandings, in turn, point to the features that are useful to machine learners for artificial intelligence. These understandings also can inform a flexible design for accommodating entity types from coarse- to fine-grained, with variable depth depending on the domain of interest.

Natural Classes and Typologies

We take a realistic view of the world. That is, we believe that what we perceive in the world is real — it is not just a consequence of what we perceive and can be aware of in our minds [14] — and that there are forces and relationships in the world independent of us as selves. Realism is a longstanding tradition in philosophy that extends back to Aristotle and embraces, for example, the natural classification systems of living things as espoused by taxonomists such as Agassiz and Linnaeus.

Charles Sanders Peirce, an American logician and scientist of the late 19th and early 20th centuries, embraced this realistic philosophy but also embedded it in a belief that our understanding of the world is fallible and that we needed to test our perceptions via logic (the scientific method) and shared consensus within the community. His overall approach is known as pragmatism and is firmly grounded in his views of logic and his theory of signs (called semiotics or semeiotics). While there is absolute truth, it actually acts more as a limit, to which our seeking of additional knowledge and clarity of communication with language continuously approximates. Through the scientific method and questioning we get closer and closer to the truth and to an ability to communicate it to one another. But new knowledge may change those understandings, which in any case will always remain proximate.

Peirce’s own words can better illustrate his perspective [15], some of which I have discussed elsewhere under his idea of “natural classes” [16]:

“Thought is not necessarily connected with a brain. It appears in the work of bees, of crystals, and throughout the purely physical world; and one can no more deny that it is really there, than that the colors, the shapes, etc., of objects are really there.” (Peirce CP 4.551)

“What if we try taking the term “natural,” or “real, class” to mean a class of which all the members owe their existence as members of the class to a common final cause? This is somewhat vague; but it is better to allow a term like this to remain vague, until we see our way to rational precision.” (Peirce CP 1.204)

“. . . it may be quite impossible to draw a sharp line of demarcation between two classes, although they are real and natural classes in strictest truth. Namely, this will happen when the form about which the individuals of one class cluster is not so unlike the form about which individuals of another class cluster but that variations from each middling form may precisely agree.” (Peirce CP 1.208)

“When one can lay one’s finger upon the purpose to which a class of things owes its origin, then indeed abstract definition may formulate that purpose. But when one cannot do that, but one can trace the genesis of a class and ascertain how several have been derived by different lines of descent from one less specialized form, this is the best route toward an understanding of what the natural classes are.” (Peirce CP 1.208)

“The descriptive definition of a natural class, according to what I have been saying, is not the essence of it. It is only an enumeration of tests by which the class may be recognized in any one of its members. A description of a natural class must be founded upon samples of it or typical examples.” (Peirce CP 1.223)

“Natural classes” thus are a testable means to organize the real objects in the world, the individual particulars of what we call “entities”. In Structured Dynamics’ usage, we define an entity as something that is an individual object, either real or mental such as an idea, either a part or a whole, and that has:

  • identity, which can be referred to via symbolic names
  • context in relation to other objects, and
  • characteristic attributes, with some expressing the essence of what type of object it is.

The key to classification of entities into categories (or “types” as we use herein) is based on this intensional understanding of attributes. Further, Peirce was expansive in his recognition of what kinds of objects could be classified, specifically including ideas, with application to areas such as social classes, man-made objects, the sciences, chemical elements and living organisms [17]. Again, here are some of Peirce’s own words on the classification of entities [15]:

“All classification, whether artificial or natural, is the arrangement of objects according to ideas. A natural classification is the arrangement of them according to those ideas from which their existence results.” (Peirce CP 1.231)

“The natural classification of science must be based on the study of the history of science; and it is upon this same foundation that the alcove-classification of a library must be based.” (Peirce CP 1.268)

“All natural classification is then essentially, we may almost say, an attempt to find out the true genesis of the objects classified. But by genesis must be understood, not the efficient action which produces the whole by producing the parts, but the final action which produces the parts because they are needed to make the whole. Genesis is production from ideas. It may be difficult to understand how this is true in the biological world, though there is proof enough that it is so. But in regard to science it is a proposition easily enough intelligible. A science is defined by its problem; and its problem is clearly formulated on the basis of abstracter science.” (Peirce CP 1.227)

A natural classification system is one, then, that logically organizes entities with shared attributes into a hierarchy of types, with each type inheriting attributes from its parents and being distinguished by what Peirce calls its final cause, or purpose. This hierarchy of types is thus naturally termed a typology.

An individual that is a member of a natural class has the same kinds of attributes as other members, all of which share this essence of the final cause or purpose. We look to Peirce for the guidance in this area because his method of classification is testable, based on discernable attributes, and grounded in logic. Further, that logic is itself grounded in his theory of signs, which ties these understandings ultimately to natural language.

Logic and the Typology Design

Unlike more interconnected knowledge graphs (which can have many network linkages), typologies are organized strictly along these lines of shared attributes, which is both simpler and provides an orthogonal means for investigating type class membership. Further, because the essential attributes or characteristics across entities in an entire domain can differ broadly — such as living v inanimate things, natural things v man-made things, ideas v physical objects, etc. — it is possible to make disjointedness assertions between entire groupings of natural entity classes. Disjoint assertions combined with logical organization and inference mean a typology design that lends itself to reasoning and tractability.

The idea of nested, hierarchical types organized into broad branches of different entity typologies also provides a very flexible design for interoperating with a diversity of world views and degrees of specificity. The photographer, as I discussed above, is interested in different camera types and even how specific cameras can relate to a detailed entity typing structure. Another party more interested in products across the board may have a view to greater breadth, but lesser depth, about cameras and related equipment. A typology design, logically organized and placed into a consistent grounding of attributes, can readily interoperate with these different world views.

A typology design for organizing entities can thus be visualized as a kind of accordion or squeezebox, expandable when detail requires, or collapsed to more coarse-grained when relating to broader views. The organization of entity types also has a different structure than the more graph-like organization of higher-level conceptual schema, or knowledge graphs. In the cases of broad knowledge bases, such as UMBEL or Wikipedia, where 70 percent or more of the overall schema is related to entity types, more attention can now be devoted to aspects of concepts or relations.

The idea that knowledge bases can be purposefully crafted to support knowledge-based artificial intelligence, or KBAI, flows from these kinds of realizations. We begin to see that we can tease out different aspects of a knowledge base, each with its own logic and relation to the other aspects. Concepts, entities, attributes and relations — including the natural classes or types that can logically organize them — all deserve discrete attention and treatment.

Peirce’s consistent belief that the real world can be logically conceived and organized provides guidance for how we can continue to structure our knowledge bases into computable form. We now have a coherent base for treating entities and their natural classes as an essential component to that thinking. We can continue to be more fine-grained so long as there are unique essences to things that enable them to be grouped into natural classes.


[1] The role for the label “entity” can also refer to what is known as the root node in some systems such as SUMO (see also http://virtual.cvut.cz/kifb/en/toc/229.html). In the OWL language and RDF data model we use, the root node is known as “thing”. Clearly, our use of the term “entity” is much different than SUMO and resides at a subsidiary place in the overall TBox hierarchy. In this case, and frankly for most semantic matches, equivalences should be judged with care, with context the crucial deciding factor.
[2] N. Chinchor, 1997. “Overview of MUC-7,” MUC-7 Proceedings, 1997.
[3] While all of these are indeed entity types, the early MUCs also tested dates, times, percentages, and monetary amounts.
[4] Ada Brunstein, 2002. “Annotation Guidelines for Answer Types”. LDC Catalog, Linguistic Data Consortium. Aug 3, 2002.
[5] See the Sekine Extended Entity Types; the listing also includes attributes info at bottom of source page.
[6] For example, try this query, https://scholar.google.com/scholar?q=”fine-grained+entity”, also without quotes.
[7] Xiao Ling and Daniel S. Weld, 2012. “Fine-Grained Entity Recognition,” in AAAI. 2012.
[8] Dan Gillick, Nevena Lazic, Kuzman Ganchev, Jesse Kirchner, and David Huynh, 2104. “Context-Dependent Fine-Grained Entity Type Tagging,” arXiv preprint arXiv:1412.1820 (2014).
[9] Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann, 2009. “DBpedia-A Crystallization Point for the Web of Data.” Web Semantics: science, services and agents on the world wide web 7, no. 3 (2009): 154-165; 170 classes in this paper. That has grown to more than 700; see http://mappings.dbpedia.org/server/ontology/classes/ and http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics.
[10] The listing is under some dynamic growth. This is the official count as of September 8, 2015, from http://schema.org/docs/full.html. Current updates are available from Github.
[11] Joanna Biega, Erdal Kuzey, and Fabian M. Suchanek, 2013. “Inside YAGO2: A Transparent Information Extraction Architecture,” in Proceedings of the 22nd international conference on World Wide Web, pp. 325-328. International World Wide Web Conferences Steering Committee, 2013. Also see Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum, 2012. “HYENA: Hierarchical Type Classification for Entity Names,” in Proceedings of the 24th International Conference on Computational Linguistics, Coling 2012, Mumbai, India, 2012.
[13] This figure and some of the accompanying text comes from a prior article, M.K. Bergman, “Creating a Platform for Machine-based Artificial Intelligence“, AI3:::Adaptive Information blog, September 21, 2015.
[14] Realism is often contrasted to idealism, nominalism or conceptualism, wherein how the world exists is a function of how we think about or name things. Descartes, for example, summarized his conceptualist view with his aphorism “I think, therefore I am.”
[15] See the electronic edition of The Collected Papers of Charles Sanders Peirce, reproducing Vols. I-VI, Charles Hartshorne and Paul Weiss, eds., 1931-1935, Harvard University Press, Cambridge, Mass., and Arthur W. Burks, ed., 1958, Vols. VII-VIII, Harvard University Press, Cambridge, Mass. The citation scheme is volume number using Arabic numerals followed by section number from the collected papers, shown as, for example, CP 1.208.
[16] M.K. Bergman, 2015. “‘Natural’ Classes in the Knowledge Web“, AI3:::Adaptive Information blog, July 13, 2015.
[17] See, for example, Menno Hulswit, 2000. “Natural Classes and Causation“, in the online Digital Encyclopedia of Charles S. Peirce.