Posted:November 7, 2022

CognontoMy Formal Companies Come to an End

Closing the doors on a virtual company almost seems like a category error. But metaphors by definition are not literal so I guess it all still fits.

The last time I had a third party sign my pay check was back in 1989, when I left the American Public Power Association to found the first of a series of what turned out to be six different companies, some of which overlapped. With the last, Cognonto, only recently providing my individual consulting services, there is no longer any need to keep the doors open with the requisite federal and state corporate and tax filings. Though I still do the occasional consulting gig, I can just as easily do that as a self-employed individual rather than under some corporate banner.

Skinnying down this complexity brings a sense of relieved joy, like paying off student loans or home mortgages. It feels good.

KBpedia will remain active. In fact, some exciting developments are occurring around this top-level knowledge graph. While my postings will likely remain infrequent, it will be exciting once we can speak of new developments on the KBpedia horizon.

As I wish Cognonto a fond adieu, I’d also like the thank the dozens of co-workers and colleagues that I have had the privilege of working with over the tenure of my various companies. Thanks for the great ride!

Posted:July 13, 2020

KBpediaNew KBpedia ID Property and Mappings Added to Wikidata

Wikidata editors last week approved adding a new KBpedia ID property (P8408) to their system, and we have therefore followed up by adding nearly 40,000 mappings to the Wikidata knowledge base. We have another 5000 to 6000 mappings still forthcoming that we will be adding in the coming weeks. Thereafter, we will continue to increase the cross-links, as we partially document below.

This milestone is one I have had in my sights for at least the past few years. We want to both: 1) provide a computable overlay to Wikidata; and 2) increase our dependence and use of Wikidata’s unparalleled structured data resources as we move KBpedia forward. Below I give a brief overview of the status of Wikidata, share some high-level views of our experience in putting forward and then mapping a new Wikidata property, and conclude with some thoughts of where we might go next.

The Status of Wikidata

Wikidata is the structured data initiative of the Wikimedia Foundation, the open-source group that oversees Wikipedia and many other notable Web-wide information resources. Since its founding in 2012, Wikidata’s use and prominence have exceeded expectations. Today, Wikidata is a multi-lingual resource with structured data for more than 95 million items, characterized by nearly 10,000 properties. Items are being added to Wikidata at a rate of nearly 5 million per month. A rich ecosystem of human editors and bots patrols the knowledge base and its entries to enforce data quality and consistency. The ecosystem includes tools for bulk loading of data with error checks, search including structured SPARQL queries, and navigation and visualization. Errors and mistakes in the data occur, but the system ensures such problems are removed or corrected as discovered. Thus, as data growth has occurred, so has quality and usefulness improved.

From KBpedia’s standpoint, Wikidata represents the most complete complementary instance data and characterization resource available. As such, it is the driving wheel and stalking horse (to mix eras and technologies) to guide where and how we need to incorporate data and its types. These have been the overall motivators for us to embrace a closer relationship with Wikidata.

As an open governance system, Wikidata has adopted its own data models, policies, and approval and ingest procedures for adopting new data or characterizations (properties). You might find it interesting to review the process and ongoing dialog that accompanied our proposal for a KBpedia ID as a property in Wikidata. As of one week ago, KBpedia ID was assigned Wikidata property P8408. To date, more than 60% of Wikidata properties have been such external identifiers, and IDs are the largest growing category of properties. Since most properties that relate to internal entity characteristics have already been identified and adopted, we anticipate mappings to external systems will continue to be a dominant feature of the growth in Wikidata properties to come.

Our Mapping Experience

There are many individuals that spend considerable time monitoring and overseeing Wikidata. I am not one of them. I had never before proposed a new property to Wikidata, and had only proposed one actual Q item (Q is the standard prefix for an entity or concept in Wikidata) for KBpedia prior to proposing our new property.

Like much else in the Wikimedia ecosystem, there are specific templates put in place for proposing a new Q item or P proposal (see the examples of external identifiers, here). Since there are about 10,000 times more Q items than properties, the path for getting a new property approved is more stringent.

Then, once a new property is granted, there are specific paths like QuickStatements or others that need to be followed in order to submit new data items (Q ids) or characteristics (property by Q ids). I made some newbie mistakes in my first bulk submissions, and fortunately had a knowledgeable administrator (@Epidosis) help guide me through making the fixes. For example, we had to back off about 10,000 updates because I had used the wrong form for referencing a claim. Once reclaimed, we were able to again upload the mappings.

As one might imagine, updates and changes are being submitted by multiple human agents and (some) bots at all times into the system. The facilities like QuickStatements are designed to enable batch uploads, and allow re-submissions due to errors. You might want to see what is currently active on the system by checking out this current status.

With multiple inputs and submitters, it takes time for large claims to be uploaded. In the case our our 40,000 mappings, we also accompanied each of those with a source and update data characterization, leading to a total upload of more than 120,000 claims. We split our submissions over multiple parts or batches, and then re-submitted if initial claims error-ed out (for example, if the base claim had not been fully registered, the next subsidiary claims might error due to lack of the registered subject; upon a second pass, the subject would be there and so no error). We ran our batches at off times for both Europe and North America, but the total time of the runs still took about 12 hours. 

Once loaded, the internal quality controls of Wikidata kick in. There are both bots and human editors that monitor concepts, both of which can flag (and revert) the mapping assignments made. After three days of being active on Wikidata, we had a dozen reverts of initial uploaded mappings, representing about 0.03% of our suggested mappings, which is gratifyingly low. Still, we expect to hear of more such errors, and we are committed to fix all identified. But, at this juncture, it appears our initial mappings were of pretty high quality.

We had a rewarding and learning experience in uploading mappings to Wikidata and found much good will and assistance from knowledgeable members. Undoubtedly, everything should be checked in advance to ensure quality assertions when preparing uploads to Wikidata. But, if done, the system and its editors also appear quite capable to identify and enforce quality control and constraints as encountered. Overall, I found the entire data upload process to be impressive and rewarding. I am quite optimistic of this ecosystem continuing to improve moving forward.

The result of our external ID uploads and mappings can be seen in these SPARQL queries regarding the KBpedia ID property on Wikidata:

As of this writing, the KBpedia ID is now about the 500th most prevalent property on Wikidata.

What is Next?

Wikidata is clearly a dynamic data environment. Not only are new items being added by the millions, but existing items are being better characterized and related to external sources. To deal with the immense scales involved requires automated quality checking bots with human editors committed to the data integrity of their domains and items. To engage in a large-scale mapping such as KBpedia also requires a commitment to the Wikidata ecosystem and model.

Initiatives that appear immediately relevant to what we have put in place relating to Wikidata include to:

  • Extend the current direct KBpedia mappings to fix initial mis-assignments and to extend coverage to remaining sections of KBpedia
  • Add additional cross-mappings that exist in KBpedia but have not yet been asserted in Wikidata (for example, there are nearly 6,000 such UNSPSC IDs)
  • Add equivalent class (P1709) and possible superproperties (P2235) and subproperties (P2236) already defined in KBpedia
  • Where useful mappings are desirable, add missing Q items used in KBpedia to Wikidata
  • And, most generally, also extend mappings to the 5,000 or so shared properties between Wikidata and KBpedia.

I have been impressed as a user of Wikidata for some years now. This most recent experience also makes me enthused about contributing data and data characterizations directly.

To Learn More

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s writings related to knowledge representation. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is partially sponsored by Cognonto Corporation. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:June 15, 2020

KBpediaNew Version Finally Meets the Hurdle of Initial Vision

I am pleased to announce that we released a powerful new version of KBpedia today with e-commerce and logistics capabilities, as well as significant other refinements. The enhancement comes from adding the United Nations Standard Products and Services Code as KBpedia’s seventh core knowledge base. UNSPSC is a comprehensive and logically organized taxonomy for products and services, organized into four levels, with third-party crosswalks to economic and demographic data sources. It is a leading standard for many industrial, e-commerce, and logistics applications.

This was a heavy lift for us. Given the time and effort involved, Fred Giasson, KBpedia’s co-editor, and I decided to also tackle a host of other refinements we had on our plate. All told, we devoted many thousands of person-hours and more than 200 complete builds from scratch to bring this new version to fruition. Proudly I can say that this version finally meets the starting vision we had when we first began KBpedia’s development. It is a solid baseline to build from for all sorts of applications and to make broad outreach for adoption in 2020. Because of the extent of changes in this new version, we have leapfrogged KBpedia’s version numbering from 2.21 to 2.50.

KBpedia is a knowledge graph that provides a computable overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata, GeoNames, DBpedia, schema.org, OpenCyc, and, now, UNSPSC. KBpedia now contains more than 58,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases with the twin goals of data interoperability and knowledge-based artificial intelligence (KBAI).

KBpedia is built from a expandable set of simple text ‘triples‘ files, specified as tuples of subject-predicate-object (EAVs to some, such as Kingsley Idehen) that enable the entire knowledge graph to be constructed from scratch. This process enables many syntax and logical tests, especially consistency, coherency, and satisfiability, to be invoked at build time. A build may take from one to a few hours on a commodity workstation, depending on the tests. The build process outputs validated ontology (knowledge graph) files in the standard W3C OWL 2 semantic language and mappings to individual instances in the contributing knowledge bases.

As Fred notes, we continue to streamline and improve our build procedures. Major changes like what we have just gone through, be it adding a main source like UNSPSC or swapping out or adding a new SuperType (or typology), often require multiple build iterations to pass the system’s consistency and satisfiability checks. We need these build processes to be as easy and efficient as possible, which also was a focus of our latest efforts. One of our next major objectives is to release KBpedia’s build and maintenance codes, perhaps including a Python option.

Incorporation of UNSPSC

Though UNSPSC is consistent with KBpedia’s existing three-sector economic model (raw products, manufactured products, services), adding it did require structural changes throughout the system. With more than 150,000 listed products and services in UNSPSC, incorporating it needed to balance with KBpedia’s existing generality and scope. The approach was to include 100% of the top three levels of UNSPSC — segments, families, and classes — plus more common and expected product and service ‘commodities’ in its fourth level. This design maintains balance while providing a framework to tie-in any remaining UNSPSC commodities of interest to specific domains or industries. This approach led to integrating 56 segments, 412 families, 3700+ classes, and 2400+ commodities into KBpedia. Since some 1300 of these additions overlapped with existing KBpedia reference concepts, we checked, consolidated, and reconciled all duplicates.

We fully specified and integrated all added reference concepts (RCs) into the existing KBpedia structure, and then mapped these new RCs to all seven of KBpedia’s core knowledge bases. Through this process, for example, we are able to greatly expand the coverage of UNSPSC items on Wikidata from 1000 or so Q (entity) identifiers to more than 6500. Contributing such mappings back to the community is another effort our KBpedia project will undertake next.

Lastly with respect to UNSPSC, I will be providing a separate article on why we selected it as KBpedia’s products and services template, and how we did the integration and what we found as we did. For now, the quick point is that UNSPSC is well-structured and organized according to the three-sector model of the economy, which matches well with Peirce’s three universal categories underlying our design of KBpedia.

Other Major Refinements

These changes were broad in scope. Effecting them took time and broke open core structures. Opportunities to rebuild the structure in cleaner ways arise when the Tinkertoys get scattered and then re-assembled. Some of the other major refinements the project undertook during the builds necessary to create this version were to:

  • Further analyze and refine the disjointedness between KBpedia’s 70 or so typologies. Disjoint assertions are a key mechanism for sub-set selections, various machine learning tasks, querying, and reasoning
  • Increase the number of disjointedness assertions 62% over the prior version, resulting in better modularity. (However, note the actual RCs affected by these improvements is lower than this percentage since many were already specified in prior disjoint pools)
  • Add 37% more external mappings to the system (DBpedia and UNSPSC, principally)
  • Complete 100% of the definitions for RCs across KBpedia
  • Greatly expand the altLabel entries for thousands of RCs
  • Improve the naming consistency across RC identifiers
  • Further clean the structure to ensure that a given RC is specified only once to its proper parent in an inheritance (subsumption) chain, which removes redundant assertions and improves maintainability, readability, and inference efficiency
  • Expand and update the explanations within the demo of the upper KBpedia Knowledge Ontology (KKO) (see kko-demo.n3). This non-working ontology included in the distro makes it easier to relate the KKO upper structure to the universal categories of Charles Sanders Peirce, which provides the basic organizational framework for KKO and KBpedia, and
  • Integrate the mapping properties for core knowledge bases within KBpedia’s formal ontology (as opposed to only offering as separate mapping files); see kbpedia-reference-concepts-mappings.n3 in the distro.

Current Status of the Knowledge Graph

The result of these structural and scope changes was to add about 6,000 new reference concepts to KBpedia, then remove the duplicates, resulting in a total of more than 58,200 RCs in the system. This has increased KBpedia’s size about 9% over the prior release. KBpedia is now structured into about 73 mostly disjoint typologies under the scaffolding of the KKO upper ontology. KBpedia has fully vetted, unique mappings (nearly all one-to-one) to these key sources:

  • Wikipedia – 53,323 (including some categories)
  • DBpedia – 44,476
  • Wikidata – 43,766
  • OpenCyc – 31,154
  • UNSPSC – 6,553
  • schema.org – 842
  • DBpedia ontology – 764
  • GeoNames – 680
  • Extended vocabularies – 249.

The mappings to Wikidata alone link to more than 40 million unique Q instance identifiers. These mappings may be found in the KBpedia distro. Most of the class mapping are owl:equivalentClass, but a minority may be subClass or superClass or isAbout predicates as well.

KBpedia also includes about 5,000 properties, organized into a multi-level hierarchy of attributes, external relations, and representations, most derived from Wikidata and schema.org. Exploiting these properties and sub-properties is also one of the next priorities for KBpedia.

To Learn More

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reach-throughs are straightforward to construct.) See further the Github site for further downloads.

KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is partially sponsored by Cognonto Corporation. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:December 4, 2019

KBpediaVersion 2.20 of the Knowledge Graph Now Prepped for Release on Public Repositories

Fred Giasson and I, as co-editors, are pleased to announce today the release of version 2.20 of the open-source KBpedia system. KBpedia is a knowledge graph that provides an overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, and OpenCyc. KBpedia contains more than 53,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases.

We are preparing to register KBpedia on many public repository sites, and we wanted to make sure quality was a high as possible as we begin this process. Since KBpedia is a system built from many constituent knowledge bases, duplicates and inconsistencies can arise when combining them. The rationale for this release was to conduct a comprehensive manual review to identify and remove most of these issues.

We made about 10,000 changes in this newest release. The major changes we made to KBpedia resulting from this inspection include:

  • Removal of about 2,000 reference concepts (RCs) and their mappings and definitions pertaining to individual plant and animal species, which was an imbalance in relation to the other generic RCs in the system;
  • Manual inspection and fixes to the 70 or so typologies (for instance, Animals or Facilities) that are used to cluster the RCs into logical groupings;
  • Removal of references to UMBEL, one of KBpedia’s earlier constituent knowledge bases, due to retirement of the UMBEL system;
  • Fixes due to user comments and suggestions since the prior release of version 2.10 in April 2019; and
  • Adding some select new RCs in order to improve the connectivity and fill gaps with the earlier version.

Without a doubt this is now the cleanest and highest quality release for the knowledge graph. We are now in position to extend the system to new mappings, which will be the focus of future releases. (Expect the next after the first of the year.) The number and structure of KBpedia’s typologies remain unchanged from prior versions. The number of RCs now stands at 53,465, smaller than the 55,301 reference concepts in the prior version.

Besides combining the six major public knowledge bases of Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, and OpenCyc, KBpedia includes mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is sponsored by Cognonto Corporation.

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates (or relations) based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce‘s prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project’s Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.) See further the Github site for further downloads. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Posted:July 24, 2019

AI3 Pulse

I would like to thank Andreas Blumauer of the Semantic Web Company and the SEMANTiCS Conference for their recent interview with me. The questions covered the waterfront of topics related to knowledge representation and semantic artificial intelligence. The emphasis was on my recent book, A Knowledge Representation Practionary, the polymath who stimulated that book, Charles Sanders Peirce, and our open-source KBpedia knowledge graph and base. 

Andreas, who conducted the interview, is always good at teasing out where people think the industry and technology are headed. This interview was no exception. Knowledge graphs and their emerging role in knowledge-based AI have been common themes across multiple recent conferences.

I’d also like to thank Alan Morrison for his keynote at the prior 2018 conference, and his kind mention of our work. Alan and Andreas are two of the most effective spokespeople around for the practical trends currently shaping our industry.

There are rumors that the SEMANTiCS conference may be coming to North America in 2020. Let’s hope the organizers see fit to continue to spread their wonderful meeting to our side of the pond.

You can see my interview in full here.