Posted:November 17, 2016

AI3 PulseInvited Article Provides Good KBAI Summary

I am pleased to point to a new invited article, “Wrestling Knowledge into Computable Intelligence,” published today on ODBMS.org. My article provides a high-level summary of recent trends in knowledge-based artificial intelligence and the mindsets and designs necessary to move KBAI forward. I think the article provides a pretty good summary (if I say so myself!) of the approach we take at Cognonto.

I’d like to thank Roberto Zicari for the invite to write in a style and brevity not typical of my normal articles. 😉  Enjoy!

Posted:November 15, 2016

CognontoUpper Structure, Typologies Updated

Cognonto today released version 1.10 of its KBpedia knowledge structure. KBpedia integrates six major knowledge bases (Wikipedia, Wikidata, OpenCyc, GeoNames, DBpedia and UMBEL), plus mappings to another 20 leading knowledge vocabularies, under the KBpedia Knowledge Ontology (KKO). KBpedia’s explicit purpose is to provide a foundation for knowledge-based artificial intelligence (KBAI) by supporting the (nearly) automatic creation of training corpuses and positive and negative training sets and feature sets for deep, unsupervised and supervised machine learning.

This new release focused on two major updates. First, certain aspects of the upper structure of the KKO were streamlined. And, second, KBpedia’s core typologies, which capture the overwhelming majority of reference concepts that are classified as entity types, were further organized to create tighter taxonomic structures.

The upper portion of the KBpedia knowledge graph required cleanup because it was still using some of the abstract-tangible distinctions used in Cyc. These distinctions were no longer used with the adoption of the universal categories of Charles S. Peirce (see my earlier article for more on this architectural design). This cleanup resulted in removing nearly 25% of the upper level links from the prior version (which were superfluous to the disjoint design of KBpedia). The typology organizations are part of an ongoing effort to streamline and tighten these structures.

Last week Cognonto’s CTO, Fred Giasson, described the general build processes we have in place for KBpedia. This release is another example of that process in action.

KBpedia contains nearly 40,000 reference concepts (RCs) and about 20 million entities. The combination of these and KBpedia’s structure results in over 6 billion logical connections across the system, as these KBpedia statistics show:

Measure Value
No KBpedia reference concepts (RCs) 39,052
No. mapped vocabularies 27
Core knowledge bases 6
Extended vocabularies 21
No. mapped classes 138,987
Core knowledge bases 137,322
Extended vocabularies 1,665
No. typologies (SuperTypes) 63
Core entity types 33
Other core types 5
Extended 25
Typology assignments 372,967
No. of “triples” in KBpedia ontology 1,347,818
No. aspects 80
Direct entity assignments 68,026,551
Inferred entity aspects 204,704,905
No. unique entities 19,643,718
Inferred no of entity mappings 2,541,684,526
Total no. of “triples” 3,689,849,183
Total no. of inferred and direct assertions 6,251,177,427
KBpedia v. 1.10 Statistics

This release of KBpedia is part of an ongoing series of releases to improve and extend the knowledge structure, as well as to increase its mappings to still additional external vocabularies. You can inspect the upper portions of the KBpedia knowledge graph on the Cognonto Web site. Also, if you have an ontology editor, you can download and inspect the open source KKO directly.

About Cognonto

The insight behind Cognonto is that existing knowledge bases can be staged to automate much of the tedium and reduce the costs now required to set up and train machine learners for knowledge purposes. Cognonto’s mission is to make knowledge-based artificial intelligence (KBAI) cheaper, repeatable, and applicable to enterprise needs.

Cognonto (a portmanteau of ‘cognition’ and ‘ontology’) exploits large-scale knowledge bases and semantic technologies for machine learning, data interoperability and mapping, and fact and entity extraction and tagging. Cognonto puts its insight into practice through a knowledge structure, KBpedia, designed to support AI, and a management framework, the Cognonto Platform, for integrating enterprise and external data to gain the advantage of KBpedia’s structure.

Cognonto automates away much of the tedium and reduces costs in many areas. Cognonto offers a number of use cases for how the Cognonto Platform and KBpedia in combination with enterprise information assets may be applied.

Posted:November 8, 2016

CognontoBuild and Testing Processes Are the Third Leg of Cognonto’s Capabilities

Knowledge is inherently dynamic and constantly changing. We learn new things; make connections between things that were previously hidden; revise our understandings in light of new discoveries; and embrace new domain relationships and facts. In the case of Cognonto‘s knowledge graph, KBpedia, and its six major contributing knowledge bases (KBs) and mappings to a further 20 ontologies, this dynamism takes place at warp speed. This dynamism is evident by simply noting the thousands of changes daily in each of Wikipedia and Wikidata, two of KBpedia’s major KBs.

Cognonto’s services are based on three capabilities. The first is KBpedia, which we have discussed elsewhere. The second is the Cognonto Platform, the means for accessing and using KBpedia in conjunction with enterprise or domain information. And the third are building and testing routines, scripts, logs and processes. It is the latter by which we keep KBpedia current, and is an essential infrastructure to our entire suite of services.

Cognonto’s CTO, Frédérick Giasson, has today published on LinkedIn an overview article on the principal components within this build and testing infrastructure. This is significant, but largely hidden, work. We have honed this infrastructure over a period of years, and are continuously adding to our roster of scripts and procedures.

What is remarkable about this infrastructure is the speed with which we can completely rebuild KBpedia from scratch (less than two hours) and the shortness of the entire cycle of producing a new major version of the system (less than two weeks). This infrastructure is all the more impressive when one considers that KBpedia has and maps to hundreds of thousands of concepts, millions of entities, and billions of assertions. Yet, despite this complexity, each new build of KBpedia is logically consistent, satisfiable, and coherent. Our build and testing scripts are what help ensure this quality.

Fred’s article explains this infrastructure in greater detail. Our build and testing infrastructure brings essential stability to Cognonto’s overall offerings. Great work, Fred!

Posted:October 24, 2016

CognontoSearch Both Reference Concepts and Entities

We have expanded the search function for Cognonto’s KBpedia knowledge graph. When first released last month, the KBpedia search was limited to reference concepts (RCs) only. With today’s upgrade, all of KBpedia’s 20 million entities can now be searched.

Though you can start investigating the Knowledge Graph (KG) simply by clicking links, to really discover items of specific interest you will need to do a search. These search functions for the Knowledge Graph are described below.

Two Search OptionThe Knowledge Graph (KG) may be searched for either reference concepts (RCs) or entities.

On the main KG page (and for all other pages in the KG section except for actual results pages), the search box has a dropdown list function next to the search button. Via this dropdown, you may select to search either Reference Concepts or Entities. Depending on your selection, that choice also shows in the title to the search box. Whatever choice you make in the dropdown list is retained until you select a different option. The default option when you first encounter the Knowledge Graph is to search Reference Concepts.

Once your search specification is entered in the search box, you must click the Search button to invoke the actual search.

If you pick the Reference Concepts main search option, there are two different behaviors available to you.

With Autocompletion

Concept Search AutocompletionNote our search box has Reference Concepts as its title. As you type in the search box, the system’s autocompletion will return RC candidates in a dropdown list. Only preferred labels (the canonical “name”) and terminal URI strings will match for autocomplete. Semsets and terms in the RCs description will not match.

Each newly entered character in the search box narrows the possible matching results. If your query string ceases prompting with RC candidates, there are no matches for the current substring query on preferred labels or URI fragments in the system.

There are some styling and naming conventions used to assign RC preferred labels and URI fragments you may observe over time that may make finding the right RC query more effective. Realize, however, that there are multiple ways RCs can be named and referenced. If autocomplete does not match what you think the RC name might be, try next the search without autocompletion (see next).

If you do get matches to the query string, you will be presented with one or more live link options to specific RCs in the dropdown list box. Pick the option you are seeking to go to the RC Record for your specific reference concept of interest.

Without Autocompletion

The search without autocompletion is a broader one, and is often useful when you have been unable to formulate an effective query for the preferred label of an RC of interest. To conduct a search without autocompletion, simply provide a query string in the search box without picking one of the autocompletion dropdown prompts (should they occur). Clicking the Search Concepts button will take you to a standard search of the concepts in KBpedia. Doing so brings you to a paginated set of search results pages, with 20 results per page.

Concept Search Options

Once you are on a results listing page, your search options change. Again you are given a dropdown list box, whereby you may restrict the actual concept search to one of these fields:

  • Preferred label — this is the title or standard name for the reference concept
  • Alternative labels — these are semset labels for synonyms for the RC
  • Description — this is the definition or description field for the reference concept; most RCs have this field
  • URI — this is the actual Web identifier for the RC; search is based on a sub-string search of the URI, or
  • All content — all four fields above are included in the search.

When you pick a result from the search list, you are taken to the RC Record report (see the How to Use the Knowledge Graph page).

The standard search box gives you a dropdown where you can choose to conduct an Entities or Reference Concepts search (see first figure above). If you choose Entities, that is so indicated in the label to the search button.

When the Entities search is chosen, you have a choice to restrict the actual search to one of these fields:

Entity Search Options

  • Preferred label — this is the title or standard name for the entity
  • Description — this is the definition or description field for the entity; not all entities have this field
  • URI — this is the actual Web identifier for the entity; search is based on a sub-string search of the URI, or
  • All content — all three fields above are included in the search.

Whichever of these four choices you make, the selection appears as the title to the search box and remains in effect until you change the dropdown option. The default when you first invoke the Entities search is ‘All’.

To actually conduct the search, you need to click on the Search Entities button.

Depending on which optional field you selected, the results count will vary. Obviously, the largest number of results arise from the ‘All’ field choice.

Entities Search Results

Search results for entities generally presents fewer details in the results than for RCs, as this figure shows:

Entities Search Results

 

Some results are limited to a title (prefLabel). Others may include altLabels or descriptions, depending on the nature of the record. Each result has an icon to the upper right indicating the source of the entity record. Clicking that icon (background highlighted text) presents the standard entity results listing as described on the How to Use the Knowledge Graph page.

Enjoy!

Posted by AI3's author, Mike Bergman Posted on October 24, 2016 at 8:45 am in KBpedia, Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/1991/kbpedia-knowledge-graph-gets-expanded-search/
The URI to trackback this post is: https://www.mkbergman.com/1991/kbpedia-knowledge-graph-gets-expanded-search/trackback/
Posted:October 17, 2016

CognontoExtending KBpedia with Domain and Enterprise Data is a Key Theme

Cognonto, our recently announced venture in knowledge-based artificial intelligence (KBAI), has just published three use cases. Two of these use cases are based on extending KBpedia with enterprise or domain data. KBpedia is the KBAI knowledge structure at the heart of Cognonto.

The Cognonto Web site contains longer descriptions of these use cases, with statistics and results where appropriate. We intend to continue to publish more use cases. Notable ones will be broadly announced.

Use Case #1: Word Embedding Corpuses

word2vec is an artificial intelligence ‘word embedding’ model that can establish similarities between terms. These similarities can be used to cluster or classify documents by topic, or to characterize them by sentiment, or for recommendations. The rich structure and entity types within Cognonto’s KBpedia knowledge structure can be used, with one or two simple queries, to create relevant domain “slices” of tens of thousands of documents and entities upon which to train word2vec models. This approach eliminates the majority of effort normally associated with word2vec for domain purposes, enabling available effort to be spent on refining the parameters of the model for superior results.

Some key findings are:

  • Domain-specific training corpuses work better with less ambiguity than general corpuses for these problems
  • Cognonto (through KBpedia) speeds and eases the creation of domain-specific training corpuses for word2vec (and other corpus-based models)
  • Other public and private text sources may be readily added to the KBpedia baseline in order to obtain still further domain-relevant models
  • Such domain-specific training corpuses can be used to establish similarity between local text documents or HTML web pages
  • This method can also be combined with Cognonto’s topics analyzer to first tag text documents using KBpedia reference concepts, and then inform or augment these domain-specific training corpuses, and
  • These capabilities enable rapid testing and refinement of different combinations of “seed” concepts to obtain better desired results.

Use Case #2: Integrating Private Data

KBpedia provides a rich set of 20 million entities in its standard configuration. However, by including relevant entity lists, which may already be in the possession of the enterprise or from specialty domain datasets, significant improvements can be achieved across all of the standard metrics used for entity recognition and tagging. Here is an example of the standard metrics applied by Cognonto in its efforts:

confusion-matrix-wikipedia.png

Cognonto’s standard methodology also includes the creation of reference, or “gold standards”, for measuring the benefits of adding more data or performing other tweaks on the entity extraction algorithms.

Some key findings from this use case in adding private data to KBpedia include:

  • In the example used, adding private enterprise data results in more than a doubling of accuracy (108%) over the standard, baseline KBpedia for identifying the publishing organization of a Web page
  • Some datasets may have a more significant impact than others, but overall, each dataset contributes to the overall improvements of the predictions. Generally adding more data improves results across all measured metrics
  • “Gold standards” are an essential component for testing the value of adding specific datasets or refining machine learning parameters
  • Approx. 500 training instances are sufficient to build a useful “gold standard” for entity tagging; negative training examples are also advisable
  • Even if all specific entities are not identified, flagging a potential “unknown” entity is an important means for targeted next efforts of adding to the current knowledge base
  • KBpedia is a very useful structure and starting point for an entity tagging effort, but that adding domain data is probably essential to gain the overall accuracy desired for enterprise requirements, and
  • This use case is broadly applicable to any entity recognition and tagging initiative.

Use Case #3: The Cognonto Mapper

The Cognonto Mapper includes standard baseline capabilities found in other mappers such as string and label comparisons, attribute comparisons, and the like. But, unlike conventional mappers, the Cognonto Mapper is able to leverage both the internal knowledge graph structure and its use of typologies (most of which do not overlap with one another) to add structural comparators as well. These capabilities lead to more automation at the front end of generating good, likely mapping candidates, leading to faster acceptance by analysts of the final mappings. This approach is in keeping with Cognonto’s philosophy to emphasize “semi-automatic” mappings that combine fast final assignments with the highest quality. Maintaining mapping quality is the sine qua non of knowledge-based artificial intelligence.

Some key findings from this use case are:

  • A capable mapper, including structural considerations, is essential for quickly generating mapping candidates for entities, concepts, vocabularies, schema and ontologies
  • A capable mapper, including structural considerations, is essential for accurate mapping candidates for entities, concepts, vocabularies, schema and ontologies
  • Quality mappings require manual vetting for final acceptance
  • The Cognonto Mapper generates quick and accurate candidates for linking entities, concepts, vocabularies, schema or ontologies
  • The Cognonto Mapper can generate “gold standards” in 15% of the time of standard approaches
  • The Cognonto Mapper scoring can help point to likely mappings even where the candidates are ambiguous
  • The Cognonto Mapper can identity likely, but unknown, entities for inspection and commitment to the knowledge base, and
  • Other structural aspects of KBpedia, such as aspects, relations or attributes, can inform other comparators and mappers.

See the original use case links for further details, code examples, and results and statistics. As noted, we will announce additional use cases as they are published.