Posted:September 1, 2020

Making the Transition to Methods and Modules

With this installment, we transition to our third major part in our Cooking with Python and KBpedia series. We have evaluated and decided upon our alternatives, then installed and configured them while gaining some exposure, and now are transitioning to applying those tools to developing our first methods. This transition will culminate with us packaging our first module in the KBpedia system, in the process beginning to undertake bulk modifications. These bulk capabilities are at the heart of adopting and then extending KBpedia for your own domain purposes. Of course, in still later installments, we will probe more advanced methods and capabilities, but this current part will help us move in that direction by setting the Python groundwork. Besides this intro article, this third major part is almost entirely devoted to Python code and code management.

When I begin posting that code, you will note that I change the standard blue message box at the conclusion of each installment. Yes, I’m a newbie, though with some exposure to programming best practices, but I am still most decidedly an amateur. One of the fun things in working with Python is the multiplicity of packages or modules or styles available to you as a programmer (amateur or not). There are great books and great online resources, which I often cite as we move forward, but I have found interactive coding to be an absolute blast with Jupyter Notebook. One can literally search and find Python coding options immediately on the Web and then test them directly in the notebook. I love the immediate testing, the tactile sense of interacting with the important code blocks. Knowing this, it is helpful to always bring forward the same environment and domain each time I work with the system. That means I am always working with information of relevance and testing routines of importance. I also like the ability to really do Knuth’s literate programming with the interspersing of comment and context.

So, as we kick off this new part, I wanted to start with a largely narrative introduction. I know where I want to go with this series, but since I am documenting as I go, I really don’t know for sure the path to get to objectives. I thought, therefore, that how I think about things and problems could be a logic trace for your own way to think about things. I think thinking in programmatic terms is more dynamic than report writing or project planning, my two main activities for decades. Coding is a faster, more consuming sport.

Why the Idea of ‘Roundtripping’?

Our experience over the past decade has brought three main lessons to the fore. First, knowledge — and, therefore, knowledge graphs to represent it — is dynamic and must be updated and maintained on a constant basis. A static knowledge graph is probably better than none at all, but is a far cry from the usefulness that can be gained from having knowledge currency an objective of a knowledge graph (and its supporting knowledge bases).

Second, while in expression they may be complex, knowledge systems are fundamentally simple and understandable. The complexity of a knowledge system arises from the emergence of simple rules, interacting in exponentially large ways. Implications are deductive, predictions are inductive, and new knowledge arises from abductive ways to interact with these systems. We should be able to break down our knowledge systems into fundamentally simple structures, modify those simple structures, and then build them back up again to their non-linear dynamics.

And, third, we have multiple ways we need to interact with knowledge graphs and bases, and multiple best-breed tools to do so. Sometimes we want to build and maintain a knowledge structure as something unto itself, with logic and integrity checks and tests and expansions of capabilities. Other actions might have us staging data to third-party applications for AI or machine learning. We also may need to make bulk modifications for specific application purposes or to tailor it specifically to our current domain. The different tools that might support these and other activities are best served when something akin to a common data interchange format is found. In our case, that is CSV in UTF-8 format, often expressed as N3 semantic triples.

Once the idea of a common exchange framework emerges, the sense of ‘roundtripping‘ becomes obvious. We use one tool for one purpose, export its information in a format with semantics sufficient for another tool to ingest it, make changes to it, and export it back again in a format with semantics readable by the first tool. Actually, in practice, good roundtripping more resembles a hub-and-spoke design, with the common representation framework at the hub and common to the spokes.

In our design of the KBpedia processing system moving forward, then, we will want to break down, or ‘decompose’ working, fully-specified and logically tested knowledge graphs into component parts that we can work with and modify offline, so to speak. We may work with and modify these component parts quite extensively in this ‘offline’ mode. We could, for example, swap out entire modules for specific domains with our own favored representations of that domain. We may also want to isolate all of our language strings to translate the knowledge graph to other languages. Or we may want to prune areas while we expand the specificity in others. We may even make changes in big chunks to the grounded upper structure of our knowledge graph because our design is inherently malleable. A huge source of ossification in knowledge graphs is this inability to be decomposed into re-processible building blocks.

A Mindset of Patterns

These big design considerations have a complimentary part at the implementation level of the code. The same drivers of hierarchy and generalizability that govern a modular architecture also govern code design, or so it seems to me. Maybe it is because of this pattern of break-down and build-up of the specification components of KBpedia that I also see repeatability in code steps. We start with a file and its context. We process that file to extract its mapped semantics and data. We manipulate that storehouse of assertions to support many applications. We are continually learning and adding to the storehouse. We make bulk moves and bulk changes to our underlying data. We are constantly opening and writing to files, and representing our information as two-dimensional arrays of records (rows) and characteristics (columns). We are needing to monitor changes and log errors and events to file while processing. We need to find our stored procedures and save stuff so we may readily find it again.

The idea of patterns and the power it brings to questions of scoping, abstraction, and design is substantial. I agree with the dictum that if you do something three times you should generalize and code it. My guess is that the search for the better algorithm and design is a key motivator to the professional programmer. For my purposes, however, this mindset is really just one of trying to think through generic activities that a given code block is intended to address, and then assess if more than three applications of this block (or parts of it) are likely across the intended code base. Once so stated, it is pretty obvious that ‘generalizability’ is very much a function of current use and context, so one dynamic aspect of programming is the continual refactoring of prior code to make it generalizable. When stated in words that way, it sounds perhaps a little crazy. But, in practice, generalizability of code leads to further simplicity, maintainability, and (hopefully) efficiency.

Python has many wonderful features to support patterns. One may, for example, adopt a ‘functional’ programming style in working with Python despite the language not being initially designed to be so. Extensions of functionality occur in any existing programming style with Python.

Any information passed to those routines should also be abstracted to logical names within input records. Automation only occurs through generalization. Like the simplicity argument made above, simple machines like automatons are easier to orchestrate and manage, even if their outcomes appear chaotic. So, what I think we would like to do in the totally abstract is have a limited number of functional method primitives to which we pass generic instructions and information using a relative small subset of named objects. Again, this is one of the key strengths of Python: the objectification of the language linked to nameable spaces.

High-level Build Overview

In its most general terms, we build KBpedia from three (actually, four, I cheat, and will explain in a bit) pieces. The first is the structural scaffolding of concepts and their ‘is-a’ hierarchical relationships. The second are the properties of the instances that the concepts represent, and how we understand, qualify, and quantify those things. The third piece is the way we label or describe or point to or indicate those things.

From these components we can build, and in the process logically test, the entire KBpedia from scratch. Since that is now working in our internal implementations with Clojure, that is a de minimus capability we want to capture in Python. While the build process begins with these input files and adds to the core starting point (the ‘bootstrap’ as best understood) we do not have that Python build code as we start out. Further, in a strange way, we never did have such a starting point for KBpedia in Clojure anyway. The code base for KBpedia we inherited from the previous generation UMBEL. And, UMBEL had some historical methods for building its knowledge graph directly from OpenCyc. The modular build routines had never been re-factored into the core routines of either UMBEL or KBpedia!

Fundamentally, this is not a big deal, since our modular approach and additions and modifications present no conceptual or implementation challenges. Still, the fact remains that our Clojure build routines do not begin at the root build premise. The easier way to bootstrap into a complete code base for roundtripping, then, is to first extract away the logical pieces from the coherent full KBpedia, until there is nothing left but the ‘core’ of the ontology. This core, of course, is the Kbpedia Knowledge Ontology, or KKO. For the bootstrapping process to work, we begin with a KKO specified core, and then extract or add pieces to it. We extract when we are capturing changes to the ontology graph that might have been made while in production or development using something like the Protégé IDE. We build when we are submitting our modifications to the ‘core’ and its existing components while testing for consistency and satisfiability.

Thus, while we may be tackling specific tasks a little backwards by dealing with extraction first, in the spirit of roundtripping these are merely questions of where one breaks into the system. For this CWPK series, that starts with extraction.

By the way, what was that reference to the fourth piece? Well, it is mapping KBpedia to external sources to facilitate retrieval and integration. We will cover that topic as well toward the end of our series. We are able to defer this topic since the mapping question is a bit of a secondary orbit from the central question of building and modifying KBpedia (or its derivatives).

A Caution and Some Basic Intuitions

My caution is just to reiterate that the Python code to come is one approach, among certainly many options, most of which I am sure would be easier to understand or better performing than what I am offering. Yet there is much to be said about getting ‘first twitch’ from these Jupyter Notebook installments and being able to test and extend these notions on your own.

And, what are these notions? Given the functional richness of the Python landscape it is only fair that I share some of my prejudices and intuitions about the specific methods put forth in the remaining code. Here are a few:

  • I like the idea of ‘generators’. Much of what we deal with in these scripts and KBpedia itself can be expressed in the ‘generator’ style of efficiently looping over specific sets or iterating things
  • A ‘set’ notation is at the heart of W3C standards (though sometimes masked as such) and the Python built-in set manipulation methods seem to be a powerful way of manipulating and comparing very large datasets. The set notation includes terminology such as intersction, union, difference, disjoint, subset, update, etc.
  • And, our view of CSV files as a central standard likely means we need to investigate and compare and choose among multiple CSV options in Python.

Once we get these basic coding methods in place it is time to turn our efforts into a standard Python module. Our transition will be aided by working the Spyder IDE into our code-development workflow toward the end of this third part.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Posted by AI3's author, Mike Bergman Posted on September 1, 2020 at 9:36 am in CWPK, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2361/cwpk-27-a-roundtrip-philosophy/
The URI to trackback this post is: https://www.mkbergman.com/2361/cwpk-27-a-roundtrip-philosophy/trackback/
Posted:August 31, 2020

Two Standards Come Pre-packaged with Owlready2

We introduce OWL (knowledge graph) reasoners in this installment of the Cooking with Python and KBpedia series. A reasoner has two purposes. First, based on deductive reasoning, a reasoner can infer new class and property assignments that are logically entailed by the assertions in an ontology (that is, from its axioms) but not otherwise explicitly stated. Once inferred, these additional assignments can be written to an inferred version of the ontology for faster lookups and analysis. Second, reasoners can evaluate the stated axioms to determine if the ontology is consistent or satisfiable. This second purpose is a key step when building or modifying a knowledge graph to ensure that illogical assertions are not introduced into the system. Reasoners thus often have explanation routines that point out where the inconsistencies or problems occur, thus helping the analyst to fix the errors before committing to productive use. In later installments we will focus especially on these coherency tests when we discuss the build procedures for KBpedia.

Consistency is when none of the assertions (axioms) in the knowledge graph contradicts another one. If there is a contradiction the graph is termed to be inconsistent. Satisfiability individually evaluates the classes in the graph and checks to see if they can have instances without contradicting other asserted axioms. Unsatisfiability indicates there is a conflicting or missing assignment that needs to be corrected. It is a particularly useful check when there are disjoint assertions made between classes, one of the design pillars of KBpedia.

Owlready2 is distributed with two OWL reasoners:

  • HermiT, developed by the department of Computer Science of the University of Oxford, and
  • Pellet, a reasoner developed specifically to support the OWL language.

Both HermiT and Pellet are written in Java, so require access to a JVM on your system. If you have difficulty running these systems it is likely because you: 1) do not have a recent version of Java installed on your system; or 2) do not have a proper PATH statement in your environmental variables to find the Java executable. If you encounter such problems, please consult third-party sources to get Java properly configured for your system before continuing with this installment.

Test Ontology

To make sure that your system is configured properly, go ahead and shift+enter or Run this cell that enters a small example ontology from the owlready2 documentation:

from owlready2 import *

onto = get_ontology("http://test.org/onto.owl")

with onto:
    class Drug(Thing):
        def take(self): print("I took a drug")

    class ActivePrinciple(Thing):
        pass

    class has_for_active_principle(Drug >> ActivePrinciple):
        python_name = "active_principles"

    class Placebo(Drug):
        equivalent_to = [Drug & Not(has_for_active_principle.some(ActivePrinciple))]
        def take(self): print("I took a placebo")

    class SingleActivePrincipleDrug(Drug):
        equivalent_to = [Drug & has_for_active_principle.exactly(1, ActivePrinciple)]
        def take(self): print("I took a drug with a single active principle")

    class DrugAssociation(Drug):
        equivalent_to = [Drug & has_for_active_principle.min(2, ActivePrinciple)]
        def take(self): print("I took a drug with %s active principles" % len(self.active_principles))

acetaminophen   = ActivePrinciple("acetaminophen")
amoxicillin     = ActivePrinciple("amoxicillin")
clavulanic_acid = ActivePrinciple("clavulanic_acid")

AllDifferent([acetaminophen, amoxicillin, clavulanic_acid])

drug1 = Drug(active_principles = [acetaminophen])
drug2 = Drug(active_principles = [amoxicillin, clavulanic_acid])
drug3 = Drug(active_principles = [])

close_world(Drug)

Then, run the HermiT reasoner with the single command:

sync_reasoner()
* Owlready2 * Running HermiT...
java -Xmx2000M -cp C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit;C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit\HermiT.jar org.semanticweb.HermiT.cli.CommandLine -c -O -D -I file:///C:/Users/mike/AppData/Local/Temp/tmpbnxf7755
* Owlready2 * HermiT took 0.4851553440093994 seconds
* Owlready * Reparenting onto.drug2: {onto.Drug} => {onto.DrugAssociation}
* Owlready * Reparenting onto.drug1: {onto.Drug} => {onto.SingleActivePrincipleDrug}
* Owlready * Reparenting onto.drug3: {onto.Drug} => {onto.Placebo}
* Owlready * (NB: only changes on entities loaded in Python are shown, other changes are done but not listed)

The feedback you get to screen should indicate that you are ‘Reparenting’ the three drugs from one class (Drug) to their appropriate sublasses. By the way, you could also place this argument in the command to turn off the debug reports to screen: $ sync_reasoner(debug = 0).

You can also confirm this move for drug2:

print("drug2 new Classes:", drug2.__class__)
drug2 new Classes: onto.DrugAssociation

And, then, in the next three cells, confirm how you took those three drugs:

drug1.take()
I took a drug with a single active principle
drug2.take()
I took a drug with 2 active principles
drug3.take()
I took a placebo

And, last, in the next two cells discover if any inconsistent classes remain (they do not), which is equivalent to a class being assigned to the Nothing class in OWL:

list(default_world.inconsistent_classes())
[]
if Nothing in Drug.equivalent_to:
       print("Drug is inconsistent!")

General Load Method

OK, so now we see the HermiT reasoner is configured properly and working, we are now ready to test our KBpedia knowledge graph. Go ahead and select Kernel → Restart & Clear Output from the main menu to begin the next activities from a clean slate.

Then execute what has become our standard load procedure:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

HermiT Reasoner

We again invoke the HermiT reasoner:

sync_reasoner()
* Owlready2 * Running HermiT...
java -Xmx2000M -cp C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit;C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit\HermiT.jar org.semanticweb.HermiT.cli.CommandLine -c -O -D -I file:///C:/Users/mike/AppData/Local/Temp/tmpxglvdub2
* Owlready2 * HermiT took 0.42046189308166504 seconds
* Owlready * (NB: only changes on entities loaded in Python are shown, other changes are done but not listed)

There is also an argument to infer_property_values = True:

sync_reasoner(infer_property_values = True)
* Owlready2 * Running HermiT...
java -Xmx2000M -cp C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit;C:\1-PythonProjects\Python\lib\site-packages\owlready2\hermit\HermiT.jar org.semanticweb.HermiT.cli.CommandLine -c -O -D -I file:///C:/Users/mike/AppData/Local/Temp/tmpxkc92ws4 -Y
* Owlready2 * HermiT took 0.416165828704834 seconds
* Owlready * (NB: only changes on entities loaded in Python are shown, other changes are done but not listed)

We see that the ontology is consistent, which we can confirm with this additional command:

list(world.inconsistent_classes())
[]

Pellet Reasoner

The second of our reasoners, Pellet, operates under a similar set of arguments. We invoke Pellet through the modified reasoner command:.

sync_reasoner_pellet()
* Owlready2 * Running Pellet...
java -Xmx2000M -cp C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\antlr-3.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\antlr-runtime-3.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\aterm-java-1.6.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\commons-codec-1.6.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\httpclient-4.2.3.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\httpcore-4.2.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jcl-over-slf4j-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-arq-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-core-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-iri-0.9.5.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-tdb-0.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jgrapht-jdk1.5.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\log4j-1.2.16.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\owlapi-distribution-3.4.3-bin.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\pellet-2.3.1.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\slf4j-api-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\slf4j-log4j12-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\xercesImpl-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\xml-apis-1.4.01.jar pellet.Pellet realize --loader Jena --input-format N-Triples --ignore-imports C:\Users\mike\AppData\Local\Temp\tmp7_rotl4_
* Owlready2 * Pellet took 1.017836093902588 seconds
* Owlready * (NB: only changes on entities loaded in Python are shown, other changes are done but not listed)

Pellet, too, is configured to run in a debug mode. If you wish, you may turn it off with $ sync_reasoner(debug = 0).

Like HermiT we can also infer_property_values. But, different than HermiT, we may also infer_data_property_values = True using Pellet:

sync_reasoner_pellet(infer_property_values = True, infer_data_property_values = True)
* Owlready2 * Running Pellet...
java -Xmx2000M -cp C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\antlr-3.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\antlr-runtime-3.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\aterm-java-1.6.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\commons-codec-1.6.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\httpclient-4.2.3.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\httpcore-4.2.2.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jcl-over-slf4j-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-arq-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-core-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-iri-0.9.5.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jena-tdb-0.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\jgrapht-jdk1.5.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\log4j-1.2.16.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\owlapi-distribution-3.4.3-bin.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\pellet-2.3.1.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\slf4j-api-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\slf4j-log4j12-1.6.4.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\xercesImpl-2.10.0.jar;C:\1-PythonProjects\Python\lib\site-packages\owlready2\pellet\xml-apis-1.4.01.jar pellet.Pellet realize --loader Jena --input-format N-Triples --infer-prop-values --infer-data-prop-values --ignore-imports C:\Users\mike\AppData\Local\Temp\tmpcr2dw8yi
* Owlready2 * Pellet took 0.6863009929656982 seconds
* Owlready * (NB: only changes on entities loaded in Python are shown, other changes are done but not listed)
list(world.inconsistent_classes())
[]

SWRL

As long as we are introducing these capabilities, we should also mention that Owlready2 also supports the use of SWRL (the Semantic Web Rule Language) “if . . . then” type statements. To the best of my knowledge, Owlready2 supports all of the standard SWRL constructs. It is also possible to mix Python and OWL code together, but that, too, is a topic we will not be addressing further in this CWPK series.

Save and Exit

When we are finished with our tests, we can File → Save and Checkpoint, Rename our output file, or specify it at the command line:

kb.save(file = 'files/kbpedia_reference_concepts-pellet.owl', format = 'rdfxml')

Additional Documentation

Here are links to appropriate Owlready2 documentation:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Posted by AI3's author, Mike Bergman Posted on August 31, 2020 at 11:01 am in CWPK, KBpedia, Semantic Web Tools | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/2360/cwpk-26-introduction-to-knowledge-graph-reasoners/
The URI to trackback this post is: https://www.mkbergman.com/2360/cwpk-26-introduction-to-knowledge-graph-reasoners/trackback/
Posted:August 28, 2020

Now, We Open Up the Power

In our recent installments we have been looking at how to search — ultimately, of course, related to how to extract — information from our knowledge graph, KBpedia, and the various large-scale knowledge bases to which it maps, such as Wikipedia, DBpedia, and Wikidata. We’ve seen that owlready2 offers us some native search capabilities, and that we can extend that by indexing additional attributes. What is powerful about knowledge graphs, however, is that all nodes and all edges are structural from the get-go, and we can easily add meaningful structure to our searches by how we represent the pieces (nodes) and by how we relate, or connect, them using the edges.

Today’s knowledge graphs are explicit in organizing information by structure. The exact scope of this structure varies across representations, and certainly one challenge in getting information to work together from multiple locations and provenances is the diversity of these representations. Those are the questions of semantics, and, fortunately, semantic technologies and parsers give us rich ways to retrieve and relate that structure. So, great, we now have structure galore! What are we going to do with it?

Well, this structured information exists, literally, everywhere. We have huge online structured datestores, trillions of semi-structured Web pages and records, and meaningful information and analysis across a rich pastiche of hierarchies and relationships. What is clear in any attempt to solve a meaningful problem is that we need much external information as well as much grounding in our internal circumstances. Problem solving can not be separated from obtaining and integrating meaningful information.

Thus, it is essential that we be able to query external information stores on an equivalent basis to our local ones. This equivalence requires both internal and external sources be structured and queriable on an equivalent basis, which is where the W3C-enabled standards and SPARQL come in.

The Role of SPARQL

I think one can argue that the purpose of semantic technologies like RDF and OWL is to enable a machine-readable format for human symbolic information. As a result, we now have a rich suite of standards and implementations using those standards.

The real purpose, and advantage, of SPARQL is to make explicit all of the structural aspects of a knowledge graph to inspection and query. Because of this intimate relationship, SPARQL is more often than not the most capable and precise language for extracting information from ontologies or knowledge graphs. SPARQL, pronounced “sparkle”, is a recursive acronym for SPARQL Protocol and RDF Query Language, and has many syntactical and structural parallels with the SQL database query language.

All explicit assignments of a semantic term in RDF or OWL or their semantic derivatives can be used as a query basis in SPARQL. Thus, SPARQL is the sine qua non option for obtaining information from an ontology or knowledge graph. SPARQL is the most flexible and responsive way to manipulate a semantically structured information store.

Let’s inspect the general components of a SPARQL query specification:

SPARQL Query Specification
Figure 1: SPARQL Query Specification

This figure is from Lee Feigenbaum’s SPARQL slides, included with other useful links under the Additional Documentation below.

Note that every SPARQL query gets directed to a specific endpoint, where access to the underlying RDF datastore takes place. These endpoints can be either local or accessed via the Web, with both examples shown below. In a standalone situation, the endpoint location is indicated by the FROM keyword. In our examples using RDFLib via Owlready2, these locations are set to a Python object.

Extended Startup

Let’s start again with the start-up script we used in the last installment, only now also opening rdflib and relating its namespace graph to the world namespace of KBpedia.

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

import rdflib

graph = world.as_rdflib_graph()

We could have put the import statement for the RDFLib package at the top, but anywhere prior to formatting the query is fine.

We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace world for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the graph namespace. This is a great example of the Python ecosystem at work.

Further, because of even greater integration, there are some native commands in Owlready2 that have been mapped to RDFLib making the syntax and conventions in working with both libraries easier.

Basic SPARQL Forms

In the last installment we presented two wrinkles for how to express your SPARQL queries to your local datastore. This form I noted looked closer to a standard SPARQL expression shown in Figure 1:

form_1 = list(graph.query_owlready("""
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""))

print(form_1)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]
* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:
http://kbpedia.org/kko/rc/Person
http://kbpedia.org/kko/rc/HomoSapiens

The query above has a warning message we can ignore and lists all of the direct sub-classes to Mammal in KBpedia.

The last installment also offered a second form, which is the one I will be using hereafter. I am doing so because this form, and its further abstraction, is a more repeatable approach. In general, this advantage is because we can take this format and abstract it into a ‘wrapper’ that encapsulates the method of making the SPARQL call separate, abstracted from the actual SPARQL specification. We will increasingly touch on these topics, but for now this is the format we will take:

form_2 = """
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""

results = list(graph.query_owlready(form_2))
print(results)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]

These two examples cover how to access the local datastore.

External SPARQL Examples

We really like what we have seen with the SPARQL querying of the internal data store using RDFLib within Owlready2. But what of querying outside sources. (And, would it not be cool to be able to mix-and-match internal and external stuff?)

As we try to use RDFLib as is against external SPARQL endpoints we quickly see that we are not adequately identifying and talking with these sites. Well, we have been here before, but the nature of stuff with Python and packages and dependencies and such often requires another capability.

Some quick poking turns up that we are lacking a HTTP-aware ‘wrapper’ to external sites. We turn up a promising package in sparqlwrapper. We discover it is on conda-forge so we back out the system, and at the command line add the package:

$ conda install sparqlwrapper

We again get the feedback to the screen as the Anaconda configuration manager does its thing. When finally installed and the prompt returns, we again load up Jupyter Notebook and return to this notebook page.

We are now ready to try our first external example, this time to Wikidata, after we import SPARQLwrapper and set our endpoint target to Wikidata (https://query.wikidata.org/sparql):

from SPARQLWrapper import SPARQLWrapper, JSON
from rdflib import Graph

sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

sparql.setQuery("""
  PREFIX schema: <http://schema.org/>
  SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {
  VALUES ?item { wd:Q25297630
  wd:Q537127
  wd:Q16831714
  wd:Q24398318
  wd:Q11755880
  wd:Q681337
}
  ?item wdt:P910 ?subClass.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print(results)
{'head': {'vars': ['item', 'itemLabel', 'wikilink', 'itemDescription', 'subClass', 'subClassLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q537127'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8667674'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'road bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge that carries road traffic'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Road bridges'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11755880'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8656043'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'residential building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building mainly used for residential purposes'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Residential buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16831714'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q6259373'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'government building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building built for and by the government, such as a town hall'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Government buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q24398318'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q5655238'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'religious building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building intended for religious worship or other activities related to a religion; ceremonial structures that are related to or concerned with religion'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Religious buildings and structures'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q25297630'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7344076'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'international bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge built across a geopolitical boundary'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:International bridges'}}]}}

Great! It works, and our first information retrieval from an external site!

Let me point out a couple of things about this format. First, the endpoint already has some built-in prefixes (wd: and wdt:) so we did not need to declare them in the query header. Second, there are some unique query capabilities of the Wikidata site noted by the SERVICE designation.

When first querying a new site it is perhaps best to stick to vanilla forms of SPARQL, but as one learns more it is possible to tailor queries more specifically. We also see that our setup will allow us to take advantage of what each endpoint gives us.

So, let’s take another example, this one using the DBpedia endpoint, to show how formats may also differ from endpoint to endpoint:

from SPARQLWrapper import SPARQLWrapper, RDFXML
from rdflib import Graph

sparql = SPARQLWrapper("http://dbpedia.org/sparql")

sparql.setQuery("""
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX schema: <http://schema.org/>

    CONSTRUCT {
      ?lang a schema:Language ;
      schema:alternateName ?iso6391Code .
    }
    WHERE {
      ?lang a dbo:Language ;
      dbo:iso6391Code ?iso6391Code .
      FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values
    }
""")

sparql.setReturnFormat(RDFXML)
results = sparql.query().convert()
print(results.serialize(format='xml'))

Notice again how the structure of our query code is pretty patterned. We also see in the two examples how we can specify different query results serializations (JSON and RDFXML in these examples) for our results sets.

Additional Documentation

The idea of a SPARQL tutorial is outside of the defined scope of this CWPK series. But, the power of SPARQL is substantial and it is well worth the time to learn more about this flexible language, that reminds one of SQL in many ways, but has its own charms and powers. Here are some great starting links about SPARQL:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Posted by AI3's author, Mike Bergman Posted on August 28, 2020 at 10:36 am in CWPK, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/
The URI to trackback this post is: https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/trackback/
Posted:August 27, 2020

It’s Time to Add a New Semantic Tool to the Toolbox

In CWPK #17 of this Cooking with Python and KBpedia series, we discussed what we would need in an API to OWL. Our work so far with owlready2 continues to be positive, leading us to believe it will prove out in the end to be the right API solution for our objectives. But in that same CWPK #17 review we also indicated intrigue with the RDFLib option. We know there are some soft spots with owlready2 in areas such as format support for which RDFLib is strong. It is also the case that owlready2 lacks a SPARQL query option, another area in which RDFLib is strong. In fact, the data exchange methods we use in KBpedia rely directly on simple variants of RDF, especially in the N3 notation.

In recognition of these synergies, just has it had in embracing SQLite as a lightweight native quad store, owlready2 has put in place many direct relations to RDFLib, including in the data store. What I had feared would be a difficult challenge of integrating Python, Anaconda, Jupyter Notebook, owlready2, and RDFLib, turned out in fact to be a very smooth process. We introduce the newest RDFLib piece in today’s installment.

RDFLib is a Python library for interacting with the Resource Description Framework (RDF) language. It has been actively maintained over 15 years and is presently in version 5.x. RDFLib is particularly strong in the areas of RDF format support, SPARQL querying of endpoints (including local stores), and CSV file functionality. Our hope in incorporating RDFLib is to provide the most robust RDF/OWL platform available in Python.

Installing RDFLib

Enter this at the command prompt:

$ conda install rdflib

You will see the standard feedback to the terminal that the package is being downloaded and then integrated with the other packages in the system. The simple install command is possible because we had already installed conda-forge as a channel within the Anaconda distribution system for Python as described in CWPK #9.

We are now ready to use RDFLib.

Basic Setup

OK, so we steer ourselves to the 24th installment in the CWPK directory and we fire up the system by invoking the command window from this directory. We enter $ jupyter notebook at the prompt and then proceed through the Jupyter file manager to this cwpk-24-intro-rdflib.ipynb file. We pick it, and then enter our standard set of opening commands to KBpedia:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, use this address for kbpedia_reference_concepts.owl
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

We could have done this first, but we need to import the RDFLib package into our active environment:

import rdflib

Depending on our use of RDFLib going forward, we could restrict this import to only certain modules in the package, but we load it all in this case.

Now, here is where the neat trick used by owlready2 in working with RDFLib comes into play. RDFLib also uses (in the standard case) SQLite as its backend. So, we point to the namespace graph (could be any name) that RDFLib expects, but we assign it to the namespace (in this case, world) already recognized by owlready2:

graph = world.as_rdflib_graph()

We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace world for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the graph namespace. This is a great example of the Python ecosystem at work.

Further, because of even greater integration, there are some native commands in owlready2 that have been mapped to from RDFLib making the syntax and conventions in working with both libraries easier.

Initial SPARQL Examples

Of course, the reason we brought RDFLib into the picture at this point was to continue our exploration of querying the knowledge graph that began in our last installment, CWPK #23. We devote the next installment to a discussion of SPARQL queries in some depth, but let’s first test to see if our configuration is working properly.

In our first of two examples we present a fairly simple query in SPARQL format to our internal KBpedia reference concept store under the namespace graph.

r = list(graph.query_owlready("""
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""))

print(r)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]

The above format looks more akin to a standard SPARQL query format. While it is a bit different, the example below is a more Python-like expression. Note as well that the three-quote convention tells Python to expect a multi-line code block:

r = """
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""

results = list(graph.query_owlready(r))
print(results)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]

Additional Documentation

In the next installment we will provide SPARQL documentation. Here, however, are a couple of useful links to learn mora about RDFLib and its capabilibies:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Posted by AI3's author, Mike Bergman Posted on August 27, 2020 at 11:03 am in CWPK, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2357/cwpk-24-introduction-to-rdflib/
The URI to trackback this post is: https://www.mkbergman.com/2357/cwpk-24-introduction-to-rdflib/trackback/
Posted:August 26, 2020

Using the Direct Approach with Owlready2

In this installment of the Cooking with Python and KBpedia series, we explore ways to directly search knowledge graph text from within the owlready2 API. We first introduced this topic in CWPK #19; we explain further some of the nuances here.

Recall that owlready2 uses its own local datastore, SQLite, for storing its knowledge graphs. Besides the search functionality added in Owlready2, we will also be taking advantage of the full-text search (FTS) functionality within SQLite.

Load Full Knowledge Graph

To get started, we again load our working knowledge graph. In this instance we will use the full KBpedia knowledge graph, kbpedia_reference_concepts.owl, because it has a richer set of contents.

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, use this address for kbpedia_reference_concepts.owl
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/owlready2/kg/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

To execute the load, pick shift+enter to execute the cell contents, or pick Run from the main menu.

Besides changing our absolute file input, note we have added another scoping assignment world to our load. world is a reserved keyword in Owlready2 that encompasses the SQLite storage space used by Owlready2. Note we assign all of our ontologies (knowledge graphs) to this namespace so that we may invoke some of the FTS functionality later in this installment.

Basic Search Functions

As the owlready2 documentation explains, it contains some pre-loaded search capabilities that can be performed with the .search() query method. This method can accept one or several keyword arguments:

  • iri – for searching entities by their full IRIs
  • type – for searching instances for a given class
  • subclass_of – for searching subclasses of a given class
  • is_a – for searching both instances and subclasses of a given class, or object, data or annotation property name.

Special arguments that may be added to these arguments are:

  • _use_str_as_loc_str – whether to treats plain Python strings as strings in any language (default is True)
  • _case_sensitive – whether to take lower/upper case into consideration (default is True).

Our search queries may accept quoted phrases and prefix or suffix wildcards (*). Let’s look at some examples combining these arguments and methods. Our first one is similar to what we presented in CWPK #19:

world.search(iri = "*luggage*")
[]

Notice our result here is an empty set, in other words, no matches. Yet we know there are IRIs in KBpedia that include the term ‘luggage’. We suspect the reason for not seeing a match is that the term might start with upper case in our IRIs. We will set the case sensitivity argument to false and try again:

world.search(iri = "*luggage*", _case_sensitive = False)

Great! We are now seeing the results we expected.

Note in the query above that we used the wildcard (*) to allow for either prefix or suffix matches. As you can see from the results above, most of the search references match the interior part of the IRI string.

The iri argument takes a search string as its assignment. The other three keyword assignments noted above take an object name, as this next example shows:

world.search(subclass_of=rc.Mammal)

We get a tremendous number of matches on this query, so much so that I cleared away the current cell output (via Cell → Current Outputs → Clear, when highlighting this cell). To winnow this results set further, we can combine search terms as the next example shows. We will add to our initial search a string search in the IRIs for which prior results might represent ‘Bats’:

world.search(subclass_of=rc.Mammal, iri = "*Bat*")
[rc.Bat-Mammal, rc.SacWingedBat, rc.BulldogBat, rc.FreeTailedBat, rc.HorseshoeBat, rc.SchreibersBat, rc.WesternSuckerFootedBat, rc.AfricanLongFingeredBat, rc.AfricanYellowBat, rc.AllensYellowBat, rc.AsianPartiColoredBat, rc.DaubentonsBat, rc.EasternRedBat, rc.GreatEveningBat, rc.GreaterTubeNosedBat, rc.GreyLongEaredBat, rc.HawaiianHoaryBat, rc.KobayashisBat, rc.LesserYellowBat, rc.LittleTubeNosedBat, rc.NewGuineaBigEaredBat, rc.NewZealandLongTailedBat, rc.NorthernLongEaredBat, rc.PallidBat, rc.SilverHairedBat, rc.SpottedBat, rc.AfricanSheathTailedBat, rc.AmazonianSacWingedBat, rc.BeccarisSheathTailedBat, rc.ChestnutSacWingedBat, rc.DarkSheathTailedBat, rc.EcuadorianSacWingedBat, rc.EgyptianTombBat, rc.FrostedSacWingedBat, rc.GraySacWingedBat, rc.GreaterSacWingedBat, rc.GreaterSheathTailedBat, rc.GreenhallsDogFacedBat, rc.HamiltonsTombBat, rc.HildegardesTombBat, rc.LargeEaredSheath-TailedBat, rc.LesserSacWingedBat, rc.LesserSheathTailedBat, rc.MauritianTombBat, rc.NorthernGhostBat, rc.PacificSheathTailedBat, rc.PelsPouchedBat, rc.PeterssSheathTailedBat, rc.ProboscisBat, rc.RaffraysSheathTailedBat, rc.SerisSheathtailBat, rc.SeychellesSheathTailedBat, rc.ShaggyBat, rc.ShortEaredBat, rc.SmallAsianSheathTailedBat, rc.TheobaldsTombBat, rc.ThomassSacWingedBat, rc.TroughtonsPouchedBat, rc.AntilleanFruitEatingBat, rc.BidentateYellowEaredBat, rc.BigEaredWoolyBat, rc.CommonVampireBat, rc.HairyLeggedVampireBat, rc.HonduranWhiteBat, rc.LesserLongNosedBat, rc.MexicanLongNosedBat, rc.SpectralBat, rc.VampireBat, rc.WhiteWingedVampireBat, rc.GreaterBulldogBat, rc.LesserBulldogBat, rc.BigCrestedMastiffBat, rc.BigFreeTailedBat, rc.BlackBonnetedBat, rc.BroadEaredBat, rc.EuropeanFreeTailedBat, rc.GallaghersFreeTailedBat, rc.IncanLittleMastiffBat, rc.LittleGoblinBat, rc.MexicanFreeTailedBat, rc.NatalFreeTailedBat, rc.PetersWrinkleLippedBat, rc.SumatranMastiffBat, rc.WroughtonsFreeTailedBat, rc.LesserFalseVampireBat, rc.YellowWingedBat, rc.BigNakedBackedBat, rc.ParnellsMustachedBat, rc.NewZealandGreaterShortTailedBat, rc.NewZealandLesserShortTailedBat, rc.BatesSlitFacedBat, rc.LargeSlitFacedBat, rc.DayakFruitBat, rc.LittleMarianaFruitBat, rc.LivingstonesFruitBat, rc.MarianaFruitBat, rc.PetersDiskWingedBat, rc.Bat-earedFox, rc.AboBat, rc.JapaneseHouseBat, rc.BigBrownBat, rc.NorthernBat, rc.GrayBat, rc.GreaterMouseEaredBat, rc.HodgsonsBat, rc.IkonnikovsBat, rc.IndianaBat, rc.LittleBrownBat, rc.NatterersBat, rc.GreaterNoctuleBat, rc.VirginiaBigEaredBat, rc.GreaterHorseshoeBat, rc.LamottesRoundleafBat, rc.LesserHorseshoeBat, rc.MalayanRoundleafBat, rc.VietnamLeafNosedBat, rc.PersianTridentBat, rc.BondaMastiffBat, rc.VelvetyFreeTailedBat, rc.WesternMastiffBat, rc.MexicanFunnelEaredBat, rc.AndersensFruitEatingBat, rc.JamaicanFruitBat]

Again, we get a large number of results. There are clearly many mammals and bats within the KBpedia reference graph!

Per the listing above, there are a number of these pre-configured search arguments directly available through Owlready2.

We can also instruct the FTS system in SQLite that we want to index still additional fields. Since we are interested in a term we know occurs in KBpedia’s annotations relating some reference concepts to the UN standard products and services codes (UNSPSC) we try that search directly:

world.search(entered = "*UNSPSC*")
[]

Hmm, this tells us there are no results. We must be missing an indexed field. So, let’s instruct the system to add indexing to the definition property where we suspect the reference may occur. We do so using the .append method to add a new field for our RC definitions (skos.definition) to the available FTS index structure:

world.full_text_search_properties.append(skos.definition)

Since this is just a simple assignment, when we Run the cell we get no results output.

However, that assignment now allows us to invoke the internal FTS (full-text search) argument:

world.search(definition = FTS("UNSPSC*"))

If you get an ‘operational error’ that means you did not Run the .append instruction above.

Like some of the other listings, this command results in a very large number of results, a couple of which are warnings we can ignore, so we again Clear the Cell. We can get a smaller listing with another keyword search, this time for the wildcarded ‘gear*’ search:

world.search(definition = FTS("gear*"))
[rc.undercarriage, rc.number-of-forward-gears, rc.vehicle-transmission, rc.AutomaticTransmission, rc.BearingBushingWheelGear, rc.BevelGear, rc.Bicycle-MultiGear, rc.BoeingAH-64Apache, rc.BugattiVeyron, rc.ChildrensWebSite, rc.CombatSportsEvent, rc.Commercialism, rc.CyclingClothing, rc.Device-FunctionallyDefective, rc.FirstNorthAmericansNovels, rc.Fishery, rc.FreeDiving, rc.Game-EquipmentSet, rc.Gear, rc.GearManufacturingMachine, rc.Gearing-Mechanical, rc.GearlessElectricDrive, rc.Goggles, rc.Harness-Animal, rc.Helmet, rc.IlyushinIl-30, rc.LandingGearAssembly, rc.MachineProtocol, rc.Mechanism-Technology, rc.Overdrive-Mechanics, rc.PinionGear, rc.ProtectiveEquipment-Human, rc.ProtectiveGear, rc.ScubaGear, rc.ScubaSnorkelingGear, rc.ShockAndAwe-MilitaryTactic, rc.Supercharger, rc.TeacherTrainingProgram, rc.Trek, rc.Wheel, rc.WildernessBackpacking, rc.WormGear]

Notice in this search that we are able to use the suffix wildcard () character. However, unlike the standard OWLready2 search, we are not able to use a wildcard () prefix search.

Since we have added a new indexed search table to our system, we may want to retain this capability. So, we decide to save the entire graph to the database, as the last example shows:

world.set_backend(filename = 'cwpk-23-text-searching-kbpedia.db', exclusive = False)

This now means our database has been saved persistently to disk.

If you run this multiple times you may get an operational error since you have already set the backend filename.

We can then .save() our work and exit the notebook.

world.save()

Additional Documentation

Here is additional information on the system’s text searching capabilities:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Posted by AI3's author, Mike Bergman Posted on August 26, 2020 at 2:07 pm in CWPK, KBpedia, Semantic Web Tools | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2356/cwpk-23-text-searching-kbpedia/
The URI to trackback this post is: https://www.mkbergman.com/2356/cwpk-23-text-searching-kbpedia/trackback/