We Continue the Theme of Structural Extraction
In this installment of the Cooking with Python and KBpedia series, we continue the theme of extracting the structural backbone to KBpedia. Our attention shifts now from classes to properties, the predicates found in the middle of a subject – predicate – object semantic triple. A s-p-o triple is the basic assertion in the RDF and OWL languages.
There are three types of predicate properties in OWL. Object properties relate a subject to another named entity, one which may be found at an IRI address, local or on the Web. Data properties are a value characterization of the subject, and may be represented by strings (labels or text) or date, time, location, or numeric values. Data properties are represented by datatypes, not IRIs. Annotation properties are pointers or descriptors to the subject and may be either a datatype or an IRI, but there is no reasoning over annotations across subjects. All property types can be represented in hierarchies using a
subPropertyOf predicate similar to
subClassOf for classes.
In addition, KBpedia uses a triadic split of predicates based on the universal categories of Charles Sanders Peirce. These map fairly closely with the OWL splits, but with some minor differences (not important to our current processing tasks). Representations are pointers, indicators, or descriptors to the thing at hand, the subject. These map closely to the OWL annotation properties. Attributes are the characterizations of the subject, intensional in nature, and are predominantly data properties (though it is not a violation to assign object properties where a value is one of an enumerated list). Direct relations are extensional relations between two entities, where the object in s-p-o must be an object property, represented by an IRI.
In most of today’s practice
subPropertyOf is little used, though KBpedia is becoming active in exploring this area. In terms of semantic inheritance, properties are classes, though with important distinctions. Object and data properties may have functional roles, restrictions as to the size and nature of their sets, and specifications as to what types of subject they may represent (
domain) or what type of object they may connect (
Though supporting these restrictions, owlready2 has less robust support for properties than classes. In the last installment’s work on the class backbone we saw the advantage of the
.descendant() method for collecting children or grandchildren throughout the subsumption tree of class descent. Owlready2 does not document or expose this method for properties, but with properties a sub-class of class in the owlready2 code, I found I could use many of the class methods. Woohoo!
What I outline below is a parallel structure extraction to what we saw in the last installment regarding classes. In the next installment we will transition from structure extraction to annotation extraction.
Starting and Load
We begin with our standard opening routine:
= 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl' main # main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl' = 'http://www.w3.org/2004/02/skos/core' skos_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl' kko_file # kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl' from owlready2 import * = World() world = world.get_ontology(main).load() kb = kb.get_namespace('http://kbpedia.org/kko/rc/') rc = world.get_ontology(skos_file).load() skos kb.imported_ontologies.append(skos)= world.get_namespace('http://www.w3.org/2004/02/skos/core#') core = world.get_ontology(kko_file).load() kko kb.imported_ontologies.append(kko)= kb.get_namespace('http://kbpedia.org/ontologies/kko#')kko
Again, we execute each cell as we progress down this notebook page by pressing
shift+enter for the highlighted cell or by choosing Run from the notebook menu.
Let’s first begin by inspecting the populated lists of our three types of properties, beginning with object (prefix
po_), and then data (prefix
pd_) and annotation (prefix
pa_) properties, checking the length for the number of records as well:
= list(world.object_properties()) po_set list(po_set)
= list(world.data_properties()) pd_set list(pd_set)
= list(world.annotation_properties()) pa_set list(pa_set)
You may want to Cell → All Output → Clear to remove these long listings from your notebook.
Getting the Subsets Right
When we inspect these lists, however, we see that many of the predicates are ‘standard’ ones that we have in our core KBpedia Knowledge Ontology (see the KKO image). Recall that our design has us nucleating our knowledge graph build efforts with a starting ontology. In KBpedia’s case that is KKO.
Now we could just build all of the properties each time from scratch. But, similar to our typology design for a modular class structure, we very much like our more direct mapping to predicates to Peirce’s universal categories.
So, we test whether we can use the same
.descendants() approach we used in the prior installment, only now applied to properties. In the case of the annotations property, that corresponds to our
kko.representations predicate. So, we test this:
= kko.representations root =root.descendants() pa_set len(pa_set)
We can see that we dropped 11 predicates that were in our first approach.
We can list the set and verify that nearly all of our descendant properties are indeed in the reference concept (rc) namespace (we will address the minor exceptions in some installments to come), so we have successfully separated our additions from the core KKO starting point:
Since I like keeping the core ontology design idea, I will continue to use this more specific way to set the roots for KBpedia properties for these extraction routines. It adds a few more files to process down the road, but it can all be automated and I am better able to keep the distnction between KKO and the specific classes and properties that populate it for the domain at hand. It does mean that all new properties introduced to the system must be made a
rdfs:subPropertyOf of one of the tie-in roots, but that also enforces the explicit treatment of new properties in relation to the Peircean universal categories.
Under this approach, the root for annotation properties is
kko.representations as noted. For object properties, the root is
kko.predicateProperties. (The other two main branches are
skos.skosProperties, which we consider central to KKO.) For data properties, the root is
kko.predicateDataProperties. The other data properties are also built in to KKO.
If one wanted to adopt the code base in this CWPK series for other purposes, perhaps with a different core or bootstrap, other design choices could be made. But this approach feels correct for the design and architecture of KBpedia.
Iterating Sub Properties
Now that we have decided this scope question, let’s try the final code block from the last installment (also based on
is_a) so see if and how it works in the property context. We make two changes to the last installment routine in that we now specify the
rdfs:subPropertyOf property and replace our iterated set with
= set() o_frag = set() s_frag = 'rdfs:subPropertyOf' p_item for s_item in pa_set: = s_item.is_a o_set for o_item in o_set: print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='') o_frag.add(o_item) s_frag.add(s_item)
Great, again! Our prior logic is directly transferable. The nice thing about this code applied to properties is that we also get the specifications for creating a new property, useful when roundtripping the information for build routines.
So, we clear out the currently active cell and are ready to move on. But first, we also made some nice discoveries in working out today’s installment, so I will end today’s installment with a couple of tips.
While doing the research for this installment, I came across a nifty method within owlready2 for controlling how these extraction retrievals display, with full IRIs, namespaces, or not. First, run the original script for listing the
pa_set above. Then, for contrast, try these two options:
def render_using_label(entity): return entity.label.first() or entity.name set_render_func(render_using_label)list(pa_set)
def render_using_iri(entity): return entity.iri set_render_func(render_using_iri)list(pa_set)
These two suggestions came from the owlready2 documentation. But, after trying them, I wanted to get back to the original (default) formatting. But the documentation is silent on this question. After poking through the code a bit, I found this initialization method for returning to the default. Again, try it:
Bonus Tip #2
Here is a nice method for getting a listing of all of the properties applied to a given class: