Posted:September 3, 2020

CWPK #29: Extracting Object and Data Properties

We Continue the Theme of Structural Extraction

In this installment of the Cooking with Python and KBpedia series, we continue the theme of extracting the structural backbone to KBpedia. Our attention shifts now from classes to properties, the predicates found in the middle of a subject – predicate – object semantic triple. A s-p-o triple is the basic assertion in the RDF and OWL languages.

There are three types of predicate properties in OWL. Object properties relate a subject to another named entity, one which may be found at an IRI address, local or on the Web. Data properties are a value characterization of the subject, and may be represented by strings (labels or text) or date, time, location, or numeric values. Data properties are represented by datatypes, not IRIs. Annotation properties are pointers or descriptors to the subject and may be either a datatype or an IRI, but there is no reasoning over annotations across subjects. All property types can be represented in hierarchies using a subPropertyOf predicate similar to subClassOf for classes.

In addition, KBpedia uses a triadic split of predicates based on the universal categories of Charles Sanders Peirce. These map fairly closely with the OWL splits, but with some minor differences (not important to our current processing tasks). Representations are pointers, indicators, or descriptors to the thing at hand, the subject. These map closely to the OWL annotation properties. Attributes are the characterizations of the subject, intensional in nature, and are predominantly data properties (though it is not a violation to assign object properties where a value is one of an enumerated list). Direct relations are extensional relations between two entities, where the object in s-p-o must be an object property, represented by an IRI.

In most of today’s practice subPropertyOf is little used, though KBpedia is becoming active in exploring this area. In terms of semantic inheritance, properties are classes, though with important distinctions. Object and data properties may have functional roles, restrictions as to the size and nature of their sets, and specifications as to what types of subject they may represent (domain) or what type of object they may connect (range).

Though supporting these restrictions, owlready2 has less robust support for properties than classes. In the last installment’s work on the class backbone we saw the advantage of the .descendant() method for collecting children or grandchildren throughout the subsumption tree of class descent. Owlready2 does not document or expose this method for properties, but with properties a sub-class of class in the owlready2 code, I found I could use many of the class methods. Woohoo!

What I outline below is a parallel structure extraction to what we saw in the last installment regarding classes. In the next installment we will transition from structure extraction to annotation extraction.

Starting and Load

We begin with our standard opening routine:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')               

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)
core = world.get_namespace('http://www.w3.org/2004/02/skos/core#')

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)
kko = kb.get_namespace('http://kbpedia.org/ontologies/kko#')

Again, we execute each cell as we progress down this notebook page by pressing shift+enter for the highlighted cell or by choosing Run from the notebook menu.

Let’s first begin by inspecting the populated lists of our three types of properties, beginning with object (prefix po_), and then data (prefix pd_) and annotation (prefix pa_) properties, checking the length for the number of records as well:

po_set = list(world.object_properties())
list(po_set)
len(po_set)
1309
pd_set = list(world.data_properties())
list(pd_set)
len(pd_set)
pa_set = list(world.annotation_properties())
list(pa_set)
len(pa_set)

You may want to Cell → All Output → Clear to remove these long listings from your notebook.

Getting the Subsets Right

When we inspect these lists, however, we see that many of the predicates are ‘standard’ ones that we have in our core KBpedia Knowledge Ontology (see the KKO image). Recall that our design has us nucleating our knowledge graph build efforts with a starting ontology. In KBpedia’s case that is KKO.

Now we could just build all of the properties each time from scratch. But, similar to our typology design for a modular class structure, we very much like our more direct mapping to predicates to Peirce’s universal categories.

So, we test whether we can use the same .descendants() approach we used in the prior installment, only now applied to properties. In the case of the annotations property, that corresponds to our kko.representations predicate. So, we test this:

root = kko.representations
pa_set=root.descendants()

len(pa_set)

We can see that we dropped 11 predicates that were in our first approach.

We can list the set and verify that nearly all of our descendant properties are indeed in the reference concept (rc) namespace (we will address the minor exceptions in some installments to come), so we have successfully separated our additions from the core KKO starting point:

list(pa_set)

Since I like keeping the core ontology design idea, I will continue to use this more specific way to set the roots for KBpedia properties for these extraction routines. It adds a few more files to process down the road, but it can all be automated and I am better able to keep the distnction between KKO and the specific classes and properties that populate it for the domain at hand. It does mean that all new properties introduced to the system must be made a rdfs:subPropertyOf of one of the tie-in roots, but that also enforces the explicit treatment of new properties in relation to the Peircean universal categories.

Under this approach, the root for annotation properties is kko.representations as noted. For object properties, the root is kko.predicateProperties. (The other two main branches are kko.mappingProeperties and skos.skosProperties, which we consider central to KKO.) For data properties, the root is kko.predicateDataProperties. The other data properties are also built in to KKO.

If one wanted to adopt the code base in this CWPK series for other purposes, perhaps with a different core or bootstrap, other design choices could be made. But this approach feels correct for the design and architecture of KBpedia.

Iterating Sub Properties

Now that we have decided this scope question, let’s try the final code block from the last installment (also based on .descendants() and is_a) so see if and how it works in the property context. We make two changes to the last installment routine in that we now specify the rdfs:subPropertyOf property and replace our iterated set with pa_set:

o_frag = set()
s_frag = set()
p_item = 'rdfs:subPropertyOf'
for s_item in pa_set:
  o_set = s_item.is_a
  for o_item in o_set:
     print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='')
     o_frag.add(o_item)
  s_frag.add(s_item) 

Great, again! Our prior logic is directly transferable. The nice thing about this code applied to properties is that we also get the specifications for creating a new property, useful when roundtripping the information for build routines.

So, we clear out the currently active cell and are ready to move on. But first, we also made some nice discoveries in working out today’s installment, so I will end today’s installment with a couple of tips.

Bonus Tip

While doing the research for this installment, I came across a nifty method within owlready2 for controlling how these extraction retrievals display, with full IRIs, namespaces, or not. First, run the original script for listing the pa_set above. Then, for contrast, try these two options:

def render_using_label(entity):
    return entity.label.first() or entity.name

set_render_func(render_using_label)
list(pa_set)
def render_using_iri(entity):
    return entity.iri

set_render_func(render_using_iri)
list(pa_set)

These two suggestions came from the owlready2 documentation. But, after trying them, I wanted to get back to the original (default) formatting. But the documentation is silent on this question. After poking through the code a bit, I found this initialization method for returning to the default. Again, try it:

set_render_func(default_render_func)
list(pa_set)

Bonus Tip #2

Here is a nice method for getting a listing of all of the properties applied to a given class:

rc.Mammal.get_class_properties()

Additional Documentation

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #29: Extracting Object and Data Properties

alternativeHeadline:
We Continue the Theme of Structural Extraction

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
In this installment of the CWPK series, we continue the theme of extracting the structural backbone to KBpedia, now shifting our attention from classes to properties, the predicates found in the middle of a subject - predicate - object semantic triple.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published.