Posted:September 9, 2020

CWPK #32: Iterating Over a Full Extraction

It is Time to Explore Python Dictionaries and Packaging

In our last coding installments in this Cooking with Python and KBpedia series, prior to our single-installment detour to learn about files, we developed extraction routines for both structure (rdfs:subClassOf) and annotations (of various properties) using the fantastic package owlready2. In practice, these generic routines will loop over populations of certain object types in KBpedia, such as typologies or property types. We want a way to feed these variations to the generic routines in an efficient and understandable way.

Python lists are one way to do so, and we have begun to gain a bit of experience in our prior work with lists and sets. But there is another structure in Python called a ‘dictionary’ that sets up key-value pairs of 2-tuples that promises more flexibility and power. The 2-tuple sets up a relationship between an attribute name (a variable name) with a value, quite similar to the associative arrays in JSON. The values in a dictionary can be any object in Python, including functions or other dictionaries, the latter which allows ‘record’-like data structures. However, there may not be duplicate names for keys within a given dictionary (but names may be used again in other dictionaries without global reference).

Dictionaries ('dicts') are like Python lists except list elements are accessed by their position in the list using a numeric index, while we access dict elements via keys. This makes tracing the code easier. We have also indicated that dictionary structures may be forthcoming in other uses of KBpedia, such as CSV or master data. So, I decided to start gaining experience with 'dicts' in this installment.

(Other apparent advantages of dictionaries not directly related to our immediate needs include:

  • Dictionaries can be expanded without altering what is already there
  • From Python 3.7 onward, the order entered into a dict is preserved in loops
  • Dictionaries can handle extremely large data sets
  • Dicts are fast because they are implemented as a hash table, and
  • They can be directly related to a Pandas DataFrame should we go that route.)

We can inspect this method with our standard statement:

dir(dict)

The Basic Iteration Approach

In installments CWPK #28, CWPK #29, and CWPK #30, we created generic prototype routines for extracting structure from typologies and properties and then annotations from classes (including typologies as a subset) and properties as well. We thus have generic extraction routines for:

Structure Annotations
classes classes
typologies     typologies (possible)
properties properties

Our basic iteration approach, then, is to define dictionaries for the root objects in these categories and loop over them invoking these generic routines. In the process we want to write out results for each iteration, provide some progress messages, and then complete the looping elements for each root object. Labels and internal lookups to the namespace objects come from the dictionary. In generic terms, then, here is how we want these methods to be structured:

  • Initialize method
  • Message: starting method
  • Get dict iterator:
    • Message: iterating current element
    • Get owlready2 set iterator for element:
      • Populate row
      • Print to file
  • Return to prompt without error message.

Starting and Load

To demonstrate this progression, we begin with our standard opening routine:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
kbpedia = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# kbpedia = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(kbpedia).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')               

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)
core = world.get_namespace('http://www.w3.org/2004/02/skos/core#')

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)
kko = kb.get_namespace('http://kbpedia.org/ontologies/kko#')

Like always, we execute each cell as we progress down this notebook page by pressing shift+enter for the highlighted cell or by choosing Run from the notebook menu.

Creating the Dictionaries

We will now create dictionaries for typologies and properties. We will construct them using our standard internal name as the ‘key’ for each element, with the value being the internal reference including the namespace prefix (easier than always concatenating using strings). I’ll first begin with the smaller properties dictionary and explain the sytax afterwards:

prop_dict = {
        'objectProperties'    : 'kko.predicateProperties',
        'dataProperties'      : 'kko.predicateDataProperties',
        'annotationProperties': 'kko.representations',
}

A dictionary is declared either with the curly brackets ({ }) with the colon separator for key:value, or by using the d = dict([(<key>, <value>)]) form. The ‘key’ field is normally quoted, except where the variable is globally defined. The ‘value’ field in this instance is the internal owlready2 notation of <namespace> + <class>. There is no need to align the colons except to enhance readability.

Our longer listing is the typology one:

typol_dict = {
        'ActionTypes'           : 'kko.ActionTypes',
        'AdjunctualAttributes'  : 'kko.AdjunctualAttributes',
        'Agents'                : 'kko.Agents',
        'Animals'               : 'kko.Animals',
        'AreaRegion'            : 'kko.AreaRegion',
        'Artifacts'             : 'kko.Artifacts',
        'Associatives'          : 'kko.Associatives',
        'AtomsElements'         : 'kko.AtomsElements',
        'AttributeTypes'        : 'kko.AttributeTypes',
        'AudioInfo'             : 'kko.AudioInfo',
        'AVInfo'                : 'kko.AVInfo',
        'BiologicalProcesses'   : 'kko.BiologicalProcesses',
        'Chemistry'             : 'kko.Chemistry',
        'Concepts'              : 'kko.Concepts',
        'ConceptualSystems'     : 'kko.ConceptualSystems',
        'Constituents'          : 'kko.Constituents',
        'ContextualAttributes'  : 'kko.ContextualAttributes',
        'CopulativeRelations'   : 'kko.CopulativeRelations',
        'Denotatives'           : 'kko.Denotatives',
        'DirectRelations'       : 'kko.DirectRelations',
        'Diseases'              : 'kko.Diseases',
        'Drugs'                 : 'kko.Drugs',
        'EconomicSystems'       : 'kko.EconomicSystems',
        'EmergentKnowledge'     : 'kko.EmergentKnowledge',
        'Eukaryotes'            : 'kko.Eukaryotes',
        'EventTypes'            : 'kko.EventTypes',
        'Facilities'            : 'kko.Facilities',
        'FoodDrink'             : 'kko.FoodDrink',
        'Forms'                 : 'kko.Forms',
        'Generals'              : 'kko.Generals',
        'Geopolitical'          : 'kko.Geopolitical',
        'Indexes'               : 'kko.Indexes',
        'Information'           : 'kko.Information',
        'InquiryMethods'        : 'kko.InquiryMethods',
        'IntrinsicAttributes'   : 'kko.IntrinsicAttributes',
        'KnowledgeDomains'      : 'kko.KnowledgeDomains',
        'LearningProcesses'     : 'kko.LearningProcesses',
        'LivingThings'          : 'kko.LivingThings',
        'LocationPlace'         : 'kko.LocationPlace',
        'Manifestations'        : 'kko.Manifestations',
        'MediativeRelations'    : 'kko.MediativeRelations',
        'Methodeutic'           : 'kko.Methodeutic',
        'NaturalMatter'         : 'kko.NaturalMatter',
        'NaturalPhenomena'      : 'kko.NaturalPhenomena',
        'NaturalSubstances'     : 'kko.NaturalSubstances',
        'OrganicChemistry'      : 'kko.OrganicChemistry',
        'OrganicMatter'         : 'kko.OrganicMatter',
        'Organizations'         : 'kko.Organizations',
        'Persons'               : 'kko.Persons',
        'Places'                : 'kko.Places',
        'Plants'                : 'kko.Plants',
        'Predications'          : 'kko.Predications',
        'PrimarySectorProduct'  : 'kko.PrimarySectorProduct',
        'Products'              : 'kko.Products',
        'Prokaryotes'           : 'kko.Prokaryotes',
        'ProtistsFungus'        : 'kko.ProtistsFungus',
        'RelationTypes'         : 'kko.RelationTypes',
        'RepresentationTypes'   : 'kko.RepresentationTypes',
        'SecondarySectorProduct': 'kko.SecondarySectorProduct',
        'Shapes'                : 'kko.Shapes',
        'SituationTypes'        : 'kko.SituationTypes',
        'SocialSystems'         : 'kko.SocialSystems',
        'Society'               : 'kko.Society',
        'SpaceTypes'            : 'kko.SpaceTypes',
        'StructuredInfo'        : 'kko.StructuredInfo',
        'Symbolic'              : 'kko.Symbolic',
        'Systems'               : 'kko.Systems',
        'TertiarySectorService' : 'kko.TertiarySectorService',
        'Times'                 : 'kko.Times',
        'TimeTypes'             : 'kko.TimeTypes',
        'TopicsCategories'      : 'kko.TopicsCategories',
        'VisualInfo'            : 'kko.VisualInfo',
        'WrittenInfo'           : 'kko.WrittenInfo'
}

To get a listing of entries in a dictionary, simply reference its name and run:

prop_dict

There are a variety of methods for nesting or merging dictionaries. We do not have need at present for that, but one example shows how we can create a new dictionary, relate it to an existing one, and then update (or merge) another dictionary with it, using the two dictionaries from above as examples:

total_dict = dict(typol_dict)
total_dict.update(prop_dict)
print(total_dict)

This now gives us a merged dictionary. However, whether keys match or vary in number means specific cases need to be evaluated individually. The .update may not always be an appropriate approach.

In these dicts, we now have the population of items (sets) from which we want to obtain all of their members and get the individual extractions. We also have them organized into dictionaries that we can iterate over to complete a full extraction from KBpedia.

Marrying Iterators and Routines

We can now return to our generic extraction prototypes and enhance them a bit to loop over these iterators. Let’s take the structure extraction of rdfs:subPropertyOf from CWPK #29 to extract out structural aspects of our properties. I will keep the form from the earlier installment and comment all lines of code added to accommodate the iterations loops and message feedback. First we will add the iterator:

for value in prop_dict.values():      # iterates over dictionary 'values' with each occurence a 'value'
  root = eval(value)                  # need to convert value 'string' to internal variable
  p_set=root.descendants()

#  o_frag = set()                     # left over from prior development; commented out
#  s_frag = set()                     # left over from prior development; commented out
  p_item = 'rdfs:subPropertyOf'
  for s_item in p_set:
    o_set = s_item.is_a
    for o_item in o_set:
       print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='')
#       o_frag.add(o_item)            # left over from prior development; commented out
#    s_frag.add(s_item)               # left over from prior development; commented out

You could do a len() to test output lines or make other tests to ensure you are iterating over the property groupings.

The eval() function submits the string represented by value to the resident Python code base and in this case returns the owlready2 property object, which then allows proper processing of the .descendants() code. My understanding is that in open settings eval() can pose some security holes. I think it is OK in our case since we are doing local or internal processing, and not exposing this as a public method.

We’ll continue with this code block, but now print to file and remove the commented lines:

out_file = 'C:/1-PythonProjects/kbpedia/sandbox/prop_struct_out.csv'                 # variable to physical file
with open(out_file, mode='w', encoding='utf8') as out_put:                        # std file declaration (CWPK #31)
  for value in prop_dict.values():      
    root = eval(value)                  
    p_set=root.descendants()
    p_item = 'rdfs:subPropertyOf'
    for s_item in p_set:
      o_set = s_item.is_a
      for o_item in o_set:
        print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='', file=out_put) # add output file here

And, then, we’ll add some messages to the screen to see output as it whizzes by:

print('Beginning property structure extraction . . .')                            # print message
out_file = 'C:/1-PythonProjects/kbpedia/sandbox/prop_struct_out.csv'
with open(out_file, mode='w', encoding='utf8') as out_put:
  for value in prop_dict.values():
    print('   . . . processing', value)                                           # loop print message
    root = eval(value)                  
    p_set=root.descendants()
    p_item = 'rdfs:subPropertyOf'
    for s_item in p_set:
      o_set = s_item.is_a
      for o_item in o_set:
        print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='', file=out_put)
Beginning property structure extraction . . .
. . . processing kko.predicateProperties
. . . processing kko.predicateDataProperties
. . . processing kko.representations

OK, so this looks to be a complete routine as we desire. However, we are starting to accumulate a fair number of lines in our routines, and we need additional routines very similar to what is above for extracting classes, typologies and annotations.

It is time to bring a bit more formality to our code writing and management, which I address in the next installment.

Additional Documentation

Here is additional documentation related to today’s CWPK installment:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #32: Iterating Over a Full Extraction

alternativeHeadline:
It is Time to Explore Python Dictionaries and Packaging

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
We learn about the Python 'dictionary' structure in this CWPK installment, in which we use their key-value pair structure to name and then loop over key structural components within the KBpedia knowledge graph structure. We also begin learning about Python's powerful looping functions.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published. Required fields are marked *