Posted:September 22, 2020

CWPK #40: Looping and Multiple Structure File Ingest

We Build Up Our Ingest Routine to All Structure

Now that we have a template for structure builds in this Cooking with Python and KBpedia series, we continue to refine that template to generalize the routine and expand it to looping over multiple input files and to apply it to property structure as well. These are the topics we cover in this current installment, with a detour as I explain below.

In order to prep for today’s material, I encourage you to go back and look at the large routine we developed in the last installment. We can see three areas we need to address in order to generalize this routine:

  • First, last installment’s structure build routine (as designed) requires three passes to complete file ingest. Each one of those passes has a duplicate code section to convert our file input forms to required shorter versions. We would like to extract these duplicates as a helper function in order to lesson code complexity and improve readability
  • Second, we need a more generic way of specifying the input file or files to be processed by the routine, preferably including being able to loop over and process all of the files in a given input dictionary (as housed in config.py), and
  • Third, we would like to generalize the approach to dealing with class hierarchical structure to also deal with property ingest and hierarchical structure.

So, with these objectives in mind, let’s begin.

Adding a Helper Function

For reference, here is the code block in the prior installment that we repeat three times, and for which we would like to develop a helper function (BTW, this code block will not run here in isolation):

id = row['id']                                                 
parent = row['parent']                                         
id = id.replace('http://kbpedia.org/kko/rc/', 'rc.')          
id = id.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
id_frag = id.replace('rc.', '')
id_frag = id_frag.replace('kko.', '')
parent = parent.replace('http://kbpedia.org/kko/rc/', 'rc.') 
parent = parent.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
parent = parent.replace('owl:', 'owl.')
parent_frag = parent.replace('rc.', '')
parent_frag = parent_frag.replace('kko.', '')
parent_frag = parent_frag.replace('owl.', '')

We will call our helper function row_clean since its purpose is to convert the full IRIs of the CSV input rows to shorter forms required by owlready2 (sometimes object names with a namespace prefix, other times just with the shortened object name). We also need these to work on either the subject of the row (‘id’) or the object of the row (‘parent’ in this case). That leads to four combinations of 2 row objects by 2 shortened forms.

Note that the second argument (‘iss’) passed to the function below is a keyword argument, always shown with the equal sign in the function definition. Also note sometimes, rather than an empty string as shown, if you assign the keyword argument a legitimate value when defined, that becomes the default assignment for that keyword and does not have to have a value assigned to it when called. (NB: Indeed, many built-in Python functions have multiple arguments that are infrequently exposed. I have found it frequently helpful to do a dir() on functions to discover their broader capabilities.)

### Here is the helper function

def row_clean(value, iss=''):                                # arg values come from calling code
    if iss == 'i_id':                                        # check to see which replacement method
        value = value.replace('http://kbpedia.org/kko/rc/', 'rc.')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
        return value                                         # returns the calculated value to calling code
    if iss == 'i_id_frag':
        value = value.replace('http://kbpedia.org/kko/rc/', '')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', '')
        return value
    if iss == 'i_parent':
        value = value.replace('http://kbpedia.org/kko/rc/', 'rc.')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
        value = value.replace('owl:', 'owl.')
        return value
    if iss == 'i_parent_frag':
        value = value.replace('http://kbpedia.org/kko/rc/', '')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', '')
        value = value.replace('owl:', '')
        return value
        
### Here is the code we will put in the main calling routine:
        
# r_id = row['id']                                           # this is the version we will actually keep
r_id = 'http://kbpedia.org/kko/rc/AlarmSignal'               # temporary assignment just to test code
# r_parent = row['parent']
r_parent = 'http://kbpedia.org/kko/rc/SoundsByType'
id = row_clean(r_id, iss='i_id')                             # send the two arguments to helper function
id_frag = row_clean(r_id, iss='i_id_frag')
parent = row_clean(r_parent, iss='i_parent')
parent_frag = row_clean(r_parent, iss='i_parent_frag')

print('id:', id)                                             # temporary print to check if results OK
print('id_frag', id_frag)
print('parent:', parent)
print('parent_frag:', parent_frag)

Because we have entered some direct assignments the code block above does Run (or <\shift+enter>).

Note in the main calling routine code that to get our routine values we are calling the row_clean function and passing the required two arguments: the value for either the ‘id’ or ‘parent’ in that row, and whether we want prefixed or shortened fragments.

I strongly suspect there are better and shorter ways to remove this duplicate code, but this approach with a helper function, even in a less optimal form, still has cut the original code length in half (36 lines to 18 lines due to three duplicates). Expect to see a similar form to this in our code going forward. (NB: I am finding that looking for these duplicate code blocks is forcing me to learn function definitions and seek shorter but more expressive forms.)

Looping Over Files

If you recall our extraction steps of getting flat CSV files out of KBpedia in CWPK #28 to CWPK #35, we can end up with close to 100 extraction files. These splits encourage modularity and are easier to work on or substitute. Still, when it comes time to building KBpedia back up again after we complete a roundtrip, a complete build requires we process many files. We thus need looping routines across our build files to automate this process.

The first thought is to simply put groupings of files in individual directories and then point the routine at a directory and instruct it to loop over all files. If we have concerns that the directories may have more file types than we want to process with our current routine, we could also introduce some file name string checks to filter by name, fragment, or extension. These options would enable us to generalize a file looping routine to apply to many conditions.

But, I’ve decided to take a different choice. Since our extractions are driven by Python dictionaries, and we can direct those extractions to any directory prefix, we can re-use these same specifications for build processes. Should we later discover that a general file harvester makes sense, we can generalize at that time from this dictionary design. Also, by applying the same dictionary approach to extraction or building, we help reinforce our roundtripping mindset in how we name and process files.

So, we already have the unique names that distinguish our input classes (in the typol_dict dictionary in config.py) and our properties (in the prop_dict dictionary), and foresee using additional dictionaries going forward in this CWPK series. We only need enter a directory root and the appropriate dictionary to loop over the unique terms associated with our various building blocks. For classes, the typology listing is a great lookup.

We will take our generic class build template from the last installment, and put it into a function that loops over opening our file set, running the routine, and then saving to our desired output location. For now, to get the logic right, I will just set this up as a wrapper before actually plopping in the full build loop routine. (Note: we have to import a couple of modules because we have not yet fully set the environment for today’s installment):

from cowpoke.config import *
import csv                                                

def class_builder(**build_deck):
    print('Beginning KBpedia class structure build . . .')
    r_default = ''
    r_label = ''
    r_iri = ''
# probably want the run specification here (see CWPK #35 for render in struct_extractor)
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    class_loop = build_deck.get('class_loop')
    base = build_deck.get('base')
    ext = build_deck.get('ext')
    if loop is not 'class_loop':
        print("Needs to be a 'class_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval)
        frag = loopval.replace('kko.','')
        in_file = (base + frag + ext)
        x = 1
        with open(in_file, mode='r', encoding='utf8') as input:                                           
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])                 
            for row in reader:
## Here is where we place the real class build routine                
                if x <= 2:
                    r_id = row['id']
                    r_parent = row['parent']
                    print(r_id, r_parent)
                    x = x + 1
        input.close()
        
class_builder(**build_deck)        

OK. We now know how to loop over our class build input files. Now, we can Kernel → Restart & Clear Outputs → and then Restart and Clear All Outputs (which should be a familiar red button to you if using Jupyter Notebook) to get ourselves to a clean starting place, to begin setting up our structure build environmment.

Setting Up the Build Environment

As before with our extract routines, we now have a build_deck dictionary of build configuration settings in config.py. If you see some unfamiliar switches as we proceed through this build process, you may want to inspect that file. The settings are pretty close analogs to the same types of settings for our extractions, as specified in the run_deck dictionary. Most all of this code will migrate to the new build module.

We begin by importing our necessary modules and setting our file settings for the build:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service or local files. The example below is based on using local files, which given the complexity of the routines that are emerging, is probably your better choice. Make sure to modify the URIs for your local directories.
from owlready2 import * 
from cowpoke.config import *
# from cowpoke.__main__ import *
import csv                                                
import types

world = World()

kb_src = every_deck.get('kb_src')                         # we get the build setting from config.py
#kb_src = 'standard'                                      # we can also do quick tests with an override

if kb_src is None:
    kb_src = 'standard'
if kb_src == 'sandbox':
    kbpedia = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
    kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
elif kb_src == 'standard':
    kbpedia = 'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts.owl'
    kko_file = 'C:/1-PythonProjects/kbpedia/v300/build_ins/stubs/kko.owl'
elif kb_src == 'start':
    kbpedia = 'C:/1-PythonProjects/kbpedia/v300/build_ins/stubs/kbpedia_rc_stub.owl'
    kko_file = 'C:/1-PythonProjects/kbpedia/v300/build_ins/stubs/kko.owl'
else:
    print('You have entered an inaccurate source parameter for the build.')
skos_file = 'http://www.w3.org/2004/02/skos/core' 
    

We load our ontologies into owlready2 and set our namespaces:

kb = world.get_ontology(kbpedia).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')               

#skos = world.get_ontology(skos_file).load()
#kb.imported_ontologies.append(skos)
#core = world.get_namespace('http://www.w3.org/2004/02/skos/core#')

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)
kko = kb.get_namespace('http://kbpedia.org/ontologies/kko#')

Since we’ve cleared memory and our workspace, we again add back in our new row_clean helper function:

def row_clean(value, iss=''):                                # arg values come from calling code
    if iss == 'i_id':                                        # check to see which replacement method
        value = value.replace('http://kbpedia.org/kko/rc/', 'rc.')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
        return value                                         # returns the calculated value to calling code
    if iss == 'i_id_frag':
        value = value.replace('http://kbpedia.org/kko/rc/', '')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', '')
        return value
    if iss == 'i_parent':
        value = value.replace('http://kbpedia.org/kko/rc/', 'rc.')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', 'kko.')
        value = value.replace('owl:', 'owl.')
        return value
    if iss == 'i_parent_frag':
        value = value.replace('http://kbpedia.org/kko/rc/', '')           
        value = value.replace('http://kbpedia.org/ontologies/kko#', '')
        value = value.replace('owl:', '')
        return value

Running the Complete Class Build

And then add our class build template to our new routine for iterating over all of our class input build files. CAUTION: to process all inputs to KBpedia, best done with the single assignment of the Generals typology (since all other typologies not already included in KKO are children of it), takes about 70 min on a conventional desktop.

You may notice that we made some slight changes to named variables in the draft template developed in the last installment:

  • src_filein_file
  • csv_fileinput

And, we have placed it into a defined function, class_struct_builder:

def class_struct_builder(**build_deck):                                    # Note 1
    print('Beginning KBpedia class structure build . . .')                 # Note 5
    kko_list = typol_dict.values()                                         # Note 2
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    class_loop = build_deck.get('class_loop')
    base = build_deck.get('base')
    ext = build_deck.get('ext')
    if loop is not 'class_loop':
        print("Needs to be a 'class_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval)                              # Note 5
        frag = loopval.replace('kko.','')
        in_file = (base + frag + ext)
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])                 
            for row in reader:
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')                           # Note 3
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:                                       
                    is_first_row = False
                    continue      
                with rc:                                                
                    kko_id = None
                    kko_frag = None
                    if parent_frag == 'Thing':                                                        
                        if id in kko_list:                                
                            kko_id = id
                            kko_frag = id_frag
                        else:    
                            id = types.new_class(id_frag, (Thing,))       
                if kko_id != None:                                         
                    with kko:                                                
                        kko_id = types.new_class(kko_frag, (Thing,))  
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])
            for row in reader:                                                
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:
                    is_first_row = False
                    continue          
                with rc:
                    kko_id = None                                   
                    kko_frag = None
                    kko_parent = None
                    kko_parent_frag = None
                    if parent_frag is not 'Thing':
                        if id in kko_list:
                            continue
                        elif parent in kko_list:
                            kko_id = id
                            kko_frag = id_frag
                            kko_parent = parent
                            kko_parent_frag = parent_frag
                        else:   
                            var1 = getattr(rc, id_frag)               
                            var2 = getattr(rc, parent_frag)
                            if var2 == None:                            
                                continue
                            else:                                
                                var1.is_a.append(var2)
                if kko_parent != None:                                         
                    with kko:                
                        if kko_id in kko_list:                               
                            continue
                        else:
                            var1 = getattr(rc, kko_frag)
                            var2 = getattr(kko, kko_parent_frag)                     
                            var1.is_a.append(var2)
        with open(in_file, 'r', encoding='utf8') as input:                 # Note 4
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])
            for row in reader:                                              
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:
                    is_first_row = False
                    continue
                if parent_frag == 'Thing': 
# This is the new code section, replacing the commented out below          # Note 4                   
                    var1 = getattr(rc, id_frag)
                    var2 = getattr(owl, parent_frag)
                    try:
                        var1.is_a.remove(var2)
                    except Exception:
#                        var1 = getattr(kko, id_frag)
#                        print(var1)
#                        var1.is_a.remove(owl.Thing)
#                        print('Last step in removing Thing')
                        continue
#                    print(var1, var2)
#                    if id in thing_list:                                     
#                        continue
#                    else:
#                        if id in kko_list:                                    
#                            var1 = getattr(kko, id_frag)
#                            thing_list.add(id)
#                        else:                                                 
#                            var1 = getattr(rc, id_frag)
#                            var2 = getattr(owl, parent_frag)
#                            if var2 == None:
#                                print('Empty Thing:')
#                                print('var1:', var1, 'var2:', var2)                            
#                            try:
#                                var1.is_a.remove(var2)
#                            except ValueError:
#                                print('PROBLEM:')
#                                print('var1:', var1, 'var2:', var2)                
#                                if len(thing_list) == 0:
#                                    print('thing_list is empty.')
#                                else:
#                                    print(*thing_list)
#                                break
#                        print(var1, var2)
#                        thing_list.append(id)
#                        thing_list.add(id)
    out_file = 'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/build_stop.csv'
    with open(out_file, 'w', encoding='utf8') as f:
            print('KBpedia class structure build is complete.')
            f.write('KBpedia class structure build is complete.')                # Note 5
            f.close()

Our function call pulls up the same keyword argument passing that we discussed for the extraction routines earlier (1). The double asterisk (**build_deck) argument means to bring in any of that dictionary’s keyword values if referenced in the routine. We can readily pick up loop or lookup specifications by referencing a dictionary (2). The kko_list is a handy one since it gives us a basis for selecting between KKO objects and the reference concepts (RCs) in KBpedia. The revised routine above also brings in our new helper function (3).

Pretty much the next portions of the routine are as described in the last installment, until we come up to Pass #3 (4), which is where we hit a major roadblock (coming up around the next bend in the road). We also added some print statements (5) that give feedback when the routine is running.

To run this file locally you will need to have the cowpoke project installed and know where to find your build_ins/typology directory. You also need to make sure your settings in config.py are properly set for your conditions. Assuming you have done so, you can invoke this routine (best with only a subset of your typology dictionary, assigned to, say, custom_dict:

class_struct_builder(**build_deck)

Realize everything has to be configured properly for this code to run. You will need to review earlier installments if you run into problems. Assuming you have gotten it to run to completion without error, you may want to then save it. We need to preface our ‘save’ statement with the ‘kb’ ontology identifier. I also have chosen to use the ‘working’ directory for saving these temporary results:

kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts.owl', format='rdfxml') 

However, I ran into plenty of problems myself. Indeed, the large code block commented out above (4) caused me hours of fits trying to troubleshoot and get the routine to act as I wanted. This whole effort put up a roadblock in my plan, sufficient that I had to add another installment. I explain this detour next.

A Brief History of Going Crazy

If we set as an objective being able to specify multiple input files for a current build, a couple of issues immediately arise. Recall, we designed our typology extraction files to be self-contained, which means that every class used as an object must also be declared as its own class subject. To speed up our extractions, we do not keep track of the many objects so needing definitions. That means each encounter triggers the need for another class definition. Multiple duplicate declarations do not cause a problem when loading the ontology, but when used as a specification input when doing multiple passes some tricky problems arise.

One obvious contributor to the difficulty is the need to identify and separately keep track of (and sometimes differentially process) our ‘kko’ and ‘rc’ namespaces. We need to account for this distinction in every loop and every assignment or removal that we make to the ontology while building it in memory. That can all be trapped for when in the class build cycle, which is the first two passes of the routine (first create the class, second add to parents), but gets decidedly tricky when removing the excess owl:Thing declarations.

To appreciate this issue a bit, here is the basic statement for removing a ‘Bird’ class from a parent ‘Reptile’:

rc.Bird.is_a.remove(rc.Reptile)

Our inputs can not be strings, but in loops variables often become so, and need to be evaluated to their type via the var1.is_a.getattr(rc, var2)

Unfortunately, when we make a rc.Bird.is_a.remove(rc.Reptile) request once it has been previously removed, the relationship is empty and owlready2 throws an error (as does Python when trying to remove an undeclared object). So, while we are able to extract without keeping track, we eventually do when we come time to build. Thus, as each file is processed, we need to account for prior removals and make sure we do not make the request again.

The later part of the code listing above (4) kept processing most of the files well, but not when too many were processed. I had the curious error of seeing the routine fail on the first entry of some files. It appeared to me perhaps the list accumulator I was using to keep track of prior removals was limited in size in some manner (it is not) or some counter or loop was not being cleared or initialized in the right location. If it ain’t perfect, it don’t run.

As a newbie with no prior experience to fall back on, here are some of the things I looked at and tested in trying to debug this Pass #3 owl:Thing deletion routine:

  • memory – was it a memory problem? Well, there are some performance issues we continue with in the next installment, but, no, Python seems to grab the memory it needs and does (apparently) a fair job of garbage cleanup. It was also not a problem with the notebook memory
  • loops – there are lots of ways to initiate loops or iterate over different structures from lists, sets, dictionaries, length and counters, etc. How loops are called and incremented differ by the iterator type chosen. I suspect this is where the issue still resides, because I continue to not have a native feel for:
    • sets v lists
    • clearing before loops
    • referencing the right loops
  • using the right fragment – the interplay of namespaces with scope is also not yet intuitive to me. Sometimes it is important to use the namespace prefixed reference to an object, other times not so. I am still learning about scope
  • not much worried about syntax because REPL was always running
  • list length limitations – I discussed this one above, as was able to eliminate it as the source
  • indentations – it is sometimes possible to put what one thinks is the closing statement to a routine at the wrong indentation, so that it runs, but is not affecting the correct code block. In my debugging efforts so far I often find this a source of the problem, especially when there is too much complexity or editing of the code. This is another reason to generalize duplicate code
  • code statement placement in order – in a similar way, counters and loop initializations can easily be placed into the wrong spots. The routine often may run, but still not do what you think it is, and
  • many others – I’m a newbie, right?

It was so frustrating trying to get this correct because I could get most everything working like I wanted, but then perhaps the routine would fail in the midst of processing a long list or would complete, but, upon inspection, may have missed some items or treated them incorrectly.

What little I do know about such matters tells me to try to pinpoint and isolate the problem. When processing long lists, that means testing for possible error conditions and liberally sprinkling various print statements with different text and different echoing of current values to the screen. For example, in an else: condition of an if: statement, I might put a print like:

  print('In kko_list loop of None error trap:', var1, var2)

But pinpointing a problem does not indicate how to solve it, though it does help to narrow attention. I had done so in the routine above, but I was still erroring out of some files. Sometimes that would happen, but it was still unclear what the offending part might be. When Python errors like that, it provides an error message and trackback, but somethings that information is cryptic. The failure point may occur any time after the last message to screen. Again, I was being pricked by needles in the haystack, but I still had not specifically found and removed them.

Error Trapping

I knew from my Python reading that it had a fairly good exception mechanism. Since print() statements were only taking me so far, I decided I needed to bite the bullet (for the needle pricks in my hand!) and start learning more about error trapping.

The basic approach for allowing a program to continue to run when an error condition is met is through the Python exception. It basically looks like this kind of routine:

   statement1
statement2
try:
non_zero = statement1 / statement2
except exception:
print('Oops, dividing by 0!')
continue

I was exploring this more graceful way to treat errors when I realized, duh, that same approach also captured exactly what I was trying to accomplish with avoiding multiple deletions in the first place! That is, I could continue to ‘try’ to delete the next instance of the owl:Thing assigment, and if it had already been deleted (which caused it to throw an exception, that is, what I was trying to fix!), I could exit gracefully and move on. Further, this would allow me to embed specific print() statements at the exact point of failure.

After this aHa! I changed the code as shown above (4). I suspect it is a slow way to process the huge numbers I have, but it works. I will continue to look for better means, but at least with this approach I was able to move on with the project.

Still, whether for this reason or others not yet contemplated, once we start processing huge numbers with multiple KBpedia build files, I am seeing performance much slower than what I would like. We address those topics in the next installment, which will also cause us to detour still further before we can get back on track to completing our property structure additions to the build.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #40: Looping and Multiple Structure File Ingest

alternativeHeadline:
We Build Up Our Ingest Routine to All Structure

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
We continue to refine our template for structure builds in this 'Cooking with Python and KBpedia' series installment by generalizing the routine and expanding it to loop over multiple input files. We also apply this refinement to property structure as well.

articleBody:
see above

datePublished:

3 thoughts on “CWPK #40: Looping and Multiple Structure File Ingest

  1. I have noticed throughout the past CWPKs that whenever I was looping through the typologies, I would get error messages such as
    “* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:”
    with some reference concepts linked below this error. One of the specific concepts that were linked here was http://kbpedia.org/kko/rc/Person and http://kbpedia.org/kko/rc/HomoSapiens. Moreover, the large code block for the function class_struct_builder did not work for me and raised a TypeError in the Pass #2 part of the code block with rc, where the code reads as var1.is_a.append(var_2). The error reads as: TypeError: a __bases__ item causes an inheritance cycle.

    For reference, my build_deck has the base as ‘kbpedia/v300/build_ins/typologies/typol_’ in relation to CWPK 39 running the similar code block with src_file = ‘kbpedia/v300/build_ins/typologies/typol_AudioInfo.csv’. Since there isn’t really a v300 folder on the github for kbpedia, I used the typologies folder from the sandbox folder in CWPK. Perhaps I wasn’t supposed to do this.

    What does the owlready2 warning mean? I will try to figure this out on my own, but if you have come across this error in your troubleshooting, I would appreciate the guidance on how to fix it.

  2. Hi Varun,

    Yes, as I mentioned first in CWPK #25, you can ignore these warning messages. In the case of Person and HomoSapiens, this warning is the result of a purposeful design decision where we represent humans with two concepts, one related to ‘personhood’ and the other related to ‘biological animals’. This separation enables us to treat the Persons and Animals typologies as distinct. Some might argue with this design decision, but we chose to take it because the scope of each of those typologies is distinct in our view. In early versions of owlready2 such cyclic references caused the code to throw an error. But, for similar reasons to what we do, the developer changed the code to merely show a warning. Again, you may ignore (or, for your own KGs, make sure that both concepts are not asserted as subclasses of the other, which will remove the cycle and the warnings).

    As for the v300 reference, that comes about because of the ongoing development of the code base. We will eventually be producing a new version 3.00 in this series, but have not yet gotten to that installment. To make sure this routine works, make sure that all of the typologies called by the dictionary are indeed in the folder you are referencing. I suspect you are missing one or more, or perhaps have a name mismatch. If that does not solve the problem, let me know and I can work with you offline to make sure your environment is clean.

    (BTW, unfortunately, the working integrity of the code base may need to await the completion of the series when all files are written and covered. I’m trying to make sure things work every step of the way, but that is kind of hard with the dynamic changes happening daily. 😉 )

    Best, Mike

  3. Hi Mike,

    I have checked out if all of the typologies in typol_dict are actually in the folder that I am referencing, and it turns out that it indeed has all of the typologies. I will troubleshoot a bit more till the weekend to try to more accurately catch the issue. If it still doesn’t work, I will send you an email.

    BTW, I noticed that you are using printed statements as some time of progress updates in a lot of these code blocks. I would recommend using the tqdm package that displays the progress of your work here: https://github.com/tqdm/tqdm. It only has you change your code by turning loop_list into tqdm(loop_list) and gives you a nice progress bar for this large looping jobs.

    Best,
    Varun

Leave a Reply

Your email address will not be published. Required fields are marked *