CWPK #47: Summary of the Extract-Build RoundtripAI3:::Adaptive InformationAI3:::Adaptive Information

Here is the Master Listing of Extraction and Build Steps

We are near the end of this major part in our Cooking with Python and KBpedia series in which we cover how to build KBpedia from a series of flat-text (CSV) input files. Though these CSV files may have been modified substantially offline (see, in part, CWPK #36), they are initially generated in an extraction loop, which we covered in CWPK #28-35. We have looked at these various steps in an incremental fashion, building up our code base function by function. This approach is perhaps good from a teaching perspective, but makes it kind of murky how all of the pieces fit together.

In this installment, I will list all of the steps — in sequence — for proceeding from the initial flat file extractions, to offline modifications of those files, and then the steps to build KBpedia again from the resulting new inputs. Since how all of these steps proceed depends critically on configuration settings prior to executing a given step, I also try to capture the main configuration settings appropriate to each step. The steps outlined here cover a full extract-build ‘roundtrip‘ cycle. In the next installment, we will address some of the considerations that go into doing incremental or partial extractions or builds.

Please note that the actual functions in our code modules may be modified slightly from what we presented in our interactive notebook files. These minor changes, when made, are needed to cover gaps or slight errors uncovered during full build and extraction sets. As an example, my initial passes of class structure extractions overlooked the kko.superClasses and rdfs.isDefinedyBy properties. Some issues in CSV extraction and build settings were also discovered that led to excess quoting of strings. The “official” code, then, is what is contained in the cowpoke modules, and not necessarily exactly what is in the notebook pages.

Therefore, of the many installments in this CWPK series, this present one is perhaps one of the most important for you to keep and reference. We will have occasion to summarize other steps in our series, but this installment is the most comprehensive view of the extract-and-build ’roundtrip’ cycle.

Summary of Extraction and Build Steps

Here are the basic steps in a complete roundtrip from extracting to building the knowledge graph anew:

Startup
Extraction

Structure Extraction of Classes
Structure Extraction of Properties
Annotation Extraction of Classes
Annotation Extraction of Properties
Extraction of Mappings

Offline Development and Manipulation
Clean and Test Build Input Files
Build

Build Class Structure
Build Property Structure
Build Class Annotations
Build Property Annotations
Ingest of Mappings

Test Build

The order of extraction and building of classes and properties must begin each phase because we need to have these resources adequately registered to the knowledge graph. Once done, however, there is no ordering requirement for whether mapping or annotation proceeds next. Since annotation changes are always likely in every new version or build, I have listed them before mapping, but that is only a matter of preference.

Each of these steps is described below, plus some key configuration settings as appropriate. We begin with our first step, startup:

1. Startup

from cowpoke.__main__ import *
from cowpoke.config import *

We will re-cap the entire breakdown and build process here. We first begin with structure extraction, first classes and then properties:

2. Extraction

The purpose of a full extraction is to retrieve all assertions in KBpedia aside from those in the upper (also called top-level) KBpedia Knowledge Ontology, or KKO.

A. Structure Extraction of Classes

We begin with the (mostly) hierarchical typologies and their linkage into KKO and with one another. Since all of the reference concepts in KBpedia are subsumed by the top-level category of Generals, we can specify it alone as a means to retrieve all of the RCs in KBpedia:

### KEY CONFIG SETTINGS (see extract_deck in config.py) ###
# 'krb_src'       : 'extract'                                          # Set in master_deck
# 'descent_type'  : 'descent',
# 'loop'          : 'class_loop',
# 'loop_list'     : custom_dict.values(),                              # Single 'Generals' specified       
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/classes/Generals_struct_out.csv',
# 'render'        : 'r_iri',

def struct2_extractor(**extract_deck):
    print('Beginning structure extraction . . .')
# 1 - render method goes here    
    r_default = ''
    r_label = ''
    r_iri = ''
    render = extract_deck.get('render')
    if render == 'r_default':
        set_render_func(default_render_func)
    elif render == 'r_label':
        set_render_func(render_using_label)
    elif render == 'r_iri':
        set_render_func(render_using_iri)
    else:
        print('You have assigned an incorrect render method--execution stopping.')
        return
# 2 - note about custom extractions
    loop_list = extract_deck.get('loop_list')
    loop = extract_deck.get('loop')
    out_file = extract_deck.get('out_file')
    class_loop = extract_deck.get('class_loop')
    property_loop = extract_deck.get('property_loop')
    descent_type = extract_deck.get('descent_type')
    x = 1
    cur_list = []
    a_set = []
    s_set = []
    new_class = 'owl:Thing'
# 5 - what gets passed to 'output'
    with open(out_file, mode='w', encoding='utf8', newline='') as output:
        csv_out = csv.writer(output)
        if loop == 'class_loop':                                             
            header = ['id', 'subClassOf', 'parent']
            p_item = 'rdfs:subClassOf'
        else:
            header = ['id', 'subPropertyOf', 'parent']
            p_item = 'rdfs:subPropertyOf'
        csv_out.writerow(header)       
# 3 - what gets passed to 'loop_list' 
        for value in loop_list:
            print('   . . . processing', value)                                           
            root = eval(value)
# 4 - descendant or single here
            if descent_type == 'descent':
                a_set = root.descendants()
                a_set = set(a_set)
                s_set = a_set.union(s_set)
            elif descent_type == 'single':
                a_set = root
                s_set.append(a_set)
            else:
                print('You have assigned an incorrect descent method--execution stopping.')
                return                         
        print('   . . . processing consolidated set.')
        for s_item in s_set:
            o_set = s_item.is_a
            for o_item in o_set:
                row_out = (s_item,p_item,o_item)
                csv_out.writerow(row_out)
                if loop == 'class_loop':
                    if s_item not in cur_list:                
                        row_out = (s_item,p_item,new_class)
                        csv_out.writerow(row_out)
                cur_list.append(s_item)
                x = x + 1
    print('Total unique IDs written to file:', x)
    print('The structure extraction for the ', loop, 'is completed.')

struct2_extractor(**extract_deck)

B. Structure Extraction of Properties

See above with the following changes/notes:

### KEY CONFIG SETTINGS (see extract_deck in config.py) ###
# 'krb_src'       : 'extract'                                          # Set in master_deck
# 'descent_type'  : 'descent',
# 'loop'          : 'property_loop',
# 'loop_list'     : prop_dict.values(),
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/properties/prop_struct_out.csv',
# 'render'        : 'r_default',

C. Annotation Extraction of Classes

Annotations require a different method, though with a similar composition to the prior ones. It was during testing of the full extract-build roundtrip that I realized our initial class annotation extraction routine was missing for the rdfs.isDefinedBy and kko.superClassOf properties. The code in extract.py has been updated to reflect these changes.

Again, we first begin with classes. Note: by convention, I have shifted a couple structural:

### KEY CONFIG SETTINGS (see extract_deck in config.py) ###                
# 'krb_src'       : 'extract'                                          # Set in master_deck
# 'descent_type'  : 'descent',
# 'loop'          : 'class_loop',
# 'loop_list'     : custom_dict.values(),                              # Single 'Generals' specified 
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/classes/Generals_annot_out.csv',
# 'render'        : 'r_label',

def annot2_extractor(**extract_deck):
    print('Beginning annotation extraction . . .') 
    r_default = ''
    r_label = ''
    r_iri = ''
    render = extract_deck.get('render')
    if render == 'r_default':
        set_render_func(default_render_func)
    elif render == 'r_label':
        set_render_func(render_using_label)
    elif render == 'r_iri':
        set_render_func(render_using_iri)
    else:
        print('You have assigned an incorrect render method--execution stopping.')
        return    
    loop_list = extract_deck.get('loop_list')
    loop = extract_deck.get('loop')
    out_file = extract_deck.get('out_file')
    class_loop = extract_deck.get('class_loop')
    property_loop = extract_deck.get('property_loop')
    descent_type = extract_deck.get('descent_type')
    """ These are internal counters used in this module's methods """
    p_set = []
    a_ser = []
    x = 1
    cur_list = []
    with open(out_file, mode='w', encoding='utf8', newline='') as output:
        csv_out = csv.writer(output)                                       
        if loop == 'class_loop':                                             
            header = ['id', 'prefLabel', 'subClassOf', 'altLabel', 
                      'definition', 'editorialNote', 'isDefinedBy', 'superClassOf']
        else:
            header = ['id', 'prefLabel', 'subPropertyOf', 'domain', 'range', 
                      'functional', 'altLabel', 'definition', 'editorialNote']
        csv_out.writerow(header)    
        for value in loop_list:                                            
            print('   . . . processing', value)                                           
            root = eval(value) 
            if descent_type == 'descent':
                p_set = root.descendants()
            elif descent_type == 'single':
                a_set = root
                p_set.append(a_set)
            else:
                print('You have assigned an incorrect descent method--execution stopping.')
                return    
            for p_item in p_set:
                if p_item not in cur_list:                                 
                    a_pref = p_item.prefLabel
                    a_pref = str(a_pref)[1:-1].strip('"\'')                
                    a_sub = p_item.is_a
                    for a_id, a in enumerate(a_sub):                        
                        a_item = str(a)
                        if a_id > 0:
                            a_item = a_sub + '||' + str(a)
                        a_sub  = a_item
                    if loop == 'property_loop':   
                        a_item = ''
                        a_dom = p_item.domain
                        for a_id, a in enumerate(a_dom):
                            a_item = str(a)
                            if a_id > 0:
                                a_item = a_dom + '||' + str(a)
                            a_dom  = a_item    
                        a_dom = a_item
                        a_rng = p_item.range
                        a_rng = str(a_rng)[1:-1]
                        a_func = ''
                    a_item = ''
                    a_alt = p_item.altLabel
                    for a_id, a in enumerate(a_alt):
                        a_item = str(a)
                        if a_id > 0:
                            a_item = a_alt + '||' + str(a)
                        a_alt  = a_item    
                    a_alt = a_item
                    a_def = p_item.definition
                    a_def = str(a_def)[2:-2]
                    a_note = p_item.editorialNote
                    a_note = str(a_note)[1:-1]
                    if loop == 'class_loop':                                  
                        a_isby = p_item.isDefinedBy
                        a_isby = str(a_isby)[2:-2]
                        a_isby = a_isby + '/'
                        a_item = ''
                        a_super = p_item.superClassOf
                        for a_id, a in enumerate(a_super):
                            a_item = str(a)
                            if a_id > 0:
                                a_item = a_super + '||' + str(a)
                            a_super = a_item    
                        a_super  = a_item
                    if loop == 'class_loop':                                  
                        row_out = (p_item,a_pref,a_sub,a_alt,a_def,a_note,a_isby,a_super)
                    else:
                        row_out = (p_item,a_pref,a_sub,a_dom,a_rng,a_func,
                                   a_alt,a_def,a_note)
                    csv_out.writerow(row_out)                               
                    cur_list.append(p_item)
                    x = x + 1
    print('Total unique IDs written to file:', x)  
    print('The annotation extraction for the', loop, 'is completed.')

annot2_extractor(**extract_deck)

d=csv.get_dialect('excel')
print("Delimiter: ", d.delimiter)
print("Doublequote: ", d.doublequote)
print("Escapechar: ", d.escapechar)
print("lineterminator: ", repr(d.lineterminator))
print("quotechar: ", d.quotechar)
print("Quoting: ", d.quoting)
print("skipinitialspace: ", d.skipinitialspace)
print("strict: ", d.strict)

D. Annotation Extraction of Properties

See above with the following changes/notes:

### KEY CONFIG SETTINGS (see extract_deck in config.py) ###                
# 'krb_src'       : 'extract'                                          # Set in master_deck
# 'descent_type'  : 'descent',
# 'loop'          : 'property_loop',
# 'loop_list'     : prop_dict.values(),                              
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/properties/prop_annot_out.csv',
# 'render'        : 'r_default',

E. Extraction of Mappings

Mappings to external sources is an integral part of KBpedia, as is likely the case for any similar, large-scale knowledge graph. As such, extractions of existing mappings is also a logical step in the overall extraction process.

Though we will not address mappings until CWPK #49, those steps belong here in the overall set of procedures for the extract-build roundtrip process.

3. Offline Development and Manipulation

The above extraction steps can capture changes over time that have been made with an ontology editing tool such as Protégé. Once that knowledge graph is at a state of readiness after using Protégé, and more major changes are desired to your knowledge graph, it is sometimes easier to work with flat files in bulk. I discussed some of my own steps using spreadsheets in CWPK #36, and I will also walk through some refactorings using bulk files in our next installment, CWPK #48. That case study will help us see at least a few of the circumstances that warrant bulk refactoring. Major additions or changes to the typologies is also an occasion for such bulk activities.

At any rate, this step in the overall roundtripping process is where such modifications are made before rebuilding the knowledge graph anew.

4. Clean and Test Build Input Files

We covered these topics in CWPK #45. If you recall, cleaning and testing of input files occurs at this logical point, but we delayed discussing it in detail until we had covered the overall build process steps. This is why this sequence number for this installment appears a bit out of order.

5. Build

The start of the build cycle is to have all structure, annotation, and mapping files in proper shape and vetted for encoding and quality.

(Note: where ‘Generals’ is specified, keep the initial capitalization, since it is also generated as such from the extraction routines and is consistent with typology naming.)

A. Build Class Structure

We start with the knowledge graph classes and their subsumption relationships, as specified in one or more class structure CSV input files. In this case, we are doing a full build, so we begin with the KKO and RC stubs, plus run our Generals typology since it is inclusive:

### KEY CONFIG SETTINGS (see build_deck in config.py) ###             # Option 1: from Generals
# 'kb_src'        : 'start'                                           # Set in master_deck; only step with 'start'
# 'loop_list'     : custom_dict.values(),                             # Single 'Generals' specified 
# 'loop'          : 'class_loop',
# 'base'          : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/',              
# 'ext'           : '_struct_out.csv',                                # Note change           
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts.csv',

### KEY CONFIG SETTINGS (see build_deck in config.py) ###             # Option 2: from all typologies
# 'kb_src'        : 'start'                                           # Set in master_deck; only step with 'start'
# 'loop_list'     : typol_dict.values(),                               
# 'loop'          : 'class_loop',
# 'base'          : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/',              
# 'ext'           : '.csv',                                           # Note change           
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts.csv',

from cowpoke.build import *

def class2_struct_builder(**build_deck):                                  
    print('Beginning KBpedia class structure build . . .')               
    kko_list = typol_dict.values()                                      
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    base = build_deck.get('base')
    ext = build_deck.get('ext')
    out_file = build_deck.get('out_file')
    if loop is not 'class_loop':
        print("Needs to be a 'class_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval)                           
        frag = loopval.replace('kko.','')
        in_file = (base + frag + ext)
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])                 
            for row in reader:
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')                         
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:                                       
                    is_first_row = False
                    continue      
                with rc:                                                
                    kko_id = None
                    kko_frag = None
                    if parent_frag == 'Thing':                                                        
                        if id in kko_list:                                
                            kko_id = id
                            kko_frag = id_frag
                        else:    
                            id = types.new_class(id_frag, (Thing,))       
                if kko_id != None:                                         
                    with kko:                                                
                        kko_id = types.new_class(kko_frag, (Thing,))  
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])
            for row in reader:                                                
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:
                    is_first_row = False
                    continue          
                with rc:
                    kko_id = None                                   
                    kko_frag = None
                    kko_parent = None
                    kko_parent_frag = None
                    if parent_frag is not 'Thing':
                        if id in kko_list:
                            continue
                        elif parent in kko_list:
                            kko_id = id
                            kko_frag = id_frag
                            kko_parent = parent
                            kko_parent_frag = parent_frag
                        else:   
                            var1 = getattr(rc, id_frag)               
                            var2 = getattr(rc, parent_frag)
                            if var2 == None:                            
                                continue
                            else:
                                print(var1, var2)
                                var1.is_a.append(var2)
                if kko_parent != None:                                         
                    with kko:                
                        if kko_id in kko_list:                               
                            continue
                        else:
                            var1 = getattr(rc, kko_frag)
                            var2 = getattr(kko, kko_parent_frag)                     
                            var1.is_a.append(var2)
        with open(in_file, 'r', encoding='utf8') as input:                
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subClassOf', 'parent'])
            for row in reader:                                              
                r_id = row['id'] 
                r_parent = row['parent']
                id = row_clean(r_id, iss='i_id')
                id_frag = row_clean(r_id, iss='i_id_frag')
                parent = row_clean(r_parent, iss='i_parent')
                parent_frag = row_clean(r_parent, iss='i_parent_frag')
                if is_first_row:
                    is_first_row = False
                    continue
                if parent_frag == 'Thing':               
                    var1 = getattr(rc, id_frag)
                    var2 = getattr(owl, parent_frag)
                    try:
                        var1.is_a.remove(var2)
                    except Exception:
                        continue
    kb.save(out_file, format="rdfxml")      
    print('KBpedia class structure build is complete.')

class2_struct_builder(**build_deck)

B. Build Property Structure

After classes, when then add property structure to the system. Note, however, that we now switch to our normal ‘standard’ kb source:

### KEY CONFIG SETTINGS (see build_deck in config.py) ###             
# 'kb_src'        : 'standard'                                        # Set in master_deck
# 'loop_list'     : prop_dict.values(),                             
# 'loop'          : 'property_loop',
# 'base'          : 'C:/1-PythonProjects/kbpedia/v300/build_ins/properties/',              
# 'ext'           : '_struct_out.csv',                                         
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts.csv',
# 'frag'          : set in code block; see below

def prop2_struct_builder(**build_deck):
    print('Beginning KBpedia property structure build . . .')
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    base = build_deck.get('base')
    ext = build_deck.get('ext')
    out_file = build_deck.get('out_file')
    if loop is not 'property_loop':
        print("Needs to be a 'property_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval)
        frag = 'prop'                                    
        in_file = (base + frag + ext)
        print(in_file)
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'subPropertyOf', 'parent'])
            for row in reader:
                if is_first_row:
                    is_first_row = False                
                    continue
                r_id = row['id']
                r_parent = row['parent']
                value = r_parent.find('owl.')
                if value == 0:                                        
                    continue
                value = r_id.find('rc.')
                if value == 0:
                    id_frag = r_id.replace('rc.', '')
                    parent_frag = r_parent.replace('kko.', '')
                    var2 = getattr(kko, parent_frag)                 
                    with rc:                        
                        r_id = types.new_class(id_frag, (var2,))
    kb.save(out_file, format="rdfxml")
    print(kbpedia)
    print(out_file)
    print('KBpedia property structure build is complete.')

prop2_struct_builder(**build_deck)

C. Build Class Annotations

With the subsumption structure built, we next load our annotations, beginning with the class ones:

### KEY CONFIG SETTINGS (see build_deck in config.py) ###                  
# 'kb_src'        : 'standard'                                        
# 'loop_list'     : file_dict.values(),                           # see 'in_file'
# 'loop'          : 'class_loop',
# 'in_file'       : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/Generals_annot_out.csv',
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts.csv',

def class2_annot_build(**build_deck):
    print('Beginning KBpedia class annotation build . . .')
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    class_loop = build_deck.get('class_loop')
    out_file = build_deck.get('out_file')
    if loop is not 'class_loop':
        print("Needs to be a 'class_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval) 
        in_file = loopval
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'prefLabel', 'subClassOf', 
                                   'altLabel', 'definition', 'editorialNote', 'isDefinedBy', 'superClassOf'])                 
            for row in reader:
                r_id = row['id']
                id = getattr(rc, r_id)
                if id == None:
                    print(r_id)
                    continue
                r_pref = row['prefLabel']
                r_alt = row['altLabel']
                r_def = row['definition']
                r_note = row['editorialNote']
                r_isby = row['isDefinedBy']
                r_super = row['superClassOf']
                if is_first_row:                                       
                    is_first_row = False
                    continue      
                id.prefLabel.append(r_pref)
                i_alt = r_alt.split('||')
                if i_alt != ['']: 
                    for item in i_alt:
                        id.altLabel.append(item)
                id.definition.append(r_def)        
                i_note = r_note.split('||')
                if i_note != ['']:   
                    for item in i_note:
                        id.editorialNote.append(item)
                id.isDefinedBy.append(r_isby)
                i_super = r_super.split('||')
                if i_super != ['']:   
                    for item in i_super:
                        item = 'http://kbpedia.org/kko/rc/' + item
#                        Code block to be used if objectProperty; 5.5 hr load
#                        item = getattr(rc, item)
#                        if item == None:
#                            print('Failed assignment:', r_id, item)
#                            continue
#                        else:                                
                        id.superClassOf.append(item)
    kb.save(out_file, format="rdfxml") 
    print('KBpedia class annotation build is complete.')

class2_annot_build(**build_deck)

D. Build Property Annotations

And then the property annotations:

### KEY CONFIG SETTINGS (see build_deck in config.py) ###                  
# 'kb_src'        : 'standard'                                        
# 'loop_list'     : file_dict.values(),                           # see 'in_file'
# 'loop'          : 'property_loop',
# 'in_file'       : 'C:/1-PythonProjects/kbpedia/v300/build_ins/properties/prop_annot_out.csv',
# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts.csv',

def prop2_annot_build(**build_deck):
    print('Beginning KBpedia property annotation build . . .')
    xsd = kb.get_namespace('http://w3.org/2001/XMLSchema#')
    wgs84 = kb.get_namespace('http://www.opengis.net/def/crs/OGC/1.3/CRS84')    
    loop_list = build_deck.get('loop_list')
    loop = build_deck.get('loop')
    out_file = build_deck.get('out_file')
    x = 1
    if loop is not 'property_loop':
        print("Needs to be a 'property_loop'; returning program.")
        return
    for loopval in loop_list:
        print('   . . . processing', loopval) 
        in_file = loopval
        with open(in_file, 'r', encoding='utf8') as input:
            is_first_row = True
            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'prefLabel', 'subPropertyOf', 'domain',  
                                   'range', 'functional', 'altLabel', 'definition', 'editorialNote'])                 
            for row in reader:
                r_id = row['id']                
                r_pref = row['prefLabel']
                r_dom = row['domain']
                r_rng = row['range']
                r_alt = row['altLabel']
                r_def = row['definition']
                r_note = row['editorialNote']
                r_id = r_id.replace('rc.', '')
                id = getattr(rc, r_id)
                if id == None:
                    continue
                if is_first_row:                                       
                    is_first_row = False
                    continue
                id.prefLabel.append(r_pref)
                i_dom = r_dom.split('||')
                if i_dom != ['']: 
                    for item in i_dom:
                        if 'kko.' in item:
                            item = item.replace('kko.', '')
                            item = getattr(kko, item)
                            id.domain.append(item) 
                        elif 'owl.' in item:
                            item = item.replace('owl.', '')
                            item = getattr(owl, item)
                            id.domain.append(item)
                        elif item == ['']:
                            continue    
                        elif item != '':
                            item = getattr(rc, item)
                            if item == None:
                                continue
                            else:
                                id.domain.append(item) 
                        else:
                            print('No domain assignment:', 'Item no:', x, item)
                            continue                             
                if 'owl.' in r_rng:
                    r_rng = r_rng.replace('owl.', '')
                    r_rng = getattr(owl, r_rng)
                    id.range.append(r_rng)
                elif 'string' in r_rng:    
                    id.range = [str]
                elif 'decimal' in r_rng:
                    id.range = [float]
                elif 'anyuri' in r_rng:
                    id.range = [normstr]
                elif 'boolean' in r_rng:    
                    id.range = [bool]
                elif 'datetime' in r_rng:    
                    id.range = [datetime.datetime]   
                elif 'date' in r_rng:    
                    id.range = [datetime.date]      
                elif 'time' in r_rng:    
                    id.range = [datetime.time] 
                elif 'wgs84.' in r_rng:
                    r_rng = r_rng.replace('wgs84.', '')
                    r_rng = getattr(wgs84, r_rng)
                    id.range.append(r_rng)        
                elif r_rng == ['']:
                    print('r_rng = empty:', r_rng)
                else:
                    print('r_rng = else:', r_rng, id)
#                    id.range.append(r_rng)
                i_alt = r_alt.split('||')    
                if i_alt != ['']: 
                    for item in i_alt:
                        id.altLabel.append(item)
                id.definition.append(r_def)        
                i_note = r_note.split('||')
                if i_note != ['']:   
                    for item in i_note:
                        id.editorialNote.append(item)
                x = x + 1        
    kb.save(out_file, format="rdfxml") 
    print('KBpedia property annotation build is complete.')

prop2_annot_build(**build_deck)

Beginning KBpedia property annotation build . . .
   . . . processing C:/1-PythonProjects/kbpedia/v300/build_ins/properties/prop_annot_out.csv
r_rng = else: xsd.anyURI rc.release_notes
r_rng = else: xsd.anyURI rc.schema_version
r_rng = else: xsd.anyURI rc.unit_code
r_rng = else: xsd.anyURI rc.property_id
r_rng = else: xsd.anyURI rc.ticket_token
r_rng = else: xsd.anyURI rc.role_name
r_rng = else: xsd.anyURI rc.feature_list
r_rng = else: xsd.hexBinary rc.associated_media
r_rng = else: xsd.hexBinary rc.encoding
r_rng = else: xsd.hexBinary rc.encodings
r_rng = else: xsd.hexBinary rc.photo
r_rng = else: xsd.hexBinary rc.photos
r_rng = else: xsd.hexBinary rc.primary_image_of_page
r_rng = else: xsd.hexBinary rc.thumbnail
r_rng = else: xsd.anyURI rc.code_repository
r_rng = else: xsd.anyURI rc.content_url
r_rng = else: xsd.anyURI rc.discussion_url
r_rng = else: xsd.anyURI rc.download_url
r_rng = else: xsd.anyURI rc.embed_url
r_rng = else: xsd.anyURI rc.install_url
r_rng = else: xsd.anyURI rc.map
r_rng = else: xsd.anyURI rc.maps
r_rng = else: xsd.anyURI rc.payment_url
r_rng = else: xsd.anyURI rc.reply_to_url
r_rng = else: xsd.anyURI rc.service_url
r_rng = else: xsd.anyURI rc.significant_link
r_rng = else: xsd.anyURI rc.significant_links
r_rng = else: xsd.anyURI rc.target_url
r_rng = else: xsd.anyURI rc.thumbnail_url
r_rng = else: xsd.anyURI rc.tracking_url
r_rng = else: xsd.anyURI rc.url
r_rng = else: xsd.anyURI rc.related_link
r_rng = else: xsd.anyURI rc.genre_schema
r_rng = else: xsd.anyURI rc.same_as
r_rng = else: xsd.anyURI rc.action_platform
r_rng = else: xsd.anyURI rc.fees_and_commissions_specification
r_rng = else: xsd.anyURI rc.requirements
r_rng = else: xsd.anyURI rc.software_requirements
r_rng = else: xsd.anyURI rc.storage_requirements
r_rng = else: xsd.anyURI rc.artform
r_rng = else: xsd.anyURI rc.artwork_surface
r_rng = else: xsd.anyURI rc.course_mode
r_rng = else: xsd.anyURI rc.encoding_format
r_rng = else: xsd.anyURI rc.file_format_schema
r_rng = else: xsd.anyURI rc.named_position
r_rng = else: xsd.anyURI rc.surface
r_rng = else: wgs84 rc.geo_midpoint
r_rng = else: xsd.anyURI rc.memory_requirements
r_rng = else: wgs84 rc.aerodrome_reference_point
r_rng = else: wgs84 rc.coordinate_location
r_rng = else: wgs84 rc.coordinates_of_easternmost_point
r_rng = else: wgs84 rc.coordinates_of_northernmost_point
r_rng = else: wgs84 rc.coordinates_of_southernmost_point
r_rng = else: wgs84 rc.coordinates_of_the_point_of_view
r_rng = else: wgs84 rc.coordinates_of_westernmost_point
r_rng = else: wgs84 rc.geo
r_rng = else: xsd.anyURI rc.additional_type
r_rng = else: xsd.anyURI rc.application_category
r_rng = else: xsd.anyURI rc.application_sub_category
r_rng = else: xsd.anyURI rc.art_medium
r_rng = else: xsd.anyURI rc.sport_schema
KBpedia property annotation build is complete.

E. Ingest of Mappings

Mappings to external sources are an integral part of KBpedia, as is likely the case for any similar, large-scale knowledge graph. As such, ingest of new or revised mappings is also a logical step in the overall build process, and occurs at this point in the sequence.

Though we will not address mappings until CWPK #49, those steps belong here in the overall set of procedures for the extract-build roundtrip process.

6. Test Build

We then conduct our series of logic tests (CWPK #43). This portion of the process may actually be the longest of all, given that it may take multiple iterations to pass all of these tests. However, in other circumstances, the build tests may also go quite quickly if relatively few changes were made between versions.

Wrap Up

Of course, these steps could be embedded in an overall ‘complete’ extract and build routine, but I have not done so.

Before we conclude this major part in our CWPK series, we next proceed to show how all of the steps may be combined to achieve a rather large re-factoring of all of KBpedia.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site. The cowpoke Python code listing covering the series is also available from GitHub.

NOTE: This CWPK installment is available both as an online interactive file

or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.

I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:

CWPK #47: Summary of the Extract-Build Roundtrip

alternativeHeadline:

Here is the Master Listing of Extraction and Build Steps

author:

Mike Bergman

image:

https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:

In this CWPK installment, we list all of the steps in sequence for proceeding from the initial flat file extractions, to offline modifications of those files, and then the steps to build KBpedia again from the resulting new inputs. We also capture the main configuration settings appropriate to each step.

articleBody:

see above

datePublished:

October 5, 2020

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Posted:October 5, 2020

CWPK #47: Summary of the Extract-Build Roundtrip

Here is the Master Listing of Extraction and Build Steps

Summary of Extraction and Build Steps

1. Startup

2. Extraction

A. Structure Extraction of Classes

B. Structure Extraction of Properties

C. Annotation Extraction of Classes

D. Annotation Extraction of Properties

E. Extraction of Mappings

3. Offline Development and Manipulation

4. Clean and Test Build Input Files

5. Build

A. Build Class Structure

B. Build Property Structure

C. Build Class Annotations

D. Build Property Annotations

E. Ingest of Mappings

6. Test Build

Wrap Up

Schema.org Markup

Leave a Reply

Main Links

Search

Categories

Calendar

Archives