This Installment Defines KBpedia’s ‘Bootstraps’
We begin the build process with this installment in the Cooking with Python and KBpedia series. We do so by creating the ‘bootstraps‘ of ‘core’ ontologies that are the targets as we ingest new classes, properties, and annotations for KBpedia. The general process we outline herein is appropriate to building any large knowledge graph. You may swap out your own starting ontology and semantic scaffoldings to apply this process to different knowledge graphs.
The idea of a ‘bootstrap’ in computer science means a core set of rudimentary instructions that is called at immediate initialization of a program. This bootstrapped core provides all of the parent instructions that are called by the subsequent applications that actually do the desired computer tasks. The bootstrap is the way those applications can perform basic binary operations like allocating registers, creating files, pushing or popping instructions to the stack, and other low-level functions.
In the case of KBpedia and our approach to the build process for knowledge graphs, the ‘bootstrap’ is the basic calls to the semantic languages such as RDF or OWL and the creation of a top-level parental set of classes and properties to which we connect the subsequent knowledge graph content. We call these starting bootstraps ‘stubs’.
These ‘stubs’ are created outside of the build process, generally using an ontology IDE like Protégé. In our case, we have already created the ‘stubs’ used in the various KBpedia build processes. As we create new versions, we must make some minor modifications to these ‘stubs’. However, in general, the stubs are rather static in nature and may only rarely need to be changed in a material manner. As you will see from inspection, these stubs are minimal in structure and rather easy to create on your own with your own favorite ontology editor.
The KBpedia build processes use one core ontology stub, the KBpedia Knowledge Ontology (KKO) and two supporting stubs for use in building the full KBpedia knowledge graph or individual typologies.
Overview of the Build Process
We set up a new directory structure with appropriate starting files as the first activity. The build first starts with a pre-ingest step of checking out input files for proper encoding and other ‘cleaning’ tests. Upon passing these checks, we are ready to continue with the build.
The build process begins by loading the stub. This loaded stub then becomes the target for all subsequent ingest steps.
The ingest process has two phases. In the first phase we ingest build files that specify the structural nature of the knowledge graph, in this case, KBpedia. This structural scaffolding consists of, first, class statements, and then object property or data property ‘is-a’ statements. In the case of classes, the binding predicate is the
rdfs:subClassOf property. In the case of properties, it is the
This phase sets the structure over which we can reason and infer with the knowledge graph. Thus, we also have the optional steps in this phase to check whether our ingests have been consistent and satisfiable. If the structural scaffolding meets these tests, we are ready for the second phase.
The second phase is to bring in the many annotations that we have gathered for the classes and properties. A description and preferred label are requirements for each item. These are best supplemented with alternative labels (synonyms in the broadest sense) and other properties. We can then load either mapping or additional annotation properties should we desire them.
These steps are not inviolate. Files that we know are clean can skip the pre-clean steps, for example. Or, we may already have a completed and vetted knowledge graph to which we only want to supplement some information. In other words, the build routines can also be used in different orders and with only partial input sets once we have a working system.
Steps to Prep
We will assume that you have already done your offline work to add to or modify your build input files. (As we proceed installment-by-installment during this build discussion we will provide a listing of required files as appropriate.) Depending on the given project, working on these offline build files may actually represent the bulk of your overall efforts. You might be querying outside sources to add to annotations, or changing or adding to your knowledge graph’s structure, or trying new top-level ontologies, etc., etc.
Once you deem this offline work to be complete, you need to do some prep to support the new build process (which in the simplest case are the extraction files we just discussed in this CWPK series). Your first task is to create a new skeletal directory structure under a new version parent, similar to what is shown in Figure 2 in the prior CWPK #37 installment. One way to avoid typing in all new directory names is to copy a prior version directory, copy it to the new version location, and then delete irrelevant files. (Further, if you know you may do this multiple times, you may then copy this shell structure for later use for subsequent versions.)
You then need to copy over all of the prior stub files from the prior version to the new ‘stub’ directory. Depending on what you have been doing locally, you may need to make further changes to mirror your needed work preferences.
Each stub file then needs to be brought into an ontology editor (Protégé, of course, in our case) and updated for new version number, as this diagram indicates:
Note that every ontology has a base IRI, and you should update the reference or version number (
http://kbpedia.org/kbpedia/v250 in our case) (1) in the ontology URI field. You then need to copy the text under your current
owl:versionInfo annotation, and paste it into a new
owl:priorVersion (2) annotation. You may need to make some minor editing changes to reflect past tense for the prior version. Then, last, you need to update the
owl:versionInfo (3) annotation.
You may, of course, make other ontology metadata changes at this time.
KKO: The Core Stub
The KKO stub is the core one for the build process. It represents its own standalone ontology, but also is the top-level ontology used by KBpedia.
KKO is also the most likely of the three stubs to need modication before a new run. Recall that KKO is organized under three main branches corresponding to the universal categories of Charles Sanders Peirce. Two of the branches,
Particulars, do not participate in a KBpedia build. (Though future version releases of KKO may affect these branches, in which case the KKO stub should be updated.) But the third branch,
Generals, is very much involved in a KBpedia build. All roots (parents) of KBpedia’s typologies tie-in under the
You will need, then, to make changes to the
Generals of KKO prior to starting a build if any of these conditions is met:
- You are dropping or removing any typologies or
- You are adding any typologies or
If you are only modifying a typology, you need not change KKO. Loading the modified typology during the full build process will accomplish this modification.
Like the other two stubs, you also need to make sure you have updated your version references. As distributed with cowpoke as part of these CWPK installments, here is the KKO stub as used in this project (remember, to see the file chose Run from the notebook menu or press
shift+enter when highlighting the cell:
with open(r'C:\1-PythonProjects\kbpedia\v300\build_ins\stubs\kko.owl', 'r', encoding='utf8') as f: print(f.read())
The KBpedia Stub
The KBpedia stub is the ‘umbrella’ above the entire project. It incorporates the KKO stub, plus is the general target for all subsequent build steps in the full-build process. When looked at in code view, as the file below shows, this ‘umbrella’ is rather sparse. However, if you are to look at it in, say, Protégé, then you will also see all of KKO due to its being imported.
Again, the KBpedia stub should have its version updated prior to a new version build:
with open(r'C:\1-PythonProjects\kbpedia\v300\build_ins\stubs\kbpedia_rc_stub.owl', 'r', encoding='utf8') as f: print(f.read())
The Typology Stub
The typology stub is the simplest of the three. Its use is merely to provide a ‘header’ sufficient for loading an individual typology into an editor such as Protégé.
However, despite being listed last, it is the typology stub we will first work with in developing our build routines, because it is our simplest possible starting point. Again, assuming you have made your version updates, here is the file:
with open(r'C:\1-PythonProjects\kbpedia\v300\build_ins\stubs\typology_stub.owl', 'r', encoding='utf8') as f: print(f.read())
OK, so our stubs are now updated and set up. We are ready to begin some ingest coding . . . .