Posted:August 4, 2020

CWPK #7: Getting Familiar with KBpedia Files

Setting Up Your Baseline KBpedia File Structure

Now that we have begun exploring KBpedia in this series on Cooking with Python and KBpedia, it is time for us to set up the entire system locally for beginning our work with it. To do so, we will download the entire suite of available KBpedia files and install them on your local file system. To make this task easier, we have provided three files on the KBpedia GitHub code site,,, and Download these files (about 50 MB in total), place the first into a new directory you have established (say, kbpedia-test or the like), and unzip it. Then, for the next two files, place them into the new ‘target’ subdirectory that has been created from the first unzip and then extract them (unzip) to that subdirectory.

When you visit GitHub, you will see that each KBpedia version exists in its own versions folder. Each sub-folder is a version folder such as 1.60 or 2.50. Each version folder duplicates what is discussed in this article, though earlier versions may have a slightly different structure. Still, you can access and re-create KBpedia in its earlier instantiations if you wish. For a new project intended for more than test use, we would only use the current version. Each release has been an improvement over its predecessor.

Note: Alternatively, you may manage KBpedia files and updates through the standard Git ‘pull’ process. This ‘pull’ approach is the best way to stay current with KBpedia’s development, but is extra work for newbies simply interested in learning. See these GitHub resources if you wish to employ the ‘pull’ method. We do not discuss that method further here.

When you extract the zip file locally, here is the directory structure you will see, which I explain after the listing:

| reference-concepts-add-alt-labels.csv
| reference-concepts-add-definition.csv
| reference-concepts-add-sub-class-of.csv
| reference-concepts-fixes.csv
| reference-concepts-remove-sub-class-of.csv
| super_types.csv
| |
| +---metrics
| |
| \---unsatisfiables
| |
| +---core
| | dbpedia-ontology.csv
| | dbpedia.csv
| | geonames.csv
| | same-as.csv
| |
| | wikidata.csv
| | wikipedia.csv
| |
| +---general
| | bibo.csv
| | cc.csv
| | dc.csv
| | doap.csv
| | event.csv
| | foaf.csv
| | frbr.csv
| | geo.csv
| | mo.csv
| | oo.csv
| | org.csv
| | po.csv
| | rss.csv
| | sioc.csv
| | time.csv
| | transit.csv
| |
| +---property
| | dbpedia-ontology.csv
| | geonames.csv
| | opencyc.csv
| |
| | unspsc.csv
| | wikidata.csv
| | wikipedia.csv
| |
| \---special
| wikipedia-categories.csv
| new-concepts.csv
| kbpedia_reference_concepts.n3
| kko-demo.n3
| kko.n3 | skos-owl1-dl.owl | +---properties | schema.csv | wikidata.csv | +---target | bibo.n3 | cc.n3 | dbpedia-ontology.n3 | dc.n3 | doap.n3 | event.n3 | foaf.n3 | frbr.n3 | geo.n3 | geonames.n3 | kbpedia_reference_concepts.n3 | kbpedia_reference_concepts_linkage.n3 | kbpedia_reference_concepts_linkage_inferrence_extended.n3 | mo.n3 | oo.n3 | opencyc.n3 | org.n3 | po.n3 | rss.n3 | same-as.n3 | | sioc.n3 | time.n3 | transit.n3 | wikidata.n3 | wikipedia.n3 | \---typologies | ActionTypes-typology.n3 | AdjunctualAttributes-typology.n3 | Agents-typology.n3 | Animals-typology.n3 | AreaRegion-typology.n3 | Artifacts-typology.n3 | Associatives-typology.n3 | AtomsElements-typology.n3 | AttributeTypes-typology.n3 | AudioInfo-typology.n3 | AVInfo-typology.n3 | BiologicalProcesses-typology.n3 | Chemistry-typology.n3 | Concepts-typology.n3 | ConceptualSystems-typology.n3 | Constituents-typology.n3 | ContextualAttributes-typology.n3 | CopulativeRelations-typology.n3 | Denotatives-typology.n3 | DirectRelations-typology.n3 | Diseases-typology.n3 | Drugs-typology.n3 | EconomicSystems-typology.n3 | EmergentKnowledge-typology.n3 | Eukaryotes-typology.n3 | EventTypes-typology.n3 | Facilities-typology.n3 | FoodDrink-typology.n3 | Forms-typology.n3 | Generals-typology.n3 | Geopolitical-typology.n3 | Indexes-typology.n3 | Information-typology.n3 | InquiryMethods-typology.n3 | IntrinsicAttributes-typology.n3 | KnowledgeDomains-typology.n3 | LearningProcesses-typology.n3 | LivingThings-typology.n3 | LocationPlace-typology.n3 | Manifestations-typology.n3 | MediativeRelations-typology.n3 | Methodeutic-typology.n3 | NaturalMatter-typology.n3 | NaturalPhenomena-typology.n3 | NaturalSubstances-typology.n3 | OrganicChemistry-typology.n3 | OrganicMatter-typology.n3 | Organizations-typology.n3 | Persons-typology.n3 | Places-typology.n3 | Plants-typology.n3 | Predications-typology.n3 | PrimarySectorProduct-typology.n3 | Products-typology.n3 | Prokaryotes-typology.n3 | ProtistsFungus-typology.n3 | RelationTypes-typology.n3 | RepresentationTypes-typology.n3 | SecondarySectorProduct-typology.n3 | Shapes-typology.n3 | SituationTypes-typology.n3 | SocialSystems-typology.n3 | Society-typology.n3 | SpaceTypes-typology.n3 | StructuredInfo-typology.n3 | Symbolic-typology.n3 | Systems-typology.n3 | TertiarySectorService-typology.n3 | Times-typology.n3 | TimeTypes-typology.n3 | TopicsCategories-typology.n3 | VisualInfo-typology.n3 | WrittenInfo-typology.n3

The ‘target’ directory is perhaps the most important of this listing, since this is where the built files reside after the build process (which we take up beginning with CWPK #37). This directory contains the output mapping files to external sources, based on the input specifications found in the ‘mappings\core’, ‘mappings\general’ and ‘mapping\special’ directories. In other words, input specifications get tested and then ingested into the build process from the ‘mappings’ directory, which, when successful, outputs those mappings to the ‘target’ directory in KBpedia’s canonical N3 format.

The output ‘target’ directory also includes these three pivotal knowledge graph files:

file name description This is the code KBpedia reference concepts structure with all the 58k concepts This is the same structure as above where we added all the linkages to other ontologies
kbpedia_reference_concepts_linkage_ This is the same structure that includes the linkages, but we add all inferred relationships between the concepts and their links to other ontologies

In the GitHub listing, we provide the other ‘target’ outputs under the sub-folder called ‘linkages’, which has one file per linked ontology.

Other input specifications are provided through the ‘indexes’, ‘new-concepts’, ‘owl’, ‘properties’ and ‘fixes’ directories. The ‘indexes’ directory contains the direct assignments to KBpedia’s 70 or so typologies, or SuperTypes. We discuss these in a bit under the output ‘typologies’ directory. The ‘new-concepts’ directory is where the major specifications for the KBpedia resource concepts (RCs) are located. The new-concepts.csv file is the single most important input file in the system, since this is where we initially specify all (most) RCs found in KBpedia. Complete entries in this listing require multiple input fields, as we will detail in a later installment. For now, just recognize this file as one of the most central to the system.

The ‘fixes’ directory contains a number of input files, processed as some of the last in the build steps, which add or overwrite specification information provided in earlier input files. These updates should be migrated into the initial input files over time, but are provided here as a separate directory as a convenient and more-easily managed location for making or testing input updates during active builds. It is best to consider this directory as a temporary one, useful while testing and evaluating new builds. However, updates contained in this directory can remain there indefinitely and will be the last processed during a build.

The ‘properties’ directory is for inputs to the property listings in KBpedia. It is the subject for a later article that we can skip over for the time being.

The remaining input directory is ‘owl’. Two important input files are found here. The first, kko.n3, is the fully specified upper ontology to KBpedia. It is fairly static, often not changing at all from build to build. It is a fully specified ontology file with complete metadata, and an integral central scaffolding to the KBpedia build process. No where else is KKO specified. The other important input file in this ‘owl’ directory is the kbpedia_reference_concepts.n3 file, which is basically the stub header with metadata used in the build process that gets populated with all of the RCs and their specifications. The output from this build with the header is the full KBpedia ontology in kbpedia_reference_concepts.n3 in the ‘target’ output directory. The kko-demo.n3 file in the ‘owl’ directory is a non-working labeled version of KKO that relates the upper concepts to Peirce’s universal categories. The skos-owl1-dl.owl file is an unusual version of the SKOS ontology used in the build process that is normally accessed offsite during the build process, but is provided here in case remote connections are lost.

Besides the ‘target’ directory, there are two other output directories in this listing. The first is the ‘logs’ directory. Here is where we direct error messages and stats that may arise from the build process. The directory is normally blanked out after a new version build is successful and completed. But, for your local tests, the ‘logs’ directory will be a key one as we work through build steps and issues in later installments. This directory is where we will find the diagnostic information to debug an unsuccessful build.

The remaining output directory of ‘typologies’ is a special one. As separate steps at the completion of a successful build, we run some additional routines that extract out each of the individual main branches, or typologies, and report them separately as individual ontologies. We also do many placement tests on these typologies for how complete or fragmented they may be. Sometimes considerable effort at the end of a build might be devoted to inspecting each of KBpedia’s SuperTypes and its members to improve the placement and consistent treatment of RCs. In actual use, these typologies are much interconnected and integral to the entire KBpedia structure when seen in the full ontology, kbpedia_reference_concepts.n3. But, within the ‘typologies’ directory we do tease the typologies out separately for easier inspection and refinement, as well as possible use as inputs to external applications.

Most of the 30 or so core typologies in KBpedia do not overlap with one another, what is known as disjoint. Disjointness enables powerful reasoning and subset selection (filtering) to be performed on the KBpedia graph. There are upper typologies useful for further organizing the core ontologies, plus providing homes for shared concepts. Living Things, for example, can capture concepts shared by all plants and animals, by all life, which then enables better segregation of those life forms into separate Plants and Animals branches. These natural segregations are applied across the KKO structure.

Of course, you can learn more about this structure by using the online KBpedia Knowledge Graph explorer. Possible matching concepts are presented as you type. Once you enter the knowledge graph, you can explore and navigate in many different ways. See the KBpedia site for more instructions.

Lastly, please note that we have highlighted three files from this directory structure in red. We will explore these files further with Protégé in our next installment.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site. Markup

CWPK #7: Getting Familiar with KBpedia Files

Setting Up Your Baseline KBpedia File Structure



We introduce the files and directory structure for KBpedia, including pointing to some of the significant project files.

see above


Leave a Reply

Your email address will not be published. Required fields are marked *