Posted:August 5, 2020

CWPK #8: Getting Familiar with the KBpedia Structure

A Survey of the Upper Ontology, Full Knowledge Graph, and Typology Design

Cooking with KBpedia

Now that we have become a bit familiar with Protégé and have our local file structure set, let’s conduct a brief survey of the main components of the KBpedia structure. We’ll be using Protégé exclusively during this installment of the Cooking with Python and KBpedia series.

Start up Protégé as we discussed in CWPK #6. This time, however, use File → Open . . . and navigate to your KBpedia directory, and then the ‘owl’ subdirectory and the kko-demo.n3 file, highlight it, and pick Open. You should see a screen similar to Figure 1 after going to the Classes tab and expanding parts of the tree to expose some of the structure under the SuperTypes or Generals branch:

KKO Demo #1

Figure 1: KKO Demo View – IRI Short Name

You will note (1) that sub-items under each leaf are listed in alphabetical order. If you recall, however, we have organized the upper structure of KBpedia according to Charles Peirce‘s universal categories of Firstness (1ns), Secondness (2ns), and Thirdness (3ns). Roughly speaking, these categories correspond to qualities or possibilities (1ns), actuals or particulars (2ns), or generals (3ns).[1] Though we have provided metadata as to which of these three categories is assigned to each KBpedia reference concept (RC), we can not see these assignments in this listing. The purpose of the kko-demo.n3 file is to make these assignments explicit via the Protégé interface. We do this by assigning a prefLabel (preferred label) to each RC prefixed with its universal category number. (This change makes this particular file inoperable, but it does serve a didactic purpose to better understand KBpedia’s upper structure.)

To see this assignment, we need to change the basis for how Protégé renders its labels. To do so, proceed to the File → Preferences . . . and then the Renderer tab as shown in Figure 2. Depending on how your system is configured, you may need to both select the ‘Render by annotation property (e.g., rdfs:label, skos:prefLabel’ radio button and use the Configure . . . button to instruct what property to use for this label. If your Configure . . . popup screen does not show the http://www.w3c.org/2004/02/skos/core#prefLabel option in the top position, either move it to be the top option or choose the Add Annotation button at the upper left of the dialog; you will find the skos:prefLabel under the rdfs:label entry. Note, via these steps you can configure Protégé to display a variety of label choices.

Protégé Change Rendering

Figure 2: Change Protégé Rendering

With this change made, we then expand out the Class hierarchy tree to the same point. Only now, we see labels prefixed with the universal category numbers, which also acts to re-order the entries, as Figure 3 shows:

KKO Demo #2

Figure 3: KKO Demo View – Preferred Label

With this labeling now operating, we can navigate to the OWLViz tab, as we begin to show in Figure 4. (If OWLViz is not active in your installation, please refer to the OWLViz plug-in page and follow the installation instructions to activate that plug-in and get it set-up properly.) Since we want to see the layout structure of the Generals node where the KBpedia typologies reside, we first navigate to that node in the Class hierarchy tree (1). By picking the rightmost button in the display pane header (2) we can configure the depth to be shown in this display (3). We pick ‘5’ levels to track, resulting in the display below:

Protégé Configure Owlviz

Figure 4: Configure Protégé Owlviz

With this level of expansion, there are too many items to see within the available pane. We need to scroll to see the full extent of the structure. However, if we want a more comprehensive view, we can also export the entire image to file.

We do so, as Figure 5 indicates, by picking the next to rightmost pane header button (1) and then picking to display only the asserted items (2). When we pick Next we are given a dialog that enables us to set the image format type and to possibly scale the image. We accept the defaults, and then proceed to save our image with a name we prefer to our desired disk location.

Protégé Export Owlviz Image

Figure 5: OWLViz Image Export

This now produces for us a full rendering of the KKO ‘Generals’ structure, as shown in Figure 6:

KKO Structure

Figure 6: KKO – The Upper KBpedia ‘Generals’ Structure

Note we would have gotten the entire KKO structure if we had chosen the owl:Thing node as our starting location for the OWLViz graph rendering as opposed to ‘Generals’.

It is worth your time to study this KKO structure closely. The ‘Generals’ branch, in particular, is where all of KBpedia’s typologies reside (and therefore the great bulk of the RCs in KBpedia). Most every node in KKO under the Generals branch is itself the root node for a corresponding typology.

Though there are nearly 70 typologies in the KBpedia system, about 30 of them host the largest number of RCs and also have disjoint (non-overlapping) assertions between them. Here are the 30 or so core typologies organized in the KKO graph, with some upper typologies that cluster them:

Constituents	Natural Phenomena	This typology includes natural phenomena and natural processes such as weather, weathering, erosion, fires, lightning, earthquakes, tectonics, etc. Clouds and weather processes are specifically included. Also includes climate cycles, general natural events (such as hurricanes) that are not specifically named. Biochemical processes and pathways are specifically excluded, occurring under its own typology.
	Area or Region	The AreaRegion typology includes all nameable or definable areas or regions that may be found within “space”. Though the distinction is not sharp, this typology is meant to be distinct from specific points of interest (POIs) that may be mapped (often displayed as a thumbtack). Areas or regions are best displayed on a map as a polygon (area) or path (polyline).
	Location or Place	The LocationPlace typology is for bounded and defined points in “space”, which can be positiioned via some form of coordinate system and can often be shown as points of interest (POIs) on a map. This typology is distinguished by areas or locations, which are often best displayed as polygons or polylines on a map.
	Shapes	The Shapes typology captures all 1D, 2D and 3D shapes, regular or irregular. Most shapes are geometrically describable things. Shapes has only a minor disjointedness role, with more than half of KKO reference concepts having some aspect of a Shapes specification.
	Forms	This typology category includes all aspects of the shapes that objects take in space; Forms is thus closely related to Shapes. The Forms typology is also the collection of natural cartographic features that occur on the surface of the Earth or other planetary bodies, as well as the form shapes that naturally occurring matter may assume. Positive examples include Mountain, Ocean, and Mesa. Artificial features such as canals are excluded. Most instances of these natural features have a fixed location in space.
Time-related	Activities	These are ongoing activities that result (mostly) from human effort, often conducted by organizations to assist other organizations or individuals (in which case they are known as services, such as medicine, law, printing, consulting or teaching) or individual or group efforts for leisure, fun, sports, games or personal interests (activities). Generic, broad grouping of actions that apply to generic objects are also included in this typology.
	Events	These are nameable occasions, games, sports events, conferences, natural phenomena, natural disasters, wars, incidents, anniversaries, holidays, or notable moments or periods in time. Events have a finite duration, with a beginning and end. Individual events (such as wars, disasters, newsworthy occasions) may also have their own names.
	Times	This typology is for specific time or date or period (such as eras, or days, weeks, months type intervals) references in various formats.
	Situations	Situations are the contexts in which activities and events occur; situations are temporal in nature in that they are a confluence of many factors, some of which are temporal.
Natural Matter	Atoms and Elements	The Atoms and Elements typology contains all known chemical elements and the constituents of atoms.
	Natural Substances	The Natural Substances typology are minerals, compounds, chemicals, or physical objects that are not living matter, not the outcome of purposeful human effort, but are found naturally occurring. Other natural objects (such as rock, fossil, etc.) are also found under this typology. Chemicals can be Natural Substances, but only if they are naturally occurring, such as limestone or salt.
	Chemistry	This typology covers chemical bonds, chemical composition groupings, and the like. It is formed by what is not a natural substance or living thing (organic) substance. Organic Chemistry and Biological Processes are, by definition, separate typologies. This Chemistry typology thus includes inorganic chemistry, physical chemistry, analytical chemistry, materials chemistry, nuclear chemistry, and theoretical chemistry.
Organic Matter	Organic Chemistry	The Organic Chemistry typology is for all chemistry involving carbon, including the biochemistry of living organisms and the materials chemistry (including polymers) of organic compounds such as fossil fuels.
	Biochemical Processes	The Biochemical Processes typology is for all sequences of reactions and chemical pathways associated with living things.
Living Things	Prokaryotes	The Prokaryotes include all prokaryotic organisms, including the Monera, Archaebacteria, Bacteria, and Blue-green algas. Also included in this typology are viruses and prions.
	Protists & Fungus	This is the remaining cluster of eukaryotic organisms, specifically including the fungus and the protista (protozoans and slime molds).
	Plants	This typology includes all plant types and flora, including flowering plants, algae, non-flowering plants, gymnosperms, cycads, and plant parts and body types. Note that all plant parts are also included.
	Animals	This large typology includes all animal types, including specific animal types and vertebrates, invertebrates, insects, crustaceans, fish, reptiles, amphibia, birds, mammals, and animal body parts. Animal parts are specifically included. Also, groupings of such animals are included. Humans, as an animal, are included (versus as an individual Person). Diseases are specifically excluded. Animals have many of the similar overlaps to Plants. However, in addition, there are more terms for animal groups, animal parts, animal secretions, etc. Also Animals can include some human traits (posture, dead animal, etc.)
	Diseases	Diseases are atypical or unusual or unhealthy conditions for living things, generally known as conditions, disorders, infections, diseases or syndromes. Diseases only affect living things and sometimes are caused by living things. This typology also includes impairments, disease vectors, wounds and injuries, and poisoning.
Agents	Persons	The appropriate typology for all named, individual human beings. This typology also includes the assignment of formal, honorific or cultural titles given to specific human individuals. It further includes names given to humans who conduct specific jobs or activities (the latter case is known as an avocation). Examples include steelworker, waitress, lawyer, plumber, artisan. Ethnic groups are specifically included. Persons as living animals are included under the Animals typology.
	Organizations	Organizations is a broad typology and includes formal collections of humans, sometimes by legal means, charter, agreement or some mode of formal understanding. Examples these organizations include geopolitical entities such as nations, municipalities or countries; or companies, institutes, governments, universities, militaries, political parties, game groups, international organizations, trade associations, etc. All institutions, for example, are organizations. Also included are informal collections of humans. Informal or less defined groupings of humans may result from ethnicity or tribes or nationality or from shared interests (such as social networks or mailing lists) or expertise (“communities of practice”). This dimension also includes the notion of identifiable human groups with set members at any given point in time. Examples include music groups, cast members of a play, directors on a corporate Board, TV show members, gangs, teams, mobs, juries, generations, minorities, etc.
	Geopolitical	Named places that have some informal or formal political (authorized) component. Important subcollections include Country, IndependentCountry, State_Geopolitical, City, and Province.
Artifacts	Products	The Products typology includes any instance offered for sale or barter or performed as a commercial service. A Product is often a physical object made by humans that is not a conceptual work or a facility (which have their own typologies), such as vehicles, cars, trains, aircraft, spaceships, ships, foods, beverages, clothes, drugs, weapons. Besides general hierarchies related to Devices or Goods, this SuperType has three main splits into the classifications of a three-sector economy: PrimarySectorProducts, SecondarySectorProducts, and TertiarySectorServices. This is where most of the UNSPSC products and services codes are mapped.
	Food or Drink	This typology is any edible substance grown, made or harvested by humans. The category also specifically includes the concept of cuisines.
	Drugs	This typology is a drug, medication or addictive substance, or a toxin or a poison.
	Facilities	Facilities are physical places or buildings constructed by humans, such as schools, public institutions, markets, museums, amusement parks, worship places, stations, airports, ports, carstops, lines, railroads, roads, waterways, tunnels, bridges, parks, sport facilities, monuments. All can be geospatially located. Facilities also include animal pens and enclosures and general human “activity” areas (golf course, archeology sites, etc.). Importantly Facilities include infrastructure systems such as roadways and physical networks. Facilities also include the component parts that go into making them (such as foundations, doors, windows, roofs, etc.). Facilities can also include natural structures that have been converted or used for human activities, such as occupied caves or agricultural facilities. Finally, facilities also include workplaces. Workplaces are areas of human activities, ranging from single person workstations to large aggregations of people (but which are not formal political entities).
Information	Audio Info	This typology is for any audio-only human work. Examples include live music performances, record albums, or radio shows or individual radio broadcasts
	Visual Info	The Visual Info typology is for any still image or picture or streaming video human work, with or without audio. Examples include graphics, pictures, movies, TV shows, individual shows from a TV show, etc.
	Written Info	This typology includes any general material written by humans including books, blogs, articles, manuscripts, but any written information conveyed via text.
	Structured Info	This information typology is for all kinds of structured information and datasets, including computer programs, databases, files, Web pages and structured data that can be presented in tabular form.
Social	Finance & Economy	This typology pertains to all things financial and with respect to the economy, including chartable company performance, stock index entities, money, local currencies, taxes, incomes, accounts and accounting, mortgages and property.
	Society	This category includes concepts related to political systems, laws, rules or cultural mores governing societal or community behavior, or doctrinal, faith or religious bases or entities (such as gods, angels, totems) governing spiritual human matters. Culture, Issues, beliefs and various activisms (most -isms) are included.

Table 1: 30 ‘Core’ KBpedia Typologies

Once you have gained a feel for the upper KKO structure, it is useful to see how that organizes the entire content across the full KBpedia knowledge graph. So, once you are done inspecting KKO, go back to the File → Open recent . . . dialog, pick the ‘ . . . \target\kbpedia_reference_concepts.n3‘ full ontology file (the one we earlier inspected in CWPK #6) and answer Yes to the ‘Do you want to open the ontology in the current window?’ dialog. Also agree to let the shared items remain in the workspace.

With the familiarity gained from our inspection of the KKO, it becomes a bit easier to see how the detailed RCs fit within this overall KKO upper structure. Note, however, that we no longer have the universal category prefixes that our non-working kko-demo.n3 view gave us. Again, spend some time continuing to get familiar with the scope of reference concepts in KBpedia including use of the search function as explained in CWPK #6.

Another step we can take is to review the individual typologies in isolation. If we return to the File → Open dialog we can navigate to the ‘typologies’ directory within our chosen KBpedia file structure. Let’s scroll through that list and then select one, say, Facilities-typology.n3. We will again in the dialog accept to open the ontology in the current window by choosing Yes. The facilities typology will then load, presenting to us the opening Protégé screen with most of the metadata blank. Then, select the Classes tab and begin expanding the Classes hierarchy tree as shown in Figure 7:

Initial Class View

Figure 7: KBpedia Facilities Typology

Remember from our brief build overview in CWPK #2 that one of the last steps in the build process was to create these typology files. Thus, while the expandable Class hierarchy pane (1) shows the expected tree structure, the annotations pane (2) is empty and the descriptions pane (3) only shows the direct subClassOf linkages. That is because these typology ontologies are an extraction from the full knowledge graph and are not meant to be usable on their own.

In this manner you can inspect any and all of the KBpedia typologies that may be of interest to you in isolation. Though many RCs have more than one typology assignment, this isolated view removes that complexity. These isolated typology views are particularly helpful when adding a new typology to the system or when trying to understand the scope of a given typology.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] See Bergman, Michael K. 2016. “A Foundational Mindset: Firstness, Secondness, Thirdness.” AI3:::Adaptive Information. https://www.mkbergman.com/1932/a-foundational-mindset-firstness-secondness-thirdness/ (September 18, 2017).

Posted:August 4, 2020

CWPK #7: Getting Familiar with KBpedia Files

Setting Up Your Baseline KBpedia File Structure

Cooking with KBpedia

Now that we have begun exploring KBpedia in this series on Cooking with Python and KBpedia, it is time for us to set up the entire system locally for beginning our work with it. To do so, we will download the entire suite of available KBpedia files and install them on your local file system. To make this task easier, we have provided three files on the KBpedia GitHub code site, https://github.com/Cognonto/kbpedia/current-zip/kbpedia-250.zip, https://github.com/Cognonto/kbpedia/current-zip/kbpedia-250-target-1.zip, and https://github.com/Cognonto/kbpedia/current-zip/kbpedia-250-target-2.zip. Download these files (about 50 MB in total), place the first into a new directory you have established (say, kbpedia-test or the like), and unzip it. Then, for the next two files, place them into the new ‘target’ subdirectory that has been created from the first unzip and then extract them (unzip) to that subdirectory.

When you visit GitHub, you will see that each KBpedia version exists in its own versions folder. Each sub-folder is a version folder such as 1.60 or 2.50. Each version folder duplicates what is discussed in this article, though earlier versions may have a slightly different structure. Still, you can access and re-create KBpedia in its earlier instantiations if you wish. For a new project intended for more than test use, we would only use the current version. Each release has been an improvement over its predecessor.

Note: Alternatively, you may manage KBpedia files and updates through the standard Git ‘pull’ process. This ‘pull’ approach is the best way to stay current with KBpedia’s development, but is extra work for newbies simply interested in learning. See these GitHub resources if you wish to employ the ‘pull’ method. We do not discuss that method further here.

When you extract the zip file locally, here is the directory structure you will see, which I explain after the listing:

      C:.
      |      
      +---fixes
      |       reference-concepts-add-alt-labels.csv
      |       reference-concepts-add-definition.csv
      |       reference-concepts-add-sub-class-of.csv
      |       reference-concepts-fixes.csv
      |       reference-concepts-remove-sub-class-of.csv
      |      
      +---indexes
      |       super_types.csv
      |      
      +---logs
      |   |  
      |   +---metrics
      |   |      
      |   \---unsatisfiables
      |          
      +---mappings
      |   |  
      |   +---core
      |   |       dbpedia-ontology.csv
      |   |       dbpedia.csv
      |   |       geonames.csv
      |   |       same-as.csv
      |   |       schema.org.csv
      |   |       wikidata.csv
      |   |       wikipedia.csv
      |   |      
      |   +---general
      |   |       bibo.csv
      |   |       cc.csv
      |   |       dc.csv
      |   |       doap.csv
      |   |       event.csv
      |   |       foaf.csv
      |   |       frbr.csv
      |   |       geo.csv
      |   |       mo.csv
      |   |       oo.csv
      |   |       org.csv
      |   |       po.csv
      |   |       rss.csv
      |   |       sioc.csv
      |   |       time.csv
      |   |       transit.csv
      |   |      
      |   +---property
      |   |       dbpedia-ontology.csv
      |   |       geonames.csv
      |   |       opencyc.csv
      |   |       schema.org.csv
      |   |       unspsc.csv
      |   |       wikidata.csv
      |   |       wikipedia.csv
      |   |      
      |   \---special
      |            wikipedia-categories.csv
      |          
      +---new-concepts
      |       new-concepts.csv
      |      
      +---owl
      |       kbpedia_reference_concepts.n3
      |       kko-demo.n3
      |       kko.n3
      |       skos-owl1-dl.owl
      |      
      +---properties
      |       schema.csv
      |       wikidata.csv
      |      
      +---target
      |       bibo.n3
      |       cc.n3
      |       dbpedia-ontology.n3
      |       dc.n3
      |       doap.n3
      |       event.n3
      |       foaf.n3
      |       frbr.n3
      |       geo.n3
      |       geonames.n3
      |       kbpedia_reference_concepts.n3
      |       kbpedia_reference_concepts_linkage.n3
      |       kbpedia_reference_concepts_linkage_inferrence_extended.n3
      |       mo.n3
      |       oo.n3
      |       opencyc.n3
      |       org.n3
      |       po.n3
      |       rss.n3
      |       same-as.n3
      |       schema.org.n3
      |       sioc.n3
      |       time.n3
      |       transit.n3
      |       wikidata.n3
      |       wikipedia.n3
      |      
      \---typologies
          |   ActionTypes-typology.n3
          |   AdjunctualAttributes-typology.n3
          |   Agents-typology.n3
          |   Animals-typology.n3
          |   AreaRegion-typology.n3
          |   Artifacts-typology.n3
          |   Associatives-typology.n3
          |   AtomsElements-typology.n3
          |   AttributeTypes-typology.n3
          |   AudioInfo-typology.n3
          |   AVInfo-typology.n3
          |   BiologicalProcesses-typology.n3
          |   Chemistry-typology.n3
          |   Concepts-typology.n3
          |   ConceptualSystems-typology.n3
          |   Constituents-typology.n3
          |   ContextualAttributes-typology.n3
          |   CopulativeRelations-typology.n3
          |   Denotatives-typology.n3
          |   DirectRelations-typology.n3
          |   Diseases-typology.n3
          |   Drugs-typology.n3
          |   EconomicSystems-typology.n3
          |   EmergentKnowledge-typology.n3
          |   Eukaryotes-typology.n3
          |   EventTypes-typology.n3
          |   Facilities-typology.n3
          |   FoodDrink-typology.n3
          |   Forms-typology.n3
          |   Generals-typology.n3
          |   Geopolitical-typology.n3
          |   Indexes-typology.n3
          |   Information-typology.n3
          |   InquiryMethods-typology.n3
          |   IntrinsicAttributes-typology.n3
          |   KnowledgeDomains-typology.n3
          |   LearningProcesses-typology.n3
          |   LivingThings-typology.n3
          |   LocationPlace-typology.n3
          |   Manifestations-typology.n3
          |   MediativeRelations-typology.n3
          |   Methodeutic-typology.n3
          |   NaturalMatter-typology.n3
          |   NaturalPhenomena-typology.n3
          |   NaturalSubstances-typology.n3
          |   OrganicChemistry-typology.n3
          |   OrganicMatter-typology.n3
          |   Organizations-typology.n3
          |   Persons-typology.n3
          |   Places-typology.n3
          |   Plants-typology.n3
          |   Predications-typology.n3
          |   PrimarySectorProduct-typology.n3
          |   Products-typology.n3
          |   Prokaryotes-typology.n3
          |   ProtistsFungus-typology.n3
          |   RelationTypes-typology.n3
          |   RepresentationTypes-typology.n3
          |   SecondarySectorProduct-typology.n3
          |   Shapes-typology.n3
          |   SituationTypes-typology.n3
          |   SocialSystems-typology.n3
          |   Society-typology.n3
          |   SpaceTypes-typology.n3
          |   StructuredInfo-typology.n3
          |   Symbolic-typology.n3
          |   Systems-typology.n3
          |   TertiarySectorService-typology.n3
          |   Times-typology.n3
          |   TimeTypes-typology.n3
          |   TopicsCategories-typology.n3
          |   VisualInfo-typology.n3
          |   WrittenInfo-typology.n3

The ‘target’ directory is perhaps the most important of this listing, since this is where the built files reside after the build process (which we take up beginning with CWPK #37). This directory contains the output mapping files to external sources, based on the input specifications found in the ‘mappings\core’, ‘mappings\general’ and ‘mapping\special’ directories. In other words, input specifications get tested and then ingested into the build process from the ‘mappings’ directory, which, when successful, outputs those mappings to the ‘target’ directory in KBpedia’s canonical N3 format.

The output ‘target’ directory also includes these three pivotal knowledge graph files:

file name	description
`kbpedia_reference_concepts.zip`	This is the code KBpedia reference concepts structure with all the 58k concepts
`kbpedia_reference_concepts_linkage.zip`	This is the same structure as above where we added all the linkages to other ontologies
`kbpedia_reference_concepts_linkage_ inferrence_extended.zip`	This is the same structure that includes the linkages, but we add all inferred relationships between the concepts and their links to other ontologies

In the GitHub listing, we provide the other ‘target’ outputs under the sub-folder called ‘linkages’, which has one file per linked ontology.

Other input specifications are provided through the ‘indexes’, ‘new-concepts’, ‘owl’, ‘properties’ and ‘fixes’ directories. The ‘indexes’ directory contains the direct assignments to KBpedia’s 70 or so typologies, or SuperTypes. We discuss these in a bit under the output ‘typologies’ directory. The ‘new-concepts’ directory is where the major specifications for the KBpedia resource concepts (RCs) are located. The new-concepts.csv file is the single most important input file in the system, since this is where we initially specify all (most) RCs found in KBpedia. Complete entries in this listing require multiple input fields, as we will detail in a later installment. For now, just recognize this file as one of the most central to the system.

The ‘fixes’ directory contains a number of input files, processed as some of the last in the build steps, which add or overwrite specification information provided in earlier input files. These updates should be migrated into the initial input files over time, but are provided here as a separate directory as a convenient and more-easily managed location for making or testing input updates during active builds. It is best to consider this directory as a temporary one, useful while testing and evaluating new builds. However, updates contained in this directory can remain there indefinitely and will be the last processed during a build.

The ‘properties’ directory is for inputs to the property listings in KBpedia. It is the subject for a later article that we can skip over for the time being.

The remaining input directory is ‘owl’. Two important input files are found here. The first, kko.n3, is the fully specified upper ontology to KBpedia. It is fairly static, often not changing at all from build to build. It is a fully specified ontology file with complete metadata, and an integral central scaffolding to the KBpedia build process. No where else is KKO specified. The other important input file in this ‘owl’ directory is the kbpedia_reference_concepts.n3 file, which is basically the stub header with metadata used in the build process that gets populated with all of the RCs and their specifications. The output from this build with the header is the full KBpedia ontology in kbpedia_reference_concepts.n3 in the ‘target’ output directory. The kko-demo.n3 file in the ‘owl’ directory is a non-working labeled version of KKO that relates the upper concepts to Peirce’s universal categories. The skos-owl1-dl.owl file is an unusual version of the SKOS ontology used in the build process that is normally accessed offsite during the build process, but is provided here in case remote connections are lost.

Besides the ‘target’ directory, there are two other output directories in this listing. The first is the ‘logs’ directory. Here is where we direct error messages and stats that may arise from the build process. The directory is normally blanked out after a new version build is successful and completed. But, for your local tests, the ‘logs’ directory will be a key one as we work through build steps and issues in later installments. This directory is where we will find the diagnostic information to debug an unsuccessful build.

The remaining output directory of ‘typologies’ is a special one. As separate steps at the completion of a successful build, we run some additional routines that extract out each of the individual main branches, or typologies, and report them separately as individual ontologies. We also do many placement tests on these typologies for how complete or fragmented they may be. Sometimes considerable effort at the end of a build might be devoted to inspecting each of KBpedia’s SuperTypes and its members to improve the placement and consistent treatment of RCs. In actual use, these typologies are much interconnected and integral to the entire KBpedia structure when seen in the full ontology, kbpedia_reference_concepts.n3. But, within the ‘typologies’ directory we do tease the typologies out separately for easier inspection and refinement, as well as possible use as inputs to external applications.

Most of the 30 or so core typologies in KBpedia do not overlap with one another, what is known as disjoint. Disjointness enables powerful reasoning and subset selection (filtering) to be performed on the KBpedia graph. There are upper typologies useful for further organizing the core ontologies, plus providing homes for shared concepts. Living Things, for example, can capture concepts shared by all plants and animals, by all life, which then enables better segregation of those life forms into separate Plants and Animals branches. These natural segregations are applied across the KKO structure.

Of course, you can learn more about this structure by using the online KBpedia Knowledge Graph explorer. Possible matching concepts are presented as you type. Once you enter the knowledge graph, you can explore and navigate in many different ways. See the KBpedia site for more instructions.

Lastly, please note that we have highlighted three files from this directory structure in red. We will explore these files further with Protégé in our next installment.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Posted:August 3, 2020

CWPK #6: Initial KBpedia Inspection

We’ll Begin with Two Central Files

Cooking with KBpedia

We’ll get more formal in a few installments from now in this Cooking with Python and KBpedia series for how to handle the entire KBpedia file structure. For now, however, let’s begin by getting familiar with the two central files in the package.[1] You should have already installed the Protégé desktop editor from the previous article.

First, go to the KBpedia GitHub repository and download the two files of the kko.n3 and kbpedia_reference_concepts.zip. In the case of kko.n3, which is the small upper ontology for KBpedia, you will copy-and-paste the code to a local file and name it the same. In the case of kbpedia_reference_concepts.zip, which contains the main substance of KBpedia, you should download the file and then unzip it in a directory you can find on your local machine. The unzipped file is called kbpedia_reference_concepts.n3. For simplicity, put this and kko.n3 file into the same directory. (We will later get a little more complicated in our file structure layout as we begin editing the files in earnest.)

Next, to start up Protégé, invoke the executable in your Protégé directory. It will take a few seconds for the program to load. Once the main screen appears, go to File and then Open, and then navigate to the directory to where you stored kbpedia_reference_concepts.n3. Pick that file and click the Open button.

The first time you load KBpedia you are likely to get the following error message:

Possible Loading Error

Figure 1: Possible Error Message Upon Loading

Follow the instructions on the screen to find the second needed file, kko.n3, which I just suggested you store in the same directory. (Once you save your current session, the next time you start up this error will not appear.) Also, next you work with the system, you can open KBpedia by using the File → Open Recent option. Lastly, you may encounter some performance or display issues; see the previous installment on Protégé.

Let’s first move to the Classes tab screen, the most important to understanding the hierarchy and structure of KBpedia. Note when we change tabs that the border colors also change. Each tab in Protégé is demarked with its own color. The actual class structure is shown in the left-hand pane (1) in Figure 2. The tree structure may be expanded or collapsed by clicking on the triangles shown for a given item (items without the triangle are terminal nodes). The direction the triangle points indicates the expand or collapse mode. Depending on your Protégé settings, the default opening for this tree may be expanded (by levels) or collapsed. What we are showing in Figure 2 is the highest structure of KBpedia, which can also be separately inspected with the kko.n3 file alone. Because KBpedia is an organized, computable structure of types (classes), the majority of the items in KBpedia may be found under the SuperTypes branch (1). This is where you will spend most of your time inspecting the existing 58 K reference concepts (RCs).

Another thing to note is the multi-paned structure of the layout (2), which I noted before. These panes are configurable, and may be moved and resized at any location across the tab. Figure 2 is close to the default Protégé settings.

Initial Class View

Figure 2: Initial View from the Class Tab

Search (3) is one of the most important functions in the system, since it is the primary way to find specific RCs when there are thousands. Search is also useful for all other information in the system. Given this importance, let’s take another short detour to the search screen. Click search.

That brings up the search screen, as shown in the next Figure 3. There is some interesting functionality here, worth calling out individually. Let’s begin a search for ‘mammal’:

Class View Using 'Mammal'

Figure 3: Class View After Doing A ‘Mammal’ Search

As we enter the search term, only ‘mamma’ so far in the case shown, there is a lookahead (auto-complete) function to match the entered text (1), beginning with three characters. It is also important to note there are some pretty powerful search options (2); I often use the Show all results choice, though sometimes lists can grow to be huge! (Using few search characters for common letter combinations, for example).

The search screen organizes its results into multiple categories (3) (scroll down), including descriptions and annotations. The most important matches, namely to preferred labels and IRIs, appear at the top of the listing. It is also possible to highlight results on these lists and create copies (4) for posting to the clipboard. I use this functionality frequently.

Once we have selected ‘Mammal’ from the search results list, the search screen remains open (useful for testing many putative matches), and the tree in the Class view updates and more RC results are automatically displayed, as Figure 4 shows (in this case, I have closed the search screen so as to not obscure the main screen):

Class View of 'Mammal'

Figure 4: Class View of ‘Mammal’

We now see a much-expanded tree in the left Class hierarchy pane (1). We can again click the triangles to collapse or expand that portion of the tree.

For the selected item in the tree, again ‘Mammal’ in this case, we can see its annotations and linkage relationships (2), including labels, descriptions, notes and links. The Descriptions pane (3) shows us the formal relationships and other assertions for this RC in the knowledge graph. (Since we are not working with all KBpedia files, this portion may not be as complete as when all files are included.)

Thie general process can be repeated over and over to gain an understanding. You can navigate the tree via scrolling and expanding and collapsing nodes, or searching for terms or stems as you encounter then. Of course, both navigation and searching are done concurrently during discovery mode. It is this process, in my view, that best leads to first twitch for KBpedia by better understanding the structure, scope and relationships for the graph’s 58 K reference concepts.

These same conventions and approaches may also be used for understanding the properties (relations) in KBpedia, as I show in Figure 5. First, note (1) we have split our properties into three groups: object properties, data properties, and annotation properties:

Initial Object Property View

Figure 5: Initial View from the Object Property Tab

These are the standard splits in the OWL language. In essence, object properties are those that connect to an item (with a URI or IRI) already in the system; data properties are literal strings and descriptions connected to the subject item; and annotation properties are those that describe or point to the item. We’ll just use an object property example here, though the use and navigation applies to the other two property categories as well.

The Object properties tab in Figure 5 also has a search function (2), exactly similar to what was described for classes. We also see a tree structure at the left that works the same as for classes (3). However, besides the relations splits due to Peirce, there are two other major property differences for KBpedia compared to most knowledge graphs or ontologies. The first difference is the sheer number of properties, more than 5 K in the case of KBpedia. The second is the logical organization of those properties, beginning with the three splits due to Peirce, but extending down to an emerging, logical hierarchy of property types.

To see some of this, let’s do a search for the property ‘color’ [(2) in Figure 5]. The result, again working similar to what we saw for classes, I show in Figure 6:

Object Property View Using 'Color'

Figure 6: Object Property View for ‘color’

Like before, we now see an expanded tree highlighting the ‘color’ property (1), again accompanied by metadata and other structural aspects of the Object properties (2).

As before, you can use a combination of scrolling, tree expansions and searching to discover the properties in KBpedia. Do make sure and check out the Data properties and Annotation properties tabs as well.

I encourage you to spend some time navigating the classes and properties tabs and searching for things of interest across the structure. Look horizontally across the many higher-level categories under the Generals main branch. Find some areas of interest and continue to expand the tree to dive deeper into those categories. Spend time using the search function and restrict searches by turning the various search options (found at the top of the Search window; see Figure 3). You can also highlight portions of results in the search pane and copy them to the clipboard for pasting into other applications.

There are many views, tabs, and plugins available to Protégé, importantly including reasoners and other extended capabilities such as visualization, documentation, querying, or exporting. We will find occasion to instruct in the use of some of these throughout the CWPK series.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] Parts of this article were posted in a previous blog post, Bergman, Michael K. 2019. “First Twitch with KBpedia.” AI3:::Adaptive Information. https://www.mkbergman.com/2202/first-twitch-with-kbpedia/ (April 1, 2020).

Posted:July 31, 2020

CWPK #5: Overview and Installation of Protégé

This Standard Ontology Editor/IDE is an Essential Part of Your Toolkit

Cooking with KBpedia

Though there are commercial alternatives, one essential part of your starting toolkit to work with ontologies (a term we use interchangeably with knowledge graph, though not all researchers do) is the Protégé editor. Protégé is an open-source ontology development framework (IDE) with more than 370,000 users. Protégé comes in two versions: one for the desktop, now in version 5.x, and one that is Web-based. We will be working with the desktop version for the Cooking with Python and KBpedia series.[1][2]

If you already have Protégé installed and are pretty comfortable with it, you may skip this installment. Otherwise, let’s spend about 15-30 min of effort so that you can set up your own local environment to work with KBpedia.

You first need to download and install Protégé. Go to the Protégé download page and follow the instructions for your particular operating system. You should fill out the new user registration (though you can claim you are already registered and still download it directly). The version I installed for this example is version 5.50 (though any of the version 5.2 forward should be fine as well.) The Protégé distribution comes as a zip file, so you should unzip it into a directory of your choice. To complete the set-up you will also need the most recent version of Java installed on your machine; it you do not have it, here are installation instructions.

Next, to start up Protégé, invoke the executable in your Protégé directory. It will take a few seconds for the program to load. Once the main screen appears, go to File and then Open from URL, and then pick, say, http://protege.stanford.edu/ontologies/camera.owl, as shown by (1):

Protégé Open URL Screen

Figure 1: Protégé Open URL Screen

We’ll get into KBpedia in earnest in the next installment, but if you want an early peek, you could also enter either https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/kko.n3 (KBpedia upper ontology) or https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/kbpedia_reference_concepts.zip (the full KBpedia, which you will need to unzip in a Web-accessible location and update this URL) into the dialog box in Figure 1. (Note: you may need to update the version reference to a later version depending on when you read this.) You will note that the next screen shots use the ‘full’ KBpedia example.

Upon entry, you will see the Protégé main screen as shown in Figure 2. Let me briefly cover some of the main conventions of the program. The three key structural aspects of the Protégé program are its main menu, its tab structure, and the views (or panes) shown for each tab where it appears on the standard interface (5). At start-up we always begin at the Active ontology tab, for which I highlight some of its key panes and functionality:

Main Protégé Screen

Figure 2: Main Protégé Screen

The ontology header section (1) is where all of the metadata for the knowledge graph resides. Such material includes title, creators, version notes and so forth. The metrics for the ontology resides in the second view (2). In this case, for example, this version of KBpedia has about 58,000 classes (reference concepts) and more than 5,000 properties. We also see in the third view (3) that KBpedia requires the SKOS and KKO ontology imports. Also note the search button (4), which we will use frequently, and the tab structure and order (5). We will modify that structure in later installments.

Because Protégé, like many integrated development environments (IDEs), is highly configurable, let’s detour for a short step to see how we can modify how our program looks. I am going to delete and add tabs to make the tab structure conform to the remaining screen shots.

To change tabs in Protégé, let’s refer to Figure 3:

Adding Tab Views

Figure 3: Adding Tab Views to Protégé

We effect the general layout of the system using the Window → Tabs option from the main menu. You delete a tab by clicking on the arrow shown for each tab as presented in the standard interface. You add tabs by selecting one of the options in the Tabs menu (2). Note that active tabs are indicated by the checkmark ( ✓ ). New tabs are added to the right of the tab sequence (3). Thus, to change the ordering of tabs, one must delete and then add tabs in the order desired. You can follow these steps if you want the tab ordering to reflect the screen shots below. This same main menu Window option is where you can change the views (panes) for each tab.

When these class tabs are to your liking, we can apply these same conventions and approaches to the properties (relations) for the knowledge graph, as I show in Figure 4. First, note (1) we have split our properties into three groups: object properties, data properties, and annotation properties:

Initial Object Property View

Figure 4: Initial View from the Object Property Tab

These are the standard splits in the OWL language. How we use these splits and their relation to the guidance of Charles Sanders Peirce is described in later installments. In essence, object properties are those that connect to an item (with a URI or IRI) already in the system; data properties are literal strings and descriptions connected to the subject item; and annotation properties are those that describe or point to the item. We’ll just use an object property example here, though the use and navigation applies to the other two property categories as well.

The Object properties tab in Figure 4 also has a search function (2), exactly similar to what was described for classes. We also see a tree structure at the left that works the same as for classes (3). As before, you can use a combination of scrolling, tree expansions and searching to discover the other properties in your knowledge graph. Do make sure and check out the Data properties and Annotation properties tabs as well.

Throughout this CWPK series we will be using examples from Protégé and comparing them to direct interaction with the code base using Python. These later installments will cover most of the standard use and maintenance cases you will likely encounter with your knowledge graphs.

A Note on Performance and Preferences

You may experience some performance issues with Protégé as it comes out of the box, especially as we begin working with the relative large KBpedia in earnest. One likely cause are the memory settings that you may find in the run.bat file that you can find in the main directory where you installed Protégé. As a quick fix, try updating these settings in that file to these values before the next time you start the application:

-Xmx2500M -Xms2000M

Also note there are many customization options in Protégé. If you get captivated with the tool, I encourage you to explore the plugins available and the ways to modify the application interface. See especially File → Preferences, with the Renderer and Plugin tabs good places to look. Again, we will touch on some of these aspects in later articles.

Some Suggested Protégé Resources

Protégé 5 Introductory Documentation
Protégé 5 Documentation
Pizza Tutorial (Protégé 4) (full listing)
Pizzas in 10 Minutes
Protégé Plug-ins
Protégé mailing list.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] Parts of this article were posted in a previous blog post, Bergman, Michael K. 2019. “First Twitch with KBpedia.” AI3:::Adaptive Information. https://www.mkbergman.com/2202/first-twitch-with-kbpedia/ (April 1, 2020).

[2] The Web-based version is great for collaboration, but does not include all of the features of the desktop version and can not handle very large ontologies, such as KBpedia as fully expressed.

Posted:July 30, 2020

CWPK #4: The Baseline Architecture

We’ll Try to be as ‘Pythonic’ as Possible in the Design

Cooking with KBpedia

In past efforts, we have produced self-contained semantic technology platforms — for one, the Open Semantic Framework, since retired — based on similar objectives to what we have set for this CWPK series. However, with Cooking with Python and KBpedia, our audience is the newbie committed to learn more, not the enterprise. It may be that the approaches presented in this series may be adapted for enterprise use, but in order to maximize the training value of this series we prefer to emphasize off-the-shelf ‘glue-together’ components utilizing a fairly easy to learn and common language, Python. Our objective here is not commercial performance and security, but learnability and understandability.

Our design places the knowledge graph at the center, as shown below, surrounded by Python-based applications shown in yellow. The knowledge graph in our instance, KBpedia, is written in the W3C standard Web ontology language of OWL 2. However, what we are outlining here, including the possible extensions of KBpedia into your own domain of interest, can apply to any knowledge graph using World Wide Web Consortium (W3C) open standards. The language, as we implement it, embraces the other W3C standards of the Resource Description Framework (RDF) and its schema extension (RDFS). We also use an implementation of RDFS called SKOS (Simplified Knowledge Organization System), which is useful for providing a language of hierarchies and classification and labels familiar to librarians and information scientists.[1] Note all of these standards are completely independent of Python, or any programming language for that matter. These standards follow description logics and enable logical manipulation and analysis of their knowledge representations (KR).

Historically, many programming languages have been used to manage, store, and manipulate these W3C standard KR languages. For at least the past 15 years, Java has been the dominant programming language for semantic technology applications, most often accounting for more than half of all tools.[2] From an enterprise standpoint, Java-based applications may still be the most defensible choice. But we want our architecture to embrace a single language, Python, that has great connections in some areas, perhaps weak ones in others. Nonetheless, like any language choice, there are trade-offs. Working through those trade-offs for Python is an explicit topic in this CWPK series.

The architecture diagram below reflects these considerations. At the top we have inputs into the Python-based system, based on electronic notebooks, Web templates where user interactions send directives to the system, or direct command line interfaces (CLI). Because they are interactive and can display invoked apps, we will be using the electronic notebook interface for most installments in this series. We include some CLI stuff for quick responses. And, we include Web page examples of how one might drive these Python-based applications based on choices by users in their Web site interactions. This latter input style is very important, since interaction with knowledge graphs should be a distributed activity across normal workflows. Stopping to invoke a separate application space whenever new knowledge is encountered or questioned is unnatural and leads to little or no adoption. If we are to take advantage of these knowledge technologies, we must integrate them into our current work activities.

These possible sources of input would be best served by having a Python interface or API that maps the basic class, instance, property, and value perspectives of the W3C standards into native Python constructs. This will allow us to abstract knowledge graph specifications into natural Python code. We show this unspecified (at this time) ‘OWL API / Mappings’ component in green in the diagram. This pivotal component will receive much attention throughout the ensuing series.

This Python input is geared to access and manage the knowledge graph, shown at the bottom of the diagram. The knowledge graph needs its own storage to be persistent. (We do not spend further time on this component, other than to say that systems should be designed to interface with external storage, not incorporate specific ones. Storage is a commodity component.) Ontologies, or knowledge graphs, already have an excellent open-source integrated design environment (IDE) in the Protégé application, developed by Stanford University.

We can see these major components in the following diagram. The Python components are shown in yellow; the knowledge graph (KBpedia) in gray; and external tools for the knowledge graph in blue. Two split boxes show that both existing, external apps and Python ones are possible for those functions:

CWPK Basic Architecture

Figure 1: CWPK Basic Architecture

The diagram shows that inputs or requests of the knowledge graph may come from specific functional components such as querying (SPARQL), rule-setting (SWRL), or programmatic ones coming from user interfaces or external requests (yellow and orange). Also, in a loosely-coupled manner, we want outputs from our system to be flexible enough to tailor to various file formats or external APIs. This interface point is where using the system to, say, power machine learning or natural language applications, among all external systems, resides. Knowing how to stage and format outputs is a key task of the design.

Protégé plays an integral role in this architecture. It is firstly the common denominator for talking about the system, since this tool is ubiquitous in the semantic technology space. Secondly, most users have only manipulated knowledge graphs through this interface. Our Python-based system must duplicate this functionality, plus show how we can greatly bridge past it. Moreover, there are many ontology or knowledge graph management tasks where Protégé is the go-to choice. Searching, navigating, and visualizing are some of the key strengths of Protégé. The objective is not to replace Protégé, but to complement it. Protégé has an organizational view of knowledge graphs; what we want is a knowledge view of knowledge graphs. We thus use Protégé as a common touchstone as we work through our installments.

Protégé can host reasoners, as can our Python code, which is why that component is shown in dual blue-yellow colors. Another dual component is the build routines. This part of the architecture is deceptively critical, since we need to both: 1) logically test the knowledge graph for coherence and consistency as we add to or build it; and 2) enable round-tripping between build and W3C formats.

Among perhaps others, I see two payoffs to the pursuit of an architecture such as this. One, we can gain a dual programmatic and interactive environment for managing and keeping a knowledge graph current. And, two, we provision an engine for feeding external APIs in areas such as machine learning, natural language understanding, and interoperability.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] At certain points in this CWPK series we will offer links to learning resources about these W3C languages. However, we assume you know their basics. The emphasis here is on the programming language Python to interoperate
with these standards.

[2] Sweet Tools.