We’ll Begin with Two Central Files
We’ll get more formal in a few installments from now in this Cooking with Python and KBpedia series for how to handle the entire KBpedia file structure. For now, however, let’s begin by getting familiar with the two central files in the package. You should have already installed the Protégé desktop editor from the previous article.
First, go to the KBpedia GitHub repository and download the two files of the
kbpedia_reference_concepts.zip. In the case of
kko.n3, which is the small upper ontology for KBpedia, you will copy-and-paste the code to a local file and name it the same. In the case of
kbpedia_reference_concepts.zip, which contains the main substance of KBpedia, you should download the file and then unzip it in a directory you can find on your local machine. The unzipped file is called
kbpedia_reference_concepts.n3. For simplicity, put this and
kko.n3 file into the same directory. (We will later get a little more complicated in our file structure layout as we begin editing the files in earnest.)
Next, to start up Protégé, invoke the executable in your Protégé directory. It will take a few seconds for the program to load. Once the main screen appears, go to File and then Open, and then navigate to the directory to where you stored
kbpedia_reference_concepts.n3. Pick that file and click the Open button.
The first time you load KBpedia you are likely to get the following error message:
Follow the instructions on the screen to find the second needed file,
kko.n3, which I just suggested you store in the same directory. (Once you save your current session, the next time you start up this error will not appear.) Also, next you work with the system, you can open KBpedia by using the File → Open Recent option. Lastly, you may encounter some performance or display issues; see the previous installment on Protégé.
Let’s first move to the Classes tab screen, the most important to understanding the hierarchy and structure of KBpedia. Note when we change tabs that the border colors also change. Each tab in Protégé is demarked with its own color. The actual class structure is shown in the left-hand pane (1) in Figure 2. The tree structure may be expanded or collapsed by clicking on the triangles shown for a given item (items without the triangle are terminal nodes). The direction the triangle points indicates the expand or collapse mode. Depending on your Protégé settings, the default opening for this tree may be expanded (by levels) or collapsed. What we are showing in Figure 2 is the highest structure of KBpedia, which can also be separately inspected with the
kko.n3 file alone. Because KBpedia is an organized, computable structure of types (classes), the majority of the items in KBpedia may be found under the SuperTypes branch (1). This is where you will spend most of your time inspecting the existing 58 K reference concepts (RCs).
Another thing to note is the multi-paned structure of the layout (2), which I noted before. These panes are configurable, and may be moved and resized at any location across the tab. Figure 2 is close to the default Protégé settings.
Search (3) is one of the most important functions in the system, since it is the primary way to find specific RCs when there are thousands. Search is also useful for all other information in the system. Given this importance, let’s take another short detour to the search screen. Click search.
That brings up the search screen, as shown in the next Figure 3. There is some interesting functionality here, worth calling out individually. Let’s begin a search for ‘mammal’:
As we enter the search term, only ‘mamma’ so far in the case shown, there is a lookahead (auto-complete) function to match the entered text (1), beginning with three characters. It is also important to note there are some pretty powerful search options (2); I often use the Show all results choice, though sometimes lists can grow to be huge! (Using few search characters for common letter combinations, for example).
The search screen organizes its results into multiple categories (3) (scroll down), including descriptions and annotations. The most important matches, namely to preferred labels and IRIs, appear at the top of the listing. It is also possible to highlight results on these lists and create copies (4) for posting to the clipboard. I use this functionality frequently.
Once we have selected ‘Mammal’ from the search results list, the search screen remains open (useful for testing many putative matches), and the tree in the Class view updates and more RC results are automatically displayed, as Figure 4 shows (in this case, I have closed the search screen so as to not obscure the main screen):
We now see a much-expanded tree in the left Class hierarchy pane (1). We can again click the triangles to collapse or expand that portion of the tree.
For the selected item in the tree, again ‘Mammal’ in this case, we can see its annotations and linkage relationships (2), including labels, descriptions, notes and links. The Descriptions pane (3) shows us the formal relationships and other assertions for this RC in the knowledge graph. (Since we are not working with all KBpedia files, this portion may not be as complete as when all files are included.)
Thie general process can be repeated over and over to gain an understanding. You can navigate the tree via scrolling and expanding and collapsing nodes, or searching for terms or stems as you encounter then. Of course, both navigation and searching are done concurrently during discovery mode. It is this process, in my view, that best leads to first twitch for KBpedia by better understanding the structure, scope and relationships for the graph’s 58 K reference concepts.
These same conventions and approaches may also be used for understanding the properties (relations) in KBpedia, as I show in Figure 5. First, note (1) we have split our properties into three groups: object properties, data properties, and annotation properties:
These are the standard splits in the OWL language. In essence, object properties are those that connect to an item (with a URI or IRI) already in the system; data properties are literal strings and descriptions connected to the subject item; and annotation properties are those that describe or point to the item. We’ll just use an object property example here, though the use and navigation applies to the other two property categories as well.
The Object properties tab in Figure 5 also has a search function (2), exactly similar to what was described for classes. We also see a tree structure at the left that works the same as for classes (3). However, besides the relations splits due to Peirce, there are two other major property differences for KBpedia compared to most knowledge graphs or ontologies. The first difference is the sheer number of properties, more than 5 K in the case of KBpedia. The second is the logical organization of those properties, beginning with the three splits due to Peirce, but extending down to an emerging, logical hierarchy of property types.
To see some of this, let’s do a search for the property ‘color’ [(2) in Figure 5]. The result, again working similar to what we saw for classes, I show in Figure 6:
Like before, we now see an expanded tree highlighting the ‘color’ property (1), again accompanied by metadata and other structural aspects of the Object properties (2).
As before, you can use a combination of scrolling, tree expansions and searching to discover the properties in KBpedia. Do make sure and check out the Data properties and Annotation properties tabs as well.
I encourage you to spend some time navigating the classes and properties tabs and searching for things of interest across the structure. Look horizontally across the many higher-level categories under the Generals main branch. Find some areas of interest and continue to expand the tree to dive deeper into those categories. Spend time using the search function and restrict searches by turning the various search options (found at the top of the Search window; see Figure 3). You can also highlight portions of results in the search pane and copy them to the clipboard for pasting into other applications.
There are many views, tabs, and plugins available to Protégé, importantly including reasoners and other extended capabilities such as visualization, documentation, querying, or exporting. We will find occasion to instruct in the use of some of these throughout the CWPK series.