Shifting Our Focus to How to Use the Knowledge Graph
Today’s installment marks a kind of a turning point in our Cooking with Python and KBpedia series. Now that our procedures for building, extracting, and managing the knowledge graph itself are in place, we can shift gears to explore how we may use this knowledge artifact. In today’s installment, I introduce a number of category of tools for doing so, and point to their more formal treatment in ensuing installments.
Some of our tools treat the knowledge graph as an object unto itself, providing statistics, logging, and traversals. Some tools allow us to publish these interactive Notebooks or enable external access to the knowlege graph. These measures are largely independent of the specific content in KBpedia. Some tools aid visualizations, sometimes dynamic. Some of the tools enable us to do advanced analysis or machine learning. We can also do much with natural language processing and understanding. We can also find novel representations of our knowledge graph — as a graph, as documents, as relations, or as terms — that we can embed in our learners.
These are the topics of most of our remaining installations in this CWPK series. Consider this installment as an introduction, then, to the remainder of this series. I present these tool clusters in approximate order of treament.
There is a wealth of counts and distributions of various resources within the KBpedia knowledge graph (or any graph, for that matter). Some of these statistics are automatically provided when using something like the Protégé editor. We will accept these statistics as given and concentrate on other counts and statistics not provided by Protégé that we may calculate directly from the knowledge graph with Python. We address stats in the next CWPK installment.
Python comes with a very capable
logging module that is more useful than
There are many wonderful charting packages available in Python, some of which also are designed to work nicely with interactive Notebook pages. We’ll survey these options and present some charting utilities as part of cowpoke in CWPK #55. We will use some of the stats calculated in the prior installment to provide the data for these charts.
Graphing and Graph Extraction
A knowledge graph, duh, has a graph or network structure. Many properties (edges) connect multiple nodes (classes or reference concepts in the case of KBpedia). It is difficult to visualize these structures in their entirety, and sometimes computationally intense to render them. It is also possible to extract out local portions of the graph, and presenting a simpler sub-graph representation.
Graph visualization and extract has many fewer options than charting, and ease-of-use and performance can also be challenges. We will inspect what is available with Python and make some visualization selections in CWPK #56.
The progress of covering these topics means that sufficient criteria for eventually going live with the Cooking with Python and KBpedia series have been met and it is time to start the public release of the series. In the process, we will need to set up remote instances of cowpoke and KBpedia, establish and endpoint for querying it, and begin to publish our Notebook series. We want to publish those Notebooks such that they retain their interactivity.
Researching, deciding upon, and then implementing the choices for these tools occupies four installments from CWPK #57 to #60.
Natural Language Processing
In our Part VI that concludes the new coding and substantive portions of this series (CWPK #61 to CWPK #71) we have much occasion to explore some additional specialty topics in natural language processing and understanding. These examples will tie into some of the most important Python packages available in NLP.
One major use of knowledge graqhs is providing the grist to various embedding models. Because of KBpedia’s mappings to full-length articles in Wikipedia, we also have the option of employing document or word-rich embedding models. We will tee up the embedding approach in CWPK #63 and then develop specific models in CWPK #64. We will stage and test a number of different embedding models, ranging from single term or concept ones to ones that leverage virtually all structural aspects of the knowledge graph.
The biggest chunk of installments in the CWPK series involves machine learning in a variety of forms. These examples undertake some aggressive uses of the knowledge graph, and then concludes the series with summaries of operating procedures and steps. We first look at ‘standard’ machine learning, with an emphasis on selecting models, creating training sets, setting parameters, and tuning performance. We then devote four installments (CWPK #67 to CWPK #70) to so-called ‘deep learning’ models. We are then able to assemble up and compare results from all classifier learning activities in CWPK #71.
The concluding last installments are all narrative in nature and wrap-up the series and provide summaries of use steps and general guidance.