Posted:August 28, 2020

CWPK #25: Querying KBpedia with SPARQL

Now, We Open Up the Power

In our recent installments we have been looking at how to search — ultimately, of course, related to how to extract — information from our knowledge graph, KBpedia, and the various large-scale knowledge bases to which it maps, such as Wikipedia, DBpedia, and Wikidata. We’ve seen that owlready2 offers us some native search capabilities, and that we can extend that by indexing additional attributes. What is powerful about knowledge graphs, however, is that all nodes and all edges are structural from the get-go, and we can easily add meaningful structure to our searches by how we represent the pieces (nodes) and by how we relate, or connect, them using the edges.

Today’s knowledge graphs are explicit in organizing information by structure. The exact scope of this structure varies across representations, and certainly one challenge in getting information to work together from multiple locations and provenances is the diversity of these representations. Those are the questions of semantics, and, fortunately, semantic technologies and parsers give us rich ways to retrieve and relate that structure. So, great, we now have structure galore! What are we going to do with it?

Well, this structured information exists, literally, everywhere. We have huge online structured datestores, trillions of semi-structured Web pages and records, and meaningful information and analysis across a rich pastiche of hierarchies and relationships. What is clear in any attempt to solve a meaningful problem is that we need much external information as well as much grounding in our internal circumstances. Problem solving can not be separated from obtaining and integrating meaningful information.

Thus, it is essential that we be able to query external information stores on an equivalent basis to our local ones. This equivalence requires both internal and external sources be structured and queriable on an equivalent basis, which is where the W3C-enabled standards and SPARQL come in.

The Role of SPARQL

I think one can argue that the purpose of semantic technologies like RDF and OWL is to enable a machine-readable format for human symbolic information. As a result, we now have a rich suite of standards and implementations using those standards.

The real purpose, and advantage, of SPARQL is to make explicit all of the structural aspects of a knowledge graph to inspection and query. Because of this intimate relationship, SPARQL is more often than not the most capable and precise language for extracting information from ontologies or knowledge graphs. SPARQL, pronounced “sparkle”, is a recursive acronym for SPARQL Protocol and RDF Query Language, and has many syntactical and structural parallels with the SQL database query language.

All explicit assignments of a semantic term in RDF or OWL or their semantic derivatives can be used as a query basis in SPARQL. Thus, SPARQL is the sine qua non option for obtaining information from an ontology or knowledge graph. SPARQL is the most flexible and responsive way to manipulate a semantically structured information store.

Let’s inspect the general components of a SPARQL query specification:

SPARQL Query Specification
Figure 1: SPARQL Query Specification

This figure is from Lee Feigenbaum’s SPARQL slides, included with other useful links under the Additional Documentation below.

Note that every SPARQL query gets directed to a specific endpoint, where access to the underlying RDF datastore takes place. These endpoints can be either local or accessed via the Web, with both examples shown below. In a standalone situation, the endpoint location is indicated by the FROM keyword. In our examples using RDFLib via Owlready2, these locations are set to a Python object.

Extended Startup

Let’s start again with the start-up script we used in the last installment, only now also opening rdflib and relating its namespace graph to the world namespace of KBpedia.

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

import rdflib

graph = world.as_rdflib_graph()

We could have put the import statement for the RDFLib package at the top, but anywhere prior to formatting the query is fine.

We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace world for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the graph namespace. This is a great example of the Python ecosystem at work.

Further, because of even greater integration, there are some native commands in Owlready2 that have been mapped to RDFLib making the syntax and conventions in working with both libraries easier.

Basic SPARQL Forms

In the last installment we presented two wrinkles for how to express your SPARQL queries to your local datastore. This form I noted looked closer to a standard SPARQL expression shown in Figure 1:

form_1 = list(graph.query_owlready("""
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""))

print(form_1)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]
* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:
http://kbpedia.org/kko/rc/Person
http://kbpedia.org/kko/rc/HomoSapiens

The query above has a warning message we can ignore and lists all of the direct sub-classes to Mammal in KBpedia.

The last installment also offered a second form, which is the one I will be using hereafter. I am doing so because this form, and its further abstraction, is a more repeatable approach. In general, this advantage is because we can take this format and abstract it into a ‘wrapper’ that encapsulates the method of making the SPARQL call separate, abstracted from the actual SPARQL specification. We will increasingly touch on these topics, but for now this is the format we will take:

form_2 = """
  PREFIX rc: <http://kbpedia.org/kko/rc/>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  SELECT DISTINCT ?x ?label
  WHERE
  {
    ?x rdfs:subClassOf rc:Mammal.
    ?x skos:prefLabel  ?label. 
  }
"""

results = list(graph.query_owlready(form_2))
print(results)
[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]

These two examples cover how to access the local datastore.

External SPARQL Examples

We really like what we have seen with the SPARQL querying of the internal data store using RDFLib within Owlready2. But what of querying outside sources. (And, would it not be cool to be able to mix-and-match internal and external stuff?)

As we try to use RDFLib as is against external SPARQL endpoints we quickly see that we are not adequately identifying and talking with these sites. Well, we have been here before, but the nature of stuff with Python and packages and dependencies and such often requires another capability.

Some quick poking turns up that we are lacking a HTTP-aware ‘wrapper’ to external sites. We turn up a promising package in sparqlwrapper. We discover it is on conda-forge so we back out the system, and at the command line add the package:

$ conda install sparqlwrapper

We again get the feedback to the screen as the Anaconda configuration manager does its thing. When finally installed and the prompt returns, we again load up Jupyter Notebook and return to this notebook page.

We are now ready to try our first external example, this time to Wikidata, after we import SPARQLwrapper and set our endpoint target to Wikidata (https://query.wikidata.org/sparql):

from SPARQLWrapper import SPARQLWrapper, JSON
from rdflib import Graph

sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

sparql.setQuery("""
  PREFIX schema: <http://schema.org/>
  SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {
  VALUES ?item { wd:Q25297630
  wd:Q537127
  wd:Q16831714
  wd:Q24398318
  wd:Q11755880
  wd:Q681337
}
  ?item wdt:P910 ?subClass.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print(results)
{'head': {'vars': ['item', 'itemLabel', 'wikilink', 'itemDescription', 'subClass', 'subClassLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q537127'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8667674'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'road bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge that carries road traffic'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Road bridges'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11755880'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8656043'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'residential building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building mainly used for residential purposes'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Residential buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16831714'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q6259373'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'government building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building built for and by the government, such as a town hall'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Government buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q24398318'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q5655238'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'religious building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building intended for religious worship or other activities related to a religion; ceremonial structures that are related to or concerned with religion'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Religious buildings and structures'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q25297630'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7344076'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'international bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge built across a geopolitical boundary'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:International bridges'}}]}}

Great! It works, and our first information retrieval from an external site!

Let me point out a couple of things about this format. First, the endpoint already has some built-in prefixes (wd: and wdt:) so we did not need to declare them in the query header. Second, there are some unique query capabilities of the Wikidata site noted by the SERVICE designation.

When first querying a new site it is perhaps best to stick to vanilla forms of SPARQL, but as one learns more it is possible to tailor queries more specifically. We also see that our setup will allow us to take advantage of what each endpoint gives us.

So, let’s take another example, this one using the DBpedia endpoint, to show how formats may also differ from endpoint to endpoint:

from SPARQLWrapper import SPARQLWrapper, RDFXML
from rdflib import Graph

sparql = SPARQLWrapper("http://dbpedia.org/sparql")

sparql.setQuery("""
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX schema: <http://schema.org/>

    CONSTRUCT {
      ?lang a schema:Language ;
      schema:alternateName ?iso6391Code .
    }
    WHERE {
      ?lang a dbo:Language ;
      dbo:iso6391Code ?iso6391Code .
      FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values
    }
""")

sparql.setReturnFormat(RDFXML)
results = sparql.query().convert()
print(results.serialize(format='xml'))

Notice again how the structure of our query code is pretty patterned. We also see in the two examples how we can specify different query results serializations (JSON and RDFXML in these examples) for our results sets.

Additional Documentation

The idea of a SPARQL tutorial is outside of the defined scope of this CWPK series. But, the power of SPARQL is substantial and it is well worth the time to learn more about this flexible language, that reminds one of SQL in many ways, but has its own charms and powers. Here are some great starting links about SPARQL:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #25: Querying KBpedia with SPARQL

alternativeHeadline:
Now, We Open Up the Power

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
SPARQL is a query language of knowledge graphs that enables explicit retrieval of all structural aspects of the graph, whether it be local or remote via an endpoint. We further explicate the SPARQL capability in this CWPK installment.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published. Required fields are marked *