Posted:August 26, 2020

CWPK #23: Text Searching KBpedia

Using the Direct Approach with Owlready2

In this installment of the Cooking with Python and KBpedia series, we explore ways to directly search knowledge graph text from within the owlready2 API. We first introduced this topic in CWPK #19; we explain further some of the nuances here.

Recall that owlready2 uses its own local datastore, SQLite, for storing its knowledge graphs. Besides the search functionality added in Owlready2, we will also be taking advantage of the full-text search (FTS) functionality within SQLite.

Load Full Knowledge Graph

To get started, we again load our working knowledge graph. In this instance we will use the full KBpedia knowledge graph, kbpedia_reference_concepts.owl, because it has a richer set of contents.

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, use this address for kbpedia_reference_concepts.owl
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/owlready2/kg/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)

To execute the load, pick shift+enter to execute the cell contents, or pick Run from the main menu.

Besides changing our absolute file input, note we have added another scoping assignment world to our load. world is a reserved keyword in Owlready2 that encompasses the SQLite storage space used by Owlready2. Note we assign all of our ontologies (knowledge graphs) to this namespace so that we may invoke some of the FTS functionality later in this installment.

Basic Search Functions

As the owlready2 documentation explains, it contains some pre-loaded search capabilities that can be performed with the .search() query method. This method can accept one or several keyword arguments:

  • iri – for searching entities by their full IRIs
  • type – for searching instances for a given class
  • subclass_of – for searching subclasses of a given class
  • is_a – for searching both instances and subclasses of a given class, or object, data or annotation property name.

Special arguments that may be added to these arguments are:

  • _use_str_as_loc_str – whether to treats plain Python strings as strings in any language (default is True)
  • _case_sensitive – whether to take lower/upper case into consideration (default is True).

Our search queries may accept quoted phrases and prefix or suffix wildcards (*). Let’s look at some examples combining these arguments and methods. Our first one is similar to what we presented in CWPK #19:

world.search(iri = "*luggage*")
[]

Notice our result here is an empty set, in other words, no matches. Yet we know there are IRIs in KBpedia that include the term ‘luggage’. We suspect the reason for not seeing a match is that the term might start with upper case in our IRIs. We will set the case sensitivity argument to false and try again:

world.search(iri = "*luggage*", _case_sensitive = False)

Great! We are now seeing the results we expected.

Note in the query above that we used the wildcard (*) to allow for either prefix or suffix matches. As you can see from the results above, most of the search references match the interior part of the IRI string.

The iri argument takes a search string as its assignment. The other three keyword assignments noted above take an object name, as this next example shows:

world.search(subclass_of=rc.Mammal)

We get a tremendous number of matches on this query, so much so that I cleared away the current cell output (via Cell → Current Outputs → Clear, when highlighting this cell). To winnow this results set further, we can combine search terms as the next example shows. We will add to our initial search a string search in the IRIs for which prior results might represent ‘Bats’:

world.search(subclass_of=rc.Mammal, iri = "*Bat*")
[rc.Bat-Mammal, rc.SacWingedBat, rc.BulldogBat, rc.FreeTailedBat, rc.HorseshoeBat, rc.SchreibersBat, rc.WesternSuckerFootedBat, rc.AfricanLongFingeredBat, rc.AfricanYellowBat, rc.AllensYellowBat, rc.AsianPartiColoredBat, rc.DaubentonsBat, rc.EasternRedBat, rc.GreatEveningBat, rc.GreaterTubeNosedBat, rc.GreyLongEaredBat, rc.HawaiianHoaryBat, rc.KobayashisBat, rc.LesserYellowBat, rc.LittleTubeNosedBat, rc.NewGuineaBigEaredBat, rc.NewZealandLongTailedBat, rc.NorthernLongEaredBat, rc.PallidBat, rc.SilverHairedBat, rc.SpottedBat, rc.AfricanSheathTailedBat, rc.AmazonianSacWingedBat, rc.BeccarisSheathTailedBat, rc.ChestnutSacWingedBat, rc.DarkSheathTailedBat, rc.EcuadorianSacWingedBat, rc.EgyptianTombBat, rc.FrostedSacWingedBat, rc.GraySacWingedBat, rc.GreaterSacWingedBat, rc.GreaterSheathTailedBat, rc.GreenhallsDogFacedBat, rc.HamiltonsTombBat, rc.HildegardesTombBat, rc.LargeEaredSheath-TailedBat, rc.LesserSacWingedBat, rc.LesserSheathTailedBat, rc.MauritianTombBat, rc.NorthernGhostBat, rc.PacificSheathTailedBat, rc.PelsPouchedBat, rc.PeterssSheathTailedBat, rc.ProboscisBat, rc.RaffraysSheathTailedBat, rc.SerisSheathtailBat, rc.SeychellesSheathTailedBat, rc.ShaggyBat, rc.ShortEaredBat, rc.SmallAsianSheathTailedBat, rc.TheobaldsTombBat, rc.ThomassSacWingedBat, rc.TroughtonsPouchedBat, rc.AntilleanFruitEatingBat, rc.BidentateYellowEaredBat, rc.BigEaredWoolyBat, rc.CommonVampireBat, rc.HairyLeggedVampireBat, rc.HonduranWhiteBat, rc.LesserLongNosedBat, rc.MexicanLongNosedBat, rc.SpectralBat, rc.VampireBat, rc.WhiteWingedVampireBat, rc.GreaterBulldogBat, rc.LesserBulldogBat, rc.BigCrestedMastiffBat, rc.BigFreeTailedBat, rc.BlackBonnetedBat, rc.BroadEaredBat, rc.EuropeanFreeTailedBat, rc.GallaghersFreeTailedBat, rc.IncanLittleMastiffBat, rc.LittleGoblinBat, rc.MexicanFreeTailedBat, rc.NatalFreeTailedBat, rc.PetersWrinkleLippedBat, rc.SumatranMastiffBat, rc.WroughtonsFreeTailedBat, rc.LesserFalseVampireBat, rc.YellowWingedBat, rc.BigNakedBackedBat, rc.ParnellsMustachedBat, rc.NewZealandGreaterShortTailedBat, rc.NewZealandLesserShortTailedBat, rc.BatesSlitFacedBat, rc.LargeSlitFacedBat, rc.DayakFruitBat, rc.LittleMarianaFruitBat, rc.LivingstonesFruitBat, rc.MarianaFruitBat, rc.PetersDiskWingedBat, rc.Bat-earedFox, rc.AboBat, rc.JapaneseHouseBat, rc.BigBrownBat, rc.NorthernBat, rc.GrayBat, rc.GreaterMouseEaredBat, rc.HodgsonsBat, rc.IkonnikovsBat, rc.IndianaBat, rc.LittleBrownBat, rc.NatterersBat, rc.GreaterNoctuleBat, rc.VirginiaBigEaredBat, rc.GreaterHorseshoeBat, rc.LamottesRoundleafBat, rc.LesserHorseshoeBat, rc.MalayanRoundleafBat, rc.VietnamLeafNosedBat, rc.PersianTridentBat, rc.BondaMastiffBat, rc.VelvetyFreeTailedBat, rc.WesternMastiffBat, rc.MexicanFunnelEaredBat, rc.AndersensFruitEatingBat, rc.JamaicanFruitBat]

Again, we get a large number of results. There are clearly many mammals and bats within the KBpedia reference graph!

Per the listing above, there are a number of these pre-configured search arguments directly available through Owlready2.

We can also instruct the FTS system in SQLite that we want to index still additional fields. Since we are interested in a term we know occurs in KBpedia’s annotations relating some reference concepts to the UN standard products and services codes (UNSPSC) we try that search directly:

world.search(entered = "*UNSPSC*")
[]

Hmm, this tells us there are no results. We must be missing an indexed field. So, let’s instruct the system to add indexing to the definition property where we suspect the reference may occur. We do so using the .append method to add a new field for our RC definitions (skos.definition) to the available FTS index structure:

world.full_text_search_properties.append(skos.definition)

Since this is just a simple assignment, when we Run the cell we get no results output.

However, that assignment now allows us to invoke the internal FTS (full-text search) argument:

world.search(definition = FTS("UNSPSC*"))

If you get an ‘operational error’ that means you did not Run the .append instruction above.

Like some of the other listings, this command results in a very large number of results, a couple of which are warnings we can ignore, so we again Clear the Cell. We can get a smaller listing with another keyword search, this time for the wildcarded ‘gear*’ search:

world.search(definition = FTS("gear*"))
[rc.undercarriage, rc.number-of-forward-gears, rc.vehicle-transmission, rc.AutomaticTransmission, rc.BearingBushingWheelGear, rc.BevelGear, rc.Bicycle-MultiGear, rc.BoeingAH-64Apache, rc.BugattiVeyron, rc.ChildrensWebSite, rc.CombatSportsEvent, rc.Commercialism, rc.CyclingClothing, rc.Device-FunctionallyDefective, rc.FirstNorthAmericansNovels, rc.Fishery, rc.FreeDiving, rc.Game-EquipmentSet, rc.Gear, rc.GearManufacturingMachine, rc.Gearing-Mechanical, rc.GearlessElectricDrive, rc.Goggles, rc.Harness-Animal, rc.Helmet, rc.IlyushinIl-30, rc.LandingGearAssembly, rc.MachineProtocol, rc.Mechanism-Technology, rc.Overdrive-Mechanics, rc.PinionGear, rc.ProtectiveEquipment-Human, rc.ProtectiveGear, rc.ScubaGear, rc.ScubaSnorkelingGear, rc.ShockAndAwe-MilitaryTactic, rc.Supercharger, rc.TeacherTrainingProgram, rc.Trek, rc.Wheel, rc.WildernessBackpacking, rc.WormGear]

Notice in this search that we are able to use the suffix wildcard () character. However, unlike the standard OWLready2 search, we are not able to use a wildcard () prefix search.

Since we have added a new indexed search table to our system, we may want to retain this capability. So, we decide to save the entire graph to the database, as the last example shows:

world.set_backend(filename = 'cwpk-23-text-searching-kbpedia.db', exclusive = False)

This now means our database has been saved persistently to disk.

If you run this multiple times you may get an operational error since you have already set the backend filename.

We can then .save() our work and exit the notebook.

world.save()

Additional Documentation

Here is additional information on the system’s text searching capabilities:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #23: Text Searching KBpedia

alternativeHeadline:
Using the Direct Approach with Owlready2

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
In this installment of the CWPK series, we explore ways to directly search knowledge graph text from within the owlready2 API.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published. Required fields are marked *