Using the Direct Approach with Owlready2
In this installment of the Cooking with Python and KBpedia series, we explore ways to directly search knowledge graph text from within the owlready2 API. We first introduced this topic in CWPK #19; we explain further some of the nuances here.
Recall that owlready2 uses its own local datastore, SQLite, for storing its knowledge graphs. Besides the search functionality added in Owlready2, we will also be taking advantage of the full-text search (FTS) functionality within SQLite.
Load Full Knowledge Graph
To get started, we again load our working knowledge graph. In this instance we will use the full KBpedia knowledge graph,
kbpedia_reference_concepts.owl, because it has a richer set of contents.
= 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl' main # main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl' = 'http://www.w3.org/2004/02/skos/core' skos_file = 'C:/1-PythonProjects/owlready2/kg/kko.owl' kko_file # kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl' from owlready2 import * = World() world = world.get_ontology(main).load() kb = kb.get_namespace('http://kbpedia.org/kko/rc/') rc = world.get_ontology(skos_file).load() skos kb.imported_ontologies.append(skos) = world.get_ontology(kko_file).load() kko kb.imported_ontologies.append(kko)
To execute the load, pick
shift+enter to execute the cell contents, or pick Run from the main menu.
Besides changing our absolute file input, note we have added another scoping assignment
world to our load.
world is a reserved keyword in Owlready2 that encompasses the SQLite storage space used by Owlready2. Note we assign all of our ontologies (knowledge graphs) to this namespace so that we may invoke some of the FTS functionality later in this installment.
Basic Search Functions
As the owlready2 documentation explains, it contains some pre-loaded search capabilities that can be performed with the
.search() query method. This method can accept one or several keyword arguments:
iri– for searching entities by their full IRIs
type– for searching instances for a given class
subclass_of– for searching subclasses of a given class
is_a– for searching both instances and subclasses of a given class, or object, data or annotation property name.
Special arguments that may be added to these arguments are:
_use_str_as_loc_str– whether to treats plain Python strings as strings in any language (default is True)
_case_sensitive– whether to take lower/upper case into consideration (default is True).
Our search queries may accept quoted phrases and prefix or suffix wildcards (*). Let’s look at some examples combining these arguments and methods. Our first one is similar to what we presented in CWPK #19:
Notice our result here is an empty set, in other words, no matches. Yet we know there are IRIs in KBpedia that include the term ‘luggage’. We suspect the reason for not seeing a match is that the term might start with upper case in our IRIs. We will set the case sensitivity argument to false and try again:
= "*luggage*", _case_sensitive = False)world.search(iri
Great! We are now seeing the results we expected.
Note in the query above that we used the wildcard (*) to allow for either prefix or suffix matches. As you can see from the results above, most of the search references match the interior part of the IRI string.
iri argument takes a search string as its assignment. The other three keyword assignments noted above take an object name, as this next example shows:
We get a tremendous number of matches on this query, so much so that I cleared away the current cell output (via Cell → Current Outputs → Clear, when highlighting this cell). To winnow this results set further, we can combine search terms as the next example shows. We will add to our initial search a string search in the IRIs for which prior results might represent ‘Bats’:
=rc.Mammal, iri = "*Bat*")world.search(subclass_of
[rc.Bat-Mammal, rc.SacWingedBat, rc.BulldogBat, rc.FreeTailedBat, rc.HorseshoeBat, rc.SchreibersBat, rc.WesternSuckerFootedBat, rc.AfricanLongFingeredBat, rc.AfricanYellowBat, rc.AllensYellowBat, rc.AsianPartiColoredBat, rc.DaubentonsBat, rc.EasternRedBat, rc.GreatEveningBat, rc.GreaterTubeNosedBat, rc.GreyLongEaredBat, rc.HawaiianHoaryBat, rc.KobayashisBat, rc.LesserYellowBat, rc.LittleTubeNosedBat, rc.NewGuineaBigEaredBat, rc.NewZealandLongTailedBat, rc.NorthernLongEaredBat, rc.PallidBat, rc.SilverHairedBat, rc.SpottedBat, rc.AfricanSheathTailedBat, rc.AmazonianSacWingedBat, rc.BeccarisSheathTailedBat, rc.ChestnutSacWingedBat, rc.DarkSheathTailedBat, rc.EcuadorianSacWingedBat, rc.EgyptianTombBat, rc.FrostedSacWingedBat, rc.GraySacWingedBat, rc.GreaterSacWingedBat, rc.GreaterSheathTailedBat, rc.GreenhallsDogFacedBat, rc.HamiltonsTombBat, rc.HildegardesTombBat, rc.LargeEaredSheath-TailedBat, rc.LesserSacWingedBat, rc.LesserSheathTailedBat, rc.MauritianTombBat, rc.NorthernGhostBat, rc.PacificSheathTailedBat, rc.PelsPouchedBat, rc.PeterssSheathTailedBat, rc.ProboscisBat, rc.RaffraysSheathTailedBat, rc.SerisSheathtailBat, rc.SeychellesSheathTailedBat, rc.ShaggyBat, rc.ShortEaredBat, rc.SmallAsianSheathTailedBat, rc.TheobaldsTombBat, rc.ThomassSacWingedBat, rc.TroughtonsPouchedBat, rc.AntilleanFruitEatingBat, rc.BidentateYellowEaredBat, rc.BigEaredWoolyBat, rc.CommonVampireBat, rc.HairyLeggedVampireBat, rc.HonduranWhiteBat, rc.LesserLongNosedBat, rc.MexicanLongNosedBat, rc.SpectralBat, rc.VampireBat, rc.WhiteWingedVampireBat, rc.GreaterBulldogBat, rc.LesserBulldogBat, rc.BigCrestedMastiffBat, rc.BigFreeTailedBat, rc.BlackBonnetedBat, rc.BroadEaredBat, rc.EuropeanFreeTailedBat, rc.GallaghersFreeTailedBat, rc.IncanLittleMastiffBat, rc.LittleGoblinBat, rc.MexicanFreeTailedBat, rc.NatalFreeTailedBat, rc.PetersWrinkleLippedBat, rc.SumatranMastiffBat, rc.WroughtonsFreeTailedBat, rc.LesserFalseVampireBat, rc.YellowWingedBat, rc.BigNakedBackedBat, rc.ParnellsMustachedBat, rc.NewZealandGreaterShortTailedBat, rc.NewZealandLesserShortTailedBat, rc.BatesSlitFacedBat, rc.LargeSlitFacedBat, rc.DayakFruitBat, rc.LittleMarianaFruitBat, rc.LivingstonesFruitBat, rc.MarianaFruitBat, rc.PetersDiskWingedBat, rc.Bat-earedFox, rc.AboBat, rc.JapaneseHouseBat, rc.BigBrownBat, rc.NorthernBat, rc.GrayBat, rc.GreaterMouseEaredBat, rc.HodgsonsBat, rc.IkonnikovsBat, rc.IndianaBat, rc.LittleBrownBat, rc.NatterersBat, rc.GreaterNoctuleBat, rc.VirginiaBigEaredBat, rc.GreaterHorseshoeBat, rc.LamottesRoundleafBat, rc.LesserHorseshoeBat, rc.MalayanRoundleafBat, rc.VietnamLeafNosedBat, rc.PersianTridentBat, rc.BondaMastiffBat, rc.VelvetyFreeTailedBat, rc.WesternMastiffBat, rc.MexicanFunnelEaredBat, rc.AndersensFruitEatingBat, rc.JamaicanFruitBat]
Again, we get a large number of results. There are clearly many mammals and bats within the KBpedia reference graph!
Per the listing above, there are a number of these pre-configured search arguments directly available through Owlready2.
Full Text Search
We can also instruct the FTS system in SQLite that we want to index still additional fields. Since we are interested in a term we know occurs in KBpedia’s annotations relating some reference concepts to the UN standard products and services codes (UNSPSC) we try that search directly:
Hmm, this tells us there are no results. We must be missing an indexed field. So, let’s instruct the system to add indexing to the
definition property where we suspect the reference may occur. We do so using the
.append method to add a new field for our RC definitions (
skos.definition) to the available FTS index structure:
Since this is just a simple assignment, when we Run the cell we get no results output.
However, that assignment now allows us to invoke the internal FTS (full-text search) argument:
If you get an ‘operational error’ that means you did not Run the
.append instruction above.
Like some of the other listings, this command results in a very large number of results, a couple of which are warnings we can ignore, so we again Clear the Cell. We can get a smaller listing with another keyword search, this time for the wildcarded ‘gear*’ search:
[rc.undercarriage, rc.number-of-forward-gears, rc.vehicle-transmission, rc.AutomaticTransmission, rc.BearingBushingWheelGear, rc.BevelGear, rc.Bicycle-MultiGear, rc.BoeingAH-64Apache, rc.BugattiVeyron, rc.ChildrensWebSite, rc.CombatSportsEvent, rc.Commercialism, rc.CyclingClothing, rc.Device-FunctionallyDefective, rc.FirstNorthAmericansNovels, rc.Fishery, rc.FreeDiving, rc.Game-EquipmentSet, rc.Gear, rc.GearManufacturingMachine, rc.Gearing-Mechanical, rc.GearlessElectricDrive, rc.Goggles, rc.Harness-Animal, rc.Helmet, rc.IlyushinIl-30, rc.LandingGearAssembly, rc.MachineProtocol, rc.Mechanism-Technology, rc.Overdrive-Mechanics, rc.PinionGear, rc.ProtectiveEquipment-Human, rc.ProtectiveGear, rc.ScubaGear, rc.ScubaSnorkelingGear, rc.ShockAndAwe-MilitaryTactic, rc.Supercharger, rc.TeacherTrainingProgram, rc.Trek, rc.Wheel, rc.WildernessBackpacking, rc.WormGear]
Notice in this search that we are able to use the suffix wildcard () character. However, unlike the standard OWLready2 search, we are not able to use a wildcard () prefix search.
Since we have added a new indexed search table to our system, we may want to retain this capability. So, we decide to save the entire graph to the database, as the last example shows:
= 'cwpk-23-text-searching-kbpedia.db', exclusive = False)world.set_backend(filename
This now means our database has been saved persistently to disk.
If you run this multiple times you may get an operational error since you have already set the backend filename.
We can then
.save() our work and exit the notebook.
Here is additional information on the system’s text searching capabilities: