Posted:September 2, 2020

CWPK #28: Extracting Structure for Typologies

We Extract a Typology Scaffolding from an Active KG

In this installment of the Cooking with Python and KBpedia series, we work out in a Python code block how to extract a single typology from the KBpedia knowledge graph. To refresh your memory, KBpedia has an upper, ‘core’ ontology, the KBpedia Knowledge Ontology (KKO) that has a bit fewer than 200 top-level concepts. About half of these concepts are connecting points we call ‘SuperTypes’, that also function as tie-in points to underlying tree structures of reference concepts (RCs). (Remember there are about 58,000 RCs across all of KBpedia.)

We call each tree structure a ‘typology’, which has a root concept that is one of the upper SuperType concepts. The tree structures in each typology are built from rdfs:subClassOf relations, also known as ‘is-a‘. The typologies range in size from a few hundred RCs to multiple thousands in some cases. The combination of the upper KKO structure and its supporting 70 or so typologies provide the conceptual backbone to KBpedia. We discussed this general terminology in our earlier CWPK #18 installment.

Each typology extracted from KBpedia can be inspected as a standalone ontology in something like the Protégé IDE. Typologies can be created or modified offline and then imported back into KBpedia, steps we will address in later installments. The individual typologies are modular in nature, and a bit easier to inspect and maintain when dealt with independently of the entire KBpedia structure.

Starting and Load

We begin with our standard opening routine, though we are a bit more specific about identifying prefixes in our name spaces:

Which environment? The specific load routine you should choose below depends on whether you are using the online MyBinder service (the ‘raw’ version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (#) out.
main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'
# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'
skos_file = 'http://www.w3.org/2004/02/skos/core' 
kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'
# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'

from owlready2 import *
world = World()
kb = world.get_ontology(main).load()
rc = kb.get_namespace('http://kbpedia.org/kko/rc/')               

skos = world.get_ontology(skos_file).load()
kb.imported_ontologies.append(skos)
core = world.get_namespace('http://www.w3.org/2004/02/skos/core#')

kko = world.get_ontology(kko_file).load()
kb.imported_ontologies.append(kko)
kko = kb.get_namespace('http://kbpedia.org/ontologies/kko#')

Like always, we execute each cell as we progress down this notebook page by pressing shift+enter for the highlighted cell or by choosing Run from the notebook menu.

We will start by picking one of our smaller typologies on InquiryMethods since its listing is a little easier to handle than one of the bigger typologies (such as Products or Animals). Unlike most all of the other RCs which are labeled in the singular, note we use plural names for these SuperType RCs.

The SuperType is also the ‘root’ of the typology. What we are going to do is use the owlready2 built-in descendants() method for extracting out a listing of all children, grandchildren, etc., starting with our root. (Another method, ancestors() navigates in the opposite direction to grab parents, grandparents, etc., all the way up to the ultimate root of any OWL ontology, owl:Thing.) Note in these commands that we are also removing the starting node from our listing as shown in the last statement:

root = kko.InquiryMethods
s_set=root.descendants()
s_set.remove(root)
* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:
http://kbpedia.org/kko/rc/Cognition
http://kbpedia.org/kko/rc/AnimalCognition

Owlready2 has an alternate way to not include the starting class in its listing, using the include_self = False argument. You may want to clear your memory to test this one:

root = kko.InquiryMethods
s_set=root.descendants(include_self = False)

We can then see the members of s_set:

list(s_set)
[rc.DriverVisionTest,
rc.StemCellResearch,
rc.AnalyticNumberTheory,
rc.ComputationalGroupTheory,
rc.HeuristicSearching,
rc.MedicalResearch,
rc.Comparing,
rc.YachtDesign,
rc.PGroups,
rc.SolarSystemModel,
rc.AirNavigation,
rc.CriticismOfMarriage,
rc.ScientificObservation,
rc.PokerStrategy,
rc.MesoscopicPhysics,
rc.Reasoning,
rc.SalesContractNegotiation,
rc.SocraticDialogue,
rc.ArgumentFromMorality,
rc.GramStainTest,
rc.Checking-Evaluating,
rc.TwinStudies,
rc.ComputationalNumberTheory,
rc.Surveillance,
rc.MethodsOfProof,
rc.InfiniteGroupTheory,
rc.Examination-Investigation,
rc.MedicalEvaluationWithImaging,
rc.Diagnosing,
rc.TragedyOfTheCommons,
rc.Survey,
rc.RepresentationTheory,
rc.SportsTraining,
rc.CelestialNavigation,
rc.Metatheorem,
rc.ModelingAndSimulation,
rc.CriticismOfMormonism,
rc.QuantumPhase,
rc.Evaluating,
rc.LatticeModel,
rc.BreastCancerScreening,
rc.SolvingAProblem,
rc.NetworkTheory,
rc.AnalyzingSomething,
rc.TransfiniteCardinal,
rc.PointGroup,
rc.CriminalInvestigation,
rc.AuthenticationEvent,
rc.FailingSomething,
rc.BargainingTheory,
rc.AdministrativeCourt,
rc.Circumnavigation,
rc.AcademicTesting,
rc.CriticismOfTheUnitedNations,
rc.ScientificTheory,
rc.NavalIntelligence,
rc.InterpretationsOfQuantumMechanics,
rc.AtomicModel,
rc.UndercoverOperation-LawEnforcement,
rc.HearingTest,
rc.IntegerSequence,
rc.ThoughtExperimentsInQuantumMechanics,
rc.Models,
rc.AdditiveCategory,
rc.UnitedStatesDiplomaticCablesLeak,
rc.CausalFallacy,
rc.ResearchEthics,
rc.VerificationOfCredit,
rc.FundamentalStockAnalysis,
rc.Gentrification,
rc.EvolutionaryGameTheory,
rc.CategoryTheoreticCategory,
rc.Geolocation,
rc.WeaponsTesting,
rc.AtmosphericDispersionModeling,
rc.FilmCriticismOnline,
rc.MathematicalTheory,
rc.ProbabilityAssessment,
rc.SetTheory,
rc.MathematicalQuantization,
rc.RapidStrepTest,
rc.Contrast,
rc.ForensicToxicology,
rc.RandomGraph,
rc.MedicalTesting,
rc.MonteCarloMethod,
rc.CategoricalLogic,
rc.PopulationModel,
rc.CognitiveBias,
rc.AmericanCollegeTestingProgramAssessment,
rc.VettingASource,
rc.TomographyScan,
rc.BodyFarm,
rc.ClosedCategory,
rc.EurovisionSongThatScoredNoPoints,
rc.TheoreticalPhysics,
rc.CosmologicalSimulation,
rc.StochasticProcess,
rc.NonlinearSystem,
rc.HiddenVariableTheory,
rc.SurveillanceScandal,
rc.DrugTestWithUrine,
rc.LatticePoint,
rc.GraduateManagementAdmissionTest,
rc.SystemsThinking,
rc.NeutralBuoyancyTraining,
rc.ClinicalHumanDrugTrial,
rc.ProbabilityInterpretation,
rc.ScientificModeling,
rc.InductiveInferenceProcess,
rc.TheoryOfProbabilityDistribution,
rc.UrbanExploration,
rc.SchroedingerEquation,
rc.ChoiceModelling,
rc.MedicalResearchProject,
rc.MedicalPhotographyAndIllustration,
rc.AuditingFinancialRecords,
rc.ClinicalTrial,
rc.ElementaryNumberTheory,
rc.DaggerCategory,
rc.RealTimeSimulation,
rc.SyntheticApertureRadar,
rc.VerificationOfTruth,
rc.LocalAuthoritySearch,
rc.BiomedicalResearchService,
rc.RequestingInformation,
rc.DualityTheory,
rc.FiniteModelTheory,
rc.CriticismOfIslamism,
rc.TheoryOfGravitation,
rc.FinancialRatio,
rc.QuantumMeasurement,
rc.MedicalUltrasonography,
rc.Experimenting,
rc.ForensicPhotography,
rc.ModularArithmetic,
rc.GroupAutomorphism,
rc.JobInterview,
rc.SatelliteMeteorologyAndRemoteSensing,
rc.PathologyResearchService,
rc.Functor,
rc.RobotNavigation,
rc.Evaluation,
rc.HiddenMarkovModel,
rc.CriticismOfMonotheism,
rc.RegressionDiagnostic,
rc.ExteriorInspection,
rc.PositronEmissionTomography,
rc.QuadraticForm,
rc.ForensicEntomology,
rc.UniversalAlgebra,
rc.WebBasedSimulation,
rc.PropositionalFallacy,
rc.Staring,
rc.HumanAttributeTesting,
rc.BritishNuclearTestsAtMaralinga,
rc.HigherCategoryTheory,
rc.Intention,
rc.PreclassicalEconomics,
rc.AbductiveInferenceProcess,
rc.NonparametricRegression,
rc.DrugTest,
rc.ModularForm,
rc.FoundationalQuantumPhysics,
rc.SimulationSoftware,
rc.Radiography,
rc.DiracEquation,
rc.GraduateRecordExamination,
rc.FreeAlgebraicStructure,
rc.PsychiatricModel,
rc.ClinicalResearch,
rc.VerificationOfEmployment,
rc.DrugEvaluation,
rc.DecisionTheory,
rc.LimitsCategoryTheory,
rc.CriticalThinking,
rc.WoodenArchitecture,
rc.RegressionWithTimeSeriesStructure,
rc.TheoryOfRelativity,
rc.Rejecting-CommunicationAct,
rc.Thinking-NonPurposeful,
rc.InfiniteGraph,
rc.ScientificMethod,
rc.Scrutiny,
rc.TechnologyDevelopment,
rc.CuringADisease,
rc.GaugeTheory,
rc.DigitalForensics,
rc.HomologicalAlgebra,
rc.LatentVariableModel,
rc.LegalReasoning,
rc.BiblicalCriticism,
rc.AutomaticIdentificationAndDataCapture,
rc.PerformanceReview,
rc.Morphism,
rc.LanguageModeling,
rc.CriticismOfCreationism,
rc.RobustRegression,
rc.PsychologicalTesting,
rc.Discipline,
rc.ElectroweakTheory,
rc.DeductiveInferenceProcess,
rc.ProbabilityFallacy,
rc.Remedy,
rc.AlternativesToAnimalTesting,
rc.Parastatistics,
rc.Verification,
rc.MedicalCollegeAdmissionTest,
rc.NeuropsychologicalTest,
rc.BirdWatching,
rc.InformationAnalysis,
rc.MassIntelligenceGatheringSystem,
rc.Census,
rc.Negotiating,
rc.TheoryOfConstraints,
rc.CriticismOfWelfare,
rc.RegressionVariableSelection,
rc.TypeTheory,
rc.GroupTheory,
rc.IntegrableSystem,
rc.PublicOwnership,
rc.ChildrensLiteratureCriticism,
rc.Evidence,
rc.Declaring-Evaluating,
rc.ExperimentalMedicineService,
rc.Supersymmetry,
rc.BusinessIntelligence,
rc.SubgroupProperty,
rc.QuantumLatticeModel,
rc.ArchitecturalElement,
rc.NuclearProgram,
rc.RejectingSomething,
rc.ErgodicTheory,
rc.SheafTheory,
rc.ThoughtExperimenting,
rc.MakingAPlan,
rc.NewCriticism,
rc.AutomaticNumberPlateRecognition,
rc.ComputerModeling,
rc.StatisticalOutlier,
rc.SelfOrganization,
rc.StandardModel,
rc.QuantumOptics,
rc.Simulation-Activity,
rc.Modeling,
rc.DatabaseSearching,
rc.CivilianChemicalResearchProgram,
rc.FinancialRiskEvaluation,
rc.HIVVaccineResearch,
rc.Exploration,
rc.MoonshineTheory,
rc.PrerogativeWrit,
rc.Criticism,
rc.Argument,
rc.ProbabilityTheoryParadox,
rc.ToposTheory,
rc.CreditScoring,
rc.VisualThinking,
rc.TheoryOfDeduction,
rc.TheatreCriticism,
rc.InspectingOfHome,
rc.AxiomOfSetTheory,
rc.PauliExclusionPrinciple,
rc.WatchingSomething,
rc.EnergyDevelopment,
rc.EmailAuthentication,
rc.StoolTest,
rc.IntelligenceAnalysisProcess,
rc.BasicConceptsInSetTheory,
rc.IntelligenceGathering,
rc.CombinatorialGroupTheory,
rc.SpinModel,
rc.Deontic-AgencyReasoning,
rc.ArchitecturalTheory,
rc.ArgumentsForTheExistenceOfGod,
rc.LogicalFallacy,
rc.GraduateSchoolEntranceTest,
rc.AlgebraicGraphTheory,
rc.Imagination,
rc.BusinessProcessModelling,
rc.CriticismOfJehovahsWitnesses,
rc.AlternativeMedicalDiagnosticMethod,
rc.CategoryTheory,
rc.Apprenticeship,
rc.GraphRewriting,
rc.InternetSearching,
rc.GenomeProject,
rc.UrineTest,
rc.PerformanceTesting,
rc.IntelligenceTest,
rc.ProductRecall,
rc.Inquiry,
rc.HypothesisTesting,
rc.ResearchProject,
rc.TypeOfScientificFallacy,
rc.Swarming,
rc.ComputationalProblemsInGraphTheory,
rc.TheoryOfCryptography,
rc.TRIZ,
rc.PhilosophicalTheory,
rc.ChaoticMap,
rc.GraphTheory,
rc.TestDrive,
rc.MagneticMonopole,
rc.NuclearPhysics,
rc.MilitaryChemicalWeaponsProgram,
rc.FairIsaacCreditScoring,
rc.RevealingTrueInformation,
rc.ResearchAndDevelopment,
rc.Canceling-Declaring-Evaluating,
rc.OilfieldProductionModel,
rc.ElectronicStructureMethod,
rc.Teleportation,
rc.ComputerModel,
rc.MethodsInSociology,
rc.Testimony,
rc.ProposedEnergyProject,
rc.TelevisionProgramming,
rc.ProblemSolving,
rc.FloatingArchitecture,
rc.ResearchByField,
rc.MonoidalCategory,
rc.Explanation-Thinking,
rc.EconomicsTheorem,
rc.CriticismOfTheBible,
rc.CohortStudyMethod,
rc.FormalTheoriesOfArithmetic,
rc.InventingSomething,
rc.LanglandsProgram,
rc.QuantumState,
rc.WitchHunt,
rc.AnthropologicalStudy,
rc.SocialConstructionism,
rc.Counting,
rc.MedicalEthics,
rc.PhenomenologicalMethodology,
rc.FunctionalSubgroup,
rc.EconomicTheory,
rc.Skepticism,
rc.FrenchLiteraryCriticism,
rc.OpenProblem,
rc.ScientificTechnique,
rc.ProbabilityTheorem,
rc.ObjectCategoryTheory,
rc.MarketFailure,
rc.FinancialChart,
rc.ReconnaissanceInForce-MilitaryOperation,
rc.Consumption-Economics,
rc.ArchitecturalDesign,
rc.NumberTheory,
rc.MagicalThinking,
rc.MultiplicativeFunction,
rc.AtomicPhysics,
rc.RegressionAnalysis,
rc.DreamInterpretation,
rc.GaloisTheory,
rc.ClinicalPsychologyTest,
rc.TermLogic,
rc.ArchitectureRecord,
rc.ResearchAdministration,
rc.ComputerSurveillance,
rc.BiochemistryMethod,
rc.NuclearIsomer,
rc.DempsterShaferTheory,
rc.ExtensionsAndGeneralizationsOfGraphs,
rc.Thought,
rc.PolarExploration,
rc.UrbanRenewal,
rc.ConsistencyModel,
rc.AppliedLearning,
rc.CriticalPhenomena,
rc.DensityFunctionalTheory,
rc.EnergyModel,
rc.Magnification-Process,
rc.Inspecting,
rc.GeometricGroupTheory,
rc.CognitiveTest,
rc.Architecture,
rc.ArchitecturalCommunication,
rc.OffenderProfiling,
rc.MassSurveillance,
rc.RandomMatrix,
rc.ExtremalGraphTheory,
rc.PolynesianNavigation,
rc.Voyage,
rc.EconometricModel,
rc.SemiempiricalQuantumChemistryMethod,
rc.Reliabilism,
rc.LearningThat,
rc.Spinor,
rc.PerturbationTheory,
rc.Investigation,
rc.ExactlySolvableModel,
rc.CommunicationOfFalsehood,
rc.SocialResearch,
rc.CannabisResearch,
rc.CardinalNumber,
rc.UrbanAndRegionalPlanning,
rc.ArchitecturalCompetition,
rc.SearchAndSeizure,
rc.GeometricGraphTheory,
rc.ChartPattern,
rc.AgeOfDiscovery,
rc.SustainableArchitecture,
rc.SubstanceTheory,
rc.StatisticalFieldTheory,
rc.Hypothesis,
rc.Research,
rc.ModelTheory,
rc.EnvironmentalResearch,
rc.SocialEngineering-PoliticalScience,
rc.Electrocardiogram,
rc.CancerResearch,
rc.Determinacy,
rc.IntelligenceTesting,
rc.QuantumModel,
rc.Negotiation,
rc.AnimalTesting,
rc.Crystallizing,
rc.GraphColoring,
rc.CandlestickPattern,
rc.ScientificExploration,
rc.BuildingInformationModeling,
rc.RadarNetwork,
rc.ForensicScience,
rc.LearningByDoing,
rc.DescriptiveSetTheory,
rc.FramingSocialSciences,
rc.ResearchMethod,
rc.ContractNegotiation,
rc.Theorizing,
rc.SocialEngineering-Security,
rc.MammographyExam,
rc.MilitaryNuclearWeaponsProgram,
rc.ForcingMathematics,
rc.ConceptualDistinction,
rc.BridgeDesign,
rc.CollegeEntranceTest,
rc.GraphConnectivity,
rc.Amniocentesis,
rc.GeneralizedLinearModel,
rc.MedicalImaging,
rc.Memorizing,
rc.DiophantineEquation,
rc.ScholasticAptitudeTest,
rc.FirstOrderMethod,
rc.MineralModel,
rc.Bargaining,
rc.MilitaryWMDProgram,
rc.PapSmearTest,
rc.InnerModelTheory,
rc.ElectronicDataSearching,
rc.ConceptualAbstraction,
rc.CensusInPeru,
rc.LandscapeArchitecture,
rc.Voyeurism,
rc.LawSchoolAdmissionTest,
rc.GraphEnumeration,
rc.ControllingSomething-Experimenting,
rc.BloodPressureTest,
rc.EstimationTheory,
rc.NuclearWeaponsTesting,
rc.AnomaliesInPhysics,
rc.ForensicMeteorology,
rc.RevealingInformation,
rc.LogLinearModel,
rc.StringBasedSearching,
rc.PregnancyTest,
rc.MeasureSetTheory,
rc.IntelligenceGatheringDiscipline,
rc.VetoingSomething,
rc.AchievementTest,
rc.Ordering,
rc.TheoryOfAging,
rc.NavalArchitecture,
rc.Psychopathy,
rc.GoOpening,
rc.GraphMinorTheory,
rc.MaritimePilotage,
rc.TrueOrFalseTest,
rc.MarkovModel,
rc.VideoSurveillance,
rc.QuantumFieldTheory,
rc.FieldResearch,
rc.GameTheory,
rc.LearningToRead,
rc.ConformalFieldTheory,
rc.StochasticModel,
rc.OrnithologicalEquipmentOrMethod,
rc.EyeContact,
rc.ThroatCultureTest,
rc.Niche,
rc.OrdinalNumber,
rc.EngineProblem,
rc.Polytely,
rc.ScientificControl,
rc.ReligiousArchitecture,
rc.ProbabilisticArgument,
rc.InfraredImaging,
rc.Aleph-1,
rc.ExoticProbability,
rc.GraphOperation,
rc.RealEstateValuation,
rc.DiastolicBloodPressureTest,
rc.MathematicalModeling,
rc.UrbanPlanning,
rc.CIAActivitiesInTheAmericas,
rc.QuantumMechanics,
rc.BiologicalWeaponsTesting,
rc.Matching,
rc.Theories,
rc.Bias,
rc.AstronomyProject,
rc.InternationalCriminalCourtInvestigation,
rc.LifeExtension,
rc.IndependenceResult,
rc.CounterIntelligence,
rc.MemoryTest,
rc.MediaProgramming,
rc.TheoreticalBiology,
rc.TeleologicalArgument,
rc.GeochronologicalDatingMethod,
rc.LeastSquares,
rc.GraphInvariant,
rc.ChartOverlay,
rc.KnowledgeSharing,
rc.EyeTest,
rc.OilfieldDrillingModel,
rc.FormalMethod,
rc.HolonomicBrainTheory,
rc.LanguageAcquisition,
rc.StringTheory,
rc.Rationalization,
rc.DeterminingInterrelationship,
rc.Appraising,
rc.AlzheimersDiseaseResearch,
rc.SetTheoreticUniverse,
rc.PersonalityTesting,
rc.DiscoveringSomething,
rc.TheoreticalChemistry,
rc.ProbabilisticModel,
rc.DeductiveReasoning,
rc.ComputerSimulation,
rc.RegressionAndCurveFittingSoftware,
rc.TechnicalIndicator,
rc.EconomicsQuantitativeMethod,
rc.ThyroidologicalMethod,
rc.DiophantineApproximation,
rc.Identification,
rc.Analysis,
rc.ChaosTheory,
rc.Comparison-Examination,
rc.MilitaryBiologicalWeaponsProgram,
rc.SystemsOfSetTheory,
rc.PersonalityTest,
rc.Practicing-Preparing,
rc.MathematicalEconomics,
rc.SyllogisticFallacy,
rc.MacroeconomicsAndMonetaryEconomics,
rc.Thinking,
rc.BusinessModel,
rc.DynamicSystemsDevelopmentMethod,
rc.SpecialRelativityMt,
rc.GraphTheoryObject,
rc.ForensicPathology,
rc.OilfieldEconomicModel,
rc.Simulation,
rc.Syllogism,
rc.AstronomySurvey,
rc.Urelement,
rc.RorschachTest,
rc.AdministrativeHearing,
rc.ComputabilityTheory,
rc.ForestModelling,
rc.Kantianism,
rc.Biosimulation,
rc.CentralLimitTheorem,
rc.ProbabilityTheory,
rc.GreatNorthernExpedition,
rc.SpaceGroup,
rc.LearningMethod,
rc.Counterintelligence,
rc.ChemicalWeaponsTesting,
rc.ArithmeticFunction,
rc.Superstring,
rc.RemoteSensing,
rc.ArgumentsAgainstTheExistenceOfGod,
rc.MedicalScience,
rc.Wellfoundedness,
rc.InvalidatingSomething,
rc.TerroristPlot,
rc.InductiveReasoning,
rc.LargeDeviationsTheory,
rc.UniversityEntryTest,
rc.Observing,
rc.MammographicBreastCancerScreening,
rc.QuantumBiology,
rc.InformationGathering,
rc.ConceptualModel,
rc.SocialEngineering,
rc.DomainDecompositionMethod,
rc.CholesterolTest,
rc.ContinuedFraction,
rc.ForensicAnthropology,
rc.RoboticsProject,
rc.InductiveFallacy,
rc.PsychiatricResearch,
rc.GameArtificialIntelligence,
rc.Interviewing,
rc.AbelianGroupTheory,
rc.StatisticalModel,
rc.ComputationalLearningTheory,
rc.CriticismOfAtheism,
rc.Designing,
rc.HilbertSpace,
rc.Wiretap,
rc.SurveyMethodology,
rc.HIVTest,
rc.SchoolOfThought,
rc.GeometryOfNumbers,
rc.ForensicPalynology,
rc.CivilianEnergyProgram,
rc.ReligiousCriticism,
rc.SystolicBloodPressureTest,
rc.Navigating,
rc.ChessTheory,
rc.PublicInquiry,
rc.PreliminaryHearing,
rc.Productivity,
rc.CriticismOfCapitalism,
rc.ProbabilisticInequality,
rc.DrugTestWithBlood,
rc.BloodTest,
rc.Annulment,
rc.CrossExamination,
rc.CivilianBiogeneticsProgram,
rc.BreastExam,
rc.Hearing-LegalProceeding,
rc.ForensicPsychology,
rc.AlgebraicNumberTheory,
rc.Zero-Number,
rc.PoliticalEconomicModel,
rc.MagneticResonanceImaging,
rc.CriticismOfBullfighting,
rc.TechnicalStockAnalysis,
rc.CombinatorialGameTheory,
rc.CreditScore-UnitedStates,
rc.AidsToNavigation,
rc.PersonalityTheory,
rc.CriticismOfFeminism,
rc.LiverFunctionTest,
rc.StettingSomething]

After doing some counts (len(s_set) for example) and inspections of the list, we determine that the code block so far is providing the entire list of sub-classes under the root. Now we want to start formatting our output similar to the flat files we are using. We begin by prefixing our variable names with s_, p_, o_ to correspond to our subject – predicate – object triples close to the native N3 format. We’ll continue to see this pattern over multiple variables in multiple code blocks for multiple installments.

We also set up an iterator to loop over the s_set, generating an s_item for each element encountered in the list. We add a print to generate back to screen each line:

o_frag = list()
s_frag = list()
p_item = 'rdfs:subClassOf'
for s_item in s_set:
   o_item = s_item.is_a
   print(s_item,p_item,o_item)

Hmm, we see many of the o_item entries are in fact sets with more than one member. This means, of course, that a given entry has multiple parents. For input specification purposes, each one of those variants needs to have its own triple assertion. Thus, we also need to iterate over the o_set entries to generate another single assignment. So, we need to insert another for iteration loop, and indent it as Python expects. Notice, too, that the calls within these loops all terminate with a ‘:’.

o_frag = list()
s_frag = list()
p_item = 'rdfs:subClassOf'
for s_item in s_set:
   o_set = s_item.is_a
   for o_item in o_set:
       print(s_item,p_item,o_item)
       o_frag.append(o_item)
       s_frag.append(s_item) 

We test with the length (len) argument to see if we have picked up items.

len(o_frag)

Hmmm, that’s not good. The size of o_frag and s_frag are showing to be the same, but we already saw there were multiple objects for the subjects. Clearly, we’re still not counting and processing this right.

So, we need to make two final changes to this routine. First, we want to get the population of our sets correct. We can see in our prior example that we were counting o_frag and s_frag as part of the same loop, but that is not correct. The s_frag needs to be linked with processing the subject set. We change the indent to assign this correctly. (Testing this may require you to Kernel → Restart & Clear Output and then running all of the above cells.)

The second change we want is for our output to begin to conform to a CSV file with leading and trailing white spaces removed and entries separated by commas, moving us again toward a N3 format. Here are the resulting changes:

o_frag = set()
s_frag = set()
p_item = 'rdfs:subClassOf'
for s_item in s_set:
   o_set = s_item.is_a
   for o_item in o_set:
       print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='')
       o_frag.add(o_item)
   s_frag.add(s_item) 

Getting rid of the leading and training white spaces is a little tricky. Indeed the sep ='' argument above is not yet widely used since it was only recently added to Python. Versions 3.3 or earlier do not support this argument and would fail. Since I have no legacy Python code I can afford to rely on the latest versions of the language. But little nuances such as this are something to be aware of as you research various methods, commands and arguments.

We can also check counts again to ensure everything is now correct:

len(s_frag)

And we can start playing around with some of the set methods, in this case the .intersection between our too sets:

len(o_frag.intersection(s_frag))

This is all looking pretty good, though we have not yet dealt with putting the full URIs into the triples. That is straightforward so we can afford to put that off until we are ready to generate the actual typologies. But we realize we also have missed one final piece of the logic necessary to have our typologies readable as separate ontologies: declaring all of our classes as such under the standard owl:Thing. These new classes correspond to each of the entries in the s_frag set, so we add another line in a print statement to do so.

o_frag = set()
s_frag = set()
p_item = 'rdfs:subClassOf'
new_class = 'owl:Thing'
for s_item in s_set:
   o_set = s_item.is_a
   for o_item in o_set:
       if o_item in s_set:
         print(s_item,',',p_item,',',o_item,'.','\n', sep='', end='')
         o_frag.add(o_item)
   s_frag.add(s_item)
   print(s_item,',','a',',',new_class,'.','\n', sep='', end='')
len(s_frag)

Great, our logic appears correct and our counts do, too. So we can consider this code block as developed enough for assembly into a formal method and then module. Let’s now move on to prototyping other components in the KBpedia structure.

Additional Documentation

Here are some other interactive resources related to today’s CWPK installment:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.
NOTE: This CWPK installment is available both as an online interactive file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment — which is part of the fun of Python — and to notify me should you make improvements.

Schema.org Markup

headline:
CWPK #28: Extracting Structure for Typologies

alternativeHeadline:
We Extract a Typology Scaffolding from an Active KG

author:

image:
https://www.mkbergman.com/wp-content/uploads/2020/07/cooking-with-kbpedia-785.png

description:
In this installment of the 'Cooking with Python and KBpedia' series, we work out in a Python code block how to extract a single typology from the KBpedia knowledge graph.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published.