We are proceeding apace with the first release of the UMBEL (Upper-level Mapping and Binding Exchange Layer) lightweight subject concept ontology. The internal working version presently has 21,580 subject nodes, though further review will certainly change that number before public release of the first draft.
UMBEL defines “subject concepts” as a distinct subset of the more broadly understood concept such as used in the SKOS RDFS controlled vocabulary or formal concept analysis or the very general concepts common to some upper ontologies. Subject concepts are a special kind of concept: ones that are concrete, subject-related and non-abstract. We further contrast these with named entities, which are the real things or instances in the world that are members of these subject concept classes.
Thus, in UMBEL parlance, there are abstract concepts, subject concepts and named entities.
The “backbone” to UMBEL is its set of these reference (“canonical” if you will) subject concepts. These subject concepts are being derived from the OpenCyc version of the Cyc knowledge base. The resulting 22 K nodes of this subject structure are related via the predicates of subclassof and type; these are the graph’s edges. The graph pictures herein are the first glimpse of this UMBEL backbone structure.
We can take the full network graph and do a bit of simulation of diving deep into its structure, as the following figures show.
So, here is the big graph, with all nodes and edges (blue) displayed. This is just about at the limit of our graphing program, Cytoscape, which we estimate is limited to about 30 K nodes:
Through the manipulation of the topological coefficient, which is a relative measure for the extent to which a node shares neighbors with other nodes, we can zoom in on the Top 750 (actually, 759!) node gateways or hubs. There are other ways to evaluate key nodes in a network, but this one fairly nicely approximates the upper structure or hierarchy within the graph:
By tightening the coefficient further, we can get a view of the Top 350 (actually, the top 336). Were the system live and not a captured jpeg, we could zoom in and read the actual node labels.
The real value from a graph structure, of course, is that now we can make selections based on relationships, neighbors and distances for various reasoning, inference or relatedness purposes. This diagram begins by inputting “saab” as my car concept, and then getting all nodes within two links:
Alternatively, for the same “saab” car concept, I asked for all directly related links (in yellow) and did some pruning of car types to make the subgraph more readable and interesting:
This ability to manipulate and navigate this large subject backbone at will should bring immense benefits. And, because of its common sense grounding, the early explorations of this first-glimpse UMBEL structure look very logical and clean.
Once we complete the next packaging and draft release steps, anyone will be able to play with and manipulate this UMBEL structure at will. The ontology and the tools we are using to manipulate it are all open source.
Our next steps on UMBEL will have us publishing the technical report (TR) of how we screened and vetted the subject concepts from the Cyc knowledge base, using an updated OpenCyc version. That document will hopefully gain some broader review and scrutiny for the canonical listing of subject concepts.
Of course, all of that is merely leading up to the Release 0 of the published ontology. We are working diligently to get that posted as well in the very near future.
These graphs were built using the super Cytoscape large-graph visualization framework, which I previously reviewed with glowing praise. The subgraph extractions were greatly aided by a fantastic add-in called NetworkAnalyzer from the Max-Planck-Institut fÃ¼r Informatik. I will be writing more about this add-in at a later time, including some guidance for how to use it for meaningful ontology analysis. But, in the meantime, do check this add-in tool out. Mucho cool, and another winner !