Big Structure has a Foundation in Reference Structures, But Any Structure Aids Interoperability
Big Structure is built on a foundation of reference structures, with domain structures capturing the domain at hand. These represent the target foundations for mapping schema and transforming data in the wild into an operable, canonical form. Any structure, even the most lightweight of lists and metadata, can contribute to and be mapped into this model, as this wall of structure shows:
Described below are some of these structures, in rough descending order of completeness and usefulness, for making data interoperable. Please note that any of these structures might be available as linked data.
In both semantics and artificial intelligence — and certainly in the realm of data interoperability — there is always the problem of symbol grounding. In the conceptual realm, symbol grounding means that when we use a term or phrase we are referring to the same thing. In the data value realm, symbol grounding means that when we refer to an object or a number, we are referring to the same measure.
UMBEL is the standard reference ontology used by Structured Dynamics. It contains 28,000 concepts (classes and relationships) derived from the Cyc knowledge base. The reference concepts of UMBEL are mapped to Wikipedia, schema.org (used in Google’s knowledge graph), DBpedia ontology classes, GeoNames and PROTON. Similar reference structures are used to ground the actual data values and attributes.
Other reference structures may be used, so long as they are rather complete in scope and coherent in their relationships. Logical consistency is a key requirement for grounding.
Knowledge bases combine schema with data in a logical manner; well-constructed ones support computations, inference and reasoning. To date, the two primary knowledge bases that we use are Wikipedia and Cyc. However, many specific domain knowledge bases also exist.
Knowledge bases are important sources for symbol grounding. It addition, because of their computability, they may be used with artificial intelligence methods to both extend the knowledge base and to refine the feature estimates used in the AI algorithms.
Domain ontologies, constructed as graphs, are the principal working structures in data interoperability. Though best practices recommend they be grounded in the reference structures, the domain structures are the ones that specifically capture the concepts and data attributes of the target information domain. More effort should be focused at this level in the wall of structure than any other.
Domain structures provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness. Further, these domain structures can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And, these domain structures may be matured incrementally, which means their development is cost-effective.
Data and schema in the wild need to be mapped and transformed into these canonical structures. What is known as data wrangling is an aspect of these mappings and transformations. Mappings thus become the glue that ties native data to interoperable forms.
Mapping is the critical bridging function in data interoperability. It requires tools and background intelligence to suggest possible correspondences; how well this is done is a key to making the semi-automatic mapping process as efficient as possible. Mapping structures are the result of the final correspondences. Mapping effort is a function of the scope of Big Structure, not the volume of Big Data.
A broad variety of structures occur in the wild — from database schema and taxonomies to dictionaries and lists — that need to be represented in a common form and then mapped in order to support interoperability. The common representation used by Structured Dynamics is the RDF data model.
Scripting and tooling are essential to help create Big Structure efficiently.