Posted:July 9, 2012
Abrogans; earliest glossary (from Wikipedia)

There are many semantic technology terms relevant to the context of a semantic technology installation [1]. Some of these are general terms related to language standards, as well as to  ontologies or the dataset concept.

ABox
An ABox (for assertions, the basis for A in ABox) is an “assertion component”; that is, a fact associated with a terminological vocabulary within a knowledge base. ABox are TBox-compliant statements about instances belonging to the concept of an ontology.
Adaptive ontology
An adaptive ontology is a conventional knowledge representational ontology that has added to it a number of specific best practices, including modeling the ABox and TBox constructs separately; information that relates specific types to different and appropriate display templates or visualization components; use of preferred labels for user interfaces, as well as alternative labels and hidden labels; defined concepts; and a design that adheres to the open world assumption.
Administrative ontology
Administrative ontologies govern internal application use and user interface interactions.
Annotation
An annotation, specifically as an annotation property, is a way to provide metadata or to describe vocabularies and properties used within an ontology. Annotations do not participate in reasoning or coherency testing for ontologies.
Atom
The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (APP for short) is a simple HTTP-based protocol for creating and updating Web resources.
Attributes
These are the aspects, properties, features, characteristics, or parameters that objects (and classes) may have. They are the descriptive characteristics of a thing. Key-value pairs match an attribute with a value; the value may be a reference to another object, an actual value or a descriptive label or string. In an RDF statement, an attribute is expressed as a property (or predicate or relation). In intensional logic, all attributes or characteristics of similarly classifiable items define the membership in that set.
Axiom
An axiom is a premise or starting point of reasoning. In an ontology, each statement (assertion) is an axiom.
Binding
Binding is the creation of a simple reference to something that is larger and more complicated and used frequently. The simple reference can be used instead of having to repeat the larger thing.
Class
A class is a collection of sets or instances (or sometimes other mathematical objects) which can be unambiguously defined by a property that all of its members share. In ontologies, classes may also be known as sets, collections, concepts, types of objects, or kinds of things.
Closed World Assumption
CWA is the presumption that what is not currently known to be true, is false. CWA also has a logical formalization. CWA is the most common logic applied to relational database systems, and is particularly useful for transaction-type systems. In knowledge management, the closed world assumption is used in at least two situations: 1) when the knowledge base is known to be complete (e.g., a corporate database containing records for every employee), and 2) when the knowledge base is known to be incomplete but a “best” definite answer must be derived from incomplete information. See contrast to the open world assumption.
Data Space
A data space may be personal, collective or topical, and is a virtual “container” for related information irrespective of storage location, schema or structure.
Dataset
An aggregation of similar kinds of things or items, mostly comprised of instance records.
DBpedia
A project that extracts structured content from Wikipedia, and then makes that data available as linked data. There are millions of entities characterized by DBpedia in this way. As such, DBpedia is one of the largest — and most central — hubs for linked data on the Web.
DOAP
DOAP (Description Of A Project) is an RDF schema and XML vocabulary to describe open-source projects.
Description logics
Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.
Domain ontology
Domain (or content) ontologies embody more of the traditional ontology functions such as information interoperability, inferencing, reasoning and conceptual and knowledge capture of the applicable domain.
Entity
An individual object or member of a class; when affixed with a proper name or label is also known as a named entity (thus, named entities are a subset of all entities).
Entity–attribute–value model
EAV is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In the EAV data model, each attribute-value pair is a fact describing an entity. EAV systems trade off simplicity in the physical and logical structure of the data for complexity in their metadata, which, among other things, plays the role that database constraints and referential integrity do in standard database designs.
Extensional
The extension of a class, concept, idea, or sign consists of the things to which it applies, in contrast with its intension. For example, the extension of the word “dog” is the set of all (past, present and future) dogs in the world. The extension is most akin to the attributes or characteristics of the instances in a set defining its class membership.
FOAF
FOAF (Friend of a Friend) is an RDF schema for machine-readable modeling of homepage-like profiles and social networks.
Folksonomy
A folksonomy is a user-generated set of open-ended labels called tags organized in some manner and used to categorize and retrieve Web content such as Web pages, photographs, and Web links.
GeoNames
GeoNames integrates geographical data such as names of places in various languages, elevation, population and others from various sources.
GRDDL
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT.
High-level Subject
A high-level subject is both a subject proxy and category label used in a hierarchical subject classification scheme (taxonomy). Higher-level subjects are classes for more atomic subjects, with the height of the level representing broader or more aggregate classes.
Individual
See Instance.
Inferencing
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The logic within and between statements in an ontology is the basis for inferring new conclusions from it, using software applications known as inference engines or reasoners.
Instance
Instances are the basic, “ground level” components of an ontology. An instance is individual member of a class, also used synonomously with entity. The instances in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract instances such as numbers and words. An instance is also known as an individual, with member and entity also used somewhat interchangeably.
Instance record
An instance with one or more attributes also provided.
irON
irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF (Resource Description Framework) triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF.
Intensional
The intension of a class is what is intended as a definition of what characteristics its members should have; it is akin to a definition of a concept and what is intended for a class to contain. It is therefore like the schema aspects (or TBox) in an ontology.
Key-value pair
Also known as a name–value pair or attribute–value pair, a key-value pair is a fundamental, open-ended data representation. All or part of the data model may be expressed as a collection of tuples <attribute name, value> where each element is a key-value pair. The key is the defined attribute and the value may be a reference to another object or a literal string or value. In RDF triple terms, the subject is implied in a key-value pair by nature of the instance record at hand.
Kind
Used synonomously herein with class.
Knowledge base
A knowledge base (abbreviated KB or kb) is a special kind of database for knowledge management. A knowledge base provides a means for information to be collected, organized, shared, searched and utilized. Formally, the combination of a TBox and ABox is a knowledge base.
Linkage
A specification that relates an object or attribute name to its full URI (as required in the RDF language).
Linked data
Linked data is a set of best practices for publishing and deploying instance and class data using the RDF data model, and uses uniform resource identifiers (URIs) to name the data objects. The approach exposes the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.
Mapping
A considered correlation of objects in two different sources to one another, with the relation between the objects defined via a specific property. Linkage is a subset of possible mappings.
Member
Used synonomously herein with instance.
Metadata
Metadata (metacontent) is supplementary data that provides information about one or more aspects of the content at hand such as means of creation, purpose, when created or modified, author or provenance, where located, topic or subject matter, standards used, or other annotation characteristics. It is “data about data”, or the means by which data objects or aggregations can be described. Contrasted to an attribute, which is an individual characteristic intrinsic to a data object or instance, metadata is a description about that data, such as how or when created or by whom.
Metamodeling
Metamodeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems.
Microdata
Microdata is a proposed specification used to nest semantics within existing content on web pages. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa or microformats.
Microformats
A microformat (sometimes abbreviated μF or uF) is a piece of mark up that allows expression of semantics in an HTML (or XHTML) web page. Programs can extract meaning from a web page that is marked up with one or more microformats.
Natural language processing
NLP is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. NLP is one method for assigning structured data characterizations to text content for use in semantic technologies. (Hand assignment is another method.) Some of the specific NLP techniques and applications relevant to semantic technologies include automatic summarization, coreference resolution, machine translation, named entity recognition (NER), question answering, relationship extraction, topic segmentation and recognition, word segmentation, and word sense disambiguation, among others.
OBIE
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. Ontology-based information extraction (OBIE) is the use of an ontology to inform a “tagger” or information extraction program when doing natural language processing. Input ontologies thus become the basis for generating metadata tags when tagging text or documents.
Ontology
An ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. Loosely defined, ontologies on the Web can have a broad range of formalism, or expressiveness or reasoning power.
Ontology-driven application
Ontology-driven applications (or ODapps) are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules.
Open Semantic Framework
The open semantic framework, or OSF, is a combination of a layered architecture and an open-source, modular software stack. The stack combines many leading third-party software packages with open source semantic technology developments from Structured Dynamics.
Open World Assumption
OWA is a formal logic assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. OWA is used in knowledge representation to codify the informal notion that in general no single agent or observer has complete knowledge, and therefore cannot make the closed world assumption. The OWA limits the kinds of inference and deductions an agent can make to those that follow from statements that are known to the agent to be true. OWA is useful when we represent knowledge within a system as we discover it, and where we cannot guarantee that we have discovered or will discover complete information. In the OWA, statements about knowledge that are not included in or inferred from the knowledge explicitly recorded in the system may be considered unknown, rather than wrong or false. Semantic Web languages such as OWL make the open world assumption. See contrast to the closed world assumption.
OPML
OPML (Outline Processor Markup Language) is an XML format for outlines, and is commonly used to exchange lists of web feeds between web feed aggregators.
OWL
The Web Ontology Language (OWL) is designed for defining and instantiating formal Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. There are also a variety of OWL dialects.
Predicate
See Property.
Property
Properties are the ways in which classes and instances can be related to one another. Properties are thus a relationship, and are also known as predicates. Properties are used to define an attribute relation for an instance.
Punning
In computer science, punning refers to a programming technique that subverts or circumvents the type system of a programming language, by allowing a value of a certain type to be manipulated as a value of a different type. When used for ontologies, it means to treat a thing as both a class and an instance, with the use depending on context.
RDF
Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats. The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object.
RDFa
RDFa 1.0 is a set of extensions to XHTML that is a W3C Recommendation. RDFa uses attributes from meta and link elements, and generalizes them so that they are usable on all elements allowing annotation markup with semantics. A W3C Working draft is presently underway that expands RDFa into version 1.1 with HTML5 and SVG support, among other changes.
RDF Schema
RDFS or RDF Schema is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources.
Reasoner
A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms.
Reasoning
Reasoning is one of many logical tests using inference rules as commonly specified by means of an ontology language, and often a description language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining or backward chaining.
Record
As used herein, a shorthand reference to an instance record.
Relation
Used synonomously herein with attribute.
RSS
RSS (an acronym for Really Simple Syndication) is a family of web feed formats used to publish frequently updated digital content, such as blogs, news feeds or podcasts.
schema.org
Schema.org is an initiative launched by the major search engines of Bing, Google and Yahoo!, and later jointed by Yandex, in order to create and support a common set of schemas for structured data markup on web pages. schema.org provided a starter set of schema and extension mechanisms for adding to them. schema.org supports markup in microdata, microformat and RDFa formats.
Semantic enterprise
An organization that uses semantic technologies and the languages and standards of the semantic Web, including RDF, RDFS, OWL, SPARQL and others to integrate existing information assets, using the best practices of linked data and the open world assumption, and targeting knowledge management applications.
Semantic technology
Semantic technologies are a combination of software and semantic specifications that encodes meanings separately from data and content files and separately from application code. This approach enables machines as well as people to understand, share and reason with data and specifications separately. With semantic technologies, adding, changing and implementing new relationships or interconnecting programs in a different way can be as simple as changing the external model that these programs share. New data can also be brought into the system and visualized or worked upon based on the existing schema. Semantic technologies provide an abstraction layer above existing IT technologies that enables bridging and interconnection of data, content, and processes.
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a “web of data”. It builds on the W3C’s Resource Description Framework (RDF).
Semset
A semset is the use of a series of alternate labels and terms to describe a concept or entity. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept.
SIOC
Semantically-Interlinked Online Communities Project (SIOC) is based on RDF and is an ontology defined using RDFS for interconnecting discussion methods such as blogs, forums and mailing lists to each other.
SKOS
SKOS or Simple Knowledge Organisation System is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary; it is built upon RDF and RDFS.
SKSI
Semantic Knowledge Source Integration provides a declarative mapping language and API between external sources of structured knowledge and the Cyc knowledge base.
SPARQL
SPARQL (pronounced “sparkle”) is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language.
Statement
A statement is a “triple” in an ontology, which consists of a subject – predicate – object (S-P-O) assertion. By definition, each statement is a “fact” or axiom within an ontology.
Subject
A subject is always a noun or compound noun and is a reference or definition to a particular object, thing or topic, or groups of such items. Subjects are also often referred to as concepts or topics.
Subject extraction
Subject extraction is an automatic process for retrieving and selecting subject names from existing knowledge bases or data sets. Extraction methods involve parsing and tokenization, and then generally the application of one or more information extraction techniques or algorithms.
Subject proxy
A subject proxy as a canonical name or label for a particular object; other terms or controlled vocabularies may be mapped to this label to assist disambiguation. A subject proxy is always representative of its object but is not the object itself.
Tag
A tag is a keyword or term associated with or assigned to a piece of information (e.g., a picture, article, or video clip), thus describing the item and enabling keyword-based classification of information. Tags are usually chosen informally by either the creator or consumer of the item.
TBox
A TBox (for terminological knowledge, the basis for T in TBox) is a “terminological component”; that is, a conceptualization associated with a set of facts. TBox statements describe a conceptualization, a set of concepts and properties for these concepts. The TBox is sufficient to describe an ontology (best practice often suggests keeping a split between instance records — and ABox — and the TBox schema).
Taxonomy
In the context of knowledge systems, taxonomy is the hierarchical classification of entities of interest of an enterprise, organization or administration, used to classify documents, digital assets and other information. Taxonomies can cover virtually any type of physical or conceptual entities (products, processes, knowledge fields, human groups, etc.) at any level of granularity.
Topic
The topic (or theme) is the part of the proposition that is being talked about (predicated). In topic maps, the topic may represent any concept, from people, countries, and organizations to software modules, individual files, and events. Topics and subjects are closely related.
Topic Map
Topic maps are an ISO standard for the representation and interchange of knowledge. A topic map represents information using topics, associations (similar to a predicate relationship), and occurrences (which represent relationships between topics and information resources relevant to them), quite similar in concept to the RDF triple.
Triple
A basic statement in the RDF language, which is comprised of a subjectproperty – object construct, with the subject and property (and object optionally) referenced by URIs.
Type
Used synonomously herein with class.
UMBEL
UMBEL, short for Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts, designed to provide common mapping points for relating different ontologies or schema to one another, and a vocabulary for aiding that ontology mapping, including expressions of likelihood relationships distinct from exact identity or equivalence. This vocabulary is also designed for interoperable domain ontologies.
Upper ontology
An upper ontology (also known as a top-level ontology or foundation ontology) is an ontology that describes very general concepts that are the same across all knowledge domains. An important function of an upper ontology is to support very broad semantic interoperability between a large number of ontologies that are accessible ranking “under” this upper ontology.
Vocabulary
A vocabulary in the sense of knowledge systems or ontologies are controlled vocabularies. They provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems.
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools can be downloaded and used freely. Multiple language versions exist, and WordNet is a frequent reference structure for semantic applications.
YAGO
“Yet another great ontology” is a WordNet structure placed on top of Wikipedia.

[1] This glossary is based on the one provided on the OSF TechWiki. For the latest version, please refer to this link.
Posted:July 2, 2012

Example Ontology (from Wikipedia)Conventional IT Systems are Poorly Suited to Knowledge Applications

Frequently customers ask me why semantic technologies should be used instead of conventional information technologies. In the areas of knowledge representation (KR) and knowledge management (KM), there are compelling reasons and benefits for selecting semantic technologies over conventional approaches. This article attempts to summarize these rationales from a layperson perspective.

It is important to recognize that semantic technologies are orthogonal to the buzz around some other current technologies, including cloud computing and big data. Semantic technologies are also not limited to open data: they are equivalently useful to private or proprietary data. It is also important to note that semantic technologies do not imply some grand, shared schema for organizing all information. Semantic technologies are not “one ring to rule them all,” but rather a way to capture the world views of particular domains and groups of stakeholders. Lastly, semantic technologies done properly are not a replacement for existing information technologies, but rather an added layer that can leverage those assets for interoperability and to overcome the semantic barriers between existing information silos.

Nature of the World

The world is a messy place. Not only is it complicated and richly diverse, but our ways of describing and understanding it are made more complex by differences in language and culture.

We also know the world to be interconnected and interdependent. Effects of one change can propagate into subtle and unforeseen effects. And, not only is the world constantly changing, but so is our understanding of what exists in the world and how it affects and is affected by everything else.

This means we are always uncertain to a degree about how the world works and the dynamics of its working. Through education and research we continually strive to learn more about the world, but often in that process find what we thought was true is no longer so and even our own human existence is modifying our world in manifest ways.

Knowledge is very similar to this nature of the world. We find that knowledge is never complete and it can be found anywhere and everywhere. We capture and codify knowledge in structured, semi-structured and unstructured forms, ranging from “soft” to “hard” information. We find that the structure of knowledge evolves with the incorporation of more information.

We often see that knowledge is not absolute, but contextual. That does not mean that there is no such thing as truth, but that knowledge should be coherent, to reflect a logical consistency and structure that comports with our observations about the physical world. Knowledge, like the world, is constantly changing; we thus must constantly adapt to what we observe and learn.

Knowledge Representation, Not Transactions

These observations about the world and knowledge are not platitudes but important guideposts for how we should organize and manage information, the field known as “information technology.” For IT to truly serve the knowledge function, its logical bases should be consistent with the inherent nature of the world and knowledge.

By knowledge functions we mean those areas of various computer applications that come under the rubrics of search, business intelligence, competitive intelligence, planning, forecasting, data federation, data warehousing, knowledge management, enterprise information integration, master data management, knowledge representation, and so forth. These applications are distinctly different than the earliest and traditional concerns of IT systems:  accounting and transactions.

A transaction system — such as calculating revenue based on seats on a plane, the plane’s occupancy, and various rate classes — is a closed system. We can count the seats, we know the number of customers on board, and we know their rate classes and payments. Much can be done with this information, including yield and profitability analysis and other conventional ways of accounting for costs or revenues or optimizations.

But, as noted, neither the world nor knowledge is a closed system. Trying to apply legacy IT approaches to knowledge problems is fraught with difficulties. That is the reason that for more than four decades enterprises have seen massive cost overruns and failed projects in applying conventional IT approaches to knowledge problems: traditional IT is fundamentally mismatched to the nature of the problems at hand.

What works efficiently for transactions and accounting is a miserable failure applied to knowledge problems. Traditional relational databases work best with structured data; are inflexible and fragile when the nature (schema) of the world changes; and thus require constant (and expensive) re-architecting in the face of new knowledge or new relationships.

Of course, often knowledge problems do consider fixed entities with fixed attributes to describe them. In these cases, relational data systems can continue to act as valuable contributors and data managers of entities and their attributes. But, in the role of organizing across schema or dealing with semantics and differences of definition and scope – that is, the common types of knowledge questions – a much different integration layer with a much different logic basis is demanded.

The New Open World Paradigm

The first change that is demanded is to shift the logic paradigm of how knowledge and the world are modeled. In contrast to the closed-world approach of transaction systems, IT systems based on the logical premise of the open world assumption (OWA) mean:

  • Lack of a given assertion does not imply whether it is true or false; it simply is not known
  • A lack of knowledge does not imply falsity
  • Everything is permitted until it is prohibited
  • Schema can be incremental without re-architecting prior schema (“extensible”), and
  • Information at various levels of incompleteness can be combined.

Much more can be said about OWA, including formal definitions of the logics underlying it [1], but even from the statements above, we can see that the right logic for most knowledge representation (KR) problems is the open world approach.

This logic mismatch is perhaps the most fundamental cause of failures, cost overruns, and disappointing deliverables for KM and KR projects over the years. But, like the fingertip between the eyes that cannot be seen because it is too close at hand, the importance of this logic mismatch strangely continues to be overlooked.

Integrating All Forms of Information

Data exists in many forms and of many natures. As one classification scheme, there are:

  • Structured data — information presented according to a defined data model, often found in relational databases or other forms of tabular data
  • Semi-structured data — does not conform to the formal structure of data models, but contains tags or other markers to denote fields within the content. Markup languages embedded in text are a common form of such sources
  • Unstructured data — information content, generally oriented to text, that lacks an explicit data model or schema; structured information can be obtained from it via data mining or information extraction.

Further, these types of data may be “soft”, such as social information or opinion, or “hard”, more akin to measurable facts or quantities.

These various forms may also be serialized in a variety of data formats or data transfer protocols, some using straight text with a myriad of syntax or markup vocabularies, ranging to scripts or forms encoded or binary.

Still further, any of these data forms may be organized according to a separate schema that describes the semantics and relationships within the data.

These variations further complicate the inherently diverse nature of the world and knowledge of it. A suitable data model for knowledge representation must therefore have the power to be able to capture the form, format, serialization or schema of any existing data within the diversity of these options.

The Resource Description Framework (RDF) data model has such capabilities [2]. Any extant data form or schema (from the simple to the complex) can be converted to the RDF data model. This capability enables RDF to act as a “universal solvent” for all information.

Once converted to this “canonical” form, RDF can then act as a single representation around which to design applications and other converters (for “round-tripping” to legacy systems, for example), as illustrated by this diagram:

Generic tools can then be driven by the RDF data model, which leads to fewer applications required and lower overall development costs.

Lastly, RDF can represent simple assertions (“Jane runs fast”) to complex vocabularies and languages. It is in this latter role that RDF can begin to represent the complexity of an entire domain via what is called an “ontology” or “knowledge graph.”

Example Ontology Growth

Connections Create Graphs

When representing knowledge, more things and concepts get drawn into consideration. In turn, the relationships of these things lead to connections between them to capture the inherent interdependence and linkages of the world. As still more things get considered, more connections are made and proliferate.

This process naturally leads to a graph structure, with the things in the graphs represented as nodes and the relationships between them represented as connecting edges. More things and more connections lead to more structure. Insofar as this structure and its connections are coherent, the natural structure of the knowledge graph itself can help lead to more knowledge and understanding.

How one such graph may emerge is shown by this portion of the recently announced Google Knowledge Graph [3], showing female Nobel prize winners:

Unlike traditional data tables, graphs have a number of inherent benefits, particularly for knowledge representations. They provide:

  • A coherent way to navigate the knowledge space
  • Flexible entry points for each user to access that knowledge (since every node is a potential starting point)
  • Inferencing and reasoning structures about the space
  • Connections to related information
  • Ability to connect to any form of information
  • Concept mapping, and thus the ability to integrate external content
  • A framework to disambiguate concepts based on relations and context, and
  • A common vocabulary to drive content “tagging”.

Graphs are the natural structures for knowledge domains.

Network Analysis is the New Algebra

Once built, graphs offer some analytical capabilities not available through traditional means of information structure. Graph analysis is a rapidly emerging field, but already some unique measures of knowledge domains are now possible to gauge:

  • Influence
  • Relatedness
  • Proximity
  • Centrality
  • Inference
  • Clustering
  • Shortest paths
  • Diffusion.

As science is coming to appreciate, graphs can represent any extant structure or schema. This gives graphs a universal character in terms of analytic tools. Further, many structures can only be represented by graphs.

Information and Interaction is Distributed

The nature of knowledge is such that relevant information is everywhere. Further, because of the interconnectedness of things, we can also appreciate that external information needs to be integrated with internal information. Meanwhile, the nature of the world is such that users and stakeholders may be anywhere.

These observations suggest a knowledge representation architecture that needs to be truly distributed. Both sources and users may be found in multiple locations.

In order to preserve existing information assets as much as possible (see further below) and to codify the earlier observation regarding the broad diversity of data formats, the resulting knowledge architecture should also attempt to put in place a thin layer or protocol that provides uniform access to any source or target node on the physical network. A thin, uniform abstraction layer – with appropriate access rights and security considerations – means knowledge networks may grow and expand at will at acceptable costs with minimal central coordination or overhead.

Properly designed, then, such architectures are not only necessary to represent the distributed nature of users and knowledge, but can also facilitate and contribute to knowledge development and exchange.

The Web is the Perfect Medium

The items above suggest the Web as an appropriate protocol for distributed access and information exchange. When combined with the following considerations, it becomes clear that the Web is the perfect medium for knowledge networks:

  • Potentially, all information may be accessed via the Web
  • All information may be given unique Web identifiers (URIs)
  • All Web tools are available for use and integration
  • All Web information may be integrated
  • Web-oriented architectures (WOA) have proven:
  • Scalability
  • Robustness
  • Substitutability
  • Most Web technologies are open source.

It is not surprising that the largest extant knowledge networks on the globe – such as Google, Wikipedia, Amazon and Facebook – are Web-based. These pioneers have demonstrated the wisdom of WOA for cost-effective scalability and universal access.

Also, the combination of RDF with Web identifiers also means that any and all information from a given knowledge repository may be exposed and made available to others as linked data. This approach makes the Web a global, universal database. And it is in keeping with the general benefits of integrating external information sources.

Leveraging – Not Replacing – Existing IT Assets

Existing IT assets represent massive sunk costs, legacy knowledge and expertise, and (often) stakeholder consensus. Yet, these systems are still largely stovepiped.

Strategies that counsel replacement of existing IT systems risk wasting existing assets and are therefore unlikely to be adopted. Ways must be found to leverage the value already embodied in these systems, while promoting interoperability and integration.

The beauty of semantic technologies – properly designed and deployed in a Web-oriented architecture – is that a thin interoperability layer may be placed over existing IT assets to achieve these aims. The knowledge graph structure may be used to provide the semantic mappings between schema, while the Web service framework that is part of the WOA provides the source conversion to the canonical RDF data model.

Via these approaches, prior investments in knowledge, information and IT assets may be preserved while enabling interoperability. The existing systems can continue to provide the functionality for which they were originally designed and deployed. Meanwhile, the KR-related aspects may be exposed and integrated with other knowledge assets on the physical network.

Democratizing the Knowledge Function

These kinds of approaches represent a fundamental shift in power and roles with respect to IT in the enterprise. IT departments and their bottlenecks in writing queries and bespoke application development can now be bypassed; the departments may be relegated to more appropriate support roles. Developers and consultants can now devote more of their time to developing generic applications driven by graph structures [4].

In turn, the consumers of knowledge applications – namely subject matter experts, employees, partners and stakeholders – now become the active contributors to the graphs themselves, focusing on reconciling terminology and ensuring adequate entity and concept coverage. Knowledge graphs are relatively straightforward structures to build and maintain. Those that rely on them can also be those that have the lead role in building and maintaining them.

Thus, graph-driven applications can be made generic by function with broader and more diverse information visualization capabilities. Simple instructions in the graphs can indicate what types of information can be displayed with what kind of widget. Graph-driven applications also mean that those closest to the knowledge problems will also be those directly augmenting the graphs. These changes act to democratize the knowledge function, and lower overall IT costs and risks.

Seven Pillars of the Semantic Enterprise

Elsewhere we have discussed the specific components that go into enabling the development of a semantic enterprise, what we have termed the seven pillars [5]. Most of these points have been covered to one degree or another in the discussion above.

There are off-the-shelf starter kits for enterprises to embrace to begin this process. The major starting requirements are to develop appropriate knowledge graphs (ontologies) for the given domain and to convert existing information assets into appropriate interoperable RDF form.

Beyond that, enterprise staff may be readily trained in the use and growth of the graphs, and in the staging and conversion of data. With an appropriate technology transfer component, these semantic technology systems can be maintained solely by the enterprise itself without further outside assistance.

Summary of Semantic Technology Benefits

Unlike conventional IT systems with their closed-world approach, semantic technologies that adhere to these guidelines can be deployed incrementally at lower cost and with lower risk. Further, we have seen that semantic technologies offer an excellent integration approach, with no need to re-do schema because of changed circumstances. The approach further leverages existing information assets and brings the responsibility for the knowledge function more directly to its users and consumers.

Semantic technologies are thus well-suited for knowledge applications. With their graph structures and the ability to capture semantic differences and meanings, these technologies can also accommodate multiple viewpoints and stakeholders. There are also excellent capabilities to relate all available information – from documents and images and metadata to tables and databases – into a common footing.

These advantages will immediately accrue through better integration and interoperability of diverse information assets. But, for early adopters, perhaps the most immediate benefit will come from visible leadership in embracing these enabling technologies in advance of what will surely become the preferred approach to knowledge problems.

Note: there is a version of this article on Slideshare:

View more presentations from Mike Bergman.

[1] For more on the open world assumption (OWA), see the various entries on this topic on Michael Bergman’s AI3:::Adaptive Information blog. This link is a good search string to discover more.
[2] M.K. Bergman, 2009. Advantages and Myths of RDF, white paper from Structured Dynamics LLC, April 22, 2009, 13 pp. See http://www.mkbergman.com/wp-content/themes/ai3v2/files/2009Posts/Advantages_Myths_RDF_090422.pdf.
[4] For the most comprehensive discussion of graph-driven apps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
[5] M.K. Bergman, 2010. “Seven Pillars of the Open Semantic Enterprise,” in AI3:::Adaptive Information blog, January 12, 2010; see http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/.
Posted:May 21, 2012

UMBEL Big GraphModularization Also Leads to Big Graph Visualization

We are pleased to announce the release of version 1.05 of UMBEL, which now has linkages to schema.org [6] and  GeoNames [1]. UMBEL has also been split into ‘core’ and ‘geo’ modules. The resulting smaller size of UMBEL ‘core’ — now some 26,000 reference concepts — has also enabled us to create a full visualization of UMBEL’s content graph.

The first notable change in UMBEL v. 1.05 is its mapping to schema.org. schema.org is a collection of schema (usable as HTML tags) that webmasters can use to markup their pages in ways recognized by major search providers. schema.org was first developed and organized by the major search engines of Bing, Google and Yahoo!; later Yandex joined as a sponsor. Now many groups are supporting schema.org and contributing vocabularies and schema.

I was one of the first to hail schema.org hours after its announcement [7]. It seemed only fair that we put our money where our mouth is and map UMBEL to it as well.

The UMBEL-schema.org mapping was manually done by, firstly, searching and inspecting the current UMBEL concept base for appropriate matches. If that mapping failed to find a rather direct correspondence between existing UMBEL concepts and the types in schema.org, the source concept reference of OpenCyc was then inspected in a similar manner. Failing a match from either of these two sources, the decision was to add a new concept to the ‘core’ UMBEL. This new concept was then appropriately placed into the UMBEL reference concept subject structure.

The net result of this process was to add 298 mapped schema.org types to UMBEL. This mapping required a further three concepts from OpenCyc, and a further 78 new reference concepts, to be added to UMBEL. Along with the new updates to UMBEL and its mappings, the section of Key Files below provides further explanatory links. We are reserving the addition of schema.org properties for a later time, when we plan to re-organize the Attributes SuperType within UMBEL.

Modularization of the UMBEL Vocabulary

Even in the early development of UMBEL there was a tension about the scope and level of what geographic information to include in its concept base. The initial decision was to support country and leading-country province and state concepts, and some leading cities. This decision was in the spirit of a general reference structure, but still felt arbitrary.

GeoNames is devoted to geographical information and concepts — both natural and human artifacts — and has become the go-to resource for geo-locational information. The decision was thus made to split out the initial geo-locational information in UMBEL and replace it with mappings to GeoNames. This decision also had the advantage of beginning a process of modularization of UMBEL. UMBEL Vocabulary and Reference Concept Ontology

Two sets of reference concepts were identified as useful for splitting out from the ‘core’ UMBEL in a geo-locational aspect:

  1. Geopolitical places and places of human activities and facilities
  2. Natural geographical places and features.

These removed concepts were then placed into a separate ‘geo’ module of UMBEL, including all existing annotations and relations, resulting in a module of 1,854 concepts. That left 26,046 concepts in UMBEL ‘core’. Because of some shared parent concepts, there is some minor overlap between the two modules. These are now the modular splits in UMBEL version 1.05.

Mapping to GeoNames

GeoNames has a different structure to UMBEL. It has few classes and distinguishes its geographic information on the basis of some 671 feature codes. These codes span from geopolitical divisions — such as countries, states or provinces, cities, or other administrative districts — to splits and aggregations by natural and human features. Types of physical terrain — above ground and underwater — are denoted, as well as regions and landscape features governed by human activities (such as vineyards or lighthouses) [1]. We wanted to retain this richness in our mappings.

We needed a bridge between feature codes and classes, a sort of umbrella property generally equivalent to owl:sameAs in nature, but with some possible inexactitude or degree of approximation. The appropriate choice here is umbel:correspondsTo, which was designed specifically for this purpose [2]. This predicate is thus the basis for the mappings.

The 671 GeoNames feature codes were manually mapped to corresponding classes in the UMBEL concepts, in a manner identical to what was described for schema.org above. The result was to add another further three OpenCyc concepts and to add 88 new UMBEL reference concepts to accommodate the full GeoNames feature codes. We thus now have a complete representation of the full structure and scope of GeoNames in UMBEL.

There are three modes in which one can now work with UMBEL:

  1. With UMBEL ‘core’ alone, recommended when your concept space is not concerned with geographical information
  2. UMBEL ‘core’ plus the UMBEL ‘geo’ module — equivalent to prior versions of UMBEL, or
  3. UMBEL ‘core’ plus GeoNames, recommended where geographical information is important to your concept space.

In the latter case, you may use SPARQL queries with the umbel:correspondsTo predicate to achieve the desired retrievals. If more logic is required, you will likely need to look to a rules-based addition such as SWRL [3] or RIF [4] to capture the umbel:correspondsTo semantics.

New Big Graph Visualization

Because of the UMBEL modularization, it has now become tractable to graph the main ontology in its entirety. The core UMBEL ontology contains about 26,000 reference concepts organized according to 33 super types. There are more than 60,000 relationships amongst these concepts, resulting in a graph structure of very large size.

It is difficult to grasp this graph in the abstract. Thus, using methods earlier described in our use of the Gephi visualization software [5], we present below a dynamic, navigable rendering of this graph of UMBEL core:

Note: at standard resolution, if this graph were to be rendered in actual size, it would be larger than 34 feet by 34 feet square at full zoom !!! Hint: that is about 1200 square feet, or 1/2 the size of a typical American house !

Note: If you are viewing this in a feed reader, click here to see the interactive graph.

This UMBEL graph displays:

  • All 26,000 concepts (“nodes”) with labels, and with connections shown (though you must must zoom to see)
  • The color-coded relation of these nodes to the 33 or so major SuperTypes in UMBEL, as well as the relative position of these clusters with respect to one another, and
  • When zooming (use scroll wheel or + icon) or panning (via mouse down moves), wait a couple of seconds to get the clearest image refresh:

You may also want to inspect a static version of this big graph by downloading a PDF.

Key Files and Links

Lastly, we fully updated the UMBEL Web site and re-released the UMBEL wiki.


[1] For more information on GeoNames, see http://www.geonames.org/. The complete mapping to GeoNames is based on its 671 feature codes, which describe natural, geopolitical, and human activity geo-locational information; see further http://www.geonames.org/statistics/total.html

[2] Approximate relationships are discussed in M.K. Bergman, 2010. “The Nature of Connectedness on the Web,” AI3:::Adaptive Information blog, November 22, 2010; see http://www.mkbergman.com/935/the-nature-of-connectedness-on-the-web/. One option, for example, is the x:coref predicate from the UMBC Ebiquity group; see further Jennifer Sleeman and Tim Finin, 2010. “Learning Co-reference Relations for FOAF Instances,” Proceedings of the Poster and Demonstration Session at the 9th International Semantic Web Conference, November 2010; see http://ebiquity.umbc.edu/_file_directory_/papers/522.pdf. In the words of Tim Finin of the Ebiquity group:

The solution we are currently exploring is to define a new property to assert that two RDF instances are co-referential when they are believed to describe the same object in the world. The two RDF descriptions might be incompatible because they are true at different times, or the sources disagree about some of the facts, or any number of reasons, so merging them with owl:sameAs may lead to contradictions. However, virtually merging the descriptions in a co-reference engine is fine — both provide information that is useful in disambiguating future references as well as for many other purposes. Our property (:coref) is a transitive, symmetric property that is a super-property of owl:sameAs and is paired with another, :notCoref that is symmetric and generalizes owl:differentFrom.

When we look at the analog properties noted above, we see that the property objects tend to share reflexivity, symmetry and transitivity. We specifically designed the umbel:correspondsTo predicate to capture these close, nearly equivalent, but uncertain degree of relationships.

[3] SWRL (Semantic Web Rule Language) combines sublanguages of the OWL Web Ontology Language (OWL DL and Lite) with those of the Rule Markup Language (Unary/Binary Datalog). SWRL has the full power of OWL DL, but at the price of decidability and practical implementations. See further http://www.w3.org/Submission/SWRL/.
[4] The Rule Interchange Format (RIF) is a W3C Recommendation. RIF is based on the observation that there are many “rules languages” in existence, and what is needed is to exchange rules between them. RIF includes three dialects, a Core dialect which is extended into a Basic Logic Dialect (BLD) and Production Rule Dialect (PRD). See further http://www.w3.org/2005/rules/wiki/RIF_FAQ.
[5] See further, M.K. Bergman, 2011. “A New Best Friend: Gephi for Large-scale Networks,” AI3:::Adaptive Information blog, August 8, 2011.
[6] schema.org lists its various contributing schema and also provides an OWL ontology of the system.
[7] See further, M.K. Bergman, 2011. “Structured Web Gets Massive Boost,” AI3:::Adaptive Information blog, June 2, 2011.

Posted by AI3's author, Mike Bergman Posted on May 21, 2012 at 12:26 am in Ontologies, Structured Web, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/999/new-umbel-release-gains-schema-org-geonames-capabilities/
The URI to trackback this post is: http://www.mkbergman.com/999/new-umbel-release-gains-schema-org-geonames-capabilities/trackback/
Posted:December 5, 2011

Open Semantic Framework Ontology Modularization and Roles within an OSF Instance

For some time now, Structured Dynamics (SD) has been touting the unique advantages of ODapps, or ontology-driven applications [1]. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules. When these supplements are added to standard ontology functions, we collectively term them adaptive ontologies [2].

To further the discussion around ODapps, today we are publishing two new documents, using the semantic technology foundation of the open semantic framework. OSF is a comprehensive, open source stack of SD and external tools that provides a turnkey environment for enterprises to adopt semantic technologies and approaches. OSF has been designed from the ground up to be an ontology-driven application framework.

The first new document, posted on Fred Giasson’s blog, provides a detailed discussion of the dozen or so roles ontologies can play within an OSF installation. Fred’s document is geared more to specific properties and configurations useful to deploy this framework; that is, the “drivers” in an ODapp setting. The second new document — this one — is more of a broad overview of the modularization and architecture of the constituent ontologies that make up an OSF installation. Both documents have also been posted to SD’s open content TechWiki [3], which now has about 360 technical articles on understanding and implementing an OSF installation, importantly including its ontologies.

OSF Constituent Ontologies

As presently configured, an OSF installation may typically utilize most or all of the following internal ontologies:

  • The SCO Ontology (Semantic Component Ontology)
  • The WSF Ontology (Web Service Framework Ontology)
  • The AGGR Ontology (Aggregation Ontology)
  • The irON Ontology (Instance Record and Object Notation Ontology)
  • One or more domain ontologies, to capture the concepts and relationships for the purposes of a given OSF installation, and
  • Possibly UMBEL (optional) or other upper-level concept ontologies, used for linkages to external systems.

(Note: the internal wiki links to each of these ontologies also provides links to the actual ontology specifications on Github.)

Depending on the specific OSF installation, of course, multiple external ontologies may also be employed. Some of the common external ones used in an OSF installation are described by the external ontologies document on the TechWiki. These external ontologies are important — indeed essential in order to ensure linkage to the external world — but have little to do with internal OSF control structures. That is why the rest of this discussion is focused on internal ontologies only.

The OSF Ontologies Architecture

The actual relationships between these ontologies are shown in the following diagram. Note that the ontologies tend to cluster into two main areas:

  1. Content (or domain) ontologies, which tend to embody more of the traditional ontology functions such as information interoperability. inferencing, reasoning and conceptual and knowledge capture of the applicable domain; and
  2. Administrative ontologies, which govern internal application use and user interface interactions.

This ontology architecture supports the broader open semantic framework:

(click for full size)

The WSF ontology plays a special role in that it sets the overall permission and access rights to the other components and ontologies. The UMBEL ontology (or other upper-level ontologies that might be chosen) is also optional. Such vocabularies are included when interoperability with external applications or knowledge bases is desired.

Summary of OSF Roles

We can further disaggregate these ontology splits with respect to the specific dozen or so ontology roles discussed in Fred’s complementary piece on ontology roles in OSF. These dozen roles are shown by the rows with interactions marked for the various ontologies:

S
C
O
A
G
G
R
W
S
F
i
r
O
N
D
o
m
a
i
n
U
M
B
E
L
Define record descriptions
Inform interface displays
Integrate different data sources
Define component selections
Define component behaviors
Guide template selection
Provide reasoning and inference
Guide content filtering (with and without inference)
Tag concepts in text documents
Help organize and navigate Web portals
Manage datasets and ontologies
Set access permissions and registrations

One of the unique aspects of adaptive ontologies is their added role in informing user interfaces and supporting specific semantic tools. Note, for example, the role of the content ontologies in informing interface displays, as well as their use in tagging concepts (via information extraction). These additional roles are the reason that these ontologies are shown as straddling both content and administrative functions in the first figure.

See Fred’s piece to learn more about these dozen roles.

Interactions Are More Complex than Arrows

Naturally, a simple drawn arrow between ontologies (first figure) or a checkmark on a matrix (table above) can hide important details of how these interactions between ontologies and components actually work. In an earlier article, we discussed how the whole workflow takes place between users and user interface selections affecting the types of data returned by those selections, and then the semantic components (widgets) used to display them. This example interaction is shown by the following animation:

(click for full size)

The blue nodes show the ontology interactions. These, in turn, instruct how the various components (yellow) and code (green) need to operate. These interactions are the essence of an ontology-driven app. The software is expressively designed to respond to specifications in the ontology(ies) used, and the ontologies themselves embrace some additional properties specific to driving those apps.

Possible Future Directions

ODapps are a relatively new paradigm, from which we continue to learn more about uses and potentials. We have wanted to write the first versions of these two new documents for some time, but have held off as we learned and exploited further the latent potentials in this design. As it stands, we see further potentials in this approach, and will therefore be likely adding new ontologies and capabilities to the general system for some time.

Some of the areas that look promising to us include:

  • A generalized statistical ontology, especially as it can inform data displays in the semantic components
  • Even more capable widgets in business intelligence (BI) uses, with a concomitant expansion of the vocabulary (predicates and classes) in some of the underlying ontologies
  • More aggregation and summation functions supported by the AGGR ontology, and
  • Still further improved permissions and access layers in the WSF ontology.

These potentials arise from the native power of the design basis for ontology-driven apps. Conceptually, the design is simplicity itself. Operationally, the system is extremely flexibile and robust. Strategically, it means that development and specification efforts can now move from coding and programmers to ontologies and the subject matter users who define and depend on them. With these advantages, who can argue with that?


[1] For the most comprehensive discussion of ODapps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
[2] See M.K. Bergman, 2009. “Ontologies as the ‘Engine’ for Data-Driven Applications“, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.
[3] Slight revisions of these documents have been posted to the TechWiki as Role and Use of Ontologies in OSF and OSF Ontologies Modularization and Architecture, respectively.

Posted by AI3's author, Mike Bergman Posted on December 5, 2011 at 12:01 pm in Ontologies, Open Semantic Framework | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/989/an-ontologies-architecture-for-ontology-driven-apps/
The URI to trackback this post is: http://www.mkbergman.com/989/an-ontologies-architecture-for-ontology-driven-apps/trackback/
Posted:November 15, 2011

UMBEL Vocabulary and Reference Concept OntologyImproved Ontology Navigation and Management in Read-only and Editable Forms

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part four discusses structOntology, the online ontology viewing and management tool that is an integral part of the open semantic framework (OSF), the framework that hosts the UMBEL portal.

Ontologies are the central governing structure or “brains” of a semantic installation. As provided by the OSF framework, ontologies are also the basis for instructing user interface labels and how the interface behaves. The Web is about global access, immediacy, flexibility and adaptability. Why can’t our use of ontologies be the same?

Unlike similar tools of the past, structOntology exists on the same installation as the ontology that drives it. It is a backoffice ontology editing and management tool that is part of the conStruct tool suite, accessible via the OSF admin panel. There is no need to go off to a separate application, make changes, re-import, and then test. structOntology allows all of that to occur locally with the instance in which it resides. Also, there are some important functionality differences — especially finding and selecting stuff and search — that sets structOntology apart from existing, conventional tools.

Yet, that being said, structOntology is also not the complete Swiss Army knife for ontology management. It is designed for local and immediate use. Its spectrum of functionality is not as complete as other ontology frameworks (for example, supporting reasoners, consistency testers or plug-ins). So, for immmediate and locally relevant use, structOntology appears to be the appropriate tool. For more detailed ontology work or testing, other frameworks are perhaps more useful. And, in recognition of these roles, structOntology also has robust import and export capabilities that enable these dual local-detailed use scenarios. For these distinctions, see further the structOntology v Protégé? document.

structOntology comes in two versions. First, there is the read-only version, which can be made publicly available, that is a great aid to ontology navigation and discovery. This is the version viewable on the UMBEL portal. Second, there is an editable version, which is only available to administrators via a back office function within an OSF instance. Some screen shots of this version, plus pointers to more documentation about it, are provided below.

OWL API as a First-class Citizen

What enables OSF to treat ontologies as a first-class citizen — viewable and editable from within the applications in which they operate — results from the incorporation of the OWL API as one of the major engines underlying the structWSF Web services framework, the key foundational basis to an OSF installation. As noted in Part 2 of this series, the OWL API is one of the four major engines supporting structWSF:

The OWL API is the same engine used by Protégé 4, which is why both structOntology and Protégé are fully interoperable.

Besides interoperabilty, the use of the OWL API also means that other OWL API-based tools, such as reasoners or mappers, may be linked into the system. This design is in keeping with our normative view of an ontology tooling landscape, which Structured Dynamics keeps pursuing in a steady, incremental manner [2]. Further, because of its sibling engines, the OWL API and OSF are also able to leverage the other engines supporting structWSF, such as Solr for advanced search or efficient indexing in the RDF triplestore. (The advantages go both ways, too, such as for example enabling the OWL API to feed appropriate ontology specifications to the GATE text processing area for uses such as ontology-based information extraction [OBIE]). All of this makes for a most powerful and capable foundation to an OSF instance.

The Read-Only Version (UMBEL)

Since UMBEL is a reference ontology and the UMBEL portal is an access point to those references and specifications, we really don’t want casual users making modifications to the ontology [3]. For this reason, only a read-only version of structOntology is provided on the portal.

Access to the structOntology function occurs via the Ontology link on the UMBEL portal. Upon access, you are presented with the main structOntology interface:

The organization of the structOntology application presents all currently available and active ontologies listed in the left panel; UMBEL, of course, is the one selected here. Since this is a read-only version, only the View button shows up in the right-hand panel. (For the options available in the editable version, see below.)

View Option

Upon invoking the View option, the hierarchical tree for the selected ontology appears on the left; structural and definitions on the right.

You may expand the tree and explore the structure deeper by either clicking on the tree nodes in the left-hand panel or the item links in the right-hand panel. If there are further levels in the tree, you will get the JavaScript ‘working’ icon and then see the tree expanded with the new node information shown to the right.

Also note that your interaction with the structOntology application is recounted via the “breadcrumbs” listing at the upper left of the application. The green arrow icon allows you to expand or collapse various sections in the display.

Tooltips

The tree labels are themselves based on the preferred labels assigned to things. However, if you want to see the actual ontology URI reference, you can do so via the tooltip when mousing over the item:

Ontology view tooltips

The tooltip shows the full URI path (unique identifier) of the selected item.

Classes Tab

This example has been based on the Classes tab, which are the reference concepts in the UMBEL context. In read-only mode, the basic information presented is the tree structure, the item description and prefLabel, and super and sub class information in the right-hand panel. (More options are available in the editable version; see below.)

Properties Tab

Properties — that is the relations or predicates between items or nodes — are presnted in a similar manner to that for Classes. The Properties tab has the same basic layout and operations as the Classes tab, including similar right-hand panels:

The Editable Version

The editable version of structOntology shares all of the functionality of the read-only version. Besides adding editing capabilities, the editable version also has other functionality related to general ontology creation and management. There is separate documentation for the editable version; the examples below are from a different instance than UMBEL.

The editable version is accessed via the backoffice admin function within an OSF instance. When invoked, it also has more management options presented in the right-hand panel:

We’ll highlight some of the differences from the read-only version below.

Create New Option

The first notable addition is the ability to create ontologies (as well as to delete, or Remove, them):

The URL (such as http://purl.org/ontology/myont#) becomes the base URI for the new ontology. The new ontology is created with a basic structure, from which you only need fill in your new concepts or classes and relationships:

Basic stubbing is provided for the new ontology to help bootstrap its development (not shown). Once created, this new ontology also now appears on the available local ontologies when first invoking the structOntology application.

View Option

Most screens are quite similar to the read-only version with the obvious change of replacing labels with edit boxes. It is via these edit fields that the ontology becomes editable. This change is quite evident for the View screen:

StructOntology view.png

Searching

Searching can take place on the currently active ontology or all loaded (available) ontologies. Note that selection was made above via the radiobutton under the search box.

Also, depending on settings, searching can also take place on only the preferred label, or on alternative labels or descriptions (in fact, all annotations). (This is part of the settings.)

When entering search terms, the system automatically attempts to complete the matching search phrase. A minimum of three entered characters guides this auto-completion functionality:

When search is initiated, the potential results list also auto-completes for what you have already typed into the search box. Upon selection of one of these items (or completion of the full search phrase), the structOntology system issues a search query to the remote server, which then acts to auto-populate the ontology tree on the left-hand panel. In this case, we have selected ‘communitiy facilities’:

The desired search results then automatically expand the ontology tree. This is really helpful for longer ontologies (the example one shown has about 3000 concepts and about 6000 axioms) and means quicker initial tree loading. Once completed, the (multiple) occurrences of the search item are shown in highlight throughout the tree.

Note this search is not necessarily restricted to the actual node label. Alternative labels and descriptions may also be used to find the search results. This greatly expands the findability of the search function. Here is a great example of matching the OWL API engine to Solr underneath a structWSF instance.

Tab Structure

The editable version of structOntology offers more detail in the right-hand panel when Viewing an item. These sections include:

  • Annotations
  • Structural relationships
  • Instances
  • Linkage to characteristics, and
  • Advanced settings.

Each section is editable. All have auto-complete. Each section may also be expanded or collapsed.

General Operations

Each panel has an expand and collapse arrow shown at the upper right of its panel. These causes the panel’s individual entries to either be exposed or hidden. At the right of each entry, new entries can be invoked with the green plus symbol; existing entries can be deleted with the red minus symbol. (See Structural Relationships below.)

In working with each panel, note that each entry also has the search and auto-complete features earlier noted. Drag-and-drop is also contextual into these panels or not, depending on the nature of the item selected in the left-hand panel (tree).

Annotations

Annotations provide the descriptions about the thing at hand and its associated metadata. (These are separately defined under the Properties tab, or as part of the imported ontology specification.) The available annotations are displayed in this panel when expanded:

Entries are simply provided by entering values into the text fields and then Saving.

Structural Relationships

The structural relationships are the means to set parent and child relations between concepts, as well as to instruct disjoint or equivalent class relations. The Structural Relationships panel is the key one for setting the interconnections within the graph structure at the heart of the governing ontology.

Most of the key structural relationships in OWL are provided by this panel. (However, note there are some additional and rarely used structural specifications in OWL. These must be set via a third-party external application. Such potential interactions are made possible via the flexible import and export options with structOntology).

Instances (Individuals)

Another right-hand panel provides the facility to assign individuals to the classes (or concepts) established under the prior two panels. In this case, we are looking at some specific ‘community facilities’ to assign to that concept:

As with the prior panels, a new instance may be added or discarded ones deleted. Individual instances and their characteristics may also be updated or changes.

Linkage to Characteristics

Another aspect to OSF ontologies is the ability to relate concepts to various metadata characteristics or attributes that might describe that concept’s instances. This relationship is done via the dedicated hasCharacteristic property, which is assigned via this right-hand panel:

This option has the specific behavior of allowing one or more properties (characteristics) to be asserted for a given a class (concept).

Advanced Options

Display and widget and other options are set under the Advanced Options panel. One item to note are the widgets that may be assigned for displaying a given information item:

The relationship of widgets (or semantic components) to information items is a deserving topic in its own right. For more information about this topic, see the semantic components category.

Contextual Drag-and-Drop

In edit mode, it is possible to drag items from the left-hand tree panel into the specifications at the right. This is contextual. In this first example, we see an attempt to drop a “class” result (or concept) into the annotation panel, which violates the structure of the system and is therefore not allowed (as shown by the visual red X cues):

However, if we drag and drop from the tree in an allowable structural definition, we get the visual green check as a cue the move is legal:

This functionality and feedback means that only allowable assignments can be dropped into a new structural definition.

Export Option

Another piece of functionality in the editable version is the export option. When invoked, Export brings up the save dialog with the ability to assign an ontology file name:

Upon saving, it stores the currently active ontology in RDF/XML format:

Export is not active in UMBEL do to the large size of the ontology. If you want to obtain it directly, you may do so from the UMBEL ontology CVS.

Import Option

An Import option is available in the editable version. structOntology import supports all OWL API serializations, specifically RDF/XML, N3, Manchester Syntax and Turtle. When import is invoked, a file open dialog is presented that enables you to find the ontology on your local hard drive:

The Import feature has no file extension limitations; make care to pick and assign the proper types for importation.

Via the Import and Export buttons, it is possible to work locally with structOntology while exporting to more capable third-party tools. Then, once use of those tools is complete, Import provides the ability to re-import the updated ontology back into the local collection.

File Options

Finally, as a server-based system accessed via Web services, there are some slightly different concepts necessary to keep in mind when using the editable version of structOntology. These distinctions need to be kept in mind because you might be working with the local version or the one on the main server. These file options are:

  • Save — saves all modifications on the file, on the server. Then, all modifications will be used if you do a Reload
  • Unload — removes the currently active ontology from the local instance, but does NOT remove it from the server. It merely acts to remove that ontology for local use in the current session
  • Remove — a full delete of the ontology, both locally and on the server
  • Update — recreates the serializations files created from these ontologies, like the .SRZ files used by structWSF and conStruct; the ironXML schema used by the semantic components, etc. The Update option is the most common one when updating an ontology locally, for which you want the persistent version on the remote server to be kept in sync
  • Reload — reloads the server version. If prior local work had not been updated, then a reload acts as a way to restore the remote instance to the local one without change..

These are all available via buttons under the main right-hand panel in structOntology and are more fully described in the edit version documentation.

Additional Information

Additional information on structOntology may be found in an online video:

UMBEL small logo

This is the fourth of a multi-part series on the newly updated UMBEL services. Other articles in this series are:


[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.
[2] See especially the second figure and the accompanying discussion in this document.
[3] The appropriate pathway for suggested changes to the UMBEL ontology itself is via its official mailing list.

Posted by AI3's author, Mike Bergman Posted on November 15, 2011 at 1:33 pm in Ontologies, Open Semantic Framework, UMBEL | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/988/umbel-services-part-4-structontology/
The URI to trackback this post is: http://www.mkbergman.com/988/umbel-services-part-4-structontology/trackback/