Posted:March 15, 2017

CognontoDialing In Queries from the General to the Specific

Inferencing is a common term heard in association with semantic technologies, but one that is rarely defined and still less frequently described as to value and rationale. I try to redress this gap in part with this article.

Inferencing is the drawing of new facts, probabilities or conclusions based on reasoning over existing evidence. Charles Sanders Peirce classed inferencing into three modes: deductive reasoning, inductive reasoning and abductive reasoning. Deductive reasoning extends from premises known to be true and clear to infer new facts. Inductive reasoning looks at the preponderance of evidence to infer what is probably true. And abductive reasoning poses possible explanations or hypotheses based on available evidence, often winnowing through the possibilities based on the total weight of evidence at hand or what is the simplest explanation. Though all three reasoning modes may be applied to knowledge graphs, the standard and most used form is deductive reasoning.

An inference engine may be applied to a knowledge graph and its knowledge bases in order to deduce new knowledge. Inference engines apply either backward- or forward-chaining deductive reasoning. In backward chaining, the reasoning tests are conducted “backwards” from a current consequent or “fact” to determine what antecedents can support that conclusion, based on the rules used to construct the graph. (“What reasons bring us to this fact?”) In forward chaining the opposite occurs; namely, a goal or series of goals are stated and then existing facts (as rules) are checked to see which ones can lead to the goal. (” A goal X may be possible because of?”) The process is iterated until the goal is reached or not; if reached, new knowledge in terms of heretofore unstated connections may be added to the knowledge base.

Inference engines can be applied at the time of graph building or extension to test the consistency and logic of the new additions. Or, semantic reasoners may be applied to a current graph in order to expand queries for semantic search or for these other reasoning purposes. In the case of Cognonto‘s KBpedia knowledge structure, which is written in OWL 2, though the terminology is slightly different, the groundings are in first-order logic (FOL) and description logics. These logical foundations provide the standard rules by which reasoners can be applied to the knowledge graph [1]. In this article, we will not be looking at how inferencing is applied during graph construction, a deserving topic in its own right. Rather, we will be looking at how inferencing may be applied to the existing graph.

Use of Reasoning at Run Time

Once a completed graph passes its logic tests during construction, perhaps importantly after being expanded for the given domain coverage, its principal use is as a read-only knowledge structure for making subset selections or querying. The standard SPARQL query language, occasionally supplemented by rule-based queries using SWRL or for bulk actions using the OWL API, are the means by which we access the knowledge graph in real time. In many instances, such as for the KBpedia knowledge graph, these are patterned queries. In such instances, we substitute variables in the queries and pass those from the HTML to query templates.

When doing machine learning, generally slices get retrieved via query and then staged for the learner. A similar approach is taken to generate entity lists for things like training recognizers and taggers. Some of the actions may also do graph traversals in order to retrieve the applicable subset.

However, the main real-time use of the knowledge structure is search. This relies totally on SPARQL. We discuss some options on how this is controlled below.

Hyponymy, Subsumption and Natural Classes

Select Your Spray

The principal reasoning basis in the knowledge graph is based on hierarchical, hyponymous relations and instance types. These establish the parent-child lineages, and enable individuals (or instances, which might be entities or events) to be related to their natural kinds, or types. Entities belong to types that share certain defining essences and shared descriptive attributes.

For inferencing to be effective, it is important to try to classify entities into the most natural kinds possible. I have spoken elsewhere about this topic [2]; clean classing into appropriate types is one way to ensure the benefits from related search and related querying are realized. Types may also have parental types in a hyponymous relation. This ‘accordion-like’ design is an important aspect that enables external schema to be tied into multiple points in KBpedia [3].

Disjointedness assertions, where two classes are logically distinct, and other relatedness options provide other powerful bases for winnowing potential candidates and testing placements and assignments. Each of these factors also may be used in SPARQL queries.

These constructs of semantic Web standards, combined with a properly constructed knowledge graph and the use of synonymous and related vocabularies in semsets as described in a previous use case, provide powerful mechanisms for how to query a knowledge base. By using these techniques, we may dial-in or broaden our queries, much in the same way that we choose different types of sprays for our garden watering hose. We can focus our queries to the particular need at hand. We explain some of these techniques in the next sections.

Adjusting Query Focus

We can see a crude application of this control when browsing the KBpedia knowledge graph. When we enter a particular query, in this case, ‘knowledge graph‘, one result entry is for the concept of ontology in information science. We see that a direct query gives us a single answer:

Direct Query

However, by picking the inferred option, we now see a listing of some 83 super classes for our ontology concept:

Inferring Querying

By reasoning for deductive inference, we are actually broadening our query to include all of the parental links in the subsumption chain within the graph. Ultimately, this inference chain traces upward into the highest order concept in the graph, namely owl:Thing. (By convention, owl:Thing itself is excluded from these inferred results.)

By invoking inference in this case, while we have indeed broadened the query, it also is quite indiscriminate. We are reaching all of the ancestors to our subject concept, reaching all of the way to the root of the graph. This broadening is perhaps more than what we actually seek.

Scoping Queries via Proerty Paths

Among many other options, SPARQL also gives us the ability to query specific property paths [4]. We can invoke these options either in our query templates or programmatically in order to control the breadth and depth of our desired query results.

Let’s first begin with the SPARQL query that uses ‘knowledge graph’ in its altLabel

==============
select ?s ?p ?o
from <http://kbpedia.org/1.40/>
where
{
  ?s <http://www.w3.org/2004/02/skos/core#altLabel> "Knowledge graph"@en ;
     ?p ?o .
}
==============

You can see from the results below that only the concept of ontology (information science) is returned as a prefLabel result, with the concept’s other altLabels also shown:

      ==============
      s     p     o

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      http://www.w3.org/2002/07/owl#Class
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#isDefinedBy
      http://kbpedia.org/kko/rc/
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2000/01/rdf-schema#subClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#prefLabel "Ontology (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontological distinction(computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Language"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology media"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "New media relations"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Strong ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologies (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (information science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology Libraries (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontologing"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Computational ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Ontology library (computer science)"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Populated ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Knowledge graph"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#altLabel "Domain ontology"@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://www.w3.org/2004/02/skos/core#definition    
      "In computer science and information science, an ontology is a formal
      naming and definition of the types, properties, and interrelationships of
      the entities that really or fundamentally exist for a particular domain
      of discourse."@en
      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://kbpedia.org/ontologies/kko#superClassOf
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

This result gives us the basis for now asking for the direct parents of our ontology concept, using this query:

      ==============
      select ?directParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>
        ?directParent .
      }
      ==============

We see that the general concepts of knowledge representation-CW and ontology are parents to our concept, as well as the external Wikipedia result on ontology (information science):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>

      ==============

If we turn on the inferred option, we will get the full listing of the 83 concepts noted earlier. This is way too general for our current needs.

While it is not possible to specify a depth using SPARQL, it is possible to use property paths to control the extent of the query results from the source. In this case, we specify a path length of 1:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,1}
        ?inferredParent .
      }
      ==============

Which produces results equivalent to the “direct” search (namely, direct parents only):

      ==============
      directParent

      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/Ontology
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      ==============

However, by expanding our path length to two, we now can request the parents and grandparents for the ontology (information science) concept:

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,2}
        ?inferredParent .
      }
      =============

This now gives us 15 results from the parental chain:

      ==============
      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254

      ==============

Similarly we can expand our query request to a path length of 3, which gives us the parental chain from parents + grandparents + great-grandparents):

      ==============
      select ?inferredParent
      from <http://kbpedia.org/1.40/>
      where
      {
        <http://kbpedia.org/kko/rc/OntologyInformationScience>
          <http://www.w3.org/2000/01/rdf-schema#subClassOf>{,3}
        ?inferredParent .
      }
      =============

In this particular case, we do not add any further results for great-grandparents:

      ==============

      inferredParent

      http://kbpedia.org/kko/rc/OntologyInformationScience
      http://wikipedia.org/wiki/Ontology_(information_science)
      <http://wikipedia.org/wiki/Ontology_%28information_science%29>
      http://kbpedia.org/kko/rc/Ontology
      http://kbpedia.org/kko/rc/KnowledgeRepresentation-CW
      http://umbel.org/umbel/rc/KnowledgeRepresentation-CW
      http://kbpedia.org/kko/rc/PropositionalConceptualWork
      http://wikipedia.org/wiki/Knowledge_representation
      http://sw.opencyc.org/concept/Mx4r4e_7xpGBQdmREI4QPyn0Gw
      http://umbel.org/umbel/rc/Ontology
      http://kbpedia.org/kko/rc/StructuredInformationSource
      http://kbpedia.org/kko/rc/ClassificationSystem
      http://wikipedia.org/wiki/Ontology
      http://sw.opencyc.org/concept/Mx4rv7D_EBSHQdiLMuoH7dC2KQ
      http://kbpedia.org/kko/rc/Technology-Artifact
      http://www.wikidata.org/entity/Q324254
      ==============

Without a property path specification, our inferred request would produce the listing of 83 results shown by the Inferred tab on the KBpedia knowledge graph, as shown in the screen capture provided earlier.

The online knowledge graph does not use these property path restrictions in its standard query templates. But these examples show how programmatically it is possible to broaden or narrow our searches of the graph, depending on the relation chosen (subClassOf in this example) and the length of the specified property path.

Many More Options and Potential for Control

This use case is but a small example of the ways in which SPARQL may be used to dial-in or control the scope of queries posed to the knowledge graph. Besides all of the standard query options provided by the SPARQL standard, we may also remove duplicates, identify negated items, and search inverses, selected named graphs or selected graph patterns.

Beyond SPARQL and now using SWRL, we may also apply abductive reasoning and hypothesis generation to our graphs, as well as mimic the action of expert systems in AI through if-then rule constructs based on any structure within the knowledge graph. A nice tutorial with examples that helps highlight some of the possibilities in combining OWL 2 with SWRL is provided by [5]

A key use of inference is its ability to be applied to natural language understanding and the extension of our data systems to include unstructured text, as well as structured data. For this potential to be fully realized, it is important that we chunk (“parse”) our natural language using primitives that themselves are built upon logical foundations. Charles S. Peirce made many contributions in this area as well. Semantic grammars that tie directly into logic tests and reasoning would be a powerful addition to our standard semantic technologies. Revisions to the approach taken to Montague grammars may be one way to achieve this illusive aim. This is a topic we will likely return to in the months to come.

Finally, of course, inference is a critical method for testing the logic and consistency of our knowledge graphs as we add new concepts, make new relations or connections, or add attribute data to our instances. All of these changes need to be tested for consistency moving forward. Nurturing graphs by testing added concepts, entities and connections is an essential prerequisite to leveraging inferencing at run time as well.

This article is part of an occasional series describing non-machine learning use cases and applications for Cognonto’s KBpedia knowledge graph. Most center around the general use and benefits of knowledge graphs, but best practices and other applications are also discussed. Prior machine learning use cases, and the ones from this series, may be found on the Cognonto Web site under the Use Cases main menu item.

[1] See, for example, Markus Krötzsch, Frantisek Simancik, and Ian Horrocks, 2012. “A Description Logic Primer.” arXiv preprint, arXiv:1201.4089; and Franz Baader, 2009.  “Description Logics,” in Sergio Tessaris, Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried Handschuh, Marie-Christine  Rousset, and Renate  A. Schmidt, editors, Reasoning Web. Semantic Technologies for Information Systems – 5th International Summer School, 2009, volume 5689 of LNCS, pages 1–39. Springer, 2009. 
[2] M.K. Bergman, 2015. “‘Natural Classes’ in the Knowledge Web,” in AI3:::Adaptive Information blog, July 13, 2015.
[3] M.K. Bergman, 2016. “Rationales for Typology Designs in Knowledge Bases,” in AI3:::Adaptive Information blog, June 6, 2016.
[4] Steve Harris and Andy Seaborne, eds., 2013. SPARQL 1.1 Query Language, World Wide Web Consortium (W3C) Recommendation, 21 March 2013; see especially Section 9 on property paths.
[5] Martin Kuba, 2012. “Owl 2 and SWRL Tutorial,” from Kuba’s Web site. 

Posted by AI3's author, Mike Bergman Posted on March 15, 2017 at 12:10 pm in Cognonto, KBpedia, Searching, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/
The URI to trackback this post is: https://www.mkbergman.com/2030/uses-and-control-of-inferencing-in-knowledge-graphs/trackback/
Posted:October 24, 2016

CognontoSearch Both Reference Concepts and Entities

We have expanded the search function for Cognonto’s KBpedia knowledge graph. When first released last month, the KBpedia search was limited to reference concepts (RCs) only. With today’s upgrade, all of KBpedia’s 20 million entities can now be searched.

Though you can start investigating the Knowledge Graph (KG) simply by clicking links, to really discover items of specific interest you will need to do a search. These search functions for the Knowledge Graph are described below.

Two Search OptionThe Knowledge Graph (KG) may be searched for either reference concepts (RCs) or entities.

On the main KG page (and for all other pages in the KG section except for actual results pages), the search box has a dropdown list function next to the search button. Via this dropdown, you may select to search either Reference Concepts or Entities. Depending on your selection, that choice also shows in the title to the search box. Whatever choice you make in the dropdown list is retained until you select a different option. The default option when you first encounter the Knowledge Graph is to search Reference Concepts.

Once your search specification is entered in the search box, you must click the Search button to invoke the actual search.

If you pick the Reference Concepts main search option, there are two different behaviors available to you.

With Autocompletion

Concept Search AutocompletionNote our search box has Reference Concepts as its title. As you type in the search box, the system’s autocompletion will return RC candidates in a dropdown list. Only preferred labels (the canonical “name”) and terminal URI strings will match for autocomplete. Semsets and terms in the RCs description will not match.

Each newly entered character in the search box narrows the possible matching results. If your query string ceases prompting with RC candidates, there are no matches for the current substring query on preferred labels or URI fragments in the system.

There are some styling and naming conventions used to assign RC preferred labels and URI fragments you may observe over time that may make finding the right RC query more effective. Realize, however, that there are multiple ways RCs can be named and referenced. If autocomplete does not match what you think the RC name might be, try next the search without autocompletion (see next).

If you do get matches to the query string, you will be presented with one or more live link options to specific RCs in the dropdown list box. Pick the option you are seeking to go to the RC Record for your specific reference concept of interest.

Without Autocompletion

The search without autocompletion is a broader one, and is often useful when you have been unable to formulate an effective query for the preferred label of an RC of interest. To conduct a search without autocompletion, simply provide a query string in the search box without picking one of the autocompletion dropdown prompts (should they occur). Clicking the Search Concepts button will take you to a standard search of the concepts in KBpedia. Doing so brings you to a paginated set of search results pages, with 20 results per page.

Concept Search Options

Once you are on a results listing page, your search options change. Again you are given a dropdown list box, whereby you may restrict the actual concept search to one of these fields:

  • Preferred label — this is the title or standard name for the reference concept
  • Alternative labels — these are semset labels for synonyms for the RC
  • Description — this is the definition or description field for the reference concept; most RCs have this field
  • URI — this is the actual Web identifier for the RC; search is based on a sub-string search of the URI, or
  • All content — all four fields above are included in the search.

When you pick a result from the search list, you are taken to the RC Record report (see the How to Use the Knowledge Graph page).

The standard search box gives you a dropdown where you can choose to conduct an Entities or Reference Concepts search (see first figure above). If you choose Entities, that is so indicated in the label to the search button.

When the Entities search is chosen, you have a choice to restrict the actual search to one of these fields:

Entity Search Options

  • Preferred label — this is the title or standard name for the entity
  • Description — this is the definition or description field for the entity; not all entities have this field
  • URI — this is the actual Web identifier for the entity; search is based on a sub-string search of the URI, or
  • All content — all three fields above are included in the search.

Whichever of these four choices you make, the selection appears as the title to the search box and remains in effect until you change the dropdown option. The default when you first invoke the Entities search is ‘All’.

To actually conduct the search, you need to click on the Search Entities button.

Depending on which optional field you selected, the results count will vary. Obviously, the largest number of results arise from the ‘All’ field choice.

Entities Search Results

Search results for entities generally presents fewer details in the results than for RCs, as this figure shows:

Entities Search Results

 

Some results are limited to a title (prefLabel). Others may include altLabels or descriptions, depending on the nature of the record. Each result has an icon to the upper right indicating the source of the entity record. Clicking that icon (background highlighted text) presents the standard entity results listing as described on the How to Use the Knowledge Graph page.

Enjoy!

Posted by AI3's author, Mike Bergman Posted on October 24, 2016 at 8:45 am in KBpedia, Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/1991/kbpedia-knowledge-graph-gets-expanded-search/
The URI to trackback this post is: https://www.mkbergman.com/1991/kbpedia-knowledge-graph-gets-expanded-search/trackback/
Posted:February 11, 2013

The Semantic Enterprise Part 5 in the Enterprise-scale Semantic Systems Series

We become such captives to our language and what we think we are saying. Many basic words or concepts, such as “search,” seem to fit into that mould. A weird thing with “search” is that twenty years ago the term and its prominence were quite different. Today, “search” is ubiquitous and for many its embodiment is Google such that “to google” is often the shorthand for “to search”.

When we actually do “search”, we submit a query. The same extension that has inveigled its tendrils into search has also caused the idea of a query to become synonymous with standard text (Google) search.

But, there’s more, much more, both within the meaning and the execution of “to search”.

Enterprises, familiar with structured query language (SQL), have understood for quite some time that queries and search were more than text searches to search engines. Semantic technologies have their own structured query approach, SPARQL. Enterprises know the value of search from discovery to research and navigation. And, they also intuitively know that they waste much time and don’t often get what they want from search. U.S. businesses alone could derive $33 billion in annual benefits from being better able to re-find previously discovered Internet information, an amount equal to $10 million per firm for the 1,000 largest businesses [1]. And those benefits, of course are only for Internet searches. There are much larger costs arising from unnecessary duplicated effort because of weaknesses in internal search [1].

The thing that’s great about semantic search — done right — is it combines conventional text search with structured search, adds more goodies, and basically overcomes most current search limitations.

Many Kinds of Search

The Webster definition of “search” is to, look into or over carefully or thoroughly in an effort to find or discover something.”

There are two telling aspects to this definition. One, search may be either casual or careful, from “looking” into something to being “thorough”. Second, search may have as its purpose finding or discovery. Finding, again, implies purpose or research. Discovery can range from serendipity to broadening one’s understanding or horizons given a starting topic.

Prior to the relational systems, network databases represented the state-of-the-art. One of E.F. Codd‘s stated reasons in developing the relational approach and its accompanying SQL query language was to shift the orientation of databases from links and relationships (the network approach) to query and focused search [2]. By virtue of the technology design put forward, relational databases shifted the premise to structured information and direct search queries. Yet, as noted, this only represents the purposeful end of the search spectrum; navigation and discovery now becomes secondary.

Text search and (text) search engines then came to the fore, offering a still-different model of indexing and search. Each term became a basis for document retrieval, leading to term-based means of scoring (the famous Salton TF/IDF statistical model), but with actually no understanding of the semantic structure or meaning of the document. Other term-based retrieval bases, such as latent semantic indexing, were put forward, but these were based on the statistical relationships between terms in documents, and not the actual meaning of the text or natural language within the documents.

What we see in the early evolution of “search” is kind of a fragmented mess. Structured search swung from navigation to purposeful queries. Text search showed itself to be term-based and reliant on Boolean logic. Each approach and information store thus had its own way to represent or index the data and a different kind of search function to access it. Web search, with its renewal of links and relationships, further shifted the locus back to the network model.

State-of-the-art semantic search , as practiced by Structured Dynamics, has found a way to combine these various underlying retrieval engines with the descriptive power of the graph and semantic technologies to provide a universal search mechanism across all types of information stores. We describe this basis more fully below, but what is important to emphasize at the outset is that this approach fundamentally addresses all aspects of search within the enterprise. As a compelling rationale for trying and then adopting semantic technologies, semantic search is the primary first interest for most enterprises.

Unique Advantages to Semantic Search

The first advantage of semantic search is that all content within the organization can be combined and searched at once. Structured stuff . . . documents . . . image metadata . . . databases . . . can now all be characterized and put on an equivalent search footing. As we just discussed in text as a first class citizen, this power of indexing all content types is the real dynamo underneath semantic search.

The universality of search means that being able to search all available content is awesome. But, being able to add the dimensions of relationships between things means that the semantic graph takes information exploration to a totally new level.

The simplest way to understand semantic search is to de-construct the basic RDF triple down to its fundamentals. This first tells us that the RDF data model is able to represent any thing, that is, an object or idea. And, we can represent that object in virtually any way that any viewer would care to describe it, in any language. Do we want it to be big, small? blue, green? meaningful, silly? smart, stupid? The data model allows this and more. We can capture how diverse users describe the same thing in diverse ways.

But, now that I have my world populated with things and descriptions of them, how do they connect? What are the relationships between these things? It is the linkages — the connections, the relationships — between things that give us context, the basis for classifying, and as a result, the means to ascertain the similarity or adjacency of those things. These sorts of adjacencies then enable us to understand the “pattern” of the thing, which is ultimately the real basis for organizing our world.

The rich brew of things (‘nouns”) and the connections between them (“verbs”) starts to give our computers a basis for describing the world more akin to our real language. It is not perfect, and even if it were, it would still suffer from the communication challenges that occur between all of us as humans. Language itself is another codified way of transmitting messages, which will always suffer some degree of loss [3]. But in this comment we can also glean a truth: humans interacting with their computer servants will be more effective the more “human” their interfaces are. And this truth can also give us some insight into what search must do.

First, we are interested in classifying and organizing things. The idea of “facets”, the arrangement of search results into categories based on indexed terms, is not a new one in search. In conventional approaches, “facets” are approached as a kind of dimension, one that is purposefully organized, sometimes hierarchically. In Web interfaces, facets most often appear as a listing in a left-hand column from which one or more of these dimensions might be selected, sometimes with a count number of potential results after the facet or sometimes with a checkbox or such by which multiple of these facets might be combined. In essence, these facets act as structural or classificatory “filters” for the content at hand. This is made potentially more powerful when also combined with basic keyword search.

In semantic search, facets may be derived from not only what types of things exist in the search space, but also what kinds of attributes (or properties) connect them. And, this all comes for free. Unlike conventional faceting, no one needs to decide what are the important “dimensions” or any such. With semantic search, the very basis of describing the domain at hand creates an organization of all things in the space.  As a result of semantic search, this combination of entities and properties leads to what could be called “global faceting”. The structure of how the domain is described is the sole basis required to gain — and make universal to the information space — these facets.

Whoa! How did that happen? All we did is describe our information space, but now we have all of this rich structure. This is when the first important enterprise realization sets in:  how we describe the information in our domain is the driving, critical factor. Semantic search is but easy pickings from this baseline. What is totally cool about the nature of semantic search is that slicing-and-dicing would put a sushi restaurant to shame. Every property represents a different pathway; and every entry (node) is an entry point.

Second, because we have based all of this on an underlying logic model in descriptive logics, we gain a huge Archimedes’ lever about our information space. We do not need to state all of the relationships and organizations in our information space. We can infer them from the assertions already made. Two parents have a child? That child has a sibling? Then, we can infer the second child also has the same parents. The “facts” that one might assume about a given domain can grow by 10x or more when inference is included.

Now we can begin to see where the benefits and return from semantic search becomes evident. Semantic search also enables a qualitatively different content enrichment: we can use these broad understandings of our content to do better targeting, tagging, highlighting or relating concepts to one another. The fact that semantic search is simply a foundation to semantic publishing is noteworthy. We will discuss this topic in a later part to this series.

SD’s Approach: RDF Triple Store + Solr + OWLAPI

In recognition of the primacy of search, we at Structured Dynamics were one of the first in the semantic Web community to add Solr (based on Lucene) full-text indexing to the structured search of an RDF triple store [4]. We later added the OWL API to gain even more power in our structured queries [5]. These three components give us the best of unstructured and structured search, and enable us to handle all kinds of search with additional flexibility at scale. Since we historically combined RDF and Solr first, let’s discuss it first.

We first adopted Solr because traditional text search of RDF triple stores is not sufficiently performant and makes it difficult to retrieve logical (user) labels in place of the URIs used in semantic technologies. While RDF and its graph model provide manifest benefits (see below), text search is a relatively mature technology and Solr provided commercial-grade features and performance in an open source option.

In our design, the triple store is the data orchestrator. The RDF data model and its triple store are used to populate the Solr schema index. The structural specifications (schema) in the triple store guide the development of facets and dynamic fields within Solr. These fields and facets in Solr give us the ability to gain Solr advantages such as aggregates, autocompletion, spell checkers and the like. We also are able to capture the full text if the item is a document, enabling standard text search to be combined with the structural aspects orchestrated from the RDF. On the RDF side, we can leverage the schema of the underlying ontologies to also do inferencing (via forward chaining). This combination gives us an optimal search platform to do full-text search, aggregates and filtering.

Since our initial adoption of Solr, and Solr’s own continued growth, we have been able to (more-or-less) seamlessly embrace geo-locational based search, time-based search, the use of multiple search profiles and ranking and scoring approaches (using Solr’s powerful extended disMax edismax parser) and other advantages. We now have nearly five years of experience of the RDF + Solr combination. We continue to discover new functionality and power in this combination. We are extremely pleased with this choice.

On the structured data side, RDF and its graph model have many inherent advantages, as earlier described. One of those advantages is the graph structure itself:

Example Taxonomy Structure Example Ontology Structure
A distinguishing characteristic of ontologies compared to conventional hierarchical structures is their degree
of connectedness, their ability to model coherent, linked relationships

Another advantage over conventional structured search (SQL) with relational databases is performance. For example, as Rik Van Bruggen recently explained [6], RDBMs searches that need to obtain information from more than one table require a “join.” The indexes in all applicable tables need to be scanned recursively to find all the data elements fitting the query criteria. Conversely, in a graph database, the index needs only be accessed once to find the starting point in the graph, after which the relationships in the graph are “walked” to traverse the graph to find the next applicable data elements. The need for complete scans is what makes “joins” expensive computationally. Graph queries are incredibly fast because index lookups are hugely reduced.

Queries that experienced DBAs with relational databases would never attempt because of the excessive need for joins are trivial in a graph search.

Various graph databases provide canned means for traversing or doing graph-based operations. And that brings us to the second addition we added to the RDF triple store: inclusion of the OWL API. While it is true that our standard triple store, Virtuoso, has support for simple inferencing and forward chaining, the fact that our semantic technologies are based on OWL 2 means that we can bring more power to bear with an ontology-specific API, including reasoners.The OWL API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines. Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API.

Thus, besides the advantages of RDF and graph-based search, we can now reason over and manipulate the ontologies themselves to bring even more search power to the system. Because of the existing integrations between the triple store and Solr, these same retrieval options can also be used to inform Solr query retrievals.

Shaking Hands with the Enterprise

On the face of it, a search infrastructure based on three components — triple store + Solr + OWL API — appears more complex than a single solution. But, enterprises already have search provided in many different guises involving text or SQL-based queries. Structured Dynamics now has nearly five years experience with this combined search configuration. Each deployment results in better installation and deployment procedures, including scripting and testing automation. The fact there are three components to the search stack is not really the challenge for enterprise adoption.

This combined approach to search really poses two classes of challenges to the enterprise. The first, and more fundamental one, is the new mindset that semantic search requires. Facets need to be understood and widely embraced; graphs and graph traversals are quite new concepts; full incorporation of tagging to make text a first-class citizen with structured search needs to be embraced; and, the pivotal role of ontologies in driving the whole structural understanding of the domain and all the various ways to describe it means a shift in thinking from dedicated applications for specific purposes to generic ontology-driven applications. These new mindsets require concerted knowledge transfer and training. Many of the new implementers are now the subject matter experts and content editors within the enterprise, rather than developers. Dedicated effort is also necessary — and needs to be continually applied — to enable ontologies to properly and adaptively capture the enterprise’s understanding of its applicable domain.

These are people-oriented aspects that require documentation, training materials, tools and work processes. These topics, actually some of the most critical to our own services, are discussed in later parts to this ESSS series.

The second challenge is in the greater variability and diversity of the “dials and knobs” now available to the enterprise to govern how these search capabilities actually work. The ranking of search results can now embrace many fields and attributes; many different types of content; and potentially different contexts. Weights (or “boosts” in Solr terms) can be applied to every single field involved in a search. Fields may be included or excluded in searches, thereby acting as filters. Different processors or parsers may be applied to handle such things as text case (upper or lower), stemming for dealing with plurals and variants, spelling variants such as between British and American English, invoking or not synonyms, handling multiple languages, and the like.

This level of control means that purposeful means and frameworks must be put in place that enable responsible managers in the enterprise to decide such settings. Understanding of these “dials and knobs” must therefore also be transferred to the enterprise. Then, easily used interfaces for changing and selecting options and then comparing the results of those changes must be embedded in tools and transferred. (This latter area is quite exciting and one area of innovation SD will be reporting on in the near future.)

The Productivity Benefits

There are actually many public Web sites that are doing fantastic and admirable jobs of bringing broad, complicated, structured search to users, all without much if any semantic technologies in the back end. Some quick examples that come to mind are Trulia in real estate; Fidelity in financial products; Amazon in general retail, etc. One difficulty that semantic search has in comparison to the alternatives is that first-blush inspection of Web sites may not show many large differences.

The real advantages from semantic search comes in its productivity and flexibility. Semantic search frameworks are easier to construct, easier to extend, easier to modify and cheaper to build. Semantic search frameworks are inherently robust. Adding entirely new domains of scope — say from moving from a department level to the entire enterprise or accommodating a new acquisition — can be implemented in a fraction of the time without the need for rework.

It will be necessary to document the use case experience of early adopting enterprises to quantify these productivity and flexibility benefits. From Structured Dynamics’ experience, however, these advantages are in the range of one to two orders of magnitude in reduced deployment and maintenance costs compared to RDBMs-based approaches.

The Tie-in with Semantic Publication

Another hot topic of late has been “semantic publishing” that is of keen interest to media and content-intensive sites on the Web. What is interesting about semantic pubishing, however, is that it is completely founded on semantic search. All of the presentation or publishing of content in the interface (or in an exported form) is the result of search. Remember, due to Structured Dynamics’ semantic technology design with its structWSF interfaces, all interaction with the underlying engines and system occur via queries.

We will be talking much about semantic publishing toward the conclusion of this series. We will cover content enrichment, new kinds of products such as topic pages and semantic forms and widgets, and the fact that semantic publishing is available almost for “free” when your stack is based on semantic technologies with semantic search, SD-style.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] M.K. Bergman, 2004. “Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents,” BrightPlanet Corporation White Paper, December 2004, 41 pp. Published on this blog at https://www.mkbergman.com/82/untapped-assets-the-3-trillion-value-of-us-enterprise-documents/.
[2] See, for instance, the Wikipedia entry on the historical development of databases.
[3] M.K. Bergman, 2012. “What is Structure?,” AI3:::Adaptive Information blog, May 28, 2012.
[4] F. Giasson, 2009. “RDF Aggregates and Full Text Search on Steroids with Solr,” Fred Giasson’s blog, April 9, 2009.
[5] M.K. Bergman, 2010. “A New Landscape in Ontology Development Tools,”, AI3:::Adaptive Information blog, September 7, 2010.
[6] See, for example, Rik Van Bruggen, 2013. “Demining the ‘Join Bomb’ with Graph Queries,” Neo4J blog, January 28, 2013.
Posted:January 28, 2013

The Semantic Enterprise Part 4 in the Enterprise-scale Semantic Systems Series

Text, text everywhere, but no information to link!

For at least a quarter of a century the amount of information within an enterprise embedded in text documents has been understood to be on the order of 80%; more recent estimates put that contribution at 90%. But, whatever the number, or no matter how you slice it, the percentage of information in documents has been overwhelming for enterprises.

The first documentation systems, Documentum being a notable pioneer, helped keep track of versions and characterized its document stores with some rather crude metadata. As document management systems evolved — and enterprise search became a go-to application in its own right — full-text indexing and search was added to characterize the document store. Search allowed better access and retrieval of those documents, but still kept documents as a separate information store from the true first citizens of information in enterprises — structured databases.

That is now changing — and fast. Particularly with semantic technologies, it is now possible to “tag” or characterize documents not only in terms of administrative and manually assigned tags, but with concepts and terminology appropriate to the enterprise domain.

Early systems tagged with taxonomies or thesauri of controlled vocabulary specific to the domain. Larger enterprises also often employ MDM (master data management) to help ensure that these vocabularies are germane across the enterprise. Yet, even still, such systems rarely interoperate with the enterprises’ structured data assets.

Semantic technologies offer a huge leverage point to bridge these gaps. Being able to incorporate text as a first-class citizen into the enterprise’s knowledge base is a major rationale for semantic technologies.

Explaining the Basis

Let’s start with a couple of semantic givens. First, as I have explained many times on this blog, ontologies — that is, knowledge graphs — can capture the rich relationships between things for any given domain. Second, this structure can be more fully expressed via expanded synonyms, acronyms, alternative terms, alternative spellings and misspellings, all in multiple languages, to describe the concepts and things represented in this graph (a construct we have called “semsets“.) That means that different people talking about the same thing with different terminology can communicate. This capability is an outcome from following SKOS-based best practices in ontology construction.

Then, we take these two semantic givens and stir in two further ingredients from NLP. We first prepare the unstructured document text with parsing and other standard text processing. These steps are also a precursor to search; they provide the means for natural language processing to obtain the “chunks” of information in documents as structured data. Then, using the ontologies with their expanded SKOS labels, we add the next ingredient of OBIE (ontology-based information extraction) to automatically “tag” candidate items in the source text.

Editors are presented these candidates to accept or not, plus to add others, in review interfaces as part of the workflow. The result is the final subject “tags” assignment. Because it is important to tag both subject concepts or named entities in the candidate text, Structured Dynamics calls this approach “scones“. We have reusable structures and common terminology and syntax (irON) as canonical representations of these objects.

Add Conventional Metadata

Of course, not all descriptive information you would want to assign to a document is only what it is about. Much other structural information describing the document goes beyond what it is about.

Some of this information relates to what the document is: its size, its format, its encoding. Some of this information relates to provenance: who wrote it? who published it? when? when was it revised? And, some of this information relates to other descriptive relationships: where to download it? a picture of it; other formats of it. Of course, any additional information useful to describe the document can be also tagged on at this point.

This latter category is quite familiar to enterprise information architects. These metadata characterizations have been what is common for standard document management systems reaching back for three decades or more now.

So, naturally, this information has proven the test of time and also must have a pathway for getting assigned to documents. What is different is that all of this information can now be linked into a coherent knowledge graph of the domain.

Some Interface and Workflow Considerations

What we are seeking is a framework and workflow that naturally allows all exisitng and new documents to be presented through a pipeline that extends from authoring and review to metadata assignments. This workflow and the user interface screens associated with it are the more difficult aspects of the challenge. It is relatively straightforward to configure and set up a tagger (though, of course, better accuracy and suitability of the candidate tags can speed overall processing time). Making final assignments for subject tags from the candidates and then ensuring all other metadata are properly assigned can be either eased or impeded by the actual workflows and interfaces.

The trick to such semi-automatic processes is to get these steps right. There are the needs for manual overrides when the suggested, candidate tags are not right. Sometimes new terms and semset entries are found when reviewing the processed documents; these need to be entered and then placed into the overall domain graph structure as discovered. The process of working through steps on the tag processing screens should be natural and logical. Some activities benefit from very focused, bespoke functionality, rather than calling up a complicated or comprehensive app.

In enterprise settings these steps need to be recorded, subject to reviews and approvals, and with auditing capabilities should anything go awry. This means there needs to be a workflow engine underneath the entire system, recording steps and approvals and enabling things to be picked up at any intermediate, suspended point. These support requirements tend to be unique to each enterprise; thus, an underlying workflow system that can be readily modified and tailored — perhaps through scripting or configuration interfaces — is favored. Since Drupal is our standard content and user interface framework, we tend to favor workflow engines like State Machine over more narrow, out-of-the-box setups such as the Workflow module.

These screens and workflows are not integral to the actual semantic framework that governs tagging, but are essential complements to it. It is but another example of how the semantic technologies in an enterprise need to be embedded and integrated into a non-semantic environment (see the prior architecture piece in this series).

But, Also Some Caveats

Yet, what we have described above is the technology and process of assigning structured information to documents so that they can interoperate with other data in the enterprise. Once linked into the domain’s knowledge graph and once characterized by the standard descriptive metadata, there is now the ability to search, slice, filter, navigate or discover text content just as if it were structured data. The semantic graph is the enabler of this integration.

Thus, the entire ability of this system to work derives from the graph structure itself. Creating, populating and maintaining these graph structures can be accomplished by users and subject matter experts from within the enterprise, but that requires new training and new skills. It is impossible to realize the benefits of semantic technologies without knowledgeable editors to maintain these structures. Because of its importance, a later part in this series deals directly with ontology management.

While ontology development and management are activities that do not require programming skills or any particular degrees, they do not happen by magic. Concepts need to be taught; tools need to be mastered; and responsibiilties need to be assigned and overseen to ensure the enterprise’s needs are being met. It is exciting to see text become a first-class information citizen in the enterprise, but like any purposeful human activity, success ultimately depends on the people involved.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.
Posted:January 14, 2013

The Semantic Enterprise Part 2 in the Enterprise-scale Semantic Systems Series

Those involved with the semantic Web are passionate as to why they are involved. This passion and the articulateness behind it are notable factors in why there is indeed a ‘semantic Web community.’ Like few other fields — perhaps including genomics or 3D manufacturing — semantic technologies tend to attract exceptionally smart, committed and passionate people.

Across this spectrum of advocates there are thousands of pages of PDFs and academic treatises as to semantic this or semantic that. There is gold in these hills, and much to mine. But, both in grants and in approaching customers, it always comes down to the questions of: What is the argument for semantic technologies? What are the advantages of a semantic approach? What is the compelling reason for spending time and money on semantics as opposed to alternatives?

Fred Giasson and I at Structured Dynamics feel we have done a pretty fair job of answering these questions. Of course, it is always hard to prove a negative — how do the arguments we make stack up against those we have not? We will never know.

Yet, on the other hand, we have found dedicated customers and steady and growing support from the arguments we do make. At least we know we are not scaring potential customers away. Frankly, we suspect our market arguments are pretty compelling. While we discuss many aspects of semantic technologies in our various writings and communications, we have also tended to continually hone and polish our messages. We keep trying to focus. Fewer points are better than more and points that resonate with the market — that address the “pain points” in common parlance — have the greatest impact.

It is also obvious that the arguments an academic needs to make to a funding agency or commission are much different than what is desired by commercial customers. (Not to mention the US intelligence community, which is the largest — yet silent — funder of semantic technologies.) Much of what one can gain from the literature is more of this academic nature, as are most discussions on mailing lists and community fora. We distinctly do not have the academic perspective. Our viewpoint is that of the enterprise, profit-making or non-profit. Theory takes a back seat to pragmatics when there are real problems to solve.

Our three main selling points to this enterprise market relate to data integration and interoperability; search and discovery; and leveraging existing information assets with low risk. How we paint a compelling picture around these topics is discussed for each point below. We conclude with some thoughts about how and the manner we communicate these arguments, perhaps representing some background that others might find useful in how they may make such arguments themselves.

“Semantic Technologies Enable Data Integration and Interoperability”

As I have experienced first hand and have argued many times [1], the Holy Grail of enterprise information technology over the past thirty years has been achieving true data integration and interoperability. It is, I believe, the primary motivating interest for most all IT efforts not directly related to conventional transaction systems. Yet, because of this longstanding and abiding interest, enterprise IT managers react with justifiable skepticism every time new advances in interoperability are claimed.

The claims for semantic technologies are not an exception. But, even in its positioning, there is something in the descriptive phrasing of “semantic technologies” that resonates with the market. Moreover, to overcome the initial skepticism, we also tend to emphasize two bolstering arguments promoting interoperability:

  1. Semantic technologies matched with natural language (NLP) techniques work to integrate unstructured data, finally incorporating the 80% of enterprise information locked up in documents and overcoming the limitations of manually assigned tags, and
  2. The RDF data model is capable of capturing any existing data relationship, and ontologies are capable of capturing any existing information schema.

Since these are two of the core aspects to data integration and have heretofore been limited with conventional approaches, and since they can be demonstrated rather quickly, trust can be placed into the ultimate interoperability argument.

In the end, the ability of semantic technologies to promote rather complete data integration and interoperability will prove to be its most compelling rationale. Yet, achieving this with semantic technologies will require more time and broader scope than what has been instituted to date. By starting smaller and simpler, a more credible entry argument can be made that also is on the direct pathway to interoperability benefits.

“Semantic Technologies Improve Search and Discovery”

On the face of it, search engines and the search function are nearly ubiquitous. Further, search is generally effective in eventually finding information of interest, though sometimes the process of getting there is lengthy and painful.

This inefficiency results because search has three abiding problems. One, there is too much ambiguity in what kind of thing is being requested; disambiguation to the context at hand is lacking. Second, there is a relative lack of richness in the kinds of relationships between things that are presented. We are learning through Web innovations like Wikipedia or the Google Knowledge Graph that there are many attributes that can be related to the things we search. The natural desire is to now see such relationships in enterprise search as well, including some of this public, external content. And, third, because of these two factors, search is not yet an adequate means for discovering new insights and knowledge. We see the benefits of serendipitous discovery, but we have not yet learned how to do this with purpose or in a repeatable way.

More often than not customers see search, with better display of results, at the heart of the budget rationale for semantic projects. The graph structures of semantic schema means that any node can become an entry point to the knowledge space for discovery. The traversal of information relationships occurs from the selection of predicates or properties that create this graph structure in the first place. This richness of characterization of objects also means we can query or traverse this space in multiple languages or via the full spectrum by which we describe or characterize things. Semantic-based knowledge graphs are potentially an explosion of richness in characterization and how those characterizations get made and referred to by any stakeholder. Search structure need not be preordained by some group of designers or information architects, but can actually be a reflection of its user community. It should not be surprising that search offers the quickest and most visible path to conveying the benefits of semantic technologies.

These arguments, too, are a relatively quick win. We can rapidly put in place these semantic structures that make improved search benefits evident. There are two nice things about this argument. First, it is not necessary to comprehensively capture the full knowledge domain of the customer’s interests to show these benefits. Relatively bounded projects or subsets of the domain are sufficient to show the compelling advantages. And, second, as this initial stakehold gets expanded, the basis for the next argument also becomes evident.

“Semantic Technologies Leverage Existing Assets with Low Risk”

I have often spoken about the incremental nature of how semantic technologies might be adopted and the inherent benefits of the open world mindset. This argument is less straightforward to make since it requires the market to contemplate assumptions they did not even know they had.

But, one thing the market does know is the brittleness and (often) high failure rates of knowledge-based internal IT projects. An explication of these causes of failure can help, via the inverse, to make the case for semantic technologies.

We know (or strongly suspect), for example, that these are typically the causes of knowledge-based IT failures:

  • Too broad a scope or the need to embrace too much of the information basis of the domain
  • Changing knowledge and circumstances that causes initial design imperatives to change over the course of a project
  • High visibility for multiple audiences and stakeholders, and no workable means for finding a common view or consensus as to objectives (let alone terminology) for the project amongst these stakeholders.

Getting recognition for these types of failures or challenges creates the opening for discussing the logic underpinnings of conventional IT approaches. The conventional closed-world approach, which is an artifact of using information systems developed for transaction and accounting purposes, is unsuited to open-ended knowledge purposes. The argument and justification for semantic technologies for knowledge systems is that simple.

The attentive reader will have seen that the first two arguments presented above already reify this open world imperative. The integration argument shows the incorporation of non-structured content as a first-class citizen into the information space. The search argument shows increased scale and richness of relationships as new topics and entities get added to the search function, all without adversely impacting any of the prior work or schema. For both arguments, we have expanded our scope and schema alike without needing to re-architect any of the semantic work that preceded it. This is tangible evidence for the open world argument in the context of semantic technologies applied to knowledge problems.

These evidences, plus the fact we have been increasingly incorporating more sources of information with varied structure, most of which already exists within the enterprise’s information assets, shows that semantic technologies can leverage benefits from existing assets at low risk. At this point, if we have told our story well, it should be evident that the semantic approach can be expanded at whatever pace and scope the enterprise finds beneficial, all without impacting what has been previously implemented.

Actually, the argument that semantic technologies leverage existing assets with low risk is perhaps the most revolutionary of the three. Most prior initiatives in the enterprise knowledge space have required wholesale changes or swapping out of existing systems. The unique contribution of semantic technologies is that they can achieve their benefits as a capability layered over existing assets, all without disruption to their existing systems and infrastructure. The degree to which this layering takes place can be driven solely by available budgets with minimal risk to the enterprise.

Ambassadors and Archivists, as well as Entrepreneurs

There are, of course, other messages than can be made, and we ourselves have made them in other circumstances and articles. The three main arguments listed herein, however, are the ones we feel are most useful at time of early engagement with the customer.

Our messages and arguments gain credibility because we are not just trying to “sell” something. We understand that semantic technologies and the mindsets behind them are not yet commonplace. We need to be ambassadors for our passion and work to explain these salient differences to our potential markets. As later parts in this series will discuss, with semantic technologies, one needs to constantly make the sale.

The best semantic technology vendors understand that market education is a core component to commercial success. Once one gets beyond the initial sale, it is a constant requirement to educate the customer with the next set of nuances, opportunities and technologies.

We acknowledge that vendors have other ways to generate “buzz” and “hotness.” We certainly see the consumer space filled with all sorts of silliness and bad business models, But our pragmatic approach is to back up our messaging with full documentation and market outreach. We write much and contribute much, all of which we document on vehicles such as our blogs, commercial Web site, or TechWiki knowledge base. New market participants need to learn and need to be armed with material and arguments for their own internal constituencies. Insofar as we are the agents making these arguments, we also get perceived as knowledgeable subject matter experts in the semantic technology space.

I have talked in my Of Flagpoles and Fishes article of the challenges of marketing to a nascent market where most early sales prospects remain hidden. At this stage in the market, our best approach is to share and communicate with new market prospects in a credible and helpful way. Then, we hope that some of those seeking more information are also in a position to commission real work. If we are at all instrumental in those early investigations, we are likely to be considered as a potential vendor to fulfill the commercial need.

Of course, each new engagement in the marketplace means new lessons and new applications. Thus, too, it is important that we become archivists as well. We need to capture those lessons and feed them back to the marketplace in a virtuous circle of learning, sharing, and further market expansion. Targeted messages delivered by credible messengers are the keys to unlocking the semantic technologies market.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] Simply conduct a search on https://www.mkbergman.com/?s=interoperability+integration to see how frequently this topic is a focus of my articles.