Posted:July 16, 2014

Battle of Niemen, WWI, photo from WikimediaAre We Losing the War? Was it Even the Right One?

Cinemaphiles will readily recognize Akira Kurosawa‘s Rashomon film of 1951. And, in the 1960s, one of the most popular book series was Lawrence Durrell‘s The Alexandria Quartet. Both, each in its own way, tried to get at the question of what is truth by telling the same story from the perspective of different protagonists. Whether you saw this movie or read these books you know the punchline: the truth was very different depending on the point of view and experience — including self-interest and delusion — of each protagonist. All of us recognize this phenomenon of the blind men’s view of the elephant.

I have been making my living and working full time on the semantic Web and semantic technologies now for a full decade. So has my partner at Structured Dynamics, Fred Giasson. Others have certainly worked longer in this field. The original semantic Web article appeared in Scientific American in 2000 [1], and the foundational Resource Description Framework data model dates from 1999. Fred and I have our own views of what has gone on in the trenches of the semantic Web over this period. We thought a decade was a good point to look back, share what we’ve experienced, and discover where to point our next offensive thrusts.

What Has Gone Well?

The vision of the semantic Web in the Scientific American article painted a picture of globally interconnected data leveraged by agents or bots designed to make our lives easier and more automated. However, by the time that I got directly involved, nearly five years after standards first started to be published, Tim Berners-Lee and many leading proponents of RDF were beginning to shift focus to linked data. The agents, and automation, and ontologies of the initial vision were being downplayed in favor of effective means to publish and consume data based on RDF. In many ways, linked data resembled a re-branding.

This break had been coming for a while, memorably captured by a 2008 ISWC session led by Peter F. Patel-Schneider [2]. This internal division of viewpoint likely caused effort to be split that would have been better spent in proselytizing and improving tools. It also diverted somewhat into internal squabbles. While many others have pointed to a tactical mistake of using an XML serialization for early versions of RDF as a key factor is slowing initial adoption, a factor I agree was at play, my own suspicion is that the philosophical split taking place in the community was the heavier burden.

Whatever the cause, many of the hopes of the heady days of the initial vision have not been obtained over the past fifteen years, though there have been notable successes.

The biomedical community has been the shining exemplar for data interoperability across an entire discipline, with earth sciences, ecology and other science-based domains also showing interoperability success [3]. Families of ontologies accompanied by tooling and best practices have characterized many of these efforts. Sadly, though, most other domains have not followed suit, and commercial interoperability is nearly non-existent.

Most all of the remaining success has resided in single-institution data integration and knowledge representation initiatives. IBM’s Watson and Apple’s Siri are two amazing capabilities run and managed by single institutions, as is Google’s Knowledge Graph. Also, some individual commercial and government enterprises, willing to pay support to semantic technology experts, have shown success in data integration, using RDF, SKOS and OWL.

We have seen the close kinship between natural language, text, and Q & A with the semantic Web, also demonstrated by Siri and more recent offshoots. We have seen a trend toward pairing great-performing open source text engines, notably Solr, with RDF and triple stores. Recommendation systems have shown some success. Linked data publishing has also had some notable examples, including the first of the lot, DBpedia, with certain institutional publishers (such as the Library of Congress, Eurostat, The Getty, Europeana, OpenGLAM [galleries, archives, libraries, and museums]) showing leadership and the commitment of significant vocabularies to linked data form.

On the standards front, early experience led to new and better versions of the SPARQL query language (SPARQL 1.1 was greatly improved in the last decade and appears to be one capability that sells triple stores), RDF 1.1 and OWL 2. Certain open source tools have become prominent, including Protégé, Virtuoso (open source) and Jena (among unnamed others, of course). At least in the early part of this history, tool development was rapid and flourishing, though the innovation pace has dropped substantially according to my tracking database Sweet Tools.

What Has Disappointed?

My biggest disappointments have been, first, the complete lack of distributed data interoperability, and, second, the lack or inability of commercial enterprises to embrace and adopt semantic technologies on their own. The near absence of discussion about instance records and their attributes helps frame the current maturity of the semantic Web. Namely, it has yet to crack the real nuts of data integration and interoperability across organizations. Again, with the exception of the biomedical community, neither in the linked data realm nor in the broader semantic Web, can we point to information based on semantic Web principles being widely shared between systems and organizations.

Some in the linked data community have explicitly acknowledged this. The abstract for the upcoming COLD 2014 workshop, for example, states [4]:

. . . applications that consume Linked Data are not yet widespread. Reasons may include a lack of suitable methods for a number of open problems, including the seamless integration of Linked Data from multiple sources, dynamic discovery of available data and data sources, provenance and information quality assessment, application development environments, and appropriate end user interfaces.

We have written about many issues with linked data, ranging from the use of improper mapping predicates; to the difficulty in publishing; and to dereferencing URIs on the Web since they are sparse and not always properly implemented [5]. But ultimately, most linked data is just instance data that can be represented in simpler attribute-value form. By shunning a knowledge representation language (namely, OWL) at the processing end, we have put too much burden on what are really just instance records. Linked data does not get the balance of labor right. It ignores the reality that data consumers want actionable information over being able to click from data item to data item, with overall quality reduced to the lowest common denominator. If a publisher has the interest and capability to publish quality linked data, great! It should become part of the data ingest pool and the data becomes easy to consume. But to insist on linked data across the board creates unnecessary barriers. Linked data growth has not nearly kept pace with broader structured data growth on the Web [6].

At the enterprise level, the semantic technology stack is hard to grasp and understand for newcomers. RDF and OWL awareness and understanding are nearly nil in companies without prior semantic Web experience, or 99.9% of all companies. This is not a failure of the enterprises; it is the failure of us, the advocates and suppliers. While we (Structured Dynamics) have developed and continue to refine the turnkey Open Semantic Framework stack, and have spent more efforts than most in documenting and explicating its use, the systems are still too complicated. We combine complicated content management systems as user front-ends to a complicated semantic technology stack that needs to be driven by a complicated (to develop) ontology. And we think we are doing some of the best technology transfer around!

Moreover, while these systems are good at integrating concepts and schema, they are virtually silent on the question of actual data integration. It is shocking to say, but the semantic Web has no vocabularies or tools sufficient to enable data items for the same entity from two different datasets to be combined or reconciled [7]. These issues can be solved within the individual enterprise, but again the system breaks when distributed interoperability is the desire. General Web-based inconsistencies, such as in HTML coding or mime types, impose hurdles on distributed interoperability. These are some of the reasons why we see the successes in the context (generally) of single institutions, as opposed to anything that is truly yet Web-wide.

These points, as is often the case with software-oriented technologies, come down to a disappointing state of tooling. Markets drive developer interest, and market share has been disappointing; thus, fewer tools. Tool interest comes from commercial engagements, and not generally grants, the major source of semantic Web funding, particularly in the European Union. Pragmatic tools that solve real problems in user adoption are rarely a sufficient basis for getting a Ph.D.

The weaknesses in tooling extend from basic installation, to configuration, unit and integrated tests, data conversion and lifting, and, especially, all things ontology. Weaknesses in ontology tooling include (critically) mapping, consistency and coherency checking, authoring, managing, version control, re-factoring, optimization, and workflows. All of these issues are solvable; they are standard software challenges. But it is hard to conquer markets largely with the wrong army pursuing the wrong objectives in response to the wrong incentives.

Yet, despite the weaknesses in tooling, we believe we have been fairly effective in transferring technology to our clients. It takes more documentation and more training and, often, accompanying tool development or improvement in the workflow areas critical to the project. But clients need to be told this as well. In these still early stages, successful clients are going to have to expend more staff effort. With reasonable commitment, it is demonstrable that an enterprise can take over and manage a large-scale semantic engagement on its own. Still, for semantic technologies to have greater market penetration, it will be necessary to lower those commitments.

How Has the Environment Changed?

Of course, over the period of this history, the environment as a whole has changed markedly. The Web today is almost unrecognizable from the Web of 15 years ago. If one assumes that Web technologies tend to have a five year or so period of turnover, we have gone through at least two to three generations of change on the Web since the initial vision for the semantic Web.

The most systemic changes in this period have been cloud computing and the adoption of the smartphone. These, plus the network of workstations approach to data centers, have radically changed what is desirable in a large-scale, distributed architecture. APIs have become RESTful and database infrastructures have become flatter and more distributed. These architectures and their supporting infrastructure — such as virtual servers, MapReduce variants, and many applications — have in turn opened the door to performant management of large volumes of flat (key-value or graph) data, or big data.

On the Web side, JavaScript, just a few years older than the semantic Web, is now dominant in Web pages and taking on server-side roles (such as through Node.js). In turn, JSON has now grown in popularity as a form of data representation and transfer and is being adopted to the semantic Web (along with codifying CSV). Mobile, too, affects the Web side because of the need for multiple-platform deployments, touchscreen use, and different user interface paradigms and layout designs. The app ecosystem around smartphones has become a huge source for change and innovation.

Extremely germane to the semantic Web — indeed, overall, for artificial intelligence — has been the occurrence of knowledge-based AI (KBAI). The marrying of electronic Web knowledge bases — such as Wikipedia or internal ones like the Google search index or its Knowledge Graph — with improvements in machine-learning algorithms is systematically mowing down what used to be called the Grand Challenges of computing. Sensors are also now entering the picture, from our phones to our homes and our cars, that exposes the higher-order requirement for data integration combined with semantics. NLP kits have improved in terms of accuracy and execution speed; many semantic tasks such as tagging or categorizing or questioning already perform at acceptable levels for most projects.

On the tooling side, nearly all building blocks for what needs to be done next are available in open source, with some platform areas quite functional (including OSF, of course). We have also been successful in finding clients that agree to open source the development work we do for them, since they are benefiting from the open source development that went on before them.

What Did We Set Out to Achieve?

When Structured Dynamics entered the picture, there were already many tools available and core languages had been released. Our view of the world at that time led us to adopt two priorities for what we thought might be a five year or so plan. We have achieved the objectives we set for ourselves then, though it has taken us a couple of years longer to realize.

One priority was to develop a reference structure for concepts to serve as a “grounding” basis for relating datasets, vocabularies, schema, taxonomies, or ontologies. We achieved this with our first commercial release (v 1.00) of UMBEL in February 2011. Subsequent to that we have progressed to v 1.05. In the coming months we will see two further major updates that have been under active effort for about eight months.

The other priority was to create a turnkey foundation for a semantic enterprise. This, too, has been achieved, with many more releases. The Open Semantic Framework (OSF) is now in version 3.00, backed by a 500-article training documentation and technical wiki. Support tooling now includes automated installation, testing, and data transfer and synchronization.

Because our corporate objectives were largely achieved it was time to look at lessons learned and set new directions. This article, in part, is a result of that process.

How Did Our Priorities Evolve Over the Decade?

I thought it would be helpful to use the content of this AI3 blog to track how concerns and priorities changed for me and Structured Dynamics over this history. Since I started my blog quite soon after my entry into the semantic Web, the record of my perspectives was conterminous and rather complete.

The fifty articles below trace my evolution in knowledge and skills, as well as a progression from structured data to the semantic Web. These 50 articles represent about 11% of all articles in my chronological archive; they were selected as being the most germane to the question of evolution of the semantic Web.

After early ramp up, most of the formative discussion below occurred in the early years. Posts have declined most recently as implementation has taken over. Note most of the links below have  PDFs available from their main pages.

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

The early years of this history were concentrated on gathering background information and getting educated. The release of DBpedia in 2007 showed how knowledge bases would become essential to the semantic Web. We also identified that a lack of shared reference concepts was making it difficult to “ground” different semantic Web datasets or schema to one another. Another key theme was the diversity of native data structures on the Web, but also how all of them could be readily represented in RDF.

By 2008 we began to study the logical underpinnings to the semantic Web as we were coming to understand how it should be practiced. We also began studying Web-oriented architectures as key design guidance going forward. These themes continued into 2009, though now informed by clients and applications, which was expanding our understanding of requirements (and, sometimes, shortcomings) in the enterprise marketplace. The importance of an open world approach to the basic open nature of knowledge management was cementing a clarity of the role and fit of semantic solutions in the overall informaton space. The general community shift to linked data was beginning to surface worries.

2010 marked a shift for us to become more of a popularizer of semantic technologies in the enterprise, useful to attract and inform prospects. The central role of ontologies as the guiding structures (either as codified knowledge structures or as instruction sets for the platform) for OSF opened realizations that generic functional software could be designed that can be re-used in most any knowledge domain by simply changing the data and ontologies guiding them. This increased our efforts in ontology tooling and training, now geared more to the knowledge worker.  The importance of groundings for aligning schema and data caused us to work hard on UMBEL in 2011 to get it to a commercial release state.

All of these efforts were converging on design thoughts about the nature of information and how it is signified and communicated. The bases of an overall philosophy regarding our work emerged around the teachings of Charles S Peirce and Claude Shannon. Semantics and groundings were clearly essential to convey accurate messages. Simple forms, so long as they are correct, are always preferred over complex ones because message transmittal is more efficient and less subject to losses (inaccuracies). How these structures could be represented in graphs affirmed the structural correctness of the design approach. The now obvious re-awakening of artificial intelligence helps to put the semantic Web in context: a key subpart, but still a subset, of artificial intelligence. The percentage of formative articles directly related over these last couple of years to the semantic Web drops much, as the emphasis continues to shift to tech transfer.

What Else Did We Learn?

Not all lessons learned warranted an article on their own. So, we have also reflected on what other lessons we learned over this decade. The overall theme is: Simpler is better.

Distributed data interoperability across the Web is a fundamental weakness. There are no magic tricks to integrate data. Data mapping and integration will always require massaging. Each data integration activity needs its own solution. However, it can greatly be helped with ontologies and with better tooling.

In keeping with the lesson of grounding, a reference ontology for attributes is missing. It is needed as a bridge across disparate datasets describing similar entities or with different attributes for the same entities. It is also a means to reduce the pairwise combinatorial issue of integrating multiple datasets. And, whatever is done in the data integration area, an open world approach will be essential given the nature of knowledge information.

There is good design and best practice for distributed architectures. The larger these installations become, the more important it is to use a lightweight, loosely-coupled design. RESTful Web services and their interfaces are key. Simpler services with fewer functions can be designed to complement one another and increase throughput effectiveness.

Functional programming languages align well with the data and schema in knowledge management functions. Ontologies, as structures, also fit well with functional languages. The ability to create DSLs should continue to improve bringing the knowledge management function directly into the hands of its users, the knowledge workers.

In a broader sense, alluded to above, the semantic Web is but a set of concepts. There are multiple ways to use it. It can be leveraged without requiring “core” semantic Web tools such a triple stores. Solr can act as a semantic store because semantics, NLP and search are naturally married. But, the semantic Web, in turn, needs to become re-embedded in artificial intelligence, now backed by knowledge bases, which are themselves creatures of the semantic Web.

Design needs to move away from linked data or the semantic Web as the goals. The building blocks are there, though perhaps not yet combined or expressed well. The real improvements now to the overall knowledge function will result from knowledge bases, artificial intelligence, and the semantic Web working together. That is the next frontier.

Overall, we perhaps have been in the wrong war for the wrong reasons. Linked data is certainly not an end and mostly appears to represent work, rather than innovation. The semantic Web is no longer the right war, either, because improvements there will not come so much from arguing semantic languages and paradigms. Learning how to master distributed data integration will teach the semantic Web much, and coupling artificial intelligence with knowledge bases will do much to improve the most labor-intensive stumbling blocks in the knowledge management workflow: mappings and transformations. Further, these same bases will extend the reach into analytical and statistical realms.

The semantic Web has always been an infrastructure play to us. On that basis, it will be hard to ever judge market penetration or dominance. So, maybe in terms of a vision from 15 years ago the growth of the semantic Web has been disappointing. But, for Fred and me, we are finally seeing the landscape clearly and in perspective, even if from a viewpoint that may be different from others’. From our vantage point, we are at the exciting cusp of a new, broader synthesis.

NOTE: This is Part I of a two-part series. Part II will appear shortly.

[1] Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web,” in Scientific American 284(5): pp 34-43, 2001. See http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2.
[2] For those with a spare 90 minutes or so, you may also want to view this panel session and debate that took place on “An OWL 2 Far?” at ISWC ’08 in Karlsruhe, Germany, on October 28, 2008. The panel was chaired by Peter F. Patel-Schneider (Bell Labs, Alcathor) with the panel members of Stefan Decker (DERI Galway), Michel Dumontier (Carleton University), Tim Finin (University of Maryland) and Ian Horrocks (University of Oxford), with much audience participation. See http://videolectures.net/iswc08_panel_schneider_owl/
[3] Open Biomedical Ontologies (OBO) is an effort to create controlled vocabularies for shared use across different biological and medical domains. As of 2006, OBO formed part of the resources of the U.S. National Center for Biomedical Ontology (NCBO). As of the date of this article, there were 376 ontologies listed on the NCBO’s BioOntology site. Both OBO and BioOntology provide tools and best practices.
[4] Fifth International Workshop on Consuming Linked Data (COLD 2014), co-located with the 13th International Semantic Web Conference (ISWC) in Riva del Garda, Italy, October 19-20.
[7] See the thread on the W3C semantic web mailing list beginning at http://lists.w3.org/Archives/Public/semantic-web/2014Jul/0129.html.
Posted:January 20, 2013

The Semantic Enterprise Part 3 in the Enterprise-scale Semantic Systems Series

The interests of enterprise architects and semantic technologists do not align. An enterprise architect has the viewpoint of the enterprise and its full breadth of IT requirements, from security to access to content and maintainability (all of which needs to be justified to non-IT managers). The semantic technologist tends to view his entire world through the lens of semantic technologies.

If one is a resident within the semantic technology community, more often than not today’s assessment is that semantics have yet to be successful. If the deployment somehow does not have semantic technologies front and center, then it is largely invisible. The fact that semantic technologies are the core enablers from initiatives ranging from Siri to Pandora to Google and recommendation engines is not embraced and credited: the semantic contribution is hidden.

If one is an enterprise architect, the primacy of whether semantic technologies are in play or not is a non-issue. There are many piece parts to be fulfilled; the system and overall architecture are the concern, not any individual component. The architecture must be broken apart, with the assessment of the suitability of any individual component not based solely on its standalone capabilities, but also as part of an inter-operating whole.

Semantic technology has generally not penetrated well into the enterprise (though it sometimes has in some of the consumer plays as noted above) because its advocates (and, therefore, deployers) have not understood its role. Sometimes semantic technologies are visible, but, more often than not, they are not. The natural role of semantic technologies is in content and schema mediation, functions which reside generally at the repository level and not that of the user.

Two rending forces arise from the wrongful perception that somehow semantic technologies must be evident. The first dissonance is that semantic advocates are often indiscriminate in where they focus their advocacies. While semantic approaches can, theoretically, be applied from the user content management level to applications, these are neither the pain points nor the focus of enterprise architects. EAs are interested in semantic technologies for content integration and interoperability, as often evidenced by superior search, not other uses. The second dissonance is that, not recognizing its natural role, semantic technologists are not paying attention to making their capabilities inter-operable with the rest of the enterprise stack.

Actual enterprise deployments have a rhythm and hierarchy of scrutiny and decision-making. For semantics to become an integral contributor to enterprise solutions, it is important to recognize where this function can fit today. There should be no arrogance in this discussion whatsoever. Like a Galileo thermometer, it is important to find the natural resting point for semantic contributions . . . .

A Basic Architecture

As other discussions by Fred Giasson and I have put forward, the nature of our (Structured Dynamics) semantic stack, what we call the open semantic framework (OSF), has Drupal as its resident content management piece, with Virtuoso the RDF triple store, and many additional open source parts. Our TechWiki explains more of that in detail.

The OSF architecture, though, is generalized enough such that these two components, or any of the other open source pieces in the stack, could be swapped out for others. It is the Web service glue underneath OSF, SD’s structWSF framework, that is the real enabler of the entire semantic stack.

Yet when one is done with design of an enterprise architecture, the actual semantic portion (shown in green below) becomes itself a mere component, all embedded into the full suite of enterprise requirements. This illustrative architecture, generalized across clients, again uses Drupal as the content management framework, with the new service being hosted in the cloud:

We see that a security component now governs all interactions. Middleware has been inserted into the standard OSF stack (Drupal + semantic services) and now takes over the functions of logging, messenging, an enterprise service bus, security, and version control and data governance things. All of our hardware, network services, and Web servers are provided in the cloud. We also need to conform to existing content and data sources and the means to harvest or get updates from them.

The semantic component — OSF in our case — has in effect been surrounded by existing or external sources and services. The semantic management responsibility resides at the core of this architecture, thus making the content repository very important. But, in order for the repository to perform its work, it must interface with all of these existing and required systems. In order for semantics to make enterprise contributions, it must become, in effect, a “hidden” or “buried” service.

When targeting enterprise customers, this role for semantic technologies is a reality. For systems to be adopted, which is the first step to being effective, it is helpful to warmly embrace that your installation will be as much involved with interfaces and external sources and systems as much as semantics alone. Embracing this viewpoint means you are being adopted.

This reality does place a premium on a Web service architecture for the semantic stack. All endpoints can be communicated with via HTTP, and all endpoints have a common and published API. Each re-factoring stresses making the interfaces distinct and clean, and embracing common syntax and protocols for communicating with the endpoints.

The Natural Resting Place for Semantic Technologies

We can expand the green portion of the diagram above — the semantic components or what corresponds to OSF — and show them in more detail, as in the next architecture diagram below. We are now enumerating the Web services in the stack, and are showing the interaction with datasets (important for the security aspects, which a later installment of this series will address). The various engines that power the OSF stack are shown at bottom:

While it is true that the semantic components are “buried” within all portions of the enterprise stack, we can ease the integration challenge by narrowing the interface points to the non-semantic portions. At the top level, in the interaction with the content management framework (Drupal in our case), we have aggregated all Web service interface calls and made them available via a programmatic API via the structWSF PHP API. (Multiples of these can be developed if the programmatic interfaces need to be in languages other than PHP, such as Java.)

The structWSF API provides a consolidated point for writing endpoint calls and queries using PHP. This not only makes it more efficient for developing endpoint connectors (whose purpose is to enable Drupal methods and modules for interfacing with the repository), but also provides a common API and methods. Though it is possible to issue queries directly to any structWSF Web service endpoint, the structWSF API module is a faster and more consistent interface for doing so. This consolidation also means that developers interacting with the semantic components need only worry about the dedicated API module, and not the code or location in the more than 20 individual endpoints.

A similar philosophy is applied to narrowing the security interfaces. We treat security as a black box. Granting access and rights is proxied at the middleware layer. If these rights are granted, the query payload is presented to the Auth:Validator endpoint via a registered security gateway IP. The verification of the IP by Auth:Validator enables the query to be submitted, with a results set also returned via the same pathway.

Three design mindsets govern this architectural design. First, interface points are narrowed and standardized, generally with a formal API. Second, important external services are treated as “black boxes”; how they do their work is immaterial. Only vetted requests and calls approved at these other layers are able or authorized to access the services at the semantic layer. And, third, we are not trying to embrace non-semantic functionality at the semantic layer. These important services — but ancillary ones from a semantic standpoint — are understood as being out of scope to the semantic requirements. This design also makes it easier to “plug” the semantic components into other enterprise stack configurations with other non-semantic services from other sources or vendors.

Some Development Gaps and Imperatives

This design makes sense from a theoretical standpoint, but can pose problems in practice.

The first challenge is that our OSF approach is based on RESTful Web services, in a true Web-oriented architecture. Many of the non-semantic legacy components were originally designed for formal big WS-* approaches drawn from the SOAP perspective. Though most of these existing interfaces have evolved to embrace RESTful alternatives, these interfaces are not always as well tested and complete as the original WS-* ones. This relative immaturity can pose issues with respect to completeness of parameter or function support or inadequate testing.

A second challenge, also related to a RESTful Web service perspective, is the size of payloads in both query and results set objects. Long HTTP queries with many parameter requests and large results sets can be a problem to handle, especially in the security layer. In some cases, we have had to look at ways to minimize and package (consolidate) parameter options in order to make endpoint requests more efficient.

Encoding mismatches are a further challenge. It is generally best, for example, to adhere to a standard UTF-8 encoding via all semantic component interfaces. This requires attention and coordination on both sides of the interface.

The more fundamental challenge, however, is one of mindset. Effective interfaces require effective communications of the participating vendors across the boundary. The terminology, concepts, logic and open-world approach of semantic technologies are not easily communicated to nor immediately understood by traditional practitioners. The communications must be constantly worked in order to overcome past practices and embrace the flexibilities provided by semantic technologies.

The Mismatch is Not Long-term

But these challenges are more one of degree and practice than anything more fundamental. As semantic components get deployed in an enterprise stack, the benefits of faceting and the underlying structure become apparent. Such awareness propels further understanding and a willingness to learn more about underlying foundations. Ultimately, with a design emphasizing a relatively few, focused interfaces, semantic components can be effectively integrated within enterprise stacks.

The more telling lesson is the understanding of the natural role that semantic technologies play within enterprise-scale systems. Semantic technologies are the natural integration framework for federating and interoperating virtually any and all non-transaction information assets of the enterprise. That places semantic technologies at the core of the enterprise stack, even if it is not terribly evident to all users. The natural role for semantic technologies for the nearest term appears to be in repositories and for content integration.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.
Posted:July 2, 2012

Example Ontology (from Wikipedia)Conventional IT Systems are Poorly Suited to Knowledge Applications

Frequently customers ask me why semantic technologies should be used instead of conventional information technologies. In the areas of knowledge representation (KR) and knowledge management (KM), there are compelling reasons and benefits for selecting semantic technologies over conventional approaches. This article attempts to summarize these rationales from a layperson perspective.

It is important to recognize that semantic technologies are orthogonal to the buzz around some other current technologies, including cloud computing and big data. Semantic technologies are also not limited to open data: they are equivalently useful to private or proprietary data. It is also important to note that semantic technologies do not imply some grand, shared schema for organizing all information. Semantic technologies are not “one ring to rule them all,” but rather a way to capture the world views of particular domains and groups of stakeholders. Lastly, semantic technologies done properly are not a replacement for existing information technologies, but rather an added layer that can leverage those assets for interoperability and to overcome the semantic barriers between existing information silos.

Nature of the World

The world is a messy place. Not only is it complicated and richly diverse, but our ways of describing and understanding it are made more complex by differences in language and culture.

We also know the world to be interconnected and interdependent. Effects of one change can propagate into subtle and unforeseen effects. And, not only is the world constantly changing, but so is our understanding of what exists in the world and how it affects and is affected by everything else.

This means we are always uncertain to a degree about how the world works and the dynamics of its working. Through education and research we continually strive to learn more about the world, but often in that process find what we thought was true is no longer so and even our own human existence is modifying our world in manifest ways.

Knowledge is very similar to this nature of the world. We find that knowledge is never complete and it can be found anywhere and everywhere. We capture and codify knowledge in structured, semi-structured and unstructured forms, ranging from “soft” to “hard” information. We find that the structure of knowledge evolves with the incorporation of more information.

We often see that knowledge is not absolute, but contextual. That does not mean that there is no such thing as truth, but that knowledge should be coherent, to reflect a logical consistency and structure that comports with our observations about the physical world. Knowledge, like the world, is constantly changing; we thus must constantly adapt to what we observe and learn.

Knowledge Representation, Not Transactions

These observations about the world and knowledge are not platitudes but important guideposts for how we should organize and manage information, the field known as “information technology.” For IT to truly serve the knowledge function, its logical bases should be consistent with the inherent nature of the world and knowledge.

By knowledge functions we mean those areas of various computer applications that come under the rubrics of search, business intelligence, competitive intelligence, planning, forecasting, data federation, data warehousing, knowledge management, enterprise information integration, master data management, knowledge representation, and so forth. These applications are distinctly different than the earliest and traditional concerns of IT systems:  accounting and transactions.

A transaction system — such as calculating revenue based on seats on a plane, the plane’s occupancy, and various rate classes — is a closed system. We can count the seats, we know the number of customers on board, and we know their rate classes and payments. Much can be done with this information, including yield and profitability analysis and other conventional ways of accounting for costs or revenues or optimizations.

But, as noted, neither the world nor knowledge is a closed system. Trying to apply legacy IT approaches to knowledge problems is fraught with difficulties. That is the reason that for more than four decades enterprises have seen massive cost overruns and failed projects in applying conventional IT approaches to knowledge problems: traditional IT is fundamentally mismatched to the nature of the problems at hand.

What works efficiently for transactions and accounting is a miserable failure applied to knowledge problems. Traditional relational databases work best with structured data; are inflexible and fragile when the nature (schema) of the world changes; and thus require constant (and expensive) re-architecting in the face of new knowledge or new relationships.

Of course, often knowledge problems do consider fixed entities with fixed attributes to describe them. In these cases, relational data systems can continue to act as valuable contributors and data managers of entities and their attributes. But, in the role of organizing across schema or dealing with semantics and differences of definition and scope – that is, the common types of knowledge questions – a much different integration layer with a much different logic basis is demanded.

The New Open World Paradigm

The first change that is demanded is to shift the logic paradigm of how knowledge and the world are modeled. In contrast to the closed-world approach of transaction systems, IT systems based on the logical premise of the open world assumption (OWA) mean:

  • Lack of a given assertion does not imply whether it is true or false; it simply is not known
  • A lack of knowledge does not imply falsity
  • Everything is permitted until it is prohibited
  • Schema can be incremental without re-architecting prior schema (“extensible”), and
  • Information at various levels of incompleteness can be combined.

Much more can be said about OWA, including formal definitions of the logics underlying it [1], but even from the statements above, we can see that the right logic for most knowledge representation (KR) problems is the open world approach.

This logic mismatch is perhaps the most fundamental cause of failures, cost overruns, and disappointing deliverables for KM and KR projects over the years. But, like the fingertip between the eyes that cannot be seen because it is too close at hand, the importance of this logic mismatch strangely continues to be overlooked.

Integrating All Forms of Information

Data exists in many forms and of many natures. As one classification scheme, there are:

  • Structured data — information presented according to a defined data model, often found in relational databases or other forms of tabular data
  • Semi-structured data — does not conform to the formal structure of data models, but contains tags or other markers to denote fields within the content. Markup languages embedded in text are a common form of such sources
  • Unstructured data — information content, generally oriented to text, that lacks an explicit data model or schema; structured information can be obtained from it via data mining or information extraction.

Further, these types of data may be “soft”, such as social information or opinion, or “hard”, more akin to measurable facts or quantities.

These various forms may also be serialized in a variety of data formats or data transfer protocols, some using straight text with a myriad of syntax or markup vocabularies, ranging to scripts or forms encoded or binary.

Still further, any of these data forms may be organized according to a separate schema that describes the semantics and relationships within the data.

These variations further complicate the inherently diverse nature of the world and knowledge of it. A suitable data model for knowledge representation must therefore have the power to be able to capture the form, format, serialization or schema of any existing data within the diversity of these options.

The Resource Description Framework (RDF) data model has such capabilities [2]. Any extant data form or schema (from the simple to the complex) can be converted to the RDF data model. This capability enables RDF to act as a “universal solvent” for all information.

Once converted to this “canonical” form, RDF can then act as a single representation around which to design applications and other converters (for “round-tripping” to legacy systems, for example), as illustrated by this diagram:

Generic tools can then be driven by the RDF data model, which leads to fewer applications required and lower overall development costs.

Lastly, RDF can represent simple assertions (“Jane runs fast”) to complex vocabularies and languages. It is in this latter role that RDF can begin to represent the complexity of an entire domain via what is called an “ontology” or “knowledge graph.”

Example Ontology Growth

Connections Create Graphs

When representing knowledge, more things and concepts get drawn into consideration. In turn, the relationships of these things lead to connections between them to capture the inherent interdependence and linkages of the world. As still more things get considered, more connections are made and proliferate.

This process naturally leads to a graph structure, with the things in the graphs represented as nodes and the relationships between them represented as connecting edges. More things and more connections lead to more structure. Insofar as this structure and its connections are coherent, the natural structure of the knowledge graph itself can help lead to more knowledge and understanding.

How one such graph may emerge is shown by this portion of the recently announced Google Knowledge Graph [3], showing female Nobel prize winners:

Unlike traditional data tables, graphs have a number of inherent benefits, particularly for knowledge representations. They provide:

  • A coherent way to navigate the knowledge space
  • Flexible entry points for each user to access that knowledge (since every node is a potential starting point)
  • Inferencing and reasoning structures about the space
  • Connections to related information
  • Ability to connect to any form of information
  • Concept mapping, and thus the ability to integrate external content
  • A framework to disambiguate concepts based on relations and context, and
  • A common vocabulary to drive content “tagging”.

Graphs are the natural structures for knowledge domains.

Network Analysis is the New Algebra

Once built, graphs offer some analytical capabilities not available through traditional means of information structure. Graph analysis is a rapidly emerging field, but already some unique measures of knowledge domains are now possible to gauge:

  • Influence
  • Relatedness
  • Proximity
  • Centrality
  • Inference
  • Clustering
  • Shortest paths
  • Diffusion.

As science is coming to appreciate, graphs can represent any extant structure or schema. This gives graphs a universal character in terms of analytic tools. Further, many structures can only be represented by graphs.

Information and Interaction is Distributed

The nature of knowledge is such that relevant information is everywhere. Further, because of the interconnectedness of things, we can also appreciate that external information needs to be integrated with internal information. Meanwhile, the nature of the world is such that users and stakeholders may be anywhere.

These observations suggest a knowledge representation architecture that needs to be truly distributed. Both sources and users may be found in multiple locations.

In order to preserve existing information assets as much as possible (see further below) and to codify the earlier observation regarding the broad diversity of data formats, the resulting knowledge architecture should also attempt to put in place a thin layer or protocol that provides uniform access to any source or target node on the physical network. A thin, uniform abstraction layer – with appropriate access rights and security considerations – means knowledge networks may grow and expand at will at acceptable costs with minimal central coordination or overhead.

Properly designed, then, such architectures are not only necessary to represent the distributed nature of users and knowledge, but can also facilitate and contribute to knowledge development and exchange.

The Web is the Perfect Medium

The items above suggest the Web as an appropriate protocol for distributed access and information exchange. When combined with the following considerations, it becomes clear that the Web is the perfect medium for knowledge networks:

  • Potentially, all information may be accessed via the Web
  • All information may be given unique Web identifiers (URIs)
  • All Web tools are available for use and integration
  • All Web information may be integrated
  • Web-oriented architectures (WOA) have proven:
  • Scalability
  • Robustness
  • Substitutability
  • Most Web technologies are open source.

It is not surprising that the largest extant knowledge networks on the globe – such as Google, Wikipedia, Amazon and Facebook – are Web-based. These pioneers have demonstrated the wisdom of WOA for cost-effective scalability and universal access.

Also, the combination of RDF with Web identifiers also means that any and all information from a given knowledge repository may be exposed and made available to others as linked data. This approach makes the Web a global, universal database. And it is in keeping with the general benefits of integrating external information sources.

Leveraging – Not Replacing – Existing IT Assets

Existing IT assets represent massive sunk costs, legacy knowledge and expertise, and (often) stakeholder consensus. Yet, these systems are still largely stovepiped.

Strategies that counsel replacement of existing IT systems risk wasting existing assets and are therefore unlikely to be adopted. Ways must be found to leverage the value already embodied in these systems, while promoting interoperability and integration.

The beauty of semantic technologies – properly designed and deployed in a Web-oriented architecture – is that a thin interoperability layer may be placed over existing IT assets to achieve these aims. The knowledge graph structure may be used to provide the semantic mappings between schema, while the Web service framework that is part of the WOA provides the source conversion to the canonical RDF data model.

Via these approaches, prior investments in knowledge, information and IT assets may be preserved while enabling interoperability. The existing systems can continue to provide the functionality for which they were originally designed and deployed. Meanwhile, the KR-related aspects may be exposed and integrated with other knowledge assets on the physical network.

Democratizing the Knowledge Function

These kinds of approaches represent a fundamental shift in power and roles with respect to IT in the enterprise. IT departments and their bottlenecks in writing queries and bespoke application development can now be bypassed; the departments may be relegated to more appropriate support roles. Developers and consultants can now devote more of their time to developing generic applications driven by graph structures [4].

In turn, the consumers of knowledge applications – namely subject matter experts, employees, partners and stakeholders – now become the active contributors to the graphs themselves, focusing on reconciling terminology and ensuring adequate entity and concept coverage. Knowledge graphs are relatively straightforward structures to build and maintain. Those that rely on them can also be those that have the lead role in building and maintaining them.

Thus, graph-driven applications can be made generic by function with broader and more diverse information visualization capabilities. Simple instructions in the graphs can indicate what types of information can be displayed with what kind of widget. Graph-driven applications also mean that those closest to the knowledge problems will also be those directly augmenting the graphs. These changes act to democratize the knowledge function, and lower overall IT costs and risks.

Seven Pillars of the Semantic Enterprise

Elsewhere we have discussed the specific components that go into enabling the development of a semantic enterprise, what we have termed the seven pillars [5]. Most of these points have been covered to one degree or another in the discussion above.

There are off-the-shelf starter kits for enterprises to embrace to begin this process. The major starting requirements are to develop appropriate knowledge graphs (ontologies) for the given domain and to convert existing information assets into appropriate interoperable RDF form.

Beyond that, enterprise staff may be readily trained in the use and growth of the graphs, and in the staging and conversion of data. With an appropriate technology transfer component, these semantic technology systems can be maintained solely by the enterprise itself without further outside assistance.

Summary of Semantic Technology Benefits

Unlike conventional IT systems with their closed-world approach, semantic technologies that adhere to these guidelines can be deployed incrementally at lower cost and with lower risk. Further, we have seen that semantic technologies offer an excellent integration approach, with no need to re-do schema because of changed circumstances. The approach further leverages existing information assets and brings the responsibility for the knowledge function more directly to its users and consumers.

Semantic technologies are thus well-suited for knowledge applications. With their graph structures and the ability to capture semantic differences and meanings, these technologies can also accommodate multiple viewpoints and stakeholders. There are also excellent capabilities to relate all available information – from documents and images and metadata to tables and databases – into a common footing.

These advantages will immediately accrue through better integration and interoperability of diverse information assets. But, for early adopters, perhaps the most immediate benefit will come from visible leadership in embracing these enabling technologies in advance of what will surely become the preferred approach to knowledge problems.

Note: there is a version of this article on Slideshare:

View more presentations from Mike Bergman.

[1] For more on the open world assumption (OWA), see the various entries on this topic on Michael Bergman’s AI3:::Adaptive Information blog. This link is a good search string to discover more.
[2] M.K. Bergman, 2009. Advantages and Myths of RDF, white paper from Structured Dynamics LLC, April 22, 2009, 13 pp. See http://www.mkbergman.com/wp-content/themes/ai3v2/files/2009Posts/Advantages_Myths_RDF_090422.pdf.
[4] For the most comprehensive discussion of graph-driven apps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
[5] M.K. Bergman, 2010. “Seven Pillars of the Open Semantic Enterprise,” in AI3:::Adaptive Information blog, January 12, 2010; see http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/.
Posted:March 7, 2011

from Wikimedia CommonsThe Time and Technology is Here to Stand Software Engineering on its Head

As an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value [1]. We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing [3]. In the U.S. alone, about 2 million people work as programmers or related [4].

It goes without saying that software is a very big deal.

No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership [5]. Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception.

This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in [1]) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare.

Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps.

In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership.

The Ontology Perspective in this Mix

For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 [6]. In this review, he points to these advantages for ontology-based approaches to software engineering:

  • Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse
  • Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly
  • Increased reliability — formal constructs with automation reduces human error
  • Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain.

These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:

  • Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking
  • Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and
  • Agility/flexibility — ontology-driven information systems are more flexible, because you can much more easily and reliably make changes in the model than in code.

In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 [7]. The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software [8].

Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is:

  1. Domain modeling — this category includes the domain knowledge representations and reasoning and inference bases that are the traditional understanding of ontologies in the semantic space. The structural aspects are akin to a database schema definition; the unique aspects of ontologies reside in their logic foundations and graph structures, which offer more power in inferencing, reasoning and graph analysis than conventional approaches
  2. Model-driven architectures (MDA) — like UML, these are platform-independent specifications that provide the functional and dataflow definitions of “models” executed by the system. These are the natural progeny of earlier CASE approaches, for example. Such systems also potentially allow graphical or visual means for building or hooking together components as a substitute to direct coding
  3. Program specifications and excecutables — though fairly experimental at present, these approaches use the languages of RDF, OWL or direct use of logic languages to create the equivalent of executable software programs. A couple of experimental systems include Fhat and Neno, for example, point to possible future directions in this area [9]
  4. Runtime or utility components — proper construction of ontologies can be a source for labels and prompts within user interfaces and other runtime uses. Because of the ontology basis, these contributions may also be contextual [10]
  5. Automated agents — based on context, user choices and the governing ontologies, new instruction sets can be generated via what some term automated agents or “robots” to instruct subsequent steps in the software, including potentially analysis or validation. Mission Critical IT [11] is apparently the most advanced in this area; we discuss their ODASE approach more below
  6. Bespoke drivers of generic applications — through using and combining a number of the aspects above, in its totality this approach is a very different paradigm, as we describe below.

When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches.

But Is Software Engineering Even the Right Focus?

For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as:

Software is merely an intermediary artifact to accomplish some given tasks. Rather than “engineering” software, the focus should be on how to fulfill those tasks in an optimal manner — and that demands a systems approach.
  • data access
  • queries
  • data transformations
  • data integration or federation
  • reports
  • other data presentations
  • business analysis, and
  • targeted, specialty functionality.

Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized [12]. Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down.

Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example [13]?

For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline.

Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset [14].

If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions.

ODapps: The Ontology-Driven Application Approach

Open Semantic Framework (OSF) at openstructs.orgOntology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components [15]. However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision.

ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies [16].

ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.

In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps).

In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago.

This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through:

  • An understanding of the kind of things under consideration, including their inference chains
  • The types of data in results sets, and how that informs the nature of the widget(s) (maps, calendars, timelines, charts, tabular reports, images, stories, media, etc.) appropriate to display and manipulate that information, and
  • UI and utility functions such as interface labels, mouseovers, auto-suggests, spelling suggestions, synonym matches, etc.

Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP.

Booch has maintained for years that in software design abstraction is good, but not if too abstract [1]. ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology [16].

In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact.

Built Upon an Ontology- and Web-based Architecture

Again, to cite Booch, the most fundamental software design decision is architecture [1]. In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps [17]. The major layers in the OSF stack are:

  • Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
  • scones / irON – the conversion layer, in part consisting of information extraction of subject concepts or named entities (scones) or the instance record Object Notation for conveying XML, JSON or spreadsheets (CSV) in RDF-ready form (via irON or RDFizers)
  • structWSF – a platform-independent suite of more than 20 RESTful Web services, organized for managing structured data datasets; it provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
  • Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave
  • conStruct – connecting modules to enable structWSF and sComponents to be hosted/embedded in Drupal, and
  • sComponents – (mostly) Flex semantic components (widgets) for visualizing and manipulating structured data.

Not all of these layers or even their specifics is necessary for an ontology-driven app design [18]. However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings.

OSF is the Basis for Domain-specific Instantiations

The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework.

The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases.

What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos:

(click for full size)

Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets.

Lower Layers

Conversion of existing assets to RDF and canonical forms is not discussed further here. See the irON and scones documentation or the TechWiki for more information on these topics.

The structWSF Web Services Layer

The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to:

  • Import or export datasets
  • Create, update, delete (CRUD) or otherwise manage data records
  • Search records with full-text and faceted search
  • Browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
  • Process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions.

Here is a listing of current ODapp functions within structWSF (with links to details for each):

WSF management Web services
User-oriented Web services

At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection.

The Semantics Components Layer

The visualization and data display and manipulation ODapps are provided via the semantic components layer. Structured Dynamics’s sComponents are Flex-based widgets that conform to a standard, generic design. Other developers using the OSF framework are developing JavaScript versions [19]. Here is the current library (with links to details for each):

New Components
Components Extending Flex

These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above.

The next animated figure shows how the basic interaction flow works with these components:

(click for full size)

Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise.

As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema:

(click for full size)

These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas.

As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

Self-service Reporting

Since self-service reporting has been such a disappointment [12], it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object).

However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:

Thing
Product
Camera
Digital Camera
SLR Digital Camera
Olympus Evolt E520

At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.

This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain.

It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT.

Adaptive Analysis

The Mission Critical IT reference provided earlier [11] helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics.

Standing Software Engineering on its Head

All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it.

Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures.

This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head.

What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate [20]. These are important development needs for the near term.

And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end.

What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand.


[1] This estimate is from Grady Booch, 2005. “The Complexity of Programming Models,” see http://www.cs.nott.ac.uk/~nem/complexity.pdf. He comments on the weakness of software lines of code as a meaningful measure. At the time in 2005, he estimated perhaps 800 billion lines of code has accumulated, which given growth and vagaries of such guesstimates I have updated to the 1 billion number noted.
[2] For a wildly different estimate, that has been criticized somewhat, see Blackduck Software, 2009. “Estimating the Development Cost of Open Source Software,” at http://www.blackducksoftware.com/development-cost-of-open-source. According to Blackduck’s research there are over 200,000 OSS projects on the Internet representing more than 4.9 billion lines of available code from 4,000 sites that the company monitors. Blackduck estimates that reproducing this OSS would cost $387 billion for “typical” SLOC estimating bases. While Blackduck is likely in the best place of any organization to track open source given their business model, others have criticized the estimates because only a portion (fewer than 10%, consistent with my own research) of open source projects are active, and many active projects also share significant code bases. Nonetheless, there is still a huge disparity between the 1 billion SLOC estimate in [1] and this estimate of 5 billion for open source alone. This disparity is an indicator of the measurement challenges.
[3] See IMAP, 2010. Computing & Internet Software Global Report — 2010, 40 pp, see http://imap.com/imap/media/resources/HighTechReport_WEB_89B4E29C01817.pdf. The relative splits they show for software packages and licenses, IT consulting or outsourcing are 48%, 29% and 23%, respectively, of the total shown. Note however, that Gartner estimates are as high as 2x these amounts, again showing the uncertainty of measuring software; see, for example, http://www.gartner.com/it/page.jsp?id=1209913.
[4] For this and related measures, see Business Software Alliance, 2009. Software Industry Facts and Figures, see http://www.bsa.org/country/Public%20Policy/~/media/Files/Policy/Security/General/sw_factsfigures.ashx.
[5] Simply conduct a Web search on ‘”open source” “cost of ownership”‘ to see the many studies in this area. Depending on advocacy, estimates may be as high as proprietary software to a lower, but still substantial percentage. In no cases are open source understood to be fully “free” once maintenance, upgrades, modifications, and site adaptations are considered.
[6] Michael Uschold, 2008. “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see http://mba.eci.ufmg.br/downloads/recol/FormalOntologyinInformationSystems2008.pdf.
[7] Nicola Guarino, 1998. “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1776&rep=rep1&type=pdf.
[8] See Phil Tetlow et al., eds., 2006. Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering, a W3C Editor’s Draft on Best Practices, February 11, 2006; see http://www.w3.org/2001/sw/BestPractices/SE/ODA/. UML class diagrams have close resemblance to certain ontology structures. This effort was part of a formal collaboration between W3C and the Object Management Group (OMG), which resulted among other things in the production of the Ontology Definition Metamodel (ODM). In the OMG’s model-driven architecture (MDA) initiative, models are used not only for design and maintenance purposes, but as a basis for generating executable artifacts for downstream use. The MDA approach grew out of much of the standards work conducted in the 1990s in the Unified Modeling Language (UML).
[9] Neno is a semantic network programming language and Fhat is a virtual machine that works off of it. These two projects have been largely abandoned. A related project is Ripple, a relational, stack-based dataflow language by Joshua Shinavier, which is episodically updated.
[10] Holger Knublauch of TopQuadrant has made the point that ontologies can also have runtime uses as well: “In contrast to conventional Model-Driven Architecture known from object-oriented systems, semantic applications use their data models not only at design time, but also as runtime components. The rich declarative semantics of ontological data models can be exploited to drive user interfaces and to control an application’s behavior.” See H. Knublauch, 2007. “From Ontology Design to Deployment: Semantic Application Development with TopBraid,” presented at the 2007 Semantic Technology Conference, San Jose, CA; see http://www.semantic-conference.com/2007/sessions/l5.html.
[11] Mission Critical IT describes its ODASE platform (Ontology Driven Architecture for Software Engineering) as a set of tools to facilitate the creation of working applications from a semantic business model (an ontology), using the open standards OWL, SWRL and RDF. The ODASE code generators (a.k.a “robots”) generate an API based on the business terminology defined by the OWL+SWRL+RDF business model, which the ODASE platform then uses to execute the rules and reasoning as contextual choices are made by the user. Among other links, the company has an impressive online demo that shows a consumer telecommunications purchase example; there is also a video explaining the rules basis of the ODASE framework.
[12] See Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see http://tdwi.org/articles/2007/10/18/the-myth-of-selfservice-bi.aspx.
[13] The buggy whip industry as a major economic entity ceased to exist with the introduction of the automobile, and is cited in economics and marketing as an example of an industry ceasing to exist because its market niche, and the need for its product, disappears. Not recognizing what industry or business purpose is being served is an oft-cited cause for obsolescence. Thus, software engineering is a practice that serves the creation of software, which itself is only a means to a functional end.
[14] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[15] See M.K. Bergman, 2010. Seven Pillars of the Open Semantic Enterprise, AI3:::Adaptive Information blog, January 12, 2010.
[16] See M.K. Bergman, 2009. Ontologies as the ‘Engine’ for Data-Driven Applications, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.
[17] Some 250 pp of complete technical documentation for these projects is provided on the Structured Dynamics’ open source OpenStructs TechWiki.
[18] For more discussion of semantic components, see F. Giasson, 2010. “Semantic Components,” in his blog, July 5, 2010. For more discussion of the layered OSF design, see M.K. Bergman, 2010. Domain-specific Instantiations based on the Open Semantic Framework, AI3:::Adaptive Information blog, June 17, 2010.
[19] To find these groups and follow the open source OSF developments, see xxx. So long as the basic design comports with the foundations herein, sComponents may be developed in any rich Internet application (RIA) environment.
[20] Ontology development, management and mapping is the emerging imperative in the semantic technology space. For some thoughts on how Structured Dynamics is approaching this question, see a Normative Landscape of Ontology Tools on the TechWiki.
Posted:July 6, 2010

Consolidating Under the Open Semantic Framework
Release of Semantic Components Adds Final Layer, Leads to Streamlined Sites

Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents).  A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.

Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.

The OSF “Semantic Muffin”

We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design [1]. However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:

Incremental Layers of the Open Semantic Framework

(click for full size)

The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:

  • Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
  • scones / irON — this layer is for general conversion of non-RDF data and data schema to RDF (via irON or RDFizers) or for information extraction of subject concepts or named entities (scones)
  • structWSF — is the pivotal Web services framework layer, and provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
  • Semantic components — the highlighted layer in the “semantic muffin”; in essence, this is the visualization and data interaction layer in the OSF stack; see more below
  • Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave, and
  • conStruct — is the content management system (CMS) layer based on Drupal and the thinnest layer with respect to OSF; this optional layer provides the theming, user rights and permissions, or other functionality drawn from Drupal’s 6500 third-party modules.

Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.

The Semantics Components Layer

Current semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.

Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.

These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:

Semantic Components Workflow

(click for full size)

New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

Consolidating and Rationalizing Web Sites and Mailing Lists

OpenStructs and Open Semantic Framework LogoAs the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.

Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.

We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.

There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.

Actual code SVN repositories are unchanged. These code repositories may be found at:

We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!


[1] An alternative view of this layer diagram is shown by the general Structured Dynamics product stack and architecture.