Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
September 2010
S M T W T F S
« Aug    
 1234
567891011
12131415161718
19202122232425
2627282930  
Archives
More . . .  
Search
Affiliations
structWSF
Credits
Blog software courtesy of WordPress Obtain Technorati profile Subscribe with Bloglines
View Mike's profile on LinkedIn
Date:   August 9, 2010

Ontologies are the structural frameworks for organizing information on the semantic Web and within semantic enterprises. They provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness; that is, their ability to represent conceptual relationships. Ontologies can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And ontologies may be developed and matured incrementally, which means their adoption may be cost-effective as benefits become evident [1].

What Is an Ontology?

Ontology may be one of the more daunting terms for those exposed for the first time to semantic technologies. Not only is the word long and without common antecedents, but it is also a term that has widely divergent use and understanding within the community. It can be argued that this not-so-little word is one of the barriers to mainstream understanding of the semantic Web.

The root of the term is the Greek ontos, or being or the nature of things. Literally — and in classical philosophy — ontology was used in relation to the study of the nature of being or the world, the nature of existence. Tom Gruber, among others, made the term popular in relation to computer science and artificial intelligence about 15 years ago when he defined ontology as a “formal specification of a conceptualization.”

Much like taxonomies or relational database schema, ontologies work to organize information. No matter what the domain or scope, an ontology is a description of a world view. That view might be limited and miniscule, or it might be global and expansive. However, unlike those alternative hierarchical views of concepts such as taxonomies, ontologies often have a linked or networked “graph” structure. Multiple things can be related to other things, all in a potentially multi-way series of relationships.

Example Taxonomy Structure Example Ontology Structure
A distinguishing characteristic of ontologies compared to conventional hierarchical structures is their degree
of connectedness, their ability to model coherent, linked relationships

Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Ontologies thus provide a similar role for the organization of data that is provided by relational data schema. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data.

When one uses the idea of “world view” as synonomous with an ontology, it is not meant to be cosmic, but simply a way to convey how a given domain or problem area can be described. One group might choose to describe and organize, say, automobiles, by color; another might choose body styles such as pick-ups or sedans; or still another might use brands such as Honda and Ford. None of these views is inherently “right” (indeed multiples might be combined in a given ontology), but each represents a particular way — a “world view” — of looking at the domain.

Though there is much latitude in how a given domain might be described, there are both good ontology practices and bad ones. We offer some views as to what constitutes good ontology design and practice in the concluding section.

What Are Its Benefits?

A good ontology offers a composite suite of benefits not available to taxonomies, relational database schema, or other standard ways to structure information. Among these benefits are:

  • Coherent navigation by enabling the movement from concept to concept in the ontology structure
  • Flexible entry points because any specific perspective in the ontology can be traced and related to all of its associated concepts; there is no set structure or manner for interacting with the ontology
  • Connections that highlight related information and aid and prompt discovery without requiring prior knowledge of the domain or its terminology
  • Ability to represent any form of information, including unstructured (say, documents or text), semi-structured (say, XML or Web pages) and structured (say, conventional databases) data
  • Inferencing, whereby by specifying one concept (say, mammals) one knows that we are also referring to a related concept (say, that mammals are a kind of animal)
  • Concept matching, which means that even though we may describe things somewhat differently, we can still match to the same idea (such as glad or happy both referring to the concept of a pleasant state of mind)
  • Thus, this means that we can also integrate external content by proper matching and mapping of these concepts
  • A framework for disambiguation by nature of the matching and analysis of concepts and instances in the ontology graph, and
  • Reasoning, which is the ability to use the coherence and structure itself to inform questions of relatedness or to answer questions.

How Are Ontologies Used?

The relationship structure underlying an ontology provides an excellent vehicle for discovery and linkages. “Swimming through” this relationship graph is the basis of the Concept Explorer (also known as the Relation Browser) and similar widgets.

The most prevalent use of ontologies at present is in semantic search. Semantic search has benefits over conventional search in terms of being able to make inferences and matches not available to standard keyword retrieval.

The relationship structure also is a powerful and more general and more nuanced way to organize information. Concepts can relate to other concepts through a richness of vocabulary. Such predicates might capture subsumption, precedence, parts of relationships (mereology), preferences, or importances along virtually any metric. This richness of expression and relationships can also be built incrementally over time, allowing ontologies to grow and develop in sophistication and use as desired.

The pinnacle application for ontologies, therefore, is as coherent reference structures whose purpose is to help map and integrate other structures and information. Given the huge heterogeneity of information both within and without organizations, the use of ontologies as integration frameworks will likely emerge as their most valuable use.

What Makes for a Good Ontology?

Good ontology practice has aspects both in terms of scope and in terms of construction.

Scope Considerations

Here are some scoping and design questions that we believe should be answered in the positive in order for an ontology to meet good practice standards:

  • Does the ontology provide balanced coverage of the subject domain? This question gets at the issue of properly scoping and bounding the subject coverage of the ontology. It also means that the breadth and depth of the coverage is roughly equivalent across its scope
  • Does the ontology embed its domain coverage into a proper context? A major strength of ontologies is their potential ability to interoperate with other ontologies. Re-using existing and well-accepted vocabularies and including concepts in the subject ontology that aid such connections is good practice. The ontology should also have sufficient reference structure for guiding the assignment of what content “is about”
  • Are the relationships in the ontology coherent? The essence of coherence is that it is a state of logical, consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. Is the hip bone connected to the thigh bone, or is the skeleton incorrect?
  • Has the ontology been well constructed according to good practice? See next.

If these questions can be answered affirmatively, then we would deem the ontology ready for production-grade use.

Fundamental to the whole concept of coherence is the fact that experts and practitioners within domains have been looking at the questions of relationships, structure, language and meaning for decades. Though perhaps today we now finally have a broad useful data and logic model in RDF, the fact remains that massive time and effort has already been expended to codify some of these understandings in various ways and at various levels of completeness and scope. Good practice also means, therefore, that maximum leverage is made to springboard ontologies from existing structural and vocabulary assets.

And, because good ontologies also embrace the open world approach, working toward these desired end states can also be incremental. Thus, in the face of common budget or deadline constraints, it is possible initially to scope domains as smaller or to provide less coverage in depth or to use a small set of predicates, all the while still achieving productive use of the ontology. Then, over time, the scope can be expanded incrementally.

Construction Considerations

To achieve their purposes, ontologies must be both human-readable and machine-processable. Also, because they represent conceptual structures, they must be built with a certain composition.

Good ontologies therefore are constructed such that they have:

  • Concept definitions – the matching and alignment of things is done on the basis of concepts (not simply labels) which means each concept must be defined
  • A preferred label that is used for human readable purposes and in user interfaces
  • A “semset” – which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept
  • Clearly defined relationships (also known as properties, attributes, or predicates) for relating two things to one another
  • All of which is written in a machine-processable language such as OWL or RDF Schema (among others).

In the case of ontology-driven applications using adaptive ontologies, there are also additional instructions contained in the system (often via administrative ontologies) that tell the system which types of widgets need to be invoked for different data types and attributes. This is different than the standard conceptual schema, but is nonetheless essential to how such applications are designed.


[1] This posting was at the request of a couple of Structured Dynamics‘ customers that desired a way to describe ontologies to non-technical management. For a more in depth treatment, see M.K. Bergman, 2007. “An Intrepid Guide to Ontologies,” AI3:::Adaptive Information blog, May 16, 2007.

Posted by AI3's author, Mike Bergman

Posted on August 9, 2010 at 12:53 am in Ontologies, Ontology Best Practices, Semantic Enterprise, Semantic Web | Comments (3)
The URI link reference to this post is: http://www.mkbergman.com/900/an-executive-intro-to-ontologies/
The URI to trackback this post is: http://www.mkbergman.com/900/an-executive-intro-to-ontologies/trackback/
Date:   July 15, 2010

Cisco Video is a Good Starting Intro for Management

Like the seminal linked data publication by PricewaterhouseCoopers of about a year ago (see PWC Dedicates Quarterly Technology Forecast to Linked Data, May 29, 2009), a video released by Cisco yesterday is another signal of the emergence of the semantic enterprise.

The Cisco tech brief on The Semantic Enterprise is a quite accessible — but a bit eerie — seven-minute introduction.  The video was prepared by Cisco’s Internet Business Solutions Group (IBSG), with Shaun Kirby, its Director of Innovations Architectures, as the narrator:

YouTube: http://www.youtube.com/watch?v=3lUzs2I8BKI

Well, as for being eerie, when the video first came up, I thought I was looking at an advanced, next generation avatar, perhaps a reincarnation of Douglas Adams’ Hyperland. Maybe this semantic stuff was closer at hand than we thought!

But, as it turned out, that first blush was only a reaction to how the video was shot. As it gets rolling, the Cisco video is extremely well done and informative. It is a great intro for sharing with management when contemplating your own moves to becoming a semantic enterprise.

I suggest you first view — and then bookmark — this one.

Posted by AI3's author, Mike Bergman

Posted on July 15, 2010 at 3:06 pm in Information Automation, Semantic Enterprise | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/897/another-milestone-in-semantic-enterprise-awareness/
The URI to trackback this post is: http://www.mkbergman.com/897/another-milestone-in-semantic-enterprise-awareness/trackback/
Date:   July 12, 2010

Benefits from an Incremental Approach

Using Incremental, Low-risk Semantic and Open World Approaches

OK. So, you’re looking at your garage … or your bedroom closet … or your office and its files. They are a mess, and you can’t find anything and you can’t stuff anything more into the nooks, cubbies, crannies or cabinets. What do you do?

Well, when you finally get fed up and have a rainy day or some other excuse, you tackle the mess. Maybe you grab a big mug of coffee to prepare for the pending battle. Maybe you strip down to comfort clothes. Then, if you’re like me, you begin to organize stuff into piles. Labeled piles and throwaway piles and any other piles that can provide a means to start bringing order to the chaos.

In the semantic Web world, there is a phrase coined by Jim Hendler that captures this approach: A little semantics goes a long way [1]. A little semantics, just like your labeled piles, helps to bring order to information chaos.

Mind you, this is not fancy or expensive stuff. In the case of my office, it is colored sheets of paper labeled with Magic Markers as “Taxes” or “Internal” or “Blog Posts” or whatever. Then, I begin sifting and distributing. In the case of the semantic world, these are classifying things into like categories and simply relating them to other categories with simple relationships, such as “is Part Of” or “is Narrower Than”.

Of course, I could have approached my mess in a different way. I could have hired an efficiency expert to come in, interview me and all of my employees and colleagues, gotten a written analysis and report, and then committed to a multi-week project to completely store and place every single last piece of paper in my office or organize every rake and set of abandoned golf clubs in my garage. When done, I would have shelled out much money and I suspect still not have been able to find anything.

Sort of sounds like the traditional way IT does its business, doesn’t it? To clean up their information messes, enterprises need to find a better strategy.

I’m not too long from having returned from the SemTech conference, which overall was quite an excellent show. But despite its emphasis on semantic technologies and their usefulness to businesses and enterprises, I found one critical theme unspoken: the ability of semantic approaches to change how enterprise IT actually does business. New ways have got to be found to clean up the many and growing information piles emerging all around us.

The Changing Nature of IT

IT is — and has been — going through a fundamental set of changes for decades. In the last decade, these changes have led to lowered relative spending, a shift in spending priorities toward services, less innovation, and less productivity. Some data and observations by researchers and analysts document these trends.

The following chart, using US Bureau of Economic Analysis data [2], shows the clear 50-year trend in declining hardware costs for enterprises, mostly resulting from the observation known as Moore’s Law. These massive hardware cost reductions (logarithmic scale) have also resulted in lower prices for IT as a whole. In 2008, for example, total relative IT prices were about two-thirds what they were a mere decade earlier:

US IT Prices in Relation to Each Other, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

In contrast, relative prices for software and services have remained remarkably flat over this entire period, including for the past decade. This is somewhat surprising given the emergence of packaged software and more recently open source. However, relative percentage expenditures for custom software and software developed in-house have also remained strong over the past decade [3].

The mid- to late-1990s represented the high-water mark on many bases for enterprise IT, expenditures and vendors. Roughly in 1997 or so, the number of public enterprise software vendors peaked as did venture funding [4] and relative expenditures for IT in relation to GDP. There was a major uptick in relation to preparing for Y2K and a major downtick due to the dot-com bubble, and then of course the past two years or so have seen a global economic downturn. But, as the figure below shows (red), the long-term trend tends to suggest a relative plateau for IT expenditures in relation to GDP somewhat around 2000:

IT and Software Expenditures in Relation to GDP, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

Yet, like the first chart, software seems to be bucking this trend (blue lines above). Though perhaps the rate of growth in expenditures for software is slowing a bit, it is still on a growth upslope, especially in relation to overall IT expenditures. The next chart, in fact, specifically compares software expenditures to total IT expenditures. Software expenditures are some 40% higher in relation to total IT than they were a mere decade ago:

US Software Expenditures in Relation to Total IT, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

The mix of these software expenditures is also changing in major ways while stagnating in others.

The changing aspect is coming about from the shift of expenditures from license and maintenance fees to services. A number of software vendors began to see revenues from services overcome that from licensing in the 1990s. By the early 2000s, this was true for the enterprise software sector as a whole [4]. Today, service revenues account for 70% or so of aggregate sector revenues. Combined with the emergence of open source and other alternatives such as software as a service (SaaS), I think it fair to say that the era of proprietary software with exceedingly high margins from monopoly rents is over [5].

The stagnating aspect occurs in how the software expenditures are applied. According to Gartner, in the US, more than 70% of IT expenditures are devoted to simply running existing systems, with only about 11% of budgets devoted to innovation; other parts of the world spend nearly double on innovation and much lower for operations [6]. This relative lack of support for innovation and high percentages for running existing systems has held true for about a decade. Meanwhile, IT’s contribution to US productivity has been declining since 2001 [7].

What is the Cause for IT’s Ills?

Last year, PricewaterhouseCoopers published a major report with the provocative title, “Why Isn’t IT Spending Creating More Value?[7]. The 42-page report covered many of the aspects above. Among other factors, the PWC authors speculated that:

As consumption of IT increases and as technologies change and advance, businesses have been left to cobble together disparate software and hardware systems and tools. The end result? Unchecked IT spending, unneeded complexity, redundant systems, underutilized hardware and data centers, the need for expensive IT security, and, inevitably, diminishing returns from IT. In short, low levels of IT productivity create conditions for an IT cost crisis. [7]

I suppose one could add to this litany other factors such as the growth and emergence of the Internet, sector consolidations through mergers and acquisitions, the rise of open source and alternatives such as SaaS, etc.

But which of these are causes? Which are symptoms? And which might only be consequences or coincident?

To be sure, all recognize the explosion of digital data and information, with sources and formats springing up faster than Whack-a-Mole. It is such an evident and ubiquitous phenomenon that pointing to it as a cause appears on the face of it quite obvious. Also obvious is that these new sources carry with them a diversity of systems and tools. While not categorically stated as such, it appears that PWC fingers the difficulties of “cobbling” these systems together as the root cause for low productivity and thus the IT cost crisis.

I agree totally that these are symptoms of what we see in IT’s current circumstance. I would even say these factors are a proximate cause to these ills. But I disagree they are the root cause. To discover that root, I believe, we must look deeper to mindset and assumptions.

Closed World Mindset as the Root Cause

There are some phenomena that are so obvious that they are easily missed. Not seeing your fingertip six inches between your eyes is one of these. We aren’t used to focusing on things so near at hand.

So, let’s look for a moment at the closed world assumption (CWA), a key underpinning to most standard relational data systems and enterprise schema and logics. CWA is the logic assumption that what is not currently known to be true, is false. If CWA is not directly familiar to you that is understandable; it is an implied assumption of these systems and logics. As such, it is not often inspected directly and therefore not often questioned [8].

With regard to standard IT systems, the closed world assumption has two important aspects:

  1. The assumption is that the information domain at hand is complete [9], and
  2. The related negation as failure, which assumes every predicate to be false that cannot be proved to be true.

On the face of them, these assumptions seem tame enough. And, indeed, there are some enterprise data systems that absolutely rely on them for efficient processing and completion times, such as most transaction systems. CWA is absolutely the appropriate design for such applications.

However, for knowledge management or representation applications — that is, applications which involve combining or using heterogeneous data or information from multiple data sources, which are exactly the same sources requiring information “cobbling” noted above by PWC — there are two very critical implications of the closed-world assumption (CWA):

  1. Efforts or projects can not be undertaken incrementally; if done in pieces, each piece must be complete and consistent, which is expensive to scope and do
  2. To be consistent and explicit, the predicates (properties or relationships) must also be complex to model the “reality” of the system, which is also expensive to scope and do [10].

The net effect, which I have argued before, most notably in a major piece about the open world assumption [11], is that typical projects with a knowledge management aspect have become costly, take very long to complete, often fail, and require much planning and coordination. These facts have been true for three decades as enterprises have attempted to extract knowledge from their electronic information using closed world approaches based on relational systems. And, as recognized by PWC, these problems are only getting worse with growth in diversity and scope of systems.

The implications of closed world v. open world approaches are absolutely at the root of the causes leading to declining productivity, low innovation, significant failures and increasing costs — all exacerbated with more data and more systems — now characterizing traditional enterprise IT. Moreover, it is not a problem for open world systems to link to and incorporate closed world approaches. With open world, there is no need for Hobson’s choices. Unfortunately, such is not true when one begins with a closed world premise.

Incremental is Good: Pay as You Go

As best as I can tell, Alon Halevy was the first to use the phrase “pay as you go” in 2006 to describe the incremental aspect of the open world approach in relation to the semantic Web [12]. The “pay as you go” phrase had been applied earlier to data management and storage and had also been used to describe phone calling plans.

Incremental concepts and “agility” have been popular topics for the past five to ten years in IT, most often related to software development. And, while “incremental” sounds good in relation to enterprise projects, especially of a knowledge management or information integration/federation nature, the actual methodologies put forward were anything but incremental in their conceptual underpinnings.

Unfortunately, the “pay as you go” phrase has (and still is) largely confined to incremental, open world approaches involving the semantic Web. How this approach might apply and benefit enterprises has yet to be articulated. Nonetheless, I like the phrase, and I think it evokes the right mindset. In fact, I think with linked data and many other aspects of the current semantic Web we are seeing such approaches come to fruition. Inch-by-inch, brick-by-brick, data on the Web is getting exposed and interlinked. “Pay as you go” is incremental, and that is good.

Purposeful is Better: Pay as You Benefit

Yet the idea of “pay as you benefit” is more purposeful, able to be planned and implemented, and founded on standard enterprise cost-benefit principles. I think it is a better (and more nuanced) expression of the “pay as you go” mindset in an enterprise setting. What it means is you can start small and be incomplete. You can target any domain or department or scope that is most useful and illustrative for your organization. You can deploy your first stand-ups as proofs-of-concept or sandboxes. And, you can build on each prior step with each subsequent one.

One of the reasons we (Structured Dynamics) embraced the MIKE2.0 methodology [13] was its inherent incremental character. (Government deployments often call them “spirals”.) In general, the five phases of MIKE2.0 can be represented as follows:

Five Phases of MIKE2.0

(click for full size)

It is specifically during the fifth phase, testing and improvement, that quantitative and qualitative benefits from the current increment are calculated and documented. This evolving methodology is where the enterprise can assess the results of its prior investment and scope and budget for the next one. These can be quick, rapid increments, or more involved ones, depending on the schedule, prior results and risk profile of the enterprise (or department) at that time.

Much is made of “incremental” or “agile” deployments within enterprises, but the nature of the traditional data system (and its closed world assumption) can act to undermine these laudable steps. The inherent nature of an open world approach, matched with methodologies and best practices, can work wonderfully with KM-related projects.

Quite Simply a Different Way to Do Business

We see in our current IT circumstances a number of embedded practices and assumptions. We have been assuming control and completeness — the closed world opposite to the open world approach. We have thus embraced and promoted “global” or enterprise-wide solutions: be they desktop operating systems or browsers or expensive enterprise-level proprietary software solutions. This scope leads to immense hurdle rates and risks: we better get our choices right up front, because if we don’t, the department or enterprise are at risk. We have an inward focus about our own resources, our own networks, our own systems. Meanwhile, when we look outward, we wonder how all of these new Web companies can grow and expand so rapidly in comparison to us.

Clearly, we are seeing shifts to more services than products, more open source, more outsourcing, and more software as a service. Yet, because of the legacy of decades-long commitments from prior IT investment and the failures of many hyped “solutions” such as ERP or BI or data warehousing or a dozen others, we also see a decline and a reluctance for IT to embrace new and transforming approaches. Our prior choices were practically tantamount to “betting the enterprise.” What if our new approaches fail as so many of their predecessors did? In a demanding, competitive environment can we afford to make such wrong choices again with such immense implications?

Yet, now that information technology is a given, it only seems natural that its role becomes an integral part of the enterprise, and not a special function. Like procurement, IT has matured to become a support function. Businesses should not succeed or fail based on the types of pencils and paper stock they use; so should they not depend on the software support choices that IT makes. Enterprises are now past the need to get “computerized”; they are thoroughly so. But our understanding of IT’s role and position has not evolved with its own success.

The first whiffs of these challenges to IT’s initial hegemony came from the departmental introduction of PCs and local networks in the early 1980s. It has continued with desktop software, spreadsheets and Web portals and sites. Large, mature companies awoke in horror in the last decade to discover they had hundreds — sometimes thousands — of Web sites and content dissemination points over which IT had little or no control. Such is the nature of entropy, and it is a fact for any organization of any size.

So, now, with strategies such as “pay as you benefit,” there is no longer an excuse not to innovate. There is not a justification to put off testing and discovering benefits that the open world and semantic approaches can bring to your organization. There is now a basis to make the case and set the affordable budgets within desirable timelines for becoming a semantic enterprise.

Mindsets and expectations do require some adjustment. For example, not everything will be known or modeled in early phases. But, is that also not true in any “real” real world? We’re not talking high-throughput transaction systems here, but beginning to pull together and link the information that is important to your organization strategically.

Remember the intro statement that “a little semantics goes a long way”? Well, that truth — and it is true — when combined with incremental deployment firmly tied to demonstrable results, promises quite simply a different way to do business. Never before have enterprises had working and winnable approaches such as this to test and innovate and learn and discover. Jump on in; the water is clear and warm.

And, oh, as to that mess in your closet or garage? Well, if you adhere to CWA, you will need to define a place for everything to go before you can start cleaning things up. I say: forget those false hurdles. If you’d really want to make a dent in the mess, grab a broom and start cleaning.


[1] Jim Hendler, “a little semantics goes a long way.” See http://www.cs.rpi.edu/~hendler/LittleSemanticsWeb.html.
[2] All starting data is for the United States only and comes from the U.S. Bureau of Economic Analysis, U.S. Department of Commerce. The data tables were downloaded from the BEA Web site at http://www.bea.gov/national/nipaweb/SelectTable.asp. GDP data is from Section 1; enterprise private investment data from Section 5. For reasons as described in the text, all relative BEA numbers were re-adjusted from a 2005 baseline to 1997 based on absolute figures. Software figures and expenditures include packaged software, custom software and software developed in-house, but excludes software bundled or included within hardware.
[3] Data not shown; see the “Software Investment and Prices, by Type” data on the BEA Web page http://www.bea.gov/national/info_comm_tech.htm.
[4] Michael A. Cusumano, 2008. “The Changing Software Business: Moving from Products to Services,” Massachusetts Institute of Technology, in Computer, Vol 41 (1): 20-27, January 2008. See http://www.iae.univ-lille1.fr/SitesProjets/bmcommunity/Research/cusumano.pdf. This shift has occurred despite the recognition that potential gross margins from software packages can exceed 90% due to zero costs of reproduction. As Cusumano notes in a rule, “99 percent of zero is zero: The great profit opportunity from software products becomes theoretical and not practical” if not sold. Also, another interesting observation made by Cusumano is that in the shift to services vendors with both low percentages and high percentages of services, or what he calls the “sweet spots”, show higher contributions to profitability than vendors in the middle. He posits that low percentage vendors are getting mostly profitable maintenance fees, while those above 60% in services show profitability due to learning more replicable and systematic processes and approaches for service delivery.
[5] While we may occasionally see some vendors successfully buck this trend, I suspect these will only occur for established vendors with established platform advantages or for isolated applications where the innovating vendors have a significant first-mover advantage.
[6] Garnter calls the innovation category “transform”; see Gartner, Incorporated, 2009. “IT Software and Services, 2007-2010,” see http://www.slideshare.net/rsink/gartner-report-it-spending-2010. Also, see Jed Rubin and Howard Rubin, 2006. “Worldwide IT Benchmark Service New Trends & Findings for 2007: Strategic Performance Management and Measurement,” from Gartner Consulting Worldwide IT Benchmark Service; see http://www.gartner.com/teleconferences/attributes/attr_161183_115.pdf.
[7] PricewaterhouseCoopers, 2009. “Why Isn’t IT Spending Creating More Value?”, see http://www.pwc.com/en_US/us/increasing-it-effectiveness/assets/it_spending_creating_value.pdf.
[8] Though relational database systems did not begin with an understanding of CWA, but rather Edgar Codd’s 12 rules, the understandings of these were formulated later by Raymond Reiter.  Reiter first described the basis of CWA in 1978, and then provided an axiomatization of relational databases and their deductive generalizations and basis in CWA in 1984; see http://prism.cs.umd.edu/papers/Min02:reiter_memoriam/memoriam-tplp.pdf.
[9] Relational database systems also assume unique names for objects, which, while not perhaps the best design for federated systems, can be overcome in other ways.
[10] For semantics-related projects there is a corollary problem to the use of CWA which is the need for upfront agreement on what all predicates “mean”, which is difficult if not impossible in reality when different perspectives are the explicit purpose for the integration.
[11] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[12] This was also the first instance (I believe) of Alon coining the “dataspace” term. First use of the “pay as you go” phrase was, Alon Halevy, Michael Franklin, and David Maier, 2006. “Principles of Dataspace Systems,” in Proceedings of ACM Symposium on Principles of Database Systems, pp: 1-9. See also the slides accompanying that talk, Alon Halevy, 2006. “Principles of Dataspace Systems (PODS),” June 26, 2006; see http://www.cs.washington.edu/homes/alon/files/pods06-keynote.ppt, 2006. More explicitly the next year see Jayant Madhavan, Shirley Cohen, Xin (Luna) Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu, 2007. “Web-scale Data Integration: You Can Afford to Pay as You Go.” in 3rd Conf. on Innovative Data Systems Research (CIDR), pp 342-350, see http://research.yahoo.com/files/paygo.pdf. The term has been picked up by many others, notably Rada Chirkova, Dongfeng Cheny, Fereidoon Sadriz and Timo J. Salo, 2007. “Pay-As-You-Go Information Integration: The Semantic Model Approach,” see ftp://ftp.csc.ncsu.edu/pub/tech/2007/TR-2007-30.pdf; and most recently papers by Gerhard Weikum on RDF-3X; see http://domino.mpi-inf.mpg.de/internet/reports.nsf/c125634c000710cec125613300585c64/70e8f906d8090e6bc125757f00448ec9!OpenDocument&ExpandSection=-1.
[13] See M.K. Bergman, 2010. “MIKE2.0: Open Source Information Development in the Enterprise,” AI3 Blog posting, February 23, 2010; and M.K. Bergman, 2010. Open SEAS: A Framework to Transition to a Semantic Enterprise,” AI3 Blog posting, March 1, 2010.

Posted by AI3's author, Mike Bergman

Posted on July 12, 2010 at 10:57 pm in Adaptive Innovation, MIKE2.0, Semantic Enterprise | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/
The URI to trackback this post is: http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/trackback/
Date:   June 17, 2010

Structured Dynamics logo

Structured Dynamics Completes Design Phase; Citizen Dan First Exemplar

Structured Dynamics has been in a fervent — and, we believe, fruitful — design phase for the past 18 months. All of the working parts related to how to embrace becoming a semantic enterprise have now been defined and designed. Actual tools and components accompany many of these parts and have been deployed.

Recently, I have been speaking and blogging much about rationale, process, mindset and approach for how to bring semantics into the organization. But, prior to now, we have not spoken much about the overall design behind our approach. Today, as we complete our design phase and introduce our first exemplar instance of it — Citizen Dan [1] — we are finally in a position to describe this overall approach.

We term our approach the open semantic framework, also OSF. The open semantic framework is a combination of a layered architecture and modular software. The open semantic framework represents the software component of the four-component total open solution, recently described in a three part series. I return to this topic in the conclusion of this post.

Revisiting Design Objectives

Over the past nine months, I have been focusing my writing largely on the semantic enterprise, with more specificity regarding our Open SEAS (Semantic Enterprise Adoption and Solutions) initiative. In bits and pieces, these writings have tended to reflect a number of objectives:

  • Leverage existing information assets (data + structure) as much as possible
  • Develop incrementally, and validate and justify as you go
  • Emphasize, where possible, open standards and open software
  • Employ Web-oriented architectures
  • Adopt an open-world approach that acknowledges that information is most often incomplete; the approach is a key enabler for incremental deployments
  • Use URIs as object identifiers, and use linked data where practical
  • Embrace any data format found in the wild, but use RDF as the ultimate integration data model
  • Design architectures and APIs that avoid “lock-in” and support multiple tools options across the stack
  • Provide systems and capabilities that put all information sources — text, media, semi-structured and conventional databases — on an equal footing
  • Promote designs that bring the ability to create useful results into the hands of users and decisionmakers; relegate IT to a support role.

To date, the result of these design objectives is perhaps best captured in my Seven Pillars of the Open Semantic Enterprise posting, as well as our general discussions regarding adaptive ontologies. Yet, still, these writings have been somewhat piecemeal. What this document attempts to do is to place all of these perspectives into a single, coherent whole.

The Incremental Layers of the Open Semantic Framework

Structured Dynamics has been a strong advocate for layered architectures, with clear APIs between layers as appropriate. But these layers are not “laminates” that completely cover the layer below, nor are they all needed or necessary. Depending on the circumstance, some layers are unneeded or superfluous. Layers may be added or not incrementally.

In this manner, then, the open semantic framework is perhaps more akin to a pearl, than to a laminate or cocoon. Each subsequent layer does not “embed” the layer prior to it, and some layers actually may inter-operate with multiple layers below or above it (this is notably true for the “ontologies” layer, which has interactions up and down the stack).

Nonetheless, we can envision this pearl of the open semantic framework and its layers as follows:

Incremental Layers of the Open Semantic Framework

(click for full size)

Others have termed this the “semantic muffin” or even “semantic muppet” or “semantic blob”. Whatever (hehe). The real idea is that layers may accrete (as in the growth of a pearl) and occur over time and be uneven. Each layer, though, does have a role to play (though it may not be needed in a given deployment), and does act to augment existing information assets in the transition to a semantic framework. Beginning at the core, each of these layers — with external references as appropriate for more details — is described below.

Existing Assets Layer

The open semantic framework is premised on leveraging existing information assets. Sure, once the framework is in place, new information can be brought into it in a more direct, semantic manner. But, the real thrust and benefit of this framework is to provide an incremental pathway for finally inter-operating and federating prior decades of data, structure and information assets.

These information assets may reside inside or outside the enterprise. They may (and DO!) exist in many formats and are described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is NO information asset that is not amenable to be included in this framework.

The Information Transformation (scones/irON) Layer

The information transformation layer provides either: 1) extraction of concepts and entities as structured metadata from source text or documents; or 2) conversion of existing data assets to interoperable form. As implemented by Structured Dynamics, the extractions are conducted by either scones (Subject Concept or Named EntitieS) or third-party utilities, and the conversions occur via irON (instance record Object Notation) or third-party “RDFizers“.

Depending on the source, the net result of the transformation is to produce interoperable data and information that can be ingested and used by other layers in the framework.

Though not strictly analogous, this layer bears some resemblance to the ETL (extract, transfer, load) utilities used in many enterprise information integration applications. Unlike those conventional systems, this information transformation layer also may capture and represent some of the source schema.

In all cases, however, these transformations are relatively simple and get parsed against the available structure (the ontologies, schema and entity reference lists) in the system to generate the semantic metadata (tags).

At this point, the extracted structure is generally at the level of instance records, or the ABox, with simple assertions of attribute-value pairs for specific records [2]. Little schema transformation or mapping occurs at this layer (if such is needed, that occurs at the structWSF layer; see next). Actual federation or interoperation occurs at later layers based on the TBox structures [2].

This modular portion of the framework is explicitly designed with APIs to allow third-party tools to be plugged in and substituted.

The structWSF Layer

The major workhorse of the open semantic framework is the structWSF (Web services framework) layer. structWSF is the most complicated of the OSF layers and has many supporting software packages and capabilities. The structWSF layer provides the standard, common interface (”canonical”) layer by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack.

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies; see below).

The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML [3], is used for internal communications across all structWSF Web services and with other layers.

structWSF has a central service that governs access rights and permissions. These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Datasets within a given structWSF instance may be accessed directly via API or via SPARQL queries to the instance’s endpoint. Depending on rights and query, results sets may be returned from a given structWSF instance in an infinite variety of ways.

This latter capability is the essential interface for subsequent layers in the open semantic framework stack. Depending on those subsequent components, pre-staged data and results sets may be returned for an essentially limitless variety of purposes.

Each structWSF instance also has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables structWSF instances to participate or not in potentially global or restricted local networks and collaboration environments. This is currently the largest untapped potential of structWSF with respect to its existing deployments.

The Semantic Components Layer

The newest layer in the stack is the semantic components layer. This layer takes results sets — most often generated by a specific query or data slice request — from one or more structWSF instances and then presents that information via a variety of data visualization or data presentation widgets (what we specifically call ‘semantic components‘ due to their design [4]). The operation and sensitivity of these display components are themselves driven by a presentation and data analysis (including statistics) ontology.

Current display widgets include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable without modification to any domain.

As presently implemented by Structured Dynamics, this layer consists either of Flex data visualization components or structured data display templates based on Smarty. The inherent design allows for updates to other bases (such as HTML5). The layer may also be swapped out or substituted with third-party capabilities.

The strength and power of this system is governed by its own ontology, the Semantic Component Ontology (SCO) (see next).

This is an extremely flexible layer in the open semantic framework stack. Expect an ongoing series of explanatory blog posts and online resources in the upcoming weeks to explain this innovative capability.

The Ontologies Layer

The ontologies layer actually refers to all structured assets driving the system. As such, this layer might be considered the “brain” (though rather simply specified!) of the open semantic framework.

At a true schema or TBox level [2], the ontologies layer represents the concept and relationships of the domain at hand. This layer also hosts the specific local entities and prominent things (people, places, events, etc.) useful for extracting local and domain-specific relevance. However, those views are also supplemented with some administrative ontologies (two examples are SCO and irON) that guide how the user interfaces or widgets in the system should behave.

The concept level represents the “world view” of the specific instantiation of the open semantic framework at hand. This conceptual (TBox) view provides the structural organization of information, inferencing capabilities, and navigation, faceting and explorer structure. The entity (ABox) view provides tagging for prominent individuals and instances important to the domain at hand, and guides the structure behind data visualizations of attribute or indicator data.

The administrative level uses simple roles and relationships for attributes and indicators to inform the framework as to how and with what widget to display information. For example, a “type” of information that is geographically related can be instructed to use the map component as an option for display. Whether some information is used for totals, comparison purposes, or other specifications useful to data visualization and graphing may also be specified.

The language and relationships (predicates or properties) of these administrative ontologies are simple and straightforward. It is, for example, relatively easy to define data display functions at the broad dataset and attributes level. Simple determinations drive how results sets and their associated results types may be displayed, no matter what datasets or slices may be generated as a result of the queries or requests fed to the system.

The structure in these layers can be replaced by other structures for other instantiations and circumstances. Indeed, all other layers in the open semantic framework can remain relatively fixed while tailoring the instance to new domains solely via this layer. The ontologies layer is what gives any given instantiation of OSF — such as Citizen Dan — its unique focus and scope.

The Content Management System (conStruct) Layer

The thinnest layer (that is, least substantial with respect to this framework) is the content management system (CMS) layer. In its current form, the open semantic framework uses the Drupal CMS via our conStruct plug-in modules. The design of the framework, however, has explicitly accommodated the possibility that other CMSs may substitute for this role.

The CMS layer is optional if structWSF endpoints are sufficient or if simple Web pages hosting semantic components are deemed as adequate. Very small organizations or deployments may reasonably choose to have no CMS layer at all.

However, for most sites or portals with more than a few active users, it is desirable to have broad flexibility in theming (”skinning”), user rights and permissions, or other functionality. These are the roles of the CMS layer. Drupal, for example, is presently supported by more than 4500 third-party modules in every conceivable function, from polling to blogs and rating systems and bulletin boards.

For such generalized portals or collaboration environments, it makes sense to adopt and install a flexible CMS system, such as Drupal. Much of the user experience and functional environment can be provided through such means.

The open semantic framework is thus designed to reside easily in a CMS while also providing the hooks to take advantage of the generalized user rights and functionality of the CMS. In this manner, the open semantic framework is able to stay focused on its structured data and interoperability purposes, while still gaining the advantages of rich-featured content management systems.

The OSF is a Web-oriented Architecture

With its inherent open-world orientation [5] and distributed and collaborative potential, the open semantic framework was designed from the outset to be Web-capable and Web-oriented:

Open Semantic Framework is a Web-oriented Architecture

(click for full size)

A Web-oriented architecture (WOA) has a number of understood requirements, to which the open semantic framework adheres. Specifically, these design considerations support the framework as being part of WOA:

  • Data and objects are all identified with Web addresses (URIs)
  • Data is generally exposed (and universally available) as linked data
  • SPARQL endpoints and APIs are generally RESTful in design
  • The overall architecture is modular, with inherent decentralized and distributed aspects
  • All display and visualization aspects are cross-browser ready and capable.

OSF is the Basis for Domain-specific Instantiations

Citizen Dan is our first exemplar instance of this open semantic framework. The details page for the project goes into some of Citizen Dan’s functionality and capabilities.

Citizen Dan is specifically geared to local governments and localities, with an emphasis on community indicator systems (CIS). CIS have become a popular way of measuring and tracking measures of local economic and social well-being; they are closely related to sustainability and how to measure it as used in many economic and environmental domains.

However, in the context of this post, what is really interesting about Citizen Dan is that its semantic framework is a completely open and generic one. The same set of tools and capabilities described on its details page can be applied to any domain that needs to manage and understand information in its own domain. This includes from unstructured text or documents to conventional structured databases.

What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists; see above) that are fed to this open semantic framework. By swapping out new structures, what can be called Citizen Dan in one instance can morph to become Curriculum Carla in say, the education instance or Doctor Doolittle in the veterinary science instance [6].

We can illustrate these multiple instances as follows:

The Open Semantic Framework can Spawn Many Different Domain Instances

(click for full size)

What this figure illustrates is that even a branded expression of the framework — such as Citizen Dan — is merely an instance of that framework. And, actually, when expressed in such a packaged manner, we can more accurately call the standard and bundled suite of generic functions and accompanying structure of Citizen Dan as an instantiation of the open semantic framework:

in·stan·ti·ate \in-ˈstan(t)-shē-āt\ (transitive verb) is to:

  1. (transitive) to represent an abstract concept by a concrete instance
  2. (transitive, object orientated computing) to create an object (an instance) of a specific class

in·stan·ti·a·tion \in-stan(t)-shē-ā-shən\ (noun) [7]

By replacing the structure bases, and by tailoring the function suite appropriate to a given market and use, we can create many instantiations of the open semantic framework for different domains and markets. In this manner, Citizen Dan can be seen as an early exemplar of the framework, but not as a definer and limiter to it.

Total Open Solution

OSF is the Software Leg to a ‘Total Open Solution

So far, this discussion has focused solely on considerations of software and architecture. While we see the power of the open semantic framework, highly useful in itself, this is inadequate alone to achieve acceptance and success in the enterprise (as we noted in our most recent posts). The very forces that are compelling enterprises to look at new options, are also the same ones that pose difficult hurdle rates for acceptance of open source.

To address this issue, we have developed a four-legged foundation to what we termed the total open solution. The solution involves software, structure, documentation and methods (or best practices). Each of these connect and relate to the other foundations.

The open semantic framework is clearly the software (and architecture) leg to this foundation. Again, however, what is interesting is that the mere swapping out of the structure can also make the system relatively ready for other domains.

We see these relationships in the following diagram, that also shows that the DocWiki portions of the solution embody the documentation (aside from code-level comments) and methods legs of the foundation:

DocWiki is a Natural Complement to the Open Semantic Framework

(click for full size)

Differences between domains may also lead to differences as to which components are included or not in that domain’s desired instantiation.

The hugely important implied point, however, from the diagram above, is to show how nearly universal the content and methods in the DocWiki may be to other domains. Because the deltas between domains largely result from structure and what specific functional components are included or not, it becomes clear that most documentation and practices shared with the DocWiki will be applicable across domains. Sure, the use cases and some of the specific terminology may change, but we can also now see a high degree of re-usability of documentation and knowledge base across markets. This realization makes the usefulness and leverage of the DocWiki even higher.

A Common Language and Framework for Moving Forward

Developing “common language” by which to describe and convey things — especially new things like semantics that also have strong technical aspects — is tough, very tough. We are only now beginning on this process; we look to many in the community and elsewhere to help define informative and evocative terminology.

Per the original design objectives above, Structured Dynamics has approached the challenge of the semantic enterprise in what we think is both a pragmatic and a new way. The insistence on preserving and respecting existing information assets, matched with the opportunities and different mindsets arising from an open-world approach [5], have necessitated thinking through new designs and developing new concepts. Any time such new thinking and concepts occurs, new language and new metaphors must accompany it.

While certainly there are components and various software packages that populate and comprise an open semantic framework, the framework is also just as importantly a world view or way to think about information, information development, and its architecture. For example, a pivotal concept is that an open semantic framework is built around generic tools responsive to the information structures fed to them. This realization shifts the locus of emphasis from software development per se to creating, managing and adapting data and information structures. While this democratizes the information development process and is more inclusive of all knowledge workers, it also imposes needs for new toolsets and business processes. We are only at the nascent stages of understanding and learning about these differences.

Similarly, a development approach that is inherently incremental and leverages (rather than replaces or displaces) existing information assets means IT projects need to be considered in a new light. Small projects with more emphasis on tangible and demonstrable benefits will alter budgets, lower risks, and place a need for quicker turnaround. Like the architecture of the open semantic framework itself, projects based on OSF are also more distributed, decentralized and modular.

With such decentralization also comes the need for mechanisms and systems to overcome vendor “lock-in” and proprietary systems. A key thrust in support of what we have called the total open solution and its mixture of documentation and methods to accompany software and structure is specifically targeted at this issue. Tools and means for collaboration and concurrent contributions are another possible answer. Prior software practices in agile development and version control will see extensions to all manner of information development across the enterprise.

We are proud of our design work and proof-testing with clients over the past 18 months. We believe the open semantic framework and its implications to be a fundamental shift in how organizations need to think about their information development, existing information assets, and IT budgets and processes. We know widescale adoption is not yet at hand — enterprises are justifiably conservative when it comes to new thinking. But, given global competition and tight pocketbooks, the open semantic framework is a formulation to which enterprises and governments should pay very close attention.


[1] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.
[2] Structured Dynamics’ best practices approach makes explicit splits between the “ABox” (for instance data) and “TBox” (for ontology schema) in accordance with our working definition for description logics, a fundamental underpinning for how we use RDF:

“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”
[3] A subsequent post will document this rather straightforward XML schema.
[4] Contact Structured Dynamics for a early sneak peek. The Citizen Dan application will be publicly released as an online sandbox and demo by the end of summer 2010.
[5] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Anothe way to say is it that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[6] Of course, things are always not so simple as this. The CMS layer gives the open semantic framework the ready ability to change themes and layouts (”skins), not to mention the breadth and specifics of what ancillary site functionality might be provided. Moreover, the module basis of the open semantic framework also means that entire clusters of functionality might be dropped from a given instantiation (or added to it!) without violating or negating this framework.
[7] Dictionary references are from Merriam-Webster and Wikitionary.

Posted by AI3's author, Mike Bergman

Posted on June 17, 2010 at 11:56 am in Adaptive Innovation, Ontology Best Practices, Open Source, Semantic Enterprise, Software Development, Structured Dynamics, Web-oriented Architecture | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/891/domain-specific-instantiations-based-on-the-open-semantic-framework/
The URI to trackback this post is: http://www.mkbergman.com/891/domain-specific-instantiations-based-on-the-open-semantic-framework/trackback/
Date:   May 9, 2010

We’re Speaking on Rich Interfaces and MIKE2.0

SemTech 2010 Conference
As I reported about a year ago after my first attendance, I think the Semantic Technology Conference is the best venue going for pragmatic discussion of semantic approaches in the enterprise. I’m really pleased that I will be attending again this year. The conference (#SemTech) will be held at the Hilton Union Square in downtown San Francisco on June 21-25, 2010. Now in its sixth year and the largest of its kind, it is again projected to attract 1500 attendees or so.

I will be presenting two papers this year, covering rather dramatically different topics. Such is the business of a young company like Structured Dynamics that wears many hats!

Rich User Interfaces for Semantic Technologies

A really exciting presentation for us is, Sizzle for the Steak: Rich, Visual Interfaces for Ontology-driven Apps, on Wed, June 23 in the 2:00 PM – 3:00 PM session.

A nagging gap in the semantic technology stack is acceptable — better still, compelling — user experiences. After our exile for a couple of years doing essential infrastructure work, we have been unshackled over the past year or so to innovate on user interfaces for semantic technologies.

Our unique approach uses adaptive ontologies to drive rich Internet applications (RIAs) through what we call “semantic components.” This framework is unbelievably flexible and powerful and can seamlessly interact with our structWSF Web services framework and its conStruct Drupal implementations.

We will be showing these rich user interfaces for the first time in this session. We will show concept explorers, “slicer-and-dicers”, dashboards, information extraction and annotation, mapping, data visualization and ontology management. Get your visualization anyway you’d like, and for any slice you’d like!

While we will focus on the sizzle and demos, we will also explain a bit of the technology that is working behind Oz’s curtain.

Methods for Becoming a Semantic Enterprise

A more informal, interactive F2F discussion will be, MIKE2.0 for the Semantic Enterprise, on Thurs, June 24 in the 4:45 PM – 5:45 PM slot.

MIKE2.0 (Method for an Integrated Knowledge Environment) is an open source methodology for enterprise information management that is coupled with a collaborative framework for information development. It is oriented around a variety of solution “offerings”, ranging from the comprehensive and the composite to specific practices and technologies. A couple of months back, I gave an overview of MIKE2.0 that was pretty popular.

We have been instrumental in adding a semantic enterprise component to MIKE2.0, with our specific version of it called Open SEAS. In this Face-to-Face session, experts and desirous practitioners will join together to discuss how to effectively leverage this framework. While I will intro and facilitate, expect many other MIKE2.0 aficionados to participate.

This is perhaps a new concept to many, but what is exciting about MIKE2.0 is that it provides a methodology and documentation complement to technology alone. When combined with that technology, all pieces comprise what might be called a total open solution. I personally think it is the next logical step beyond open source.

Hope to See You There!

So, if you have not already made plans, consider adjusting your schedule today. And, contact me in advance (mike at structureddynamics dot com) if you’ll be there. We’d love to chat!

Posted by AI3's author, Mike Bergman

Posted on May 9, 2010 at 11:06 pm in MIKE2.0, Open SEAS, Semantic Enterprise, Semantic Web Tools, Structured Dynamics | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/881/two-presentations-at-semtech-2010/
The URI to trackback this post is: http://www.mkbergman.com/881/two-presentations-at-semtech-2010/trackback/
Date:   April 2, 2010

Friday  Brown Bag Lunch

Semantic mediation — that is, resolving semantic heterogeneities — must address more than 40 discrete categories of potential mismatches from units of measure, terminology, language, and many others. These sources may derive from structure, domain, data or language.

Earlier postings in this recent series traced the progress in climbing the data federation pyramid to today’s current emphasis on the semantic Web. Partially this series is aimed at disabusing the notion that data extensibility can arise simply by using the XML (eXtensible Markup Language) data representation protocol. As Stonebraker and Hellerstein correctly observe:

XML is sometimes marketed as the solution to the semantic heterogeneity problem . . . . Nothing could be further from the truth. Just because two people tag a data element as a salary does not mean that the two data elements are comparable. One could be salary after taxes in French francs including a lunch allowance, while the other could be salary before taxes in US dollars. Furthermore, if you call them “rubber gloves” and I call them “latex hand protectors”, then XML will be useless in deciding that they are the same concept. Hence, the role of XML will be limited to providing the vocabulary in which common schemas can be constructed.[1]

This series also covers the ontologies and the OWL language (written in XML) that now give us the means to understand and process these different domains and “world views” by machine. According to Natalya Noy, one of the principal researchers behind the Protégé development environment for ontologies and knowledge-based systems:

How are ontologies and the Semantic Web different from other forms of structured and semi-structured data, from database schemas to XML? Perhaps one of the main differences lies in their explicit formalization. If we make more of our assumptions explicit and able to be processed by machines, automatically or semi-automatically integrating the data will be easier. Here is another way to look at this: ontology languages have formal semantics, which makes building software agents that process them much easier, in the sense that their behavior is much more predictable (assuming they follow the specified explicit semantics–but at least there is something to follow). [2]

Again, however, simply because OWL (or similar) languages now give us the means to represent an ontology, we still have the vexing challenge of how to resolve the differences between different “world views,” even within the same domain. According to Alon Halevy:

When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies–or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel. [3]

In the sections below, we describe the sources for how this heterogeneity arises and classify the many different types of heterogeneity. I then describe some broad approaches to overcoming these heterogeneities, though a subsequent post looks at that topic in more detail.

Causes and Sources of Semantic Heterogeneity

There are many potential circumstances where semantic heterogeneity may arise (partially from Halevy [3]):

  • Enterprise information integration
  • Querying and indexing the deep Web (which is a classic data federation problem in that there are literally tens to hundreds of thousands of separate Web databases) [4]
  • Merchant catalog mapping
  • Schema v. data heterogeneity
  • Schema heterogeneity and semi-structured data.

Naturally, there will always be differences in how differing authors or sponsors create their own particular “world view,” which, if transmitted in XML or expressed through an ontology language such as OWL may also result in differences based on expression or syntax. Indeed, the ease of conveying these schema as semi-structured XML, RDF or OWL is in and of itself a source of potential expression heterogeneities. There are also other sources in simple schema use and versioning that can create mismatches [3]. Thus, possible drivers in semantic mismatches can occur from world view, perspective, syntax, structure and versioning and timing:

  • One schema may express a similar “world view” with different syntax, grammar or structure
  • One schema may be a new version of the other
  • Two or more schemas may be evolutions of the same original schema
  • There may be many sources modeling the same aspects of the underlying domain (”horizontal resolution” such as for competing trade associations or standards bodies), or
  • There may be many sources that cover different domains but overlap at the seams (”vertical resolution” such as between pharmaceuticals and basic medicine).

Regardless, the needs for semantic mediation are manifest, as are the ways in which semantic heterogeneities may arise.

Classification of Semantic Heterogeneities

The first known classification scheme applied to data semantics that I am aware of is from William Kent nearly 20 years ago.[5] (If you know of earlier ones, please send me a note.) Kent’s approach dealt more with structural mapping issues (see below) than differences in meaning, which he pointed to data dictionaries as potentially solving.

The most comprehensive schema I have yet encountered is from Pluempitiwiriyawej and Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources.” [6] They classify heterogeneities into three broad classes:

  • Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying DTDs. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names.
  • Domain conflicts arise when the semantic of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the DTDs and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts.
  • Data conflicts refer to discrepancies among similar or related data values across multiple sources. Data conflicts can only be detected by comparing the underlying DOCs. The class of data conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the element contents and the attribute values.

Moreover, mismatches or conflicts can occur between set elements (a “population” mismatch) or attributes (a “description” mismatch).

The table below builds on Pluempitiwiriyawej and Hammer’s schema by adding the fourth major explicit category of language, leading to about 40 distinct potential sources of semantic heterogeneities:

Class

Category

Subcategory

STRUCTURAL Naming Case Sensitivity
Synonyms
Acronyms
Homonyms
Generalization / Specialization
Aggregation Intra-aggregation
Inter-aggregation
Internal Path Discrepancy
Missing Item Content Discrepancy
Attribute List Discrepancy
Missing Attribute
Missing Content
Element Ordering
Constraint Mismatch
Type Mismatch
DOMAIN Schematic Discrepancy Element-value to Element-label Mapping
Attribute-value to Element-label Mapping
Element-value to Attribute-label Mapping
Attribute-value to Attribute-label Mapping
Scale or Units
Precision
Data Representation Primitive Data Type
Data Format
DATA Naming Case Sensitivity
Synonyms
Acronyms
Homonyms
ID Mismatch or Missing ID
Missing Data
Incorrect Spelling
LANGUAGE Encoding Ingest Encoding Mismatch
Ingest Encoding Lacking
Query Encoding Mismatch
Query Encoding Lacking
Languages Script Mismatches
Parsing / Morphological Analysis Errors (many)
Syntactical Errors (many)
Semantic Errors (many)

Most of these line items are self-explanatory, but a few may not be:

  • Homonyms refer to the same name referring to more than one concept, such as Name referring to a person v. Name referring to a book
  • A generalization/specialization mismatch can occur when single items in one schema are related to multiple items in another schema, or vice versa. For example, one schema may refer to “phone” but the other schema has multiple elements such as “home phone,” “work phone” and “cell phone”
  • Intra-aggregation mismatches come when the same population is divided differently (Census v. Federal regions for states, or full person names v. first-middle-last, for examples) by schema, whereas inter-aggregation mismatches can come from sums or counts as added values
  • Internal path discrepancies can arise from different source-target retrieval paths in two different schema (for example, hierarchical structures where the elements are different levels of remove)
  • The four sub-types of schematic discrepancy refer to where attribute and element names may be interchanged between schema
  • Under languages, encoding mismatches can occur when either the import or export of data to XML assumes the wrong encoding type. While XML is based on Unicode, it is important that source retrievals and issued queries be in the proper encoding of the source. For Web retrievals this is very important, because only about 4% of all documents are in Unicode, and earlier BrightPlanet provided estimates there may be on the order of 25,000 language-encoding pairs presently on the Internet
  • Even should the correct encoding be detected, there are significant differences in different language sources in parsing (white space, for example), syntax and semantics that can also lead to many error types.

It should be noted that a different take on classifying semantics and integration approaches is taken by Sheth et al. [7] Under their concept, they split semantics into three forms: implicit, formal and powerful. Implicit semantics are what is either largely present or can easily be extracted; formal languages, though relatively scarce, occur in the form of ontologies or other descriptive logics; and powerful (soft) semantics are fuzzy and not limited to rigid set-based assignments. Sheth et al.’s main point is that first-order logic (FOL) or descriptive logic is inadequate alone to properly capture the needed semantics.

From my viewpoint, Pluempitiwiriyawej and Hammer’s [6] classification better lends itself to pragmatic tools and approaches, though the Sheth et al. approach also helps indicate what can be processed in situ from input data v. inferred or probabalistic matches.

Importance of Reference Standards

An attractive and compelling vision  — perhaps even a likely one  — is that standard reference ontologies become increasingly prevalent as time moves on and semantic mediation is seen as more of a mainstream problem. Certainly, a start on this has been seen with the use of the Dublin Core metadata initiative, and increasingly other associations, organizations, and major buyers are busy developing “standardized” or reference ontologies.[8] Indeed, there are now more than 10,000 ontologies available on the Web.[9] Insofar as these gain acceptance, semantic mediation can become an effort mostly at the periphery and not the core.

But, such is not the case today. Standards only have limited success and in targeted domains where incentives are strong. That acceptance and benefit threshold has yet to be reached on the Web. Until such time, a multiplicity of automated methods, semi-automated methods and gazetteers will all be required to help resolve these potential heterogeneities.

Friday  Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on June 6, 2006. No changes have been made to the original posting. Current approaches to dealing with these heterogeneities would be to use “bridging” ontologies that map the mismatches.

[1] Michael Stonebraker and Joey Hellerstein, “What Goes Around Comes Around,” in Joseph M. Hellerstein and Michael Stonebraker, editors, Readings in Database Systems, Fourth Edition, pp. 2-41, The MIT Press, Cambridge, MA, 2005. See http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf.
[2] Natalya Noy, “Order from Chaos,” ACM Queue vol. 3, no. 8, October 2005 See http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=341&page=1
[3] Alon Halevy, “Why Your Data Won’t Mix,” ACM Queue vol. 3, no. 8, October 2005. See http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=336.
[4] Michael K. Bergman, “The Deep Web: Surfacing Hidden Value,” BrightPlanet Corporation White Paper, June 2000. The most recent version of the study was published by the University of Michigan’s Journal of Electronic Publishing in July 2001. See http://www.press.umich.edu/jep/07-01/bergman.html.
[5] William Kent, “The Many Forms of a Single Fact”, Proceedings of the IEEE COMPCON, Feb. 27-Mar. 3, 1989, San Francisco. Also HPL-SAL-88-8, Hewlett-Packard Laboratories, Oct. 21, 1988. [13 pp]. See http://www.bkent.net/Doc/manyform.htm.
[6] Charnyote Pluempitiwiriyawej and Joachim Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources,” Technical Report TR00-004, University of Florida, Gainesville, FL, 36 pp., September 2000. See ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf.
[7] Amit Sheth, Cartic Ramakrishnan and Christopher Thomas, “Semantics for the Semantic Web: The Implicit, the Formal and the Powerful,” in Int’l Journal on Semantic Web & Information Systems, 1(1), 1-18, Jan-March 2005. See http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html
[8] See, among scores of possible examples, the NIEM (National Information Exchange Model) agreed to between the US Departments of Justice and Homeland Security; see http://www.niem.gov/.
[9] OWL Ontologies: When Machine Readable is Not Good Enough

Posted by AI3's author, Mike Bergman

Posted on April 2, 2010 at 10:22 am in Adaptive Information, Brown Bag Lunch, Ontologies, Semantic Enterprise, Semantic Web | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/
The URI to trackback this post is: http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/trackback/
Date:   March 23, 2010

Optimus Prime transformer

Open Source, Open World, Web, and Semantics to Transform the Enterprise

Ten years ago the message was the end of obscene rents from proprietary enterprise software licenses. Five years ago the message was the arrival and fast maturing of open source. Today, the message is the open world and semantics.

These forces are conspiring to change much within enterprise IT. And, this change will undoubtedly be for the good — for the enterprise. But these forces are not necessarily good news within conventional IT departments and definitely not for traditional vendors unwilling to transform their business models.

I have been beating the tom-tom on this topic for a few months, specifically in regards to the semantic enterprise. But I have by no means been alone nor unique. The last two weeks have seen an interesting confluence of reports and commentaries by others that richen the story of the changing information technology landscape. I’ll be drawing on the observations of Thomas Wailgum (CIO magazine) [1], John Blossom [2] and Andy Mulholland, CTO of Capgemini [3].

The New Normal

“After nearly five decades of gate-keeping prominence, corporate IT is in trouble and at a crossroads like never before in its mercurial and storied history as a corporate function. You may be too big to fail, but you’re not too big to succeed. What will you do?”

– Thomas Wailgum [1]

Wailgum describes the “New Normal” and how it might kill IT [1]. He picks up on the viewpoint that ties the recent meltdowns in the financial sector as a seismic force for changes in information technology. While he acknowledges many past challenges to IT from PCs and servers and Y2K and software becoming a commodity, he puts the global recession’s impact on business — the “New Normal”– into an entirely different category.

His basic thesis is that these financial shocks are forcing companies to scrutinize IT as never before, in particular “unfavorable licensing agreements and much-too-much shelfware; ill-conceived purchasing and integration strategies; and questionable software married to entrenched business processes.”

Yet, he also argues that IT and its systems are too ingrained into the core business processes of the enterprise to be allowed to fail. IT systems are now thoroughly intertwined with:

  • ERP systems – the financial, administrative and procurement backbone of every organization
  • Business development and BI
  • Operations and forecasting
  • Customer service and call centers
  • Networking and security
  • Sales and marketing via CRM and lead generation
  • Supply chain applications in manufacturing and shipping.

But top management is disappointed and disaffected. IT systems gobble up too many limited resources. They are inflexible. They are old and require still more limited resources to modernize. They are complex. They create and impose delays. And all of these negatives lead to huge losses in opportunity costs. Wailgum notes Gartner, for instance, as saying that by 2012 perhaps 20 percent of businesses will own no IT assets at all in their desire to outsource this headache.

“Enterprise systems are doing it wrong. And not just a little bit, either. Orders of magnitude wrong. Billions and billions of dollars worth of wrong. Hang-our-heads-in-shame wrong. It’s time to stop the madness.”

– Tim Bray, as quoted in [1]

I think this devastating diagnosis is largely correct, though perhaps incomplete in that no mention is made of the flipside: what IT has failed to deliver. I think this flipside is equally damning.

Despite decades of trying, IT still has not broken down the data stovepipes in the enterprise. Rather, they have proliferated like rabbits. And, IT has failed to unlock the data in the 80% of enterprise information contained within documents (unstructured data).

Unfortunately, after largely zeroing in and mostly diagnosing the situation, Wailgum’s remedy comes off sounding like a tired 12-step program. He argues for new mindsets, better communications, getting in touch with customers, being willing to take risks, and being nimble. Well, duh.

So, over the decades of IT failures there has been accompanying decades of criticism, hand-wringing, and hackneyed solutions. Without some more insightful thinking, this analysis can make our understanding of the New Normal look pretty old.

Not Necessarily Good News for Vendors

John Blossom [2] picks up on these arguments and looks at the issues from the vendor’s perspective. Blossom characterizes Wailgum’s piece as “outlining the enormous value gap that’s been arising in enterprise information technologies.” And, while clearly new approaches are needed and farming them out may become more prevalent, Blossom cautions this is not necessarily good news for vendors.

“. . . the trend towards agnosticism in finding solutions to information problems is only going to get stronger. Whatever platform, tool or information service can solve the job today will get used, as long as it’s affordable and helps major organizations adapt to their needs.”

“. . . many solutions oriented at first towards small to medium enterprises are likely to scale up cost-effectively as platforms from which more targeted information services can be launched to meet the needs of larger enterprises. If agility favors smaller companies that lack the legacy of failed IT investments that larger organizations must still bear, then there will be increasing pressure on large organizations to adopt similar methods.”
“. . . if you thought that your business could be segregated from the Web as a whole, increasingly you’ll be dead wrong.”
– John Blossom [2]

As Blossom puts it, “what seems to be happening is that many of the business processes through which these enterprises survived and thrived over the past several decades are shooting blanks. . . . many of the fundamental concepts of IT that have been promoted for the past few decades no longer give businesses operational advantages but they have to keep spending on them anyway.”

As he has been arguing for quite some time, one fundamental change agent has been the Web itself. “The Web has accelerated the flow of information and services that can lead to effective decision-making far more rapidly than enterprise IT managers have been able to accommodate.”

Web search engines and social media tools can begin to replace some of the dedicated expenditures and systems within the enterprise. Moreover, the extent, growth and value of external data and content is readily apparent. Without outreach and accommodation of external data — even if it can solve its own internal data federation challenges — the individual enterprise is at risk of itself becoming a stovepipe.

Prior focuses on strategy and capturing workflows are perhaps being supplanted by the need for operational flexibility and on-the-fly aggregation and rapid service development tools. In an increasingly interconnected and rapidly changing world with massive information growth, being able to control workflows and to depend on central IT platforms may become last decade’s “Old Normal.” Floating on top of these massive forces and riding with their tides is a better survival tactic than digging fixed emplacements in the face of the tsunami.

These factors of Web, open source, agnosticism as to platform or software applications, and the need to mash up innovations from anywhere are not the traditional vendor game. Just as businesses and their IT departments must get leaner, so must the expectation of vendors to extort exorbitant rents from their clients. “Fasten your seatbelts, it’s going to be a bumpy night!” [4]

So, Blossom agrees with the Wailgum diagnosis, but also helps us begin to understand parts of the cure. Blossom argues the importance of:

  • Web approaches and architectures
  • Incorporation of external data
  • Leverage of Web applications, and
  • Use of open standards and APIs to avoid vendor lock-in.

Much, if not all of this, can be provided by open source. But open source is not a sine qua non: commercial products that embrace these approaches can also be compatible components across the stack.

A Semantic Lever on An Open World Fulcrum

But — even with these components — a full cure still lacks a couple of crucial factors.

These remaining gaps are emphasized in Andy Mulholland’s recent blog post [3]. His post was occasioned by the press announcement that Structured Dynamics (my firm) had donated its Semantic Enterprise Adoption and Solutions, or SEAS, methodology to MIKE2.0 [5]. Mulholland was suggesting his audience needed to know about this Method for an Integrated Knowledge Environment because some of the major audit partnerships have decided to get behind MIKE2.0 with its explicit and open source purpose of managing knowledge environments and their data and provenance.

“In ‘closed’ — or some might say normal — IT environments where all data sources can be carefully controlled, all statements are taken to be false unless explicitly known to be true. However most ‘new’ data is from the ‘open’ environment of the web and in semantic data. If this is not specifically flagged as true it is categorised as ‘unknown’ rather then false. This single characteristic to me is in many ways the most crucial issue to understand as we go forward into using mixed data sets to support complex ‘business intelligence’ or ‘decision support’ around externally driven events, and situations.”

– Andy Mulholland, CTO, Capgemini [3]

As Mulholland notes, “. . . it’s not just more data, it’s the forms of data, and what the data is used for, all of which add to the complications. . . . Sadly the proliferation of data has mostly been in unstructured data in formats suitable for direct human use.”

So, one remaining factor is thus how to extract meaning from unstructured (text) content. It is here that semantics and various natural language processing (NLP) components come in. Implied in the incorporation of data extracted from unstructured sources is a data model expressly designed for such integration.

Yet, without a fulcrum, the semantic lever can still not move the world. Mulholland insightfully nails this fundamental missing piece — the “most crucial issue” — as the use of the open world assumption.

From an enterprise perspective and in relation to the points of this article, an open world assumption is not merely a different way to look at the world. More fundamentally, it is a different way to do business and a very different way to do IT.

I have summarized these points before, but they deserve reiteration. Open world frameworks provide some incredibly important benefits for knowledge management applications in the enterprise:

  • Domains can be analyzed and inspected incrementally
  • Schema can be incomplete and developed and refined incrementally
  • The data and the structures within these open world frameworks can be used and expressed in a piecemeal or incomplete manner
  • We can readily combine data with partial characterizations with other data having complete characterizations
  • Systems built with open world frameworks are flexible and robust; as new information or structure is gained, it can be incorporated without negating the information already resident, and
  • Open world systems can readily bridge or embrace closed world subsystems.

Archimedes is attributed to the apocryphal quote, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.” [6] I have also had lawyer friends tell me that the essence of many court cases is found in a single pivotal assertion or statement in the arguments. I think it fair to say that the open world approach plays such a central role in unlocking the adaptive way for IT to move forward.

Bringing the Factors Together via Open SEAS

Open SEAS
As Mulholland notes, we have donated our Open SEAS methodology [7] to MIKE2.0 in the hopes of seeing greater adoption and collaboration. This is useful, and all are welcome to review, comment and contribute to the methodology, indeed as is the case for all aspects of MIKE2.0.

But the essential point of this article is that Open SEAS also embraces most — if not all — of the factors necessary to address the New Normal IT function.

Pillars of the Open Semantic Enterprise

Open SEAS is explicitly designed to facilitate becoming an open semantic enterprise. Namely, this means an organization that uses the languages and standards of the semantic Web, including RDF, RDFS, OWL, SPARQL and others to integrate existing information assets, using the best practices of linked data and the open world assumption, and targeting knowledge management applications. It does so based on Web-oriented architectures and approaches and uses ontologies as an “integration layer” across existing assets.

The foundational approaches to the open semantic enterprise do not necessarily mean open data nor open source (though they are suitable for these purposes with many open source tools available). The techniques can equivalently be applied to internal, closed, proprietary data and structures. The techniques can themselves be used as a basis for bringing external information into the enterprise. ‘Open’ is in reference to the critical use of the open world assumption.

These practices do not require replacing current systems and assets; they can be applied equally to public or proprietary information; and they can be tested and deployed incrementally at low risk and cost. The very foundations of the practice encourage a learn-as-you-go approach and active and agile adaptation. While embracing the open semantic enterprise can lead to quite disruptive benefits and changes, it can be accomplished as such with minimal disruption in itself. This is its most compelling aspect.

We believe this offers IT an exciting, incremental and low-risk path for moving forward. All existing assets can be left in place and — in essence — modernized in place. No massive shifts and no massive commitments are required. As benefits and budgets allow, the extent of the semantic interoperability layer may be extended as needed and as affordable.

The open semantic enterprise is not magic nor some panacea. Simply consider it as bringing rationality to what has become a broken IT system. Embracing the open semantic enterprise can help the New Normal be a good and more adaptive normal.


[1] Thomas Wailgum, 2010. “Why the New Normal Could Kill IT,” March 12, 2010 online story in CIO Magazine; see http://www.cio.com/article/575563/Why_the_New_Normal_Could_Kill_IT.
[2] John Blossom, 2010. “Enterprise Publishing and the “New Normal” in I.T. – Are You Missing the Trend?,” March 15, 2010 blog post on ContentBlogger; see http://www.shore.com/commentary/weblogs/2010/03/enterprise-publishing-and-new-normal-in.html.
[3] Andy Mulholland, 2010. “Meet MIKE – Methodology for Managing Data and its Use,” March 12, 2010 ePractice.eu Blog post; see http://www.epractice.eu/en/blog/309352. Also, see Mulholland’s Capgemini CTO Blog.
[4] Bette Davis (as Margo Channing) uttered this famous line in All About Eve (1950).
[5] MIKE2.0 is a Method for an Integrated Knowledge Environment is an open source methodology for enterprise information management that provides a framework for information development. The MIKE2.0 Methodology is part of the overall Open Methodology Framework and is a collaborative effort to help organisations who have invested heavily in applications and infrastructures, but haven’t focused on the data and information needs of the business.
[6] As quoted in The Lever, “”Archimedes, however, in writing to King Hiero, whose friend and near relation he was, had stated that given the force, any given weight might be moved, and even boasted, we are told, relying on the strength of demonstration, that if there were another earth, by going into it he could remove this.” from Plutarch (c. 45-120 AD) in the Life of Marcellus, as translated by John Dryden (1631-1700).
[7] The Open SEAS framework is part of the MIKE2.0 Semantic Enterprise Solution capability. It adds some 40 new resources to this area, importantly including reasoning for the validity of statements in ‘open’ situations.

Posted by AI3's author, Mike Bergman

Posted on March 23, 2010 at 1:36 pm in MIKE2.0, Open SEAS, Open Source, Semantic Enterprise, Structured Dynamics, Web-oriented Architecture | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/873/changing-it-for-good/
The URI to trackback this post is: http://www.mkbergman.com/873/changing-it-for-good/trackback/
Date:   March 1, 2010

New Release Builds on the MIKE2.0 Methodology and Deliverables

Open SEAS

Today, Structured Dynamics is pleased to release Open SEAS, its methodology for Semantic Enterprise Adoption and Solutions. At the same time, we are donating the framework to the open source MIKE2.0 Method for an Integrated Knowledge Environment project.

Open SEAS provides a framework for the enterprise to establish a coherent, consistent and interoperable layer across its information assets. It is compliant with the MIKE2.0 Semantic Enterprise Solution Offering.

Open SEAS has been developed for enterprises desiring to initiate or extend their involvement with semantic technologies. It is inherently incremental, low-cost and low-risk.

Donation and Relation to MIKE2.0

Concurrent with this release, Structured Dynamics is also donating the methodology and all of its related intellectual assets to the MIKE2.0 project. Under Creative Commons license and MIKE2.0’s content governance policies, the community’s current 2000+ members are now free to expand and use the Open SEAS methodology in any manner they see fit.
MIKE2.0 Logo

Last week, I began to introduce MIKE2.0 and its methodology to the readers of this blog. MIKE2.0 provides a complete delivery environment and methodology for information management projects in the enterprise. Solutions — from the specific to the composite — are described and packaged with respect to plans, management communications, products (open source and proprietary), activities, benchmarks, and deliverables. Delivery is accomplished over multiple increments, split into five phases from definition and planning to deployment. The assets associated with this framework first are based on templates and guidelines that can be applied to any information management area. The framework allows for multiple projects to be combined and inter-related, all under a common methodology. More information and a good entry point is provided on the What is MIKE2.0? page on the project’s main Web site.

MIKE2.0 presently has some 800 resources across about 40 solution areas. With Structured Dynamics’ donation, there are now about 40 resources related to the semantic enterprise, many of them major, accompanied by many images and figures. This contribution makes the Semantic Enterprise Solution Offering instantly one of the more complete within MIKE2.0. As noted below, this contribution is also just a beginning of our commitment.

Basic Overview of Open SEAS

The Open SEAS framework is Structured Dynamics’ specific implementation framework for MIKE2.0’s Semantic Enterprise Solution Offering. This section overviews some of Open SEAS‘ key facets.

A Grounding in the Open World Approach

Many enterprise information systems, particularly relational ones, embody a closed world assumption that holds that any statement that is not known to be true is false. This premise works well where there is complete coverage of specific items, such as the enumeration of all customers or all products.

Yet, in most areas of the real (”open”) world there is no guarantee or likelihood of complete coverage. Under an open world assumption the lack of a given assertion or fact does not imply whether that possible assertion is true or false: it simply is not known. An open world assumption is one of the key factors that defines the open Semantic Enterprise Offering and enables it to be deployed incrementally. It is also the basis for enabling linkage to external (often incomplete) datasets.
Pillars of the Open Semantic Enterprise

Fortunately, there is no requirement for enterprises to make some philosophical commitment to either closed- or open-world systems or reasoning. It is perfectly acceptable to combine traditional closed-world relational systems with open-world reasoning. It is also not necessary to make any choices or trade-offs about using public v. private data or combinations thereof. All combinations are acceptable when the basis for integration is an open-world one.

Open SEAS is grounded in this “open” style. It can be employed in virtually any enterprise circumstance and at any scope, and expanded in a similar way as budget and needs allow.

Other Basic Pillars to the Framework

Open SEAS is based on seven pillars, which themselves inform the basis for the MIKE2.0 Guiding Principles for the Open Semantic Enterprise. These principles cover data model, architecture, deployment practices and approach for how an enterprise can begin and then extend its use of semantics for information interoperability.

Important aspects are linked data or Web-oriented architecture, but it is really the unique combination of open-world approach and the RDF data model and its semantic power that provide the distinctive differences for Open SEAS. An exciting prospect — but still in its early stages of discovery and implementation — is the role of adaptive ontologies to power ontology-driven applications. These prospects, if fully realized, could totally remake how knowledge workers interact and specify the applications that manage their information environment.

Embracing the Layered Semantic Enterprise Architecture

Open SEAS also fully embraces the Layered Semantic Enterprise Architecture of MIKE2.0’s Semantic Enterprise Offering. This architecture acts as a subsequent set of functions or middleware with respect to the MIKE2.0’s standard SAFE Architecture. Most of the existing SAFE architecture resides in the Existing Assets layer. The specific aspects of Open SEAS resides in the layers above, namely Access/Conversion, Ontologies and the Applications Layers.

Using (Mostly) Open Source to Fill Gaps in the Technology Stack

Stitching together this interoperability layer above existing information and infrastructure assets requires many diverse tools and products, and there still are gaps. The layer figure below shows the semantic enterprise architecture overlaid with some representative open source projects and tools that plug some of those gaps.

Open SEAS also maintains a comprehensive roster of open source and proprietary tools in all aspects of semantic technology, ranging from data storage and converters, to Web services and middleware, and then to ultimate user applications. A database of nearly 1,000 tools in all areas is maintained for potential applicability to the methodology.

Open SEAS Architecture w/ Examples
Click to expand

Quick, Adaptive, Agile Increments

The inherently incremental nature of the Open SEAS framework encourages experimentation, affordable deployments, and experience gathering. Because the systems and deployments put into place with this framework are based on the open world approach and use the extensible RDF data model, expansions in scope, sophistication or domain can be incorporated at any time without adverse effects on existing assets or systems or prior Open SEAS deployments.

Quick and (virtually) risk-free increments means that adopting semantic approaches in the enterprise can be accelerated (or not) based on empirical benefits and available budgets.

An Emphasis on Learning

The Open SEAS framework is built on a solid foundation, but it also one that is incomplete. Deployments of semantic technologies and approaches are still quite early in the enterprise, whether measured in numbers, scope or depth. In order for the framework — and the practice of semantic adoption in general — to continue to expand and be relevant in the enterprise, active learning and documentation is essential. One of the reasons for the affiliation of Open SEAS with MIKE2.0 is to leverage these strong roots in methodological learning.

Where Do We Go From Here?

The nature of Open SEAS and its parent Semantic Enterprise Solution Offering touches most offerings within the MIKE2.0 framework. There is much to be done to integrate the semantic enterprise perspective into these other possibilities, plus much that needs to be learned and documented for the offering itself. The concept of the semantic enterprise, after all, is relatively new with few prominent case studies.

As the offering points out, there are some dozens of addition necessary resources that are available and ready to be packaged and moved into the MIKE2.0 framework. These efforts are a priority, and will continue over the coming weeks.

But, more importantly, beyond that, the experience and practitioner base needs to grow. Much is unknown regarding key aspects of the offering:

  • What are the priority application areas which promise the greatest return on investment?
  • What are best practices for adoption and technologies across the entire semantic enterprise stack?
  • Many tools and techniques are still legacies and outgrowths of the research and academic communities. How can these be adopted and modified to meet enterprise standards and expectations?
  • What are the “best” ontology and vocabulary building blocks upon which to model and help frame the enterprise’s interoperability needs?
  • What are the most cost-effective strategies for leveraging existing information and infrastructure assets, while transitioning away from them where appropriate?

Despite these questions, emergence is the way complex systems arise out of a multiple of relatively simple interactions, exhibiting new and unforeseen properties in the process. RDF is an emergent model. It begins as simple “fact” statements of triples, that may then be combined and expanded into ever-more complex structures and stories. As an internal, canonical data model, RDF has advantages for information federation and development over any other approach. It can represent, describe, combine, extend and adapt data and their organizational schema flexibly and at will. Applications built upon RDF can explore and analyze in ways not easily available with other models.

Combined with an open-world approach, new information can be brought in and incorporated to the framework step-by-step. Perhaps the greatest promise in an ongoing transition to become a semantic enterprise is how an inherently incremental and building-block approach might alter prior practices and risks across the entire information management spectrum.

We invite you to join us and to contribute to this effort. I encourage you to join MIKE2.0 if you have not already done so, and check out announcements on this blog for ongoing developments.

Posted by AI3's author, Mike Bergman

Posted on March 1, 2010 at 12:19 pm in Adaptive Innovation, MIKE2.0, Ontology Best Practices, Open SEAS, Open Source, Semantic Enterprise, Structured Dynamics, Web-oriented Architecture | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/868/open-seas-a-framework-to-transition-to-a-semantic-enterprise/
The URI to trackback this post is: http://www.mkbergman.com/868/open-seas-a-framework-to-transition-to-a-semantic-enterprise/trackback/
Copyright © 2004–2010 Michael K. Bergman.   This work is licensed under a Creative Commons License