Posted:February 14, 2006

How often do you see vendor literature or system or application descriptions that claim extensibility simply because of a heavy reliance on XML? I find it amazing how common the claim is and how prevalent are the logical fallacies surrounding this notion.

Don’t get me wrong. As a data exchange format, eXtensible Markup Language (XML) does provide data representation extensibility. This contribution is great, with widespread adoption a major factor in its own right helping to bring down the Tower of Babel. But the simple use of XML is insufficient alone to provide extensibility.

Fully extensible systems need to have at least these capabilities:

  • Extensible data representation so that any data type and form can be transmitted between two disparate systems. XML and its other structured cousins such as RDF and OWL perform this role. Note, however, that standard data exchange formats have been an active topic of research and adoption for at least 20 years, with other notable formats such as ASN.1, CDF, EDI, etc., also performing the task now largely being overtaken by XML
  • Extensible semantics, since once more than one source of data is brought into an extended environment it likely introduces new semantics and heterogeneities. These mismatches fall into the classic challenge areas of data federation. The key point, however, is that simply being able to ingest extended data does nothing if the meaning of that data is not also captured. Semantic extensibilitiy requires more structured data representations (RDF-S or OWL, for example), reference vocabularies and ontologies, and utilities and means to map the meanings between different schema
  • Extensible data management. Though native XML data bases and other extensions to conventional data systems have been attempted, truly extensible data management systems have not yet been developed that: 1) perform at scale; 2) can be extended without re-architecting the schema; 3) can be extended without re-processing the original source data; and 4) perform efficiently. Until extensible infrastructure with these capabilities is available, extensibility will not become viable at the enterprise level and will remain an academic or startup curiosity, and
  • Extensible capabilities through extendable and interoperable applications or tools. Though we are now moving up the stack into the application layer, real extensibility comes from true interoperability. Service-oriented architectures (SOAs) and other approaches allow the registry and message brokering amongst extended apps and services. But central v. decentralized systems, inclusion or not of business process interoperabilty, and even the accommodation of the other extensible imperatives above make this last layer potentially fiendishly difficult.

These challenges are especially daunting in a completely decentralized, chaotic, distributed enviornment such as the broader Internet. This environment requires peer-to-peer protocols, significant error checking and validation, and therefore the inefficiencies due to excessive protocol layering. Moreover, there are always competing standards and few incentives and fewer rewards for gaining compliance or adherence.

Thus it is likely that whatever progress is made on these extensibility and interoperabilkity fronts will show themselves soonest in the enterprise. Enterprises can better enforce and reward centralized standards. Yet even in this realm, while perhaps virtually all of the extensible building blocks and nascent standards exist, pulling them together into a cohesive whole, in which the standards themselves are integrated and cohesive, is the next daunting challenge.

Thus, the next time you hear about a system with its amazing extensibilitiy, look more closely at it in terms of these threshold criteria. The claims will likely fail. And, even if they do appear to work in a demo setting, make sure you look around carefully for the wizard’s curtain.

Posted:February 12, 2006

The W3C organization has just published an update on "A Survey of RDF/Topic Maps Interoperability Proposals."  This note, dated Feb 10, updates the previous version of one year ago.

It is well and good to embrace standards for semantic content such as RDF or OWL, but without mechanisms for standardly expressing schemas it is difficult to actually map and resolve semantic heterogeneities.  This introductory survey is useful from the standpoint of topic maps. 

Posted by AI3's author, Mike Bergman Posted on February 12, 2006 at 11:52 am in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:February 1, 2006

IBM has announced it has completed the first step of making the Unstructured Information Management Architecture (UIMA) available to the open source community by publishing the UIMA source code to UIMA is an open software framework to aid the creation, development and deployment of technologies for unstructured content. IBM first unveiled UIMA in December of 2004. The source code for the IBM reference implementation of UIMA is currently available and can be downloaded from . In addition, the IBM UIMA SDK, with additional facilities and components, can be downloaded for free from .

UIMA has received support from the Defense Advanced Research Projects Agency (DARPA) and is currently in use as part of DARPA’s new human language technology research and development program called GALE (Global Autonomous Language Exploitation). UIMA is also embedded in various IBM products for processing unstructured information.

Posted:December 14, 2005

I just finished participating in a discussion that has mirrored many others I have observed in the past:  We have a complicated problem with much data before us, and we don’t know where it may evolve or trend.  Can we architect a single database schema up front that handles all possible options?

Every programmer or database administrator (DBA) will recommend keeping designs to a single database, schema and vendor.  It makes life easier for them.

However, every real world application and community points to the natural outcomes of multiple schemas and databases.  This reality, in fact, has what has led to the whole topic of data federation and the various needs to resolve physical, semantic, syntactic and other schema heterogeneities.

Designers can certainly be clever with respect to anticipating growth, changes as seen in the past and so forth.  Leaving "open slots" or "generic fields" in schemas are often posited and perhaps may allow for a little bit of growth.  Also, perhaps quite a bit of mitigation for schema evolution can be anticipated up front.

But the reality of diversity remains.  The semantic Web and proliferation of user-generated metadata will only exacerbate these challenges.  Simply talk to the biological or physics communities of what they have seen in finding a single "grand schema."  They haven’t been able to, can’t, and it is a chimera.

Thus, smart design does not begin with a naive single database premise.  It recognizes that information exists in many forms in many places and in many transmutations from many viewpoints.  And what is important today will surely change tomorrow.  Explicit recognition of these realities is critical to successful upfront information management design.

Viva la multiple databases!

Posted by AI3's author, Mike Bergman Posted on December 14, 2005 at 2:31 pm in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:December 13, 2005

I just came across an easily readable and accessible short series on the semantic Web by Sunil Goyal of the enventure blog. The four-part series consists of:

  • Part 1 — introduction and overview of various Web services
  • Part 2 — the challenges of data integration
  • Part 3 – RDF and OWL data models and service-oriented middleware, and
  • Part 4 — user applications, enterprise systems, research applications and themes and services.

If you are new to this topic, you may find this series an easy first introduction.

Posted by AI3's author, Mike Bergman Posted on December 13, 2005 at 1:31 pm in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: