Posted:March 20, 2006

Machine readable, standard formats are revolutionizing the transfer and interoperability of data within the semantic Web.  This trend is all to the good, the implications of which are only beginning to be seen.

I’ve been working with OWL for some time now.  OWL (Web Ontology Language) is the next step up the food chain to deal with semantic definition from given data sources and semantic heterogeneity between sources.  And in my dealings with OWL, I have also begun using the Protégé viewer and editor from Stanford University.

But something has been lacking.

Machines may like formats in certain ways, but we as analysts and designers also need to work with the schema.  So, while I like the idea of these standards, I’m actually pretty shocked at the huge gap between the representation of the data and the visualization of its schema useful to us humans.  Why is this gap important?

The Growing Proliferation of Online Ontologies

Today, if you go to the ontology distribution site, Swoogle, you will see listed about 10,000 ontologies across manifest areas.  Similarly, if you pose the query ‘ontology filetype:owl‘ to Google, you will see more than 13,300 results.

The number of ontologies is growing and will increasingly become valuable resources for capturing the structure of knowledge and the world.  Indeed, in a recent paper by Harith Alani, “Ontology Construction from Online Ontologies,” in Proceedings of 15th World Wide Web Conference (2006), Edinburgh, Scotland (see here for the PDF reference), we can see that mechanisms for identifying, harvesting, and assembling constituent ontologies into meaningful sets are already becoming a source of active academic investigation.

A key insight is that ontologies demand reuse and recombination.  Constructing an individual ontology can be difficult, and most are only small in scope.  Though everyone has a unique view and lens on the world, it does not make sense to construct each knowledge representation anew.  Rather, it is likely that knowledge structures will combine what has been designed before, as all knowledge accretes.  And, further, it is also the case that knowledge structures and the ontologies that guide them will grow in size and scope to match true real-world problems.

Mixing-and-matching, combining and culling, is a human activity that requires different ways to present the structure and its schema.

Spreadsheets:  Better Visualization of OWL?

For some time now, BrightPlanet has provided a spreadsheet format as one alternative for managing and organizing large taxonomies or directory structures with its Deep Query ManagerTM Publisher portal product.  This spreadhsheet alternative works great when dealing with large structures, large block moves, major structure re-organizations, structure merges and the like.  In short, a spreadsheet supports the same very manipulations that may be required in combining existing ontologies.

When fine-tuning individual nodes or establishing relationships, a node-by-node approach makes better sense than a spreadsheet.  The point, however, is that the best form for data manipulation and visualization depends on the task at hand.  Bulk actions and gross structure overviews require a different approach than fine-tuning placements and relationships.  This lesson has apparently yet to be learned by ontology editors and OWL editing tools.

This same observation has been made by Richard Searle in his post on OWL in spreadsheet form. According to Searle:

The vast majority of OWL/RDF UIs use a graph/tree representation. That can be useful for understanding the structure but does not scale beyond a dozen or so subjects. Some systems use a HTML browser style (e.g., Sesame and Piggybank) where the links correspond to the properties. Neither corresponds to the de facto representation of business data: the spreadsheet. Leveraging that familiar structure should provide a way to leverage the power of RDF in a form that is more accessible to the end user. . . .

A spreadsheet UI is based on displaying a set of subjects. These could be defined in a number of ways:

  1. SPARQL query
  2. OWL class
  3. Fragments of an RDF document (# style)
  4. Children of an RDF document (/ style)
  5. Properties of a particular Subject (or Object).

These are issues that I am now looking at carefully. Though the spreadsheet may not be the most elegant metaphor, it is also a framework that has gained significant user expertise and familiarity for large-scale structure and data manipulation. There is much to be said for the workable, often at the expense of the elegant.

So far, the standards community has done a good job in satisfying the machines. It is now time to make these structures and standards truly usable for meaningful work — at scale, and by humans.

Posted by AI3's author, Mike Bergman Posted on March 20, 2006 at 10:24 pm in Semantic Web | Comments (2)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:March 10, 2006

According to a press release issued this week, Metatomix announced the availability of its Semantic Toolkit free to Eclipse developers. The toolkit may be downloaded from

The toolkit provides:

  • A Web Ontology Language (OWL) editor that allows users to build complex, nested ontologies for describing a concept or a domain. Users can add classes, properties and constraints via selection boxes and text fields, as well as share ontologies among projects
  • A Resource Definition Framework (RDF) editor that allows users to create and edit the RDF content in a project, and provides RDF triples and directed graph views.

The Semantic Toolkit is a component of Metatomix’s MetaStudio(TM), an Eclipse-based development and execution environment for Metatomix’s semantic composite applications.

Posted by AI3's author, Mike Bergman Posted on March 10, 2006 at 11:25 am in Semantic Web, Semantic Web Tools | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:March 9, 2006

Through the courtesy of Hewlett Packard Lab’s Semantic Blogging Demonstrator, I came across this excellent HP technical paper by Dave Reynolds, Carol Thompson, Jishnu Mukerji, and Derek Coleman, "An Assessment of RDF/OWL Modelling," HPL Technical Report 2005-189, October 28, 2005, 24 pp.

According to the paper’s authors:

We identify the primary strengths of RDF/OWL as:

  • Support for information integration and reuse of shared vocabularies
  • Handling of semi-structured data
  • Separation of syntax from data modelling
  • Web embedding
  • Extensibility and resilience to change
  • Support for inference and classification, based on a formal semantics
  • Representation flexibility, especially ability to model graph structures
  • Ability to represent instance and class information in the same formalism and hence combine them.

Weaknesses noted are:

  • Weak ability to validate documents
  • Expressivity limitations, particularly in terms of correlating across different properties of a resource
  • Performance
  • XML serialization issues and impedance mismatch with XML tooling
  • Lack of familiarity and potentially high learning curve
  • Inability to natively represent uncertain data and continuous domains
  • No built-in representation of processes and change

We conclude that RDF/OWL is particularly suited to modelling applications which involve distributed information problems such as integration of data from multiple sources, publication of shared vocabularies to enable interoperability and development of resilient networks of systems which can cope with changes to the data models. It has less to offer in closed world or point- to-point processing problems where the data models are stable and the data is not to be made available to other clients.

This report is highly recommended to those just becoming familiar with semantic Web protocols with a need for assessing trade-offs.

Posted by AI3's author, Mike Bergman Posted on March 9, 2006 at 12:38 pm in Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:February 14, 2006

How often do you see vendor literature or system or application descriptions that claim extensibility simply because of a heavy reliance on XML? I find it amazing how common the claim is and how prevalent are the logical fallacies surrounding this notion.

Don’t get me wrong. As a data exchange format, eXtensible Markup Language (XML) does provide data representation extensibility. This contribution is great, with widespread adoption a major factor in its own right helping to bring down the Tower of Babel. But the simple use of XML is insufficient alone to provide extensibility.

Fully extensible systems need to have at least these capabilities:

  • Extensible data representation so that any data type and form can be transmitted between two disparate systems. XML and its other structured cousins such as RDF and OWL perform this role. Note, however, that standard data exchange formats have been an active topic of research and adoption for at least 20 years, with other notable formats such as ASN.1, CDF, EDI, etc., also performing the task now largely being overtaken by XML
  • Extensible semantics, since once more than one source of data is brought into an extended environment it likely introduces new semantics and heterogeneities. These mismatches fall into the classic challenge areas of data federation. The key point, however, is that simply being able to ingest extended data does nothing if the meaning of that data is not also captured. Semantic extensibilitiy requires more structured data representations (RDF-S or OWL, for example), reference vocabularies and ontologies, and utilities and means to map the meanings between different schema
  • Extensible data management. Though native XML data bases and other extensions to conventional data systems have been attempted, truly extensible data management systems have not yet been developed that: 1) perform at scale; 2) can be extended without re-architecting the schema; 3) can be extended without re-processing the original source data; and 4) perform efficiently. Until extensible infrastructure with these capabilities is available, extensibility will not become viable at the enterprise level and will remain an academic or startup curiosity, and
  • Extensible capabilities through extendable and interoperable applications or tools. Though we are now moving up the stack into the application layer, real extensibility comes from true interoperability. Service-oriented architectures (SOAs) and other approaches allow the registry and message brokering amongst extended apps and services. But central v. decentralized systems, inclusion or not of business process interoperabilty, and even the accommodation of the other extensible imperatives above make this last layer potentially fiendishly difficult.

These challenges are especially daunting in a completely decentralized, chaotic, distributed enviornment such as the broader Internet. This environment requires peer-to-peer protocols, significant error checking and validation, and therefore the inefficiencies due to excessive protocol layering. Moreover, there are always competing standards and few incentives and fewer rewards for gaining compliance or adherence.

Thus it is likely that whatever progress is made on these extensibility and interoperabilkity fronts will show themselves soonest in the enterprise. Enterprises can better enforce and reward centralized standards. Yet even in this realm, while perhaps virtually all of the extensible building blocks and nascent standards exist, pulling them together into a cohesive whole, in which the standards themselves are integrated and cohesive, is the next daunting challenge.

Thus, the next time you hear about a system with its amazing extensibilitiy, look more closely at it in terms of these threshold criteria. The claims will likely fail. And, even if they do appear to work in a demo setting, make sure you look around carefully for the wizard’s curtain.

Posted:February 13, 2006

Conventional service-oriented architectures (SOAs) have been found to have:

  • Slow and inefficient bindings
  • Complete duplication of information processing in requests because of no caching, and
  • Generally slow performance because of RDBMS storage.

These problems are especially acute at scale.

Frank Cohen recently posted a paper on IBM’s developerWorks, "FastSOA: Accelerate SOA with XML, XQuery, and native XML database technology: The role of a mid-tier SOA cache architecture," that presents some interesting alternatives to this conundrum.

The specific FastSOA proposal may or may not be your preferred solution if you are working with complex SOA environments at scale. But the general overview of conventional SOA constraints (in the SOAP framework) is very helpful and highly recommended.

Posted by AI3's author, Mike Bergman Posted on February 13, 2006 at 9:59 am in Information Automation, Semantic Web | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: