Posted:March 20, 2006

OWL Ontologies: When Machine Readable is Not Good Enough

Machine readable, standard formats are revolutionizing the transfer and interoperability of data within the semantic Web.  This trend is all to the good, the implications of which are only beginning to be seen.

I’ve been working with OWL for some time now.  OWL (Web Ontology Language) is the next step up the food chain to deal with semantic definition from given data sources and semantic heterogeneity between sources.  And in my dealings with OWL, I have also begun using the Protégé viewer and editor from Stanford University.

But something has been lacking.

Machines may like formats in certain ways, but we as analysts and designers also need to work with the schema.  So, while I like the idea of these standards, I’m actually pretty shocked at the huge gap between the representation of the data and the visualization of its schema useful to us humans.  Why is this gap important?

The Growing Proliferation of Online Ontologies

Today, if you go to the ontology distribution site, Swoogle, you will see listed about 10,000 ontologies across manifest areas.  Similarly, if you pose the query ‘ontology filetype:owl‘ to Google, you will see more than 13,300 results.

The number of ontologies is growing and will increasingly become valuable resources for capturing the structure of knowledge and the world.  Indeed, in a recent paper by Harith Alani, “Ontology Construction from Online Ontologies,” in Proceedings of 15th World Wide Web Conference (2006), Edinburgh, Scotland (see here for the PDF reference), we can see that mechanisms for identifying, harvesting, and assembling constituent ontologies into meaningful sets are already becoming a source of active academic investigation.

A key insight is that ontologies demand reuse and recombination.  Constructing an individual ontology can be difficult, and most are only small in scope.  Though everyone has a unique view and lens on the world, it does not make sense to construct each knowledge representation anew.  Rather, it is likely that knowledge structures will combine what has been designed before, as all knowledge accretes.  And, further, it is also the case that knowledge structures and the ontologies that guide them will grow in size and scope to match true real-world problems.

Mixing-and-matching, combining and culling, is a human activity that requires different ways to present the structure and its schema.

Spreadsheets:  Better Visualization of OWL?

For some time now, BrightPlanet has provided a spreadsheet format as one alternative for managing and organizing large taxonomies or directory structures with its Deep Query ManagerTM Publisher portal product.  This spreadhsheet alternative works great when dealing with large structures, large block moves, major structure re-organizations, structure merges and the like.  In short, a spreadsheet supports the same very manipulations that may be required in combining existing ontologies.

When fine-tuning individual nodes or establishing relationships, a node-by-node approach makes better sense than a spreadsheet.  The point, however, is that the best form for data manipulation and visualization depends on the task at hand.  Bulk actions and gross structure overviews require a different approach than fine-tuning placements and relationships.  This lesson has apparently yet to be learned by ontology editors and OWL editing tools.

This same observation has been made by Richard Searle in his post on OWL in spreadsheet form. According to Searle:

The vast majority of OWL/RDF UIs use a graph/tree representation. That can be useful for understanding the structure but does not scale beyond a dozen or so subjects. Some systems use a HTML browser style (e.g., Sesame and Piggybank) where the links correspond to the properties. Neither corresponds to the de facto representation of business data: the spreadsheet. Leveraging that familiar structure should provide a way to leverage the power of RDF in a form that is more accessible to the end user. . . .

A spreadsheet UI is based on displaying a set of subjects. These could be defined in a number of ways:

  1. SPARQL query
  2. OWL class
  3. Fragments of an RDF document (# style)
  4. Children of an RDF document (/ style)
  5. Properties of a particular Subject (or Object).

These are issues that I am now looking at carefully. Though the spreadsheet may not be the most elegant metaphor, it is also a framework that has gained significant user expertise and familiarity for large-scale structure and data manipulation. There is much to be said for the workable, often at the expense of the elegant.

So far, the standards community has done a good job in satisfying the machines. It is now time to make these structures and standards truly usable for meaningful work — at scale, and by humans.

Schema.org Markup

headline:
OWL Ontologies: When Machine Readable is Not Good Enough

alternativeHeadline:

author:

image:

description:
Machine readable, standard formats are revolutionizing the transfer and interoperability of data within the semantic Web.  This trend is all to the good, the implications of which are only beginning to be seen. I’ve been working with OWL for some time now.  OWL (Web Ontology Language) is the next step up the food chain to […]

articleBody:
see above

datePublished:

2 thoughts on “OWL Ontologies: When Machine Readable is Not Good Enough

  1. One might wonder why Swoogle has found only 10K ontologies yet Google reports ovber 13K files with the file extension .owl.

    Swoogle discovers and indexes RDF documents and, in some cases, HTML and XHTML documents with embedded RDF data. These are found in files with all kinds of names and extensions. (I believe .rdf is the most common).

    Swoogle uses a heuristic to distinguish semantic web ontologies from semantic web data files. The exact nature of the heuristic is not yet fixed, since we’ve been experimenting with different ones. The basic idea, though, is that if a significant percentage of the documents RDF triples are involved in defining terms (e.g., classes and properies) then it is an ontology. If not, it is an RDF data file.

    So, of the ~1.2M web documents with RDF data, Swoogle considers only about 10K of them to be bone fide ontologies.

Leave a Reply

Your email address will not be published. Required fields are marked *