Posted:September 19, 2006

Unused is Useless: Musings on Metadata (and the Semantic Web)

Jo Walsh, a geospatial and semantic software specialist and one of the authors of Mapping Hacks, has written a very sensible and down-to-earth musing on metadata titled, Have a nice metadata. Her overall points are that metadata must be actually used and generated — a process she argues should be both fun and relatively easy — or else no benefits will be realized. Jo’s taking off point is last week’s FOSS4G, the seminar on Free and Open Source Software for Geoinformatics, where she was able to engage many people in the hallways in preparing for her talk toward the end of the meeting.

When one of Jo’s colleagues observed that metadata is useless without a focus on re-use, client-side applications, data visuallization and analytics, Jo kept the “useless” thread running. As she states:

Without good metadata, data is useless. To make data dance for others, I need to clearly know what a web service will offer; to know how I can legally recombine and republish data in different ways. It helps to know how others have classified, verified, contextualised, represented data.

Without reusable, intelligible data, software is useless; machines will fall short of their capacity for making humans' lives easier; semiautomation that allows us to bootstrap communication, to generate seemingly-meaningful new conclusions, faster.

If there is no clear provision of a simple metadata model and exchange mechanism for geographic information, it's tempting to me to look at this space and declare; "standards are useless."

In part the target of these musings is the standards communities themselves, which can often take too much time to be comprehensive and complete rather than to be useful. It is frustrating to endlessly harangue the details when practicality can be so quickly at hand if the guidelines were rather to be useful and keep it simple.

There have been quite a few posts recently hand-wringing that the semantic Web will never get here, it is all an academic pipe dream, and similar observations that (to me) belie a lack of perspective and a lack of history.

Actually, Peter Rip has observed in another context that he wished [he had] said that the semantic Web was actually taking place today, but in a piecemeal manner and often not under the semantic Web banner. Peter went on to note the issues facing the semantic Web were Now versus Never (and not Not), and many semantic Webs versus the semantic Web. As he states, “Meaning is contextual, not universal. There will be many semantic webs, all of which are ways to organize the information available on the world wide web.”

It would be facile to frame some of the current anxieties as simply academics v. practitioners, or global v. incremental advances, or even curmudgeons v. visionaries, though these tensions obviously are in play.

The fact is that useful standards presently exist, many powerful extraction, annotation and management tools exist, great starting ontologies exist, and most of the theory looks pretty complete. What appears to be lacking is the right match applied to the right tinder in this dry forest of ideas and tools. Of course, Jo Walsh is correct: standards bodies will not strike that match. So, who will?

It all kinda reminds one of the World Wide Web and HTML before that innovation of the Mosaic browser, doesn’t it? Contrarians, look past the anxieties: It is time to invest.

Posted:September 14, 2006

A Parable Too Far

Dion Hinchcliffe, who I have lauded before and generally like for his insights on his Enterprise Web 2.0 site for many items, seems to have gone too native with everything being “2.0”! Ok, it is cool to talk about new stuff with enterprise mashups, wikis, and the like (no news to us), but going gonzo on standard web developments seems pretty silly.

So, take Dion’s most recent post on Finding Web 2.0. The interesting thing is that a bit of historical probing would have shown a more compelling view of “2.0”, namely from Hyperland of 16 years ago! As I indicated earlier in my A Hitchhiker's Guide to the Semantic Web, this is Must See TV!

P.S. I ‘trackbacked’ this post rather than commenting directly because it was a pain in the ass to add my comment to the ZDNet site. Corporitis detrititus.

Posted:September 9, 2006

A Hitchhiker’s Guide to the Semantic Web

The late Douglas Adams, of Doctor Who and A Hitchhiker’s Guide to the Galaxy fame, produced an absolutely fascinating, prescient and entertaining TV program 16 years ago for BBC2 presaging the Internet. Called Hyperland (see also the IMDB write up), this self-labelled ‘fantasy documentary’ 50-min video from 1990 can now be seen in its entirety from Google video. Mind you, this was well in advance of the World Wide Web (remember the source for ‘www’?) and the browser, though both that name and hypertext are liberally sprinked thrughout the show.

The presentation, written by and starring Adams as the protoganist having a fantasy dream, features Tom, the semantic simulacrum (actually, Tom Baker from Doctor Who), who is the “obsequious, and fully customizable” personal software agent who introduces, anticipates and guides Adams through what in actuality is a semantic Web of interconnected information. Laptops (actually an early Apple), pointing devices, icons and avatars sprinkle this tour de force in an uncanny glimpse into the (now) future.

Sure, some details are gotten wrong and perhaps there is a bit too much emphasis (given today’s realities) on virtual reality, but the vision presented is exactly that promised by the semantic Web and an interconnected global digital library of information and multimedia. Wow! And entertaining and fun to boot!

This is definitely Must See TV!

I’d like to thank Buzzsort for first writing about the availability of this video. Apparently fans and aficiandos have been clamoring for some time to see this show again, which has only recently been posted. Indeed, the access to an archived video such as this is a great example of Hyperland coming to reality.

An AI3 Jewels & Doubloon Winner

Posted:September 8, 2006

The Commoditization of Content Software

John Newton (co-founder formerly of Documentum, now of Alfresco) puts a telling marker on the table in his recent post on the Commoditization of ECM. Though noting the term "enterprise content management" did not even exist prior to 1998, he goes on to observe that expansion of the definition of what was appropriate in ECM and the consolidation of the leading players occurred rapidly. He concludes that this process has commoditized the market, with competitive differentiation now based on market size rather than functionality. The platforms from the leading IBM, Microsoft and EMC-Documentum vendors all can manage documents, Web content, images, forms and records via basic library services, metadata management, search and retrieval, workflow, portal integration, and development kits.

If such consolidation and standardization of functionality were Newton’s only point one could say, “ho, hum,” such has been true in all major enterprise software markets.

But, in my reading, he goes on to make two more important and fundamental points, both of which existing enterprise software vendors ignore at their peril.

Poor Foundations and Poor Performance

Newton notes that ECM applications are never bought based on the nature of their repositories, but an inefficient repository can result in the rejection of the system. He also acknowledges that ECM installations are costly to set up and maintain, difficult to use, poorly performing and lack essential automation (such as classification). (Kind of sounds like most enterprise software initiatives, doesn’t it?)

Indeed, I have repeatedly documented these gaps for virtually all large-scale document-centric or federated applications. The root cause — besides rampant poor interface designs — has been in my opinion poorly suited data management foundations. Relational or IR-based systems both perform poorly for different reasons in managing semi-structured data. This problem will not be solved by open source per se (see below), though there are some interesting options emerging from open source that may point the way to new alternatives, as well as incipient designs from BrightPlanet and others.

The Proprietary Killers of Open Standards and Open Source

Service-oriented architectures (SOA), the various Web services standards (WS**), the certain JSRs (170 and 283 in documents, but also 168 and others), plus all of the various XML and semantic derivatives are moving rapidly with the very real prospect of “pluggability” and the substitution of various packages, components and applications across the entire enterprise stack.

In quoting Newton’s case at Alfresco, by aggregating these existing open source components they were able to get their ECM product ready in less than one year:

Spring – A framework that provides the wiring of the repository and the tools to extend capabilities without rebuilding the repository (Aspect-Oriented Programming)

Hibernate – An object-relational mapping tool that stores content metadata in database and handles all the idiosyncrasies of each SQL dialect

Lucene – An internet-scale full-text and general purpose information retrieval engine that supports federated search, taxonomic, XML and full-text search

EHCache – Distributed intelligent caching of content and metadata in a loosely coupled environment

jBPM – A full featured enterprise production workflow and business process engine that includes BPEL4WS support

Chiba – A complete Xforms interface that can be used for the configuration and management of the repository

Open Office – Provides a server-based and Linux-compatible transformation of MS Office based content

ImageMagic – Supports transformation and watermarking of images.

Moreover, the combination of these components led to an inherent architecture including pluggable modules, rules and templating engines, workflow and business process management, security, and other enterprise-level capabilities. In prior times, I estimate no proprietary-based vendor could have accomplished this for ten times or more the effort.

Similar Trends and Challenges in the Entire Enterprise Space

Newton is obviously well placed to comment on these trends within ECM. But similar trends can be seen in every major enterprise software space. For virtually every component one can imagine, there is a very capable open source offering. Many of the newer open source ventures are indeed centered around aggregating and integrating various open source components followed by either dual-source licensing or support services as the basis of their business models. At its most extreme, this trend has expanded to the whole process of enterprise application integration (EAI) itself through offerings such as LogicBlaze FUSE with its SOA-oriented standards and open source components. Initiatives such as SCA (service component architecture) will continue to fuel this trend.

So, enterprise software vendors, listen to your wake up call. It is as if gold dubloons, pearls and jewels are laying all of the floor. If you and your developers don’t take the time to bend over and pick them up, someone else will. As Joel Mokyr has compellingly researched, the innovation of systems or how to integrate pieces can be every bit as important as the ‘Aha!’ discovery. Open source is now giving a whole new breed of bakers new ingredients for baking the cake.

Main Links

Search

Author: Mike Bergman