Posted:August 20, 2008

In a recent posting on the Ontolog forum, Toby Considine discussed the difficulty of describing to several business CEOs the concept of an ontology. He noted that when one of the CEOs finally got it, he explained it thus to the others:

“Ontology is always a value proposition, how a company makes money. Each company, and perhaps each sales professional must be able to define his own ontology and explain it to his customers. We need semantic alignment to create a common basis for discussing value. If it is a good semantic set, then the ontologies that each sales director creates will be better; better to produce sales differentiation, and better to produce true long-term value for the company.
“A general purpose ontology gives us a framework to develop and discuss our own value propositions. But those value propositions, and their underlying ontologies must remain proprietary, or else every company is just building to the lowest common denominator, and innovation and value creation end.”

BTW, Toby is chair of the OASIS Open Building Information Exchange (oBIX) Technical Committee (see http://www.oasis-open.org), and is used to conversing about standards and technical matters to business audiences.

This discussion came up in relation to the use of the Cyc knowledge base and the possible role of “lightweight” or “foundational” reference ontologies.

There are a number of interesting points embedded and implied in this discussion, and at the risk of reading too much into them, include:

  • Foundational, reference ontologies have an important role, but as frameworks and for external interoperability
  • Each enterprise has its own world view, which can be expressed as an ontology and represents its “value proposition”; in this regard, internal ontologies work similarly to current legacy schema
  • Semantic “alignment” (and therefore interoperabililty) is important to discuss value
  • For a business enterprise, the real focus of its ontologies is to express its value proposition, how it makes money.

I think these sentiments are just about right, with the last point especially profound.

We have supported UMBEL as an important reference structure, and see the role for ever more specific ones. But, at the other end of the spectrum, ontologies are also specific world views, and can and should be private for proprietary enterprises. Yet this is not in any way in conflict with the interoperation — with increasingly widening circles — using shared structure (ontologies).

The balance and integration of the private and public in semantic Web ontologies is still being worked out. But, I truly believe it is appropriate and necessary that both the public and the private be embraced.

Toby’s CEO got it almost right: innovation depends on reserving some proprietary aspects. But the complete story, I also think, is that embracing ontologies themselves and interoperable linked data frameworks in that context is also a key source of innovation and added value.

Posted by AI3's author, Mike Bergman Posted on August 20, 2008 at 9:31 pm in Adaptive Information, Ontologies, Semantic Web, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/452/a-business-perspective-on-ontologies/
The URI to trackback this post is: http://www.mkbergman.com/452/a-business-perspective-on-ontologies/trackback/
Posted:July 25, 2008

'Dust Motes Dancing in Sunlight, Interior from the Artist's Home, Strandgade 30,' Vilhelm Hammerchi 1900; courtesy of http://www.thecityreview.com/copen.htmlStructure Demands Context; But, is that Enough?

Last week marked a red-letter day in my professional life with release of the UMBEL subject concept structure. UMBEL began as a gleam in the eye more than a year ago when I observed that semantic Web techniques, while powerful — especially with regard to the RDF data model as a universal and simple (at its basics) means for representing any information and its structure — still lacked something. It took me a while to recognize that the first something was context.

Now, I have written and talked much about context before on this blog, with The Semantics of Context being the most salient article for the present discussion.

This is my mental image of Web content without context: Unconnected dust motes floating through a sun-lite space, moving slowly, randomly, and without connections, sort of like Brownian motion. Think of the sunlight on dust shown by the picture to the left.

By providing context, my vision saw we could freeze these moving dust motes and place them into a fixed structure, perhaps something like constellations in the summer sky. Or, at least, more stable, and floating less aimlessly and unconnected.

So, my natural response was to look for structural frameworks to provide that context. And that was the quest I set forward at UMBEL’s initiation.

At the time of UMBEL’s genesis, the impact of Wikipedia and other sources of user-generated content (UGC) such as del.icio.us or Flickr or many, many others was becoming clear. The usefulness of tags, folksonomies, microformats and other forms of “bottom-up” structure was proven.

The evident — and to me, exciting — aspect of globally-provided UGC was that this was the ultimate democratic voice: the World has spoken, and the article about this or the tag about that had been vetted in the most interactive, exposed, participatory and open framework possible. Moreover, as the World changed and grew, these new realizations would also be fed back into the system in a self-correcting goodness. Final dot.

Through participation and collective wisdom, therefore, we could gain consensus and acceptance and avoid the fragility and arbitrariness of “wise man” or imposed from the “top-down” answers. The people have spoken. All voices have been heard. The give and take of competing views have found their natural resting point. Again, I thought, final dot.

Thus, when I first announced UMBEL, my stated desire (and hope) was that something like Wikipedia could or would provide that structural context. Here is a quote from the announcement of UMBEL, nearly one year ago to this day:

The selection of the actual subject proxies within the UMBEL core are to be based on consensus use. The subjects of existing and popular Web subject portals such as Wikipedia and the Open Directory Project (among others) will be intersected with other widely accepted subject reference systems such as WordNet and library classification systems (among others) in order to derive the candidate pool of UMBEL subject proxies.

Yet, that is not the basis of the structure announced last week for UMBEL. Why?

The Strengths of User-Generated Content

Before we probe the negative, let’s rejoice the positive.

User-generated content (UGC) works, has rapidly proven itself in venues from authoritative subjects (Wikipedia), photos (Flickr), bookmarking and tagging (del.icio.us), blogs, video (YouTube) and every Web space imaginable. This is new, was not foreseen by most a few years ago, and has totally remade our perception of content and how it can be generated. Wow!

The nature of this user-generated content, of course, as is true for the Web itself, is that it has arisen from a million voices without coercion, coordination or a plan, spontaneously in relation to chosen platforms and portals. Yet, still, today, as to what makes one venue more successful than others, we are mostly clueless. My suspicion is that — akin to financial markets — when Web portals or properties are successful, they readily lend themselves to retrospective books and learned analysis explaining that success. But, just try to put down that “recipe” in advance, and you will most likely fail.

So, prognostication is risky business around these parts.

There is a reason why both the head and sub-head of this article are stated as questions: I don’t know. For the reasons stated above, I would still prefer to see user-generated structure (UGS) emerge in the same way that topic- and entity-specific content has on Wikipedia. However, what I can say is this: for the present, this structure has not yet emerged in a coherent way.

Might it? Actually, I hope so. But, I also think it will not arise from systems or environments exactly like Wikipedia and, if it does arise, it will take considerable time. I truly hope such new environments emerge, because user-mediated structure will also have legitimacy and wisdom that no “expert” approach may ever achieve.

But these are what if‘s, and nice to have‘s and wouldn’t it be nice‘s. For my purposes, and the clients my company serves, what is needed must be pragmatic and doable today — all with acceptable risk, time to delivery and cost.

So, I think it safe to say that UGC works well today at the atomic level of the individual topic or data object, what might be called the nodes in global content, but not in the connections between those nodes, its structure. And, the key to the answer of why user-generated structure (UGS) has not emerged in a bottom-up way resides in that pivotal word above: coherence.

Coherence was the second something to accompany context as lacking missing pieces for the semantic Web.

Coherence in Context

What is it to be coherent? The tenth edition of Merriam-Websters Collegiate Dictionary (and the online version) defines it as:

coherent \kō-ˈhir-ənt\ adj.; Middle French or Latin; Middle French cohérent, from Latin cohaerent-, cohaerens, present participle of cohaerēre Date: (ca. 1555)1: a: logically or aesthetically ordered or integrated : consistent <coherent style> <a coherent argument> b: having clarity or intelligibility : understandable <a coherent person> <a coherent passage>
2: having the quality of cohering; especially : cohesive, coordinated <a coherent plan for action>
3: a: relating to or composed of waves having a constant difference in phase <coherent light> b: producing coherent light <a coherent source>.

Another online source I like for visualization purposes is Visuwords, which displays the accompanying graph relationships view based on WordNet.

Of course, coherent is just the adjectival property of having coherence. Again, the Merriam Webster dictionary defines coherence as 1: the quality or state of cohering: as a: systematic or logical connection or consistency b: integration of diverse elements, relationships, or values.

Decomposing even further, we can see that coherence is itself the state of the verb, cohere. Cohere, as in its variants above, has as its etymology a derivation from the Latin cohaerēre, from co- + haerēre to stick, namely “to stick with”. Again, the Merriam Webster dictionary defines cohere as 1: a: to hold together firmly as parts of the same mass; broadly: stick, adhere b: to display cohesion of plant parts 2: to hold together as a mass of parts that cohere 3: a: to become united in principles, relationships, or interests b: to be logically or aesthetically consistent.

These definitions capture the essence of coherence in that it is a state of logical, consistent connections, a logical framework for integrating diverse elements in an intelligent way. In the sense of a content graph, this means that the right connections (edges or predicates) have been drawn between the object nodes (or content) in the graph.

Bottom-up UGC: The Hip Bone is Connected to the Arm Bone

Structure without coherence is where connections are being drawn between object nodes, but those connections are incomplete or wrong (or, at least, inconsistent or unintelligible). The nature of the content graph lacks logic. The hip bone is not connected to the thigh bone, but perhaps to something wrong or silly, like the arm or cheek bone.

Ambiguity is one source for such error, as when, for example, the object “bank” is unclear as to whether it is a financial institution, billiard shot, or edge of a river. If we understand the object to be the wrong thing, then connections can get drawn that are in obvious error. This is why disambiguation is such a big deal in semantic systems.

However, ambiguity tends not to be a major source of error in user-generated content (UGC) systems because the humans making the connections can see the context and resolve the meanings. Context is thus a very important basis for resolving disambiguities.

A second source of possible incoherence is the organizational structure or schema of the actual concept relationships. This is the source that poses the most difficulty to UGC systems such as folksonomies or Wikipedia.

Remember in the definitions above that logic, consistency and intelligibility were some of the key criteria for a coherent system. Bottom-up UGS (user-generated structure) is prone to not meet the test in all three areas.

“In the context of an information organization framework, a structure is a cohesive whole or ‘container’ that establishes qualified, meaningful relationships among those activities, events, objects, concepts which, taken together, comprise the ‘bounded space’ of the universe of interest.” – J.T. Tennis and E.K. Jacob [1]

Logic and consistency almost by definition imply the application of a uniform perspective, a single world view. Multiple authors and contributors doing so without a common frame of reference or viewpoint are unable to bring this consistency of perspective. For example, how time might be treated with regard to famous people’s birth dates in Wikipedia is very different than its discussion of time with respect to topics on geological eras, and Wikipedia contains no mechanisms for relating those time dimensions or making them consistent.

Logic and intelligibility suggest that the structure should be testable and internally consistent. Is the hip bone connected with the arm bone? No? and why not? In UGC systems, individual connections are made by consensus and at the object-to-object level. There are no mechanisms, at least in present systems, for resolving inconsistencies as these individual connections get aggregated. We can assign dogs as mammals and dogs as pets, but does that mean that all pets are mammals? The connections can get complicated fast and such higher-order relationships remain unstated or more often than not wrong.

Note as well that in UGC systems items may be connected (“assigned”) to categories, but their “factual” relation is not being asserted. Again, without a consistency of how relations are treated and the ability to test assertions, the structures may not only be wrong in their topology, but totally lack any inference power. Is the hip bone connected with the cheek bone? UGC structures lack such fundamental logic underpinnings to test that, or any other, assertion.

From the first days of the Web, notably Yahoo! in its beginnings but many other portals as well, we have seen many taxonomies and organizational structures emerge. As simple heuristic devices for clustering large amounts of content, this is fine (though certainly there, too, there are some structures that are better at organizing along “natural” lines than others). Wikipedia itself, in its own structure, has useful organizational clustering.

But once a system is proposed, such as UMBEL, with the purpose of providing broad referenceability to virtually any Web content, the threshold condition changes. It is no longer sufficient to merely organize. The structure must now be more fully graphed, with intelligent, testable, consistent and defensible relations.

Full Circle to Cyc and UGC

Once the seemingly innocent objective of being a lightweight subject reference structure was established for UMBEL, the die was cast. Only a coherent structure would work, since anything else would be fragile and rapidly break in the attempt to connect disparate content. Relating content coherently itself demands a coherent framework.

As noted in the lead-in, this was not a starting premise. But, it became an unavoidable requirement once the UMBEL effort began in earnest.

I have spoken elsewhere about other potential candidates as possibly providing the coherent underlying structure demanded by UMBEL. We have also discussed why Cyc, while by no means perfect, was chosen as the best starting framework for contributing this coherent structure.

I anticipate we will see many alternative structures proposed to UMBEL based on other frameworks and premises. This is, of course, natural and the nature of competition and different needs and world views.

However, it will be most interesting to see if either ad hoc structures or those derived from bottom-up UGC systems like Wikipedia can be robust and coherent enough to support data interoperability at Web scale.

I strongly suspect not.


[1] Joseph T. Tennis and Elin K. Jacob, 2008. “Toward a Theory of Structure in Information Organization Frameworks,” upcoming presentation at the 10th International Conference of the International Society for Knowledge Organization (ISKO 10), in Montréal, Canada, August 5th-8th, 2008. See http://www.ebsi.umontreal.ca/isko2008/documents/abstracts/tennis.pdf.
Posted:July 6, 2008

Breakthroughs in the Basis, Nature and Organization of Information Across Human History

I’m pleased to present a timeline of 100 or so of the most significant events and developments in the innovation and management of information and documents from cave paintings ( ca 30,000 BC) to the present. Click on the link to the left or on the screen capture below to go to the actual interactive timeline.

This timeline has fast and slow scroll bands — including bubble popups with more information and pictures for each of the entries offered. (See the bottom of this posting for other usage tips.)

Note the timeline only presents non-electronic innovations and developments from alphabets to writing to printing and information organization and conventions. Because there are so many innovations and they are concentrated in the last 100 years or fewer, digital and electronic communications are somewhat arbitrarily excluded from the listing.

I present below some brief comments on why I created this timeline, some caveats about its contents, and some basic use tips. I conclude with thanks to the kind contributors.

Why This Timeline?

Readers of this AI3 blog or my detailed bio know that information — biological embodied in genes, or cultural embodied in human artefacts — has been my lifelong passion. I enjoy making connections between the biological and cultural with respect to human adaptivity and future prospects and I like to dabble on occasion as an amateur economic or information science historian. SIMILE Timeline

About 18 months ago I came across David Huynh‘s nifty Exhibit lightweight data display widget, gave it a glowing review, and then proceeded to convert my growing Sweet Tools listing of semantic Web and related tools to that format. Exhibit still powers the listing (which I just updated yesterday for the twelfth time or so).

At the time of first rolling out Exhibit I also noted that David had earlier created another lightweight timeline display widget that looked similarly cool (and which was also the first API for rendering interactive timelines in Web pages). (In fact, Exhibit and Timeline are but two of the growing roster of excellent lightweight tools from David.) Once I completed adopting Exhibit, I decided to find an appropriate set of chronological or time-series data to play next with Timeline.

I had earlier been ruminating on one of the great intellectual mysteries of human development: Why, roughly beginning in 1820 to 1850 or so, did the historical economic growth patterns of all prior history suddenly take off? I first wrote on this about two years ago in The Biggest Disruption in History: Massively Accelerated Growth Since the Industrial Revolution, with a couple of follow-ups and expansions since then.

I realized that in developing my thesis that wood pulp paper and mechanized printing were the key drivers for this major inflection change in growth (as they effected literacy and the broadscale access to written information) I already had the beginnings of a listing of various information innovations throughout history. So, a bit more than a year ago, I began adding to that list in terms of how humans learned to write, print, share, organize, collate, reproduce and distribute information and when those innovations occurred.

There are now about 100 items in this listing (I’m still looking for and researching others; please send suggestions at any time. ;) ). Here are some of the current items in chronological order from upper left to lower right:

cave paintings codex footnotes microforms
ideographs woodblock printing copyrights thesaurus
calendars tree diagram encyclopedia pencil (mass produced)
cuneiform quill pen capitalization rotary perfection press
papyrus (paper) library catalog magazines catalogues
hieroglyphs movable type taxonomy (binomial classification) typewriter
ink almanacs statistics periodic table
alphabet paper (rag) timeline chemical pulp (sulfite)
Phaistos Disc word spaces data graphs classification (Dewey)
logographs registers card catalogs linotype
maps intaglio lithography mimeograph machine
scrolls printing press punch cards kraft process (pulp)
manuscripts advertising (poster) steam-powered (mechanized) papermaking flexography
glossaries bookbinding book (machine-paper) classification (LoC)
dictionaries pagination chemcial symbols classification (UDC)
parchment (paper) punctuation mechanical pencil offset press
bibliographies library catalog (printed) chromolithography screenprinting
concept of categories public lending library paper (wood pulp) ballpoint pen
library dictionaries (alphabetic) rotary press xerographic copier
classification system (library) newspapers mail-order catalog hyperlink
zero Information graphics fountain pen metadata (MARC)
paper scientific journal

So, off and on, I have been working with and updating the data and display of this timeline in draft. (I may someday also post my notes about how to effectively work with the Timeline widget.)

With the listing above, completion was sufficient to finally post this version. One of the neat things with Timeline is the ability to drive the display from a simple XML listing. I will update the timeline when I next have an opportunity to fill in some of the missing items still remaining on my innovations list such as alphabeticization, citations, and table of contents, among many others.

Some Interpretation Caveats

Of course, rarely can an innovation be traced to a single individual or a single moment in time. Historians are increasingly documenting the cultural milieu and multiple individuals that affect innovation.

In these regards, then, a timeline such as this one is simplistic and prone to much error and uncertainty. We have no real knowledge, for examples, for the precise time certain historical innovations occurred, and others (the ballpoint pen being one case in point) are a matter of interpretation as to what and when constituted the first expression. For instances where the record indicated multiple dates, I chose to use the date when released to the publlic.

Nonetheless, given the time scales here of more than 30,000 years, I do think broad trends and rough time frames can be discerned. As long as one interprets this timeline as indicative and not meant as definitive in any scholary sense, I believe this timeline can inform and provide some insight and guidance for how information has evolved over human history.

Some Use Tips

The operation of Timeline is pretty straightforward and intuitive. Here are a couple of tips to get a bit more out of playing with it:

  • The timeline has two scrolling panels, fast and slow. For rapid scolling, use mouse down and left or right movement on the lower panel
  • The lower panel also shows small ticks for each innovation in the upper panel
  • Clicking any icon or label in the upper panel will cause a bubble popup to appear with a bit more detail and a picture for the item; click the ‘X’ to close the bubble
  • Each entry is placed in one or more categories keyed by icon. You may “filter” results by using keywords such as: alphabets, book, calendars, libraries, maps, mechanization, paper, papermaking, printing, organizing, scripts, standardization, statistics, timelines, or typography. Partial strings also match
  • Similarly, you may enter one of those same terms into one of the four color highlight boxes. Partial strings also match.

Sources, Contributions and Thanks

For the sake of consistency, nearly all entries and pictures on the timeline are drawn from the respective entries within Wikipedia. Subsequent updates may add to this listing by reference to original sources, at which time all sources will be documented.

The timeline icons are from David Vignoni’s Nuvola set, available under the LGPL license. Thanks David!

The fantastic Timeline was developed by David Huynh while he was a graduate student at MIT. Timeline and its sibling widgets were developed under funding from MIT’s Simile program. Thanks to all in the program and best wishes for continued funding and innovation.

Finally, my sincere thanks go to Professor Michael Buckland of the School of Information at the University of California, Berkeley, for his kind suggestions, input and provision of additonal references and sources. Of course, any errors or omissions are mine alone. I also thank Professor Buckland for his admonitions about use and interpretation of the timeline dates.

Posted:June 23, 2008

We Offer a Definition and Some Answers to Enterprise Questions

The recent LinkedData Planet conference in NYC marked, I think, a real transition point. The conference signaled the beginning movement of the Linked Data approach from the research lab to the enterprise. As a result, there was something of a schizophrenic aspect at many different levels to the conference: business and research perspectives; realists and idealists; straight RDF and linked data RDF; even the discussions in the exhibit area versus some of the talks presented from the podium.

Like any new concept, my sense was a struggle around terminology and common language and the need to bridge different perspectives and world views. Like all human matters, communication and dialog were at the core of the attendees’ attempts to bridge gaps and find common ground. Based on what I saw, much great progress occurred.

The reality, of course, is that Linked Data is still very much in its infancy, and its practice within the enterprise is just beginning. Much of what was heard at the conference was theory versus practice and use cases. That should and will change rapidly.

In an attempt to help move the dialog further, I offer a definition and Structured Dynamics’ perspective to some of the questions posed in one way or another during the conference.

Linked Data Defined

Sources such as the four principles of Linked Data in Tim Berners-Lee’s Design Issues: Linked Data and the introductory statements on the Linked Data Wikipedia entry approximate — but do not completely express — an accepted or formal or “official” definition of Linked Data per se. Building from these sources and attempting to be more precise, here is the definition of Linked Data we used internally:

Linked Data is a set of best practices for publishing and deploying instance and class data using the RDF data model, naming the data objects using uniform resource identifiers (URIs), and exposing the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.

All references to Linked Data below embrace this definition.

Some Clarifying Questions

I’m sure many other questions were raised, but listed below are some of the more prominent ones I heard in the various conference Q&A sessions and hallway discussions.

1. Does Linked Data require RDF?

Yes. Though other approaches can also model the first order predicate logic of subject-predicate-object at the core of the Resource Description Framework data model, RDF is the one based on the open standards of the W3C. RDF and FOL are powerful because of simplicity, ability to express complex schema and relationships, and suitability for modeling all extant data frameworks for unstructured, semi-structured and structured data.

2. Is publishing RDF sufficient to create Linked Data?

No. Linked Data represents a set of techniques applied to the RDF data model that names all objects as URIs and makes them accessible via the HTTP protocol (as well as other considerations; see the definition above and further discussion below).

Some vendors and data providers claim Linked Data support, but if their data is not accessible via HTTP using URIs for data object identification, it is not Linked Data. Fortunately, it is relatively straightforward to convert non-compliant RDF to Linked Data.

3. How does one publish or deploy Linked Data?

There are some excellent references for how to publish Linked Data. Examples include a tutorial, How to Publish Linked Data on the Web, and a white paper, Deploying Linked Data, using the example of OpenLink’s Virtuoso software. There are also recommended approaches and ways to use URI identifiers, such as the W3C’s working draft, Cool URIs for the Semantic Web.

However, there are not yet published guidelines for also how to meet the Zitgist definition above where there is also an emphasis on class and context matching. A number of companies and consultants, including Zitgist, presently provide such assistance.

The key principles, however, are to make links aggressively between data items with appropriate semantics (properties or relations; that is, the predicate edges between the subject and object nodes of the triple) using URIs for the object identifiers, all being exposed and accessible via the HTTP Web protocol.

4. Is Linked Data just another term or branding for the Semantic Web?

Absolutely not, though this is a source of some confusion at present.

The Semantic Web is probably best understood as a vision or goal where semantically rich annotation of data is used by machine agents to make connections, find information or do things automatically in the background on behalf of humans. We are on a path toward this vision or goal, but under this interpretation the Semantic Web is more of a process than a state. By understanding that the Semantic Web is a vision or goal we can see why a label such as ‘Web 3.0′ is perhaps simplistic and incomplete.

Linked Data is a set of practices somewhere in the early middle of the spectrum from the initial Web of documents to this vision of the Semantic Web. (See my earlier post at bottom for a diagram of this spectrum.)

Linked Data is here today, doable today, and pragmatic today. Meaningful semantic connections can be made and there are many other manifest benefits (see below) with Linked Data, but automatic reasoning in the background or autonomic behavior is not yet one of them.

Strictly speaking, then, Linked Data represents doable best practices today within the context both of Web access and of this yet unrealized longer-term vision of the Semantic Web.

5. Does Linked Data only apply to instance data?

Definitely not, though early practice has been interpreted by some as such.

One of the stimulating, but controversial, keynotes of the conference was from Dr. Anant Jhingran of IBM, who made the strong and absolutely correct observation that Linked Data requires the interplay and intersection of people, instances and schema. From his vantage, early exposed Linked Data has been dominated by instance data from sources such as Wikipedia and have lacked the schema (class) relationships that enterprises are based upon. The people aspect in terms of connections, collaboration and joint buy-in is also the means for establishing trust and authority to the data.

In Zitgist’s terminology, class-level mappings ‘explode the domain’ and produce information benefits similar to Metcalfe’s Law as a function of the degree of class linkages [1]. While this network effect is well known to the community, it has not yet been shown much in current Linked Data sets. As Anant pointed out, schemas define enterprise processes and knowledge structures. Demonstrating schema (class) relationships is the next appropriate task for the Linked Data community.

6. What role do “ontologies” play with Linked Data?

In an RDF context, “ontologies” are the vocabularies and structures that capture the schema structures noted above. Ontologies embody the class and instance definitions and the predicate (property) relations that enable legacy schemas and data to be transformed into Linked Data graphs.

Though many public RDF vocabularies and ontologies presently exist, and should be re-used where possible and where the semantics match the existing legacy information, enterprises will require specific ontologies reflective of their own data and information relationships.

Despite the newness or intimidation perhaps associated with the “ontology” term, ontologies are no more complex — indeed, are simpler and more powerful — than the standard relational schema familiar to enterprises. If you’d like, simply substitute schema for ontology and you will be saying the same thing in an RDF context.

7. Is Linked Data a centralized or federated approach?

Neither, really, though the rationale and justification for Linked Data is grounded in federating widely disparate sources of data that can also vary widely in existing formalism and structure.

Because Linked Data is a set of techniques and best practices for expressing, exposing and publishing data, it can easily be applied to either centralized or federated circumstances.

However, the real world where any and all potentially relevant data can be interconnected is by definition a varied, distributed, and therefore federated world. Because of its universal RDF data model and Web-based techniques for data expression and access, Linked Data is the perfect vehicle, finally, for data integration and interoperability without boundaries.

8. How does one maintain context when federating Linked Data?

The simple case is where two data sources refer to the exact same entity or instance (individual) with the same identity. The standard sameAs predicate is used to assert the equivalence in such cases.

The more important case is where the data sources are about similar subjects or concepts, in which case a structure of well-defined reference classes is employed. Furthermore, if these classes can themselves be expressed in a graph structure capturing the relationships amongst the concepts, we now have some fixed points in the conceptual information space for relating and tieing together disparate data. Still further, such a conceptual structure also provides the means to relate the people, places, things, organizations, events, etc., of the individual instances of the world to one another as well.

Any reference structure that is composed of concept classes that are properly related to each other may provide this referential “glue” or “backbone”.

One such structure provided in open source by Zitgist is the 21,000 subject concept node structure of UMBEL, itself derived from the Cyc knowledge base. In any event, such broad reference structures may often be accompanied by more specific domain conceptual ontologies to provide focused domain-specific context.

9. Does data need to be “open” to qualify as Linked Data?

No, absolutely not.

While, to date, it is the case that Linked Data has been demonstrated using public Web data and many desire to expose more through the open data movement, there is nothing preventing private, proprietary or subscription data from being Linked Data.

The Linking Open Data (LOD) group formed about 18 months ago to showcase Linked Data techniques began with open data. As a parallel concept to sever the idea that it only applies to open data, François-Paul Servant has specifically identified Linking Enterprise Data (and see also the accompanying slides).

For example, with Linked Data (and not the more restrictive LOD sense), two or more enterprises or private parties can legitimately exchange private Linked Data over a private network using HTTP. As another example, Linked Data may be exchanged on an intranet between different departments, etc.

So long as the principles of URI naming, HTTP access, and linking predicates where possible are maintained, the approach qualifies as Linked Data.

10. Can legacy data be expressed as Linked Data?

Absolutely yes, without reservation. Indeed, non-transactional legacy data perhaps should be expressed as Linked Data in order to gain its manifest benefits. See #14 below.

11. Can enterprise and open or public data be intermixed as Linked Data?

Of course. Since Linked Data can be applied to any data formalism, source or schema, it is perfectly suited to integrating data from inside and outside the firewall, open or private.

12. How does one query or access Linked Data?

The basic query language for Linked Data is SPARQL (pronounced “sparkle”), which bears close resemblance to SQL only applicable to an RDF data graph. The actual datastores applied to RDF may also add a fourth aspect to the tuple for graph namespaces, which can bring access and scale efficiencies. In these cases, the system is known as a “quad store”. Additional techniques may be added to data filtering prior to the SPARQL query for further efficiencies.

Templated SPARQL queries and other techniques can lead to very efficient and rapid deployment of various Web services and reports, two techniques often applied by Zitgist and other vendors. For example, all Zitgist DataViewer views and UMBEL Web services are expressed using such SPARQL templates.

This SPARQL templating approach may also be combined with the use of templating standards such as Fresnel to bind instance data to display templates.

13. How is access control or security maintained around Linked Data?

In Zitgist’s view, access control or security occurs at the layer of the HTTP access and protocols, and not at the Linked Data layer. Thus, the same policies and procedures that have been developed for general Web access and security are applicable to Linked Data.

However, standard data level or Web server access and security can be enhanced by the choice of the system hosting the data. Zitgist, for example, uses OpenLink’s Virtuoso universal server that has proven and robust security mechanisms. Additionally, it is possible to express security and access policies using RDF ontologies as well. These potentials are largely independent of Linked Data techniques.

The key point is that there is nothing unique or inherent to Linked Data with respect to access or control or security that is not inherent with standard Web access. If a given link points to a data object from a source that has limited or controlled access, its results will not appear in the final results graph for those users subject to access restrictions.

14. What are the enterprise benefits of Linked Data? (Why adopt it?)

For more than 30 years — since the widespread adoption of electronic information systems by enterprises — the Holy Grail has been complete, integrated access to all data. With Linked Data, that promise is now at hand. Here are some of the key enterprise benefits to Linked Data, which provide the rationales for adoption:

  • Via the RDF model, equal applicability to unstructured, semi-structured, and structured data and content
  • Elimination of internal data “silos”
  • Integration of internal and external data
  • Easy interlinkage of enterprise, industry-standard, open public and public subscription data
  • Complete data modeling of any legacy schema
  • Flexible and easy updates and changes to existing schema
  • An end to the need to re-architect legacy schema resulting from changes to the business or M & A
  • Report creation and data display based on templates and queries, not IT departments
  • Data access, analysis and manipulation pushed out to the user level, and, generally
  • The ability of internal Linked Data stores to be maintained by existing DBA procedures and assets.

15. What are early applications or uses of Linked Data?

Linked Data is well suited to traditional knowledge base or knowledge management applications. Its near-term application to transactional or material process applications is less apparent.

Of special use is the value-added from connecting existing internal and external content via the network effect from the linkages [1].

A Hearty Thanks

Johnnie Linked Data is starting to grow up. Our little semantic Web toddler is moving beyond ga-ga-goo-goo to saying his first real sentences. Language acquisition will come rapidly, and, like what all of us have seen with our own children, they will grow up faster than we can imagine.

There were so many at this meeting that had impact and meaning to this exciting transition point that I won’t list specific names at risk of leaving other names off. Those of you who made so many great observations or stayed up late interacting with passion know who you are. Let me simply say: Thanks!

The LinkedData Planet conference has shown, to me, that enterprises are extremely interested in what our community has developed and now proven. They are asking hard questions and will be difficult task masters, but we need to listen and respond. The attendees were a selective and high-quality group, understanding of their own needs and looking for answers. We did an OK job of providing those answers, but we can do much, much better.

I reflect on these few days now knowing something I did not truly know before: the market is here and it is real. The researchers who have brought us to this point will continue to have much to research. But, those of us desirous of providing real pragmatic value and getting paid for it, can confidently move forward knowing both the markets and the value are real. Linked Data is not magic, but when done with quality and in context, it delivers value worth paying for.

To all of the fellow speakers and exhibitors, to all of the engaged attendees, and to the Juperitermedia organizers and Bob DuCharme and Ken North as conference chairs, let me add my heartfelt thanks for a job well done.

Next Steps and Next Conference

The next LinkedData Planet conference and expo will be October 16-17, 2008, at the Santa Clara Hyatt in Santa Clara, California. The agenda has not been announced, but hopefully we will see a continuing enterprise perspective and some emerging use cases.

Zitgist as a company will continue to release and describe its enterprise products and services, and I will continue to blog on Linked Data matters of specific interest to the enterprise. Pending topics include converting legacy data to Linked Data, converting relational data and schema to Linked Data, placing context to Linked Data, and many others. We think you will like the various announcements as they arise. ;)

Zitgist is also toying with the use of a distinctive icon A Linked Data iconto indicate the availability of Linked Data conforming to the principles embodied in the questions above. (The color choice is an adoption of the semantic Web logo from the W3C.) The use of a distinctive icon is similar to what RSS feeds A Linked Data iconor microformats A Linked Data iconhave done to alert users to their specific formats. Drop me a line and let us know what you think of this idea.


[1] Metcalfe’s law states that the value of a telecommunications network is proportional to the square of the number of users of the system(n²), where the linkages between users (nodes) exist by definition. For information bases, the data objects are the nodes. Linked Data works to add the connections between the nodes. We can thus modify the original sense to become Zitgist’s Law: the value of a Linked Data network is proportional to the square of the number of links between the data objects.
Posted:June 14, 2008

The Flood of 2008 Brings New Perspectives

I live in Coralville, Iowa, a sister community to Iowa City, home of the University of Iowa and the Hawkeyes. Iowa City is in eastern Iowa on I-80 about one hour west of the state’s eastern border at the Mississippi River, and about 35 miles south of Cedar Rapids, home of one of the major mills for Quaker Oats. Iowa City is a very together and vibrant community of bucolic rolling hills and pretty vistas.

In between Iowa City and Cedar Rapids is my local commercial airport, the Eastern Iowa Airport, and Coralville Lake, which is an Army Corps of Engineers flood-control reservoir controlling the Iowa River that flows through Iowa City from the northwest on its way to the Mississippi. Just north of the Iowa River is the Cedar River, a major tributary to the Iowa River that flows through Cedar Rapids before joining the Iowa River southeast of our location near the Mississippi.

From the conclusion of winter we have been getting a lot of rain around here. I mean, a lot.

I am just one of the 450,000 or so residents in the broader area encompassing both of these Iowa hubs, but my experience may have some interest as I prepared for the upcoming LinkedData Planet conference in New York City. The kind of information needs posed by a natural disaster such as is now occurring with our ‘Flood of 2008‘ point to, I think, a compelling use case for Linked Data.

Cedar Rapids First to be Hit

As the rains continued and the rivers rose, local residents discussed the previous large flood that occurred in 1993. That one caused much disruption and devastation. However, the general consensus was that we were unlikely to see a repeat of that 100-yr event. But unfortunately, the rains continued, and the last week saw multiple occasions of inches of rain within 24-hr periods.

Photo Courtesy of The Register, Harry BaumertThe first issue became apparent for the Cedar River.

By Wednesday, alarm began to set in, and the floodwaters were rising at unforecasted rates. The city worked furiously, especially right around Mercy Hospital downtown, but by Thursday the cause was lost and the hospital was completely abandoned.

Thursday also saw a major railway bridge, laden down with rock-filled rail cars to keep it from floating away, collapse, which then also became a dam catching floating trailers and other portable buildings.Photo Courtesy of The Register, Harry Baumert

Meanwhile, a major employer in that city, Quaker Oats, was flooding up to the middle of its second story. It is shown at lower right, underneath the aerial Cedar Rapids city view. You can see another major thoroughfare, the north-south transiting I-380, snaking around town and next to the Quaker production complex. All grain and important machinery has been lost in that complex.

The water peaked on Friday, and the Cedar River is now very slowly receding. However, as of today, Saturday, about 25,000 city residents have been evacuated and some 450 city blocks are still underwater. Early estimates for damage run to about $1 billion.

Photo Courtesy of The Register, Harry BaumertI-380 remains the only way to cross town and all other city bridges are flooded and closed.

Getting Ready for the Conference

Meanwhile at mid-week, concerned that the footprint of flooding was spreading, I finished up my materials for duplication early for the LinkedData Planet conference and headed to downtown Coralville to get my order in.

Locally, the Iowa River flow is controlled by the local Coralville Dam and was not on the same cycle as the Cedar River.

Volunteers were busy sandbagging behind the printing business and things were holding steady on Coralville’s commercial strip. I delivered my materials and had some interactions with the business as the job was run, but basically things were calm though steady precautions were underway. The rising river was forecasted to peak in about 5-7 days as rains from the 3,000 square mile drainage continued to flow into the reservoir.

However, two events occurred late Thursday that greatly complicated matters and shifted the eye of concern to Iowa City.

Iowa City and Coralville are Next

Photo Courtesy of The Register, Harry Baumert The first event was the overflow of the Coralville Dam spillway. That had been forecasted, but its degree was not. The spillway had only breached one time before, and that was during the 1993 flood. (That earlier event also exposed a rich bed of Devonian fossils).

When it occurred in 1993, the water inflow into the river peaked at about 28,000 cfm. (The standard dam bypass had been running about half that prior to the spillway being breached, shown after breach on the left hand side of the image to the left.) As of today, the estimate is that the rate will rise to 44,000 cfm, and perhaps will actually have a 2-3 ft crest overflowing the spillway.

The second event that occurred was that a railroad embankment on the spur Clear Creek on the Coralville strip breached. Overnight the area flooded rapidly.

When I awoke Friday morning, I tried to make my way downtown to pick up my printed materials. Unfortunately, the printer’s shop was now marooned. I could see it and there was only about 50 yds of water separating it from me, but power was out and emergency personnel were preventing any access. By late afternoon, that breach also joined up with more direct overflow from the Iowa River to create the totally flooded commercial strip shown in the picture to the lower right. (The print shop is under water at the lower third on the left; I assume with my materials floating inside.)Photo Courtesy of The Register, Harry Baumert

By this point, navigating around the area was quite difficult. Most of the major bridges were now closing and the breach had caught many by surprise, so that most were at work or school for a standard business day.

As closings began and people tried to save businesses or homes, traffic got totally snarled. With some scrambling and friendly assistance from another local printer, I was able to get my job completed just as that facility announced its closing.

Meanwhile, elsewhere in the community, real mobilization was occurring in earnest.

One great aspect of Iowans is their community support and spirit. The alarming rise of river levels has literally energized the entire community, not to mention the presence of the National Guard and Coast Guard. There are dozens of locations with major sandbag operations underway. My family and the university students, not to mention thousands of others, have been bagging and stacking. Much effort occurred through the entire night to Saturday morning.Photo Courtesy of Press-Citizen, Dave Schwarz

The amazing thing is that supplies and sand seem to run out well in advance of the volunteers. Local officials have continuously extended bagging efforts beyond what they anticipated (in fact, never shutting down) and all of the local TV stations have been providing nearly constant news and video coverage.

Of course, the eventual effort in clean-up and recovery will also far exceed the current frantic efforts to hold off Mother Nature today. I’m sure we will see the same can-do spirit continue until normalcy is restored.

Dealing with the Mundane

Travel is a nightmare. My flight for tomorrow (Sunday) is normally a 20-min drive from my house. However, since the reservoir is squarely in between, all bridges have been closed. If the one bridge over I-80 in Iowa City closes, our community will be completely and totally cut off from the rest of the state.

As it is, the official detour to the airport now requires a 2-hr trip west to Des Moines, an hour north, and then 2-hrs back, for about a 5-hr one way detour. (I found and discovered a shorter “secret” back road approach today, but it may also not remain open either.)

For all citizens, potable and drinking water is becoming a concern. As the University of Iowa struggles mightily with flooding, it is losing power and its major computing center and main library are at risk. A book brigade yesterday extricated all of the books from the first floor of the library. All university summer camps and current summer session classes have been cancelled. All university personnel, including my professor wife, have been ordered to secure labs and offices and depart the campus. Efforts are now focused on keeping power and operations to the University hospital, which fortunately is itself not apparently at risk of flooding.

The effects of water, travel and power go beyond the direct flooding, but of course those with homes and businesses under water are experiencing the worst and feeling much stress. Fortunately, direct injuries and loss of life has been minimal so far. It is a real mess around here.

A ’500-Year Flood’

Photo Courtesy of The Register, Harry BaumertThe discussion is now that Iowa City is experiencing a ’500-yr flood’ that opens new questions and new uncertainties.

Because the Iowa River cycle is later, our area is not projected to see the flood water peak until Tuesday or so. Though we have already seen major damage (as of mid-day Saturday), the crest is at least two days away. The Iowa River, presently at 30.5 feet, is expected to reach 33 feet to 34 feet, well in excess of the 25-foot flood stage.

The only bridge connecting the east and west sides of downtown Iowa City is likely to close soon. The river surged 2 feet in just 12 hours on Friday. The further projected 2-3 foot of rise is unprecedented. Major stress will be placed on the current temporary levees and sandbag banks.

We have clearly not seen the peak nor the worst. The floodwaters will also take much time to recede and recovery may be long. In 1993, for example, one of Iowa City’s major thoroughfares was closed for 82 days. This year the effects will likely be much, much worse.

When not working at the lines or directly viewing the flooding, there is a bit of a surreal quality. Up until the past half hour or so, the skies were clear, the birds were chirping, and the waters were rising all around. I grew up in Southern California and have directly experienced earthquakes, and have been trying for years to try to come to grips with tornadoes. But, creeping flooding with the occasional breach just has such a weird feeling.

And, now, as I write and the afternoon unfolds, the sky has again darkened with more severe thunderstorms forming and on the way.

The Role of Linked Data

Circumstances like this demand the ability to assemble relevant information for the topic at hand. In this case, one wants information for floods and flooding, bridge and road closings, curfews, routes, airport openings and flight delays, weather forecasts, stream and river rise forecasts, hotel room availabilities, traffic delays, closings and official announcements, photo galleries to give perspective, and the like for an area encompassing eastern Iowa and municipalities including Iowa City, Solon, Cedar Rapids and Coralville.

Of course, better information and massive human effort can not themselves hold off Mother Nature when she is angry. But, better information can enable us to mobilize and use resources more efficiently and with less loss of limb and property.

In short, that is what Linked Data is all about. It represents techniques and capabilities that exist today to appropriately tag and annotate content to make it “smarter”, to put all of the relevant information in context and on demand.

So, while we have the techniques available, we do not yet have widespread application. Linked Data is not yet of help to Iowa City.

The question of doing this is not one of technology, but of business models and incentives.

So, that is what the LinkedData Planet conference is all about: Making connections that matter, and doing so simply and in context. Assuming I’m able to wend my way to dryer ground and get on the plane, stop by Zitgist‘s booth at the Roosevelt Hotel on June 17-18. I’d love to chat and shake a dry hand!

Posted by AI3's author, Mike Bergman Posted on June 14, 2008 at 3:59 pm in Adaptive Information | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/446/linkeddata-planet-and-the-www-wet-wild-world/
The URI to trackback this post is: http://www.mkbergman.com/446/linkeddata-planet-and-the-www-wet-wild-world/trackback/