Posted:June 14, 2008

The Flood of 2008 Brings New Perspectives

I live in Coralville, Iowa, a sister community to Iowa City, home of the University of Iowa and the Hawkeyes. Iowa City is in eastern Iowa on I-80 about one hour west of the state’s eastern border at the Mississippi River, and about 35 miles south of Cedar Rapids, home of one of the major mills for Quaker Oats. Iowa City is a very together and vibrant community of bucolic rolling hills and pretty vistas.

In between Iowa City and Cedar Rapids is my local commercial airport, the Eastern Iowa Airport, and Coralville Lake, which is an Army Corps of Engineers flood-control reservoir controlling the Iowa River that flows through Iowa City from the northwest on its way to the Mississippi. Just north of the Iowa River is the Cedar River, a major tributary to the Iowa River that flows through Cedar Rapids before joining the Iowa River southeast of our location near the Mississippi.

From the conclusion of winter we have been getting a lot of rain around here. I mean, a lot.

I am just one of the 450,000 or so residents in the broader area encompassing both of these Iowa hubs, but my experience may have some interest as I prepared for the upcoming LinkedData Planet conference in New York City. The kind of information needs posed by a natural disaster such as is now occurring with our ‘Flood of 2008‘ point to, I think, a compelling use case for Linked Data.

Cedar Rapids First to be Hit

As the rains continued and the rivers rose, local residents discussed the previous large flood that occurred in 1993. That one caused much disruption and devastation. However, the general consensus was that we were unlikely to see a repeat of that 100-yr event. But unfortunately, the rains continued, and the last week saw multiple occasions of inches of rain within 24-hr periods.

Photo Courtesy of The Register, Harry BaumertThe first issue became apparent for the Cedar River.

By Wednesday, alarm began to set in, and the floodwaters were rising at unforecasted rates. The city worked furiously, especially right around Mercy Hospital downtown, but by Thursday the cause was lost and the hospital was completely abandoned.

Thursday also saw a major railway bridge, laden down with rock-filled rail cars to keep it from floating away, collapse, which then also became a dam catching floating trailers and other portable buildings.Photo Courtesy of The Register, Harry Baumert

Meanwhile, a major employer in that city, Quaker Oats, was flooding up to the middle of its second story. It is shown at lower right, underneath the aerial Cedar Rapids city view. You can see another major thoroughfare, the north-south transiting I-380, snaking around town and next to the Quaker production complex. All grain and important machinery has been lost in that complex.

The water peaked on Friday, and the Cedar River is now very slowly receding. However, as of today, Saturday, about 25,000 city residents have been evacuated and some 450 city blocks are still underwater. Early estimates for damage run to about $1 billion.

Photo Courtesy of The Register, Harry BaumertI-380 remains the only way to cross town and all other city bridges are flooded and closed.

Getting Ready for the Conference

Meanwhile at mid-week, concerned that the footprint of flooding was spreading, I finished up my materials for duplication early for the LinkedData Planet conference and headed to downtown Coralville to get my order in.

Locally, the Iowa River flow is controlled by the local Coralville Dam and was not on the same cycle as the Cedar River.

Volunteers were busy sandbagging behind the printing business and things were holding steady on Coralville’s commercial strip. I delivered my materials and had some interactions with the business as the job was run, but basically things were calm though steady precautions were underway. The rising river was forecasted to peak in about 5-7 days as rains from the 3,000 square mile drainage continued to flow into the reservoir.

However, two events occurred late Thursday that greatly complicated matters and shifted the eye of concern to Iowa City.

Iowa City and Coralville are Next

Photo Courtesy of The Register, Harry Baumert The first event was the overflow of the Coralville Dam spillway. That had been forecasted, but its degree was not. The spillway had only breached one time before, and that was during the 1993 flood. (That earlier event also exposed a rich bed of Devonian fossils).

When it occurred in 1993, the water inflow into the river peaked at about 28,000 cfm. (The standard dam bypass had been running about half that prior to the spillway being breached, shown after breach on the left hand side of the image to the left.) As of today, the estimate is that the rate will rise to 44,000 cfm, and perhaps will actually have a 2-3 ft crest overflowing the spillway.

The second event that occurred was that a railroad embankment on the spur Clear Creek on the Coralville strip breached. Overnight the area flooded rapidly.

When I awoke Friday morning, I tried to make my way downtown to pick up my printed materials. Unfortunately, the printer’s shop was now marooned. I could see it and there was only about 50 yds of water separating it from me, but power was out and emergency personnel were preventing any access. By late afternoon, that breach also joined up with more direct overflow from the Iowa River to create the totally flooded commercial strip shown in the picture to the lower right. (The print shop is under water at the lower third on the left; I assume with my materials floating inside.)Photo Courtesy of The Register, Harry Baumert

By this point, navigating around the area was quite difficult. Most of the major bridges were now closing and the breach had caught many by surprise, so that most were at work or school for a standard business day.

As closings began and people tried to save businesses or homes, traffic got totally snarled. With some scrambling and friendly assistance from another local printer, I was able to get my job completed just as that facility announced its closing.

Meanwhile, elsewhere in the community, real mobilization was occurring in earnest.

One great aspect of Iowans is their community support and spirit. The alarming rise of river levels has literally energized the entire community, not to mention the presence of the National Guard and Coast Guard. There are dozens of locations with major sandbag operations underway. My family and the university students, not to mention thousands of others, have been bagging and stacking. Much effort occurred through the entire night to Saturday morning.Photo Courtesy of Press-Citizen, Dave Schwarz

The amazing thing is that supplies and sand seem to run out well in advance of the volunteers. Local officials have continuously extended bagging efforts beyond what they anticipated (in fact, never shutting down) and all of the local TV stations have been providing nearly constant news and video coverage.

Of course, the eventual effort in clean-up and recovery will also far exceed the current frantic efforts to hold off Mother Nature today. I’m sure we will see the same can-do spirit continue until normalcy is restored.

Dealing with the Mundane

Travel is a nightmare. My flight for tomorrow (Sunday) is normally a 20-min drive from my house. However, since the reservoir is squarely in between, all bridges have been closed. If the one bridge over I-80 in Iowa City closes, our community will be completely and totally cut off from the rest of the state.

As it is, the official detour to the airport now requires a 2-hr trip west to Des Moines, an hour north, and then 2-hrs back, for about a 5-hr one way detour. (I found and discovered a shorter “secret” back road approach today, but it may also not remain open either.)

For all citizens, potable and drinking water is becoming a concern. As the University of Iowa struggles mightily with flooding, it is losing power and its major computing center and main library are at risk. A book brigade yesterday extricated all of the books from the first floor of the library. All university summer camps and current summer session classes have been cancelled. All university personnel, including my professor wife, have been ordered to secure labs and offices and depart the campus. Efforts are now focused on keeping power and operations to the University hospital, which fortunately is itself not apparently at risk of flooding.

The effects of water, travel and power go beyond the direct flooding, but of course those with homes and businesses under water are experiencing the worst and feeling much stress. Fortunately, direct injuries and loss of life has been minimal so far. It is a real mess around here.

A ‘500-Year Flood’

Photo Courtesy of The Register, Harry BaumertThe discussion is now that Iowa City is experiencing a ‘500-yr flood’ that opens new questions and new uncertainties.

Because the Iowa River cycle is later, our area is not projected to see the flood water peak until Tuesday or so. Though we have already seen major damage (as of mid-day Saturday), the crest is at least two days away. The Iowa River, presently at 30.5 feet, is expected to reach 33 feet to 34 feet, well in excess of the 25-foot flood stage.

The only bridge connecting the east and west sides of downtown Iowa City is likely to close soon. The river surged 2 feet in just 12 hours on Friday. The further projected 2-3 foot of rise is unprecedented. Major stress will be placed on the current temporary levees and sandbag banks.

We have clearly not seen the peak nor the worst. The floodwaters will also take much time to recede and recovery may be long. In 1993, for example, one of Iowa City’s major thoroughfares was closed for 82 days. This year the effects will likely be much, much worse.

When not working at the lines or directly viewing the flooding, there is a bit of a surreal quality. Up until the past half hour or so, the skies were clear, the birds were chirping, and the waters were rising all around. I grew up in Southern California and have directly experienced earthquakes, and have been trying for years to try to come to grips with tornadoes. But, creeping flooding with the occasional breach just has such a weird feeling.

And, now, as I write and the afternoon unfolds, the sky has again darkened with more severe thunderstorms forming and on the way.

The Role of Linked Data

Circumstances like this demand the ability to assemble relevant information for the topic at hand. In this case, one wants information for floods and flooding, bridge and road closings, curfews, routes, airport openings and flight delays, weather forecasts, stream and river rise forecasts, hotel room availabilities, traffic delays, closings and official announcements, photo galleries to give perspective, and the like for an area encompassing eastern Iowa and municipalities including Iowa City, Solon, Cedar Rapids and Coralville.

Of course, better information and massive human effort can not themselves hold off Mother Nature when she is angry. But, better information can enable us to mobilize and use resources more efficiently and with less loss of limb and property.

In short, that is what Linked Data is all about. It represents techniques and capabilities that exist today to appropriately tag and annotate content to make it “smarter”, to put all of the relevant information in context and on demand.

So, while we have the techniques available, we do not yet have widespread application. Linked Data is not yet of help to Iowa City.

The question of doing this is not one of technology, but of business models and incentives.

So, that is what the LinkedData Planet conference is all about: Making connections that matter, and doing so simply and in context. Assuming I’m able to wend my way to dryer ground and get on the plane, stop by Zitgist‘s booth at the Roosevelt Hotel on June 17-18. I’d love to chat and shake a dry hand!

Posted by AI3's author, Mike Bergman Posted on June 14, 2008 at 3:59 pm in Adaptive Information | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/446/linkeddata-planet-and-the-www-wet-wild-world/
The URI to trackback this post is: https://www.mkbergman.com/446/linkeddata-planet-and-the-www-wet-wild-world/trackback/
Posted:June 7, 2008

A New Report from Norway Compares Topic Maps, RDF/OWL, UML and Others

No, the title does not refer to NO as in no, nyet, nein, non or ne, but NO as in Norway.

NorStella, the Norwegian Foundation for E-business and Trade Procedures, has published a 51-pp PDF report edited by Øyvind Aassve, called The SIM Report: A Comparative Study of Semantic Technologies. Though Norway is perhaps the most aggressive country in the use of topic maps, and Steve Pepper and Lars Marius Garshol of Onotopia (among the ten contributors) are also noted advocates for topic maps, the report’s treatment of its five comparative semantic technologies is balanced and informative.

This group of ten contributors formed over beers about two years ago under the banner of what they called, “Semantic Interoperability Models” (SIM). As they note in the report,

. . . we found that we as representatives for different semantic technologies were talking different languages. We were explaining our technologies with different terms to mean the same thing and the same term to mean different things, and our models for addressing the issue of semantic interoperability were so different it was hard to get our message across to people who already identified deeply with their own way of thinking. We were supposed to be experts on semantic technologies, but we experienced a complete breakdown of semantic interoperability among ourselves.

How true. It is bold of this group to recognize this gap and to commit to help bridge understanding.

As I have noted many times regarding basic semantic Web approaches, even within the fold, so to speak, there are large challenges of semantics and usage. Yet, when such perspectives are extended across multiple technologies and mindsets, the challenges become still more daunting.

The interest in this group was to find common ground to explain their different but complementary approaches to the lay public and to the enterprises to which many of them consult. The attempt correctly recognizes that there will always be competing approaches and mindsets and technologies geared to roughly the same aims, though perhaps suited for different use cases or strengths.

The group set as its objective to increase understanding of five candidate semantic technologies in order to better promote semantic interoperability. They defined semantic operability as the “ability to share information (or knowledge) based on its meaning — i.e., what it is about.” The report notes that “semantic interoperability thus focuses on the benefits that can be achieved by mediating structures, meanings and contexts within relatively confined and well-understood domains.”

As such, this report is a very effective introduction and a good explication of the issues and trade-offs in semantic representation.

And the Contestants Are . . .

The first part of the report is dedicated to a balanced discussion of the five semantic technologies chosen for comparison:

  • RDF/OWL — RDF (Resource Description Framework) and OWL (Web Ontology Language) are two of the technologies that form the foundation of the semantic Web
  • Topic Maps — a standard (ISO 13250) for describing knowledge structures based on a formal model built around topics (subjects or concepts), assertions about them, and the relationships (associations) between them
  • UN/CEFACT Core Components — the Core Component Technical Specification (CCTS) is designed for simple, transparent and effective processes for global commerce, with a focus on machine-to-machine exchange of business documents often with a transactional nature such as orders or billing
  • ISO 15926 — another standard, though with a different purpose to facilitate integration of data to support the life-cycle activities and processes of process plants, and
  • UML — the Unified Modeling Language (UML) is a standardized specification language for object modeling with current wide use in software design and object components and relationships visualization, often used for business process modeling, other modeling, flowcharts, or work or organizational structures.

Note that XML is not included in this list because it is geared to syntax representation and serialization, not the underlying semantics.

Now, Some Commonalities

Though based on different perspectives and use cases, this group found some common characteristics amongst the approaches:

  • Componentization of information elements
  • Use of unique identifiers
  • Focus on reuse of externally defined conceptual models
  • Agility

These are the commonalities explicitly listed. But, we also see in the group’s diagram an interesting commitment across these approaches to what is now recognized as the subjectpredicateobject “triple”:

Click for full size

There are real fundamental concepts and principals underlying this embracing of first-order logic and the triple diagram of object-action-referent expressed in so many different ways by philosophers and semanticists.

Sure, there are real differences (most of which are likely well beyond my ken). But, when boiled down, we can see that whether we call things “things” or objects or subjects or referents or entities or the myriad of ways that we try to discuss the same concepts or topics or subjects, everyone is still basically trying to capture the same things and meanings.

So, in our triples, we can call the relations between an object (“subject”) and its referent (“object”) by many different things: verbs, predicates, properties, associations, and, indeed, even relations. In the end, we are all simply trying to express the “facts” of our observed world, our “assertions”.

Strengths and Weaknesses

While the authors of the NO study are careful not to use these terms, they do nibble around the edges that both by design, scope or expressiveness, each of the five candidate approaches may be better suited for different uses. It is useful to understand domain and scope when choosing one of these options.

Is the purpose to model software systems or use cases? Transactions (such as order processing systems)? Knowledge representation? Visualization or analytic modeling? Use and application of controlled vocabularies? Control structures or process systems?

Of course, in the end, the market will decide about all of these approaches. But, because of unique strengths in representation or historical or local uptake, each of the five is likely to survive to some extent.

My Take

In terms of scope and coverage, I found the report’s Section 8 FAQ to be quite useful. I would have preferred more citations, especially of a reference or seminal nature, since the five approaches each had its own advocates. The earlier detailed work on RDF-Topic Maps relations and its nature [1] deserved more than a simple note in passing. And, the sections on tools were very weak.

But these are small quibbles. As the opening quote notes eloquently, our broader challenge is that we are saddled with semantic, world view, and domain differences.

I really respect and like efforts like this. There is no single truth. In the reality of the Web and our global commons, there are many approaches and viewpoints.

What we see in these more comprehensive approaches are a reflection of other Web standards such as microformats or tagging or RDFa or, frankly, any of the means by which any of us attempt to provide structure and enhanced meaning (“metadata”) to the content we encounter and process. What I find remarkable about the methods outlined in the NO study is how many similarities there are to approach and viewpoint. Surely, we can easily overcome the differences and forgive the differences. We will also find common ground for translation and interoperability.

As the RDF-Topic Maps studies mentioned before show, in the end, I think it really does not matter what flavor of semantic technology you prefer. Speak your language and in your own way. We are now seeing the emergence of sufficiently similar frameworks such that we’ll be able to collectively figure out how to bridge differences and effectively communicate.


[1] See http://www.w3.org/TR/2005/WD-rdftm-survey-20050329/ and http://topicmaps.wordpress.com/2008/05/11/topic-maps-and-the-semantic-web/.

Posted by AI3's author, Mike Bergman Posted on June 7, 2008 at 11:38 am in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/445/no-semantic-technologies/
The URI to trackback this post is: https://www.mkbergman.com/445/no-semantic-technologies/trackback/
Posted:June 4, 2008

The Foundational Bibliographic Ontology is Released

After a year of diligent effort, Frédérick Giasson and Bruce D’Arcus have just announced the release of the Bibliographic Ontology. The Bibliontology, or BIBO for short, is a much anticipated foundational piece for the semantic Web. It is designed to provide flexible and comprehensive RDF representation of citations and bibliographic materials and collections.

One of the very exciting parts of the project is its possible use as an underlying schema for Zotero. Those familiar with this blog know I have been a Zotero fan from practically the first day of its release, and have pointed to it repeatedly as how application-capable functionality can be incorporated as a browser plug-in (based on Firefox in the case of Zotero). I’m hoping the automatic exposure of my citation and reference information as linked data via the ontology is a day now nearly at hand.

A Zotero-BIBO-exposed linked data RDF chain would dramatically change the game and prove a powerful demo and story for how all of this linked data can truly interoperate. Moreover, once that information is related to its subject contexts using UMBEL, watch the fireworks!

So again, congratulations to Fred and Bruce for a real labor of love. And, thanks to the scores of individuals who provided review and commentary on the BIBO forum during this initial development. Now that it is out in the wild, I’m sure we will see still further development and improvements.

BTW, Zitgist has been a proud sponsor and funder of Fred’s contributions to the project.

Posted:May 23, 2008

The Pace of Change is Almost Breathtaking

Michel Bauwens of the P2P Foundation just posted The future of the web: semantic, or just structured? that is comprised almost in its entirety of what I had written in a post about 10 months ago. The point of my earlier post was that structured Web content, short of the full-blown aspects and aspirations of the semantic Web, is where the Web was currently at and it represented the natural transition point between documents and meaning interpretable by machines.

I’m honored with Michel’s treatment of these comments and I thank him. It is always nice to see one’s name (and words) in lights, as it were.

But, I also had forgotten about that original post and its sentiments.

This re-surfacing caused me to go back and read and think about it again. (Not too bad, if I may say!) But, with the passage of now 10 months, I also don’t think I got it exactly right. As I commented back to Michel:

But being directed to these words by my feed reader (from a basis that is now, what, 10 months ago?) also got me to thinking. I stand by the words here (and in the full article that I just re-read after a long hiatus), but I would likely say them differently today.

Linked data and the semantic Web are moving at warp speed. Linked data, in particular, is showing the way to pragmatic, meaningful connections. And, for me and my company, Zitgist, that is also leading to a quicker exploration of semantics, class relationships, ontology mappings, and much that is closer to the "semantic" end of the spectrum rather than the "structured" end of the spectrum implied by my comments above.

I frankly did not see this rapidity of uptake and it is very, very exciting.

The lesson, I think, is that structure, yes, makes sense, but, once we taste it, we want to take it further. The role of structure driving the demand for semantics is pretty compelling.

In that regard, I would not equate "premature" nearly so strongly to the semantic Web.

The freight train is now moving through the yard but without any slowdown. The train left the station quite a bit ago. Like most trends, it is hard to see the start and initial pace.

Fortunately, we are also finding that items such as context, named entities, and quality are now becoming themes within our community’s posts. I say Hear, hear!, and Rah, rah! Good show, all. The great thing about opining is that we never get it right!

Posted by AI3's author, Mike Bergman Posted on May 23, 2008 at 12:29 am in Semantic Web, Structured Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/442/structured-web-v-semantic-web/
The URI to trackback this post is: https://www.mkbergman.com/442/structured-web-v-semantic-web/trackback/
Posted:May 11, 2008

Unattributed cover drawing from The Economist, August 4, 2007; see http://www.economist.com/

Squeezed Between Two World Views at the Infocline

I remember one of my formative jobs at the American Public Power Association when we were running all of APPA’s technical activities. While mostly a lobbying outfit, we were after all in Washington, DC, and were perceived by many to be near some nexus of power and influence. And, because my group did technical stuff, we were a natural magnet for some inventors seeking an edge. And, believe me, some of those folks were real crackpots.

We’d hear claims how this person or that invented radar, or perpetual energy machines, or bendable concrete, and, once, even, cold fusion. Hehe.

In monitoring various ontology and semantic Web mailing lists, claims sometimes arise about the “ontology of everything” or the single, universal ontology that cures cancer or walks on one leg. It makes me smile and think about those past wild claims about radar or perpetual energy.

Some, I believe, when we mention the UMBEL lightweight subject concept reference structure, conjure up similar visions of a universal “ontology of everything”. That is wrong and not our intent. But, UMBEL is trying to straddle two different worlds and world views, and that can often lead to misunderstandings and misperceptions.

This posting is not the first and surely will not be the last on the subject, but it is worthwhile again to try to explain the role of UMBEL from these different angles and with slightly different analogies.

UMBEL is an Infocline

In prior posts, we have described UMBEL as a backbone, as a roadmap to related content, as a lightweight ontology or a lightweight reference structure, and as middleware. In this post, I am going to concentrate on its role as middleware, in its role as residing at the infocline between two different worlds and world views.

The Greek base -cline is often applied to gradual transition layers or changes in gradients or slope. A thermocline, for example, represents the layer between the deep and surface ocean. While there is mixing across this layer, it is slower than within the two parts that it separates. Both parts and the thermocline layer itself have quite different properties and temperatures, even though all are ocean and salty water.

The UMBEL infocline acts in a similar manner. On one side of the UMBEL layer is the Cyc knowledge base, with its self-contained, more-or-less closed world of higher order logics, microtheories regarding thousands of knowledge domains, rich predicates, and coherence. It is venerable, solid and proven, but with its own language and world view. Its purpose is also directed to reasoning and inference, driven from a foundation of (generally not codified outside of Cyc) common sense. It was designed well in advance of the creation of RDF or OWL, indeed in advance of the Internet and Web itself.

On the other side of the UMBEL infocline is the entire Web. This is a chaotic, decentralized, distributed knowledge environment representing untold numbers of world views. The specifications of the semantic Web and its languages and vocabularies have been designed expressly with these differences in mind and the means and structures to link and interrelate them. The Web environment — though not exactly incoherent — is also not in its ground state coherent. Indeed, it is the very purpose of existing semantic Web standards and UMBEL to help provide that coherence.

A key aspect of the Web is its “open world” assumption, defined in SKOS [1] as:

"RDF and OWL Full are designed for systems in which data may be widely distributed ( e.g., the Web). As such a system becomes larger, it becomes both impractical and virtually impossible to "know" where all of the data in the system is located. Therefore, one cannot generally assume that data obtained from such a system is "complete", i.e., if some data appears to be "missing", one has to assume, in general, that the data might exist somewhere else in the system. This assumption, roughly speaking, is known as the "open world" assumption."

What this means for the Web is that we must assume that the system’s knowledge is incomplete, and that if a statement cannot be inferred from what is explicitly expressed, we still cannot infer it to be false. Adding new information never falsifies a previous conclusion, and most of what we can know about the world will remain unknown. Cyc, on the other hand, can make closed-world assumptions under appropriate conditions.

UMBEL thus must act as a mediator, or middleware, in its role as the interface between these world views. It can lead to tension and turbulence when contemplating or transiting this infocline layer.

The Cyc Reference Knowledge Base

The central purpose of UMBEL is to provide a context for relating information. Once such a purpose for context is embraced, the natural next question is: And what shall be the basis for this context?

A previous post discussed why Cyc was chosen over alternatives as this contextual basis. Ultimately, the reasons for choosing Cyc come down to real practical tools and capabilities such as helping to disambiguate the identities of named entities, mapping ontologies and schema, doing natural language processing, and the sheer provenness of the concept relationships that are at the core of UMBEL.

(Also, as noted many times, others could just as reasonably chose other bases for providing context. The important point, again, is to provide some context over no context.)

We can view the Cyc knowledge base as a complete, albeit large, world unto itself. Like the Earth, it is complex and varied and self-contained. It has its own atmosphere and perspective on the broader universe:

But, like other planets or celestial bodies, Cyc is a world, not the world. There are many different possible worlds with different atmospheres and gravities and temperatures and compositions. And, of course, Cyc is not a physical world at all, but a conceptual “world” representing knowledge and its relationships. We will, however, represent it with the Earth image below.

There is 25 years and perhaps close to 300 person-years of development behind Cyc. It has thousands of able practitioners around the world and has been used in hundreds of meaningful projects and engagements. Since its release in 2002, there have been well in excess of 100,000 downloads of its open-source OpenCyc version.[2]

This legacy and history leads to distinct functional and terminology differences from current semantic Web perspectives. For example, the richness of Cyc predicates does not lead to simple mappings to existing OWL and RDFS properties. The notion of class is different than the closest analog in Cyc, the ‘collection‘. The concept and treatment of individuals and types is different. The 1000 or so microtheory domains in Cyc are not easily transferred or mapped to OWL constructs. Cyc uses reification aggressively to functionally combine concepts from constituent elements, such as “apple tree”. Higher-order logic is not transferable in all cases to the first-order logic (FOL) of the semantic Web. And so forth . . . .

Perhaps most importantly, however, is that Cyc has been designed, built and extended by professional ontologists and related researchers. This brings a degree of consistency and quality control that Web-broad initiatives can not hope to approach.

In our working with Cyc there has been nothing but good will and professionalism from the staff at Cycorp and the Cyc Foundation. But, there are clearly times when world view and terminology can differ, sometimes leading to translation problems and issues. Moreover, attempts to bridge from the Cyc world to the open world assumptions of the general Web means the translation is “lossy”, much like what happens in moving from a 16 million palette of 24-bit colors to something less.

Cleaning Cyc

Here are some statistics showing the relative size and scope of the ResearchCyc and OpenCyc versions of Cyc, current as of the last official distributions [3]:

Category OpenCyc ResearchCyc
Reifiied Terms (Constants and NARTs) 263,332 303,340
Assertions 2,040,330 2,964,161
Deductions 323,751 1,305,354
Unique Predicates (Properties) [2] 17 (OWL) ~16,000
Disk Storage (KBs) [4] 495,104 566,400

The sheer size and sophistication of either version is too great for easy comprehension and linkage by standard Web resources. Thus, the UMBEL project set out to determine and derive the most fundamental concepts from within OpenCyc. What was desired was a tractable set of subject concept “hub” nodes from within OpenCyc. A further design criterion was to maintain a 100% consistency with OpenCyc for this subset of subject concepts in order for UMBEL to preserve linkage into the Cyc knowledge base.

A subsequent post will relate in detail the nine-month (and continuing!) vetting and extraction process applied to Cyc, the result of which is currently the identification of about 21,000 subject concepts. These are schematically illustrated by the yellow dots on our Cyc Earth representation:

Of course, these yellow dots are not really physical locations on a globe. Rather, they represent important “hub” locations within the virtual Cyc knowledge “space”.

UMBEL: A Lightweight Skein

Once removed from the broader knowledge base, we now have a simple skein of these 21,000 subject concepts and their interrelations. We can show this lightweight structure as a ball of subject concept nodes (in red) connected to one another via their graph edges. We can represent this lightweight skein as follows, which has similarities to a hairnet with the nodes represented by the knots (in red) in the net:

This simplistic wireframe representation has been presented before for all 21,000 UMBEL nodes via the Cytoscape graph visualization software (see figure right; click for larger size).

Click to ExpandThe following table shows that the overall size and complexity of Cyc has been reduced by 1-2 orders of magnitude through this cleaning exercise, resulting in a lightweight UMBEL structure about 5-10% of the original size:

Category OpenCyc ResearchCyc UMBEL
Terms or Concepts 263,332 303,340 21,057
Assertions 2,040,330 2,964,161 285,700
Deductions 323,751 1,305,354
Unique Predicates (Properties) [2,5] 17 (OWL) ~16,000 18
Disk Storage (KBs) [4] 495,104 566,400 14,445

Very striking is the predicate reduction, which is both a key source of “lossiness” and a challenge in maintaining a meaningful OWL and RDFS correspondence with the original Cyc. However, since the purpose of UMBEL is context and not reasoning or inference, this reduction is appropriate and understandable.

The ‘Hairnet Over the Basketball’

Metaphorically, we can now re-apply this UMBEL skein over the Cyc knowledge base. We have described this visual metaphor as the “hairnet over the basketball,” with UMBEL being the hairnet, and Cyc (Earth) the basketball:

Note that the UMBEL skein can act and be used fully independently from the underlying Cyc structure or not.

21,000 Docking Ports for an Open World

This UMBEL lightweight skein or wireframe structure now is ready to act as middleware, to play its role as an infocline. Each of UMBEL’s 21,000 subject concepts is, in effect, a “docking port” to which external Web data can “attach”. Once attached, this data can then be related to other Web data via the subject concept relationships in the UMBEL skein. This docking and attachment mechanism can be visualized as follows (click to enlarge):

If you mentally remove the Earth figure (Cyc) above, the UMBEL skein acts solely as a context reference structure for other Web data through its lightweight SKOS taxonomy structure (narrowerTransitive and broaderTransitive). These are the internal edge relationships of the wireframe structure with the red nodes above.

Though lightweight, this structure is surprisingly powerful in that it also enables tie-ins with external ontology classes — what Fred Giasson has called ‘exploding the domain‘ — and provides a reference context for Web data. Without these docking ports via UMBEL’s subject concepts, there is no contextual frame of reference and these Web data bits essentially tumble aimlessly in a dark knowledge space.[6]

But one need not stop at the infocline wireframe layer of UMBEL. Because each subject concept (“docking port”) has a direct correspondence to Cyc, we can dive more deeply into the Cyc knowledge environment. First through OpenCyc and then (via licensing or other arrangements) into ResearchCyc or the full Cyc, another dimension of tools and capabilities can become available. We now have backup and support to assess mappings and assignments and inferences and reasoning.

Will everyone want such capabilities? Most will not.

But it also surely does not hurt to have these value-added pathways so readily available for use and exploitation.

Some Context is Better than No Context at All

Some perhaps in the Cyc community may look at this picture and say, Whoa!: We’re giving Web denizens loaded Cyc guns via the UMBEL infocline to harm themselves and others.

Perhaps so. But this is also why we have courses on firearms safety and practice ranges for gaining the experience. Ontology mapping of any nature in an open world requires attention and skill to maintain quality.

The open world circumstances have already shown challenges with sameAs assignments and will certainly be exacerbated as we extend to class mappings in ontologies and inferencing. Quality and provenance will assert their prominence. Who do you trust and who is capable? But haven’t these always been operative questions?

Some perhaps in the broader Web community may go, Whoa! We are free and independent actors who hate any sniff of possible centralized Big Brother crap. Why UMBEL? Why Cyc? I want to free-form tag and twitter to my heart’s content.

OK, well, sure. But how can the Web of data meaningfully expand without reference points, structure and context? Though we may have foundational semantic Web standards in place, if we are going to meaningfully inter-relate data, we also need context and semantics.

UMBEL and Cyc offer one set of contexts, semantics and tools. Whether they are the best or not is a matter for the market to decide. But I think it will rapidly become clear that future Linked Data that is published without context will remain largely unused data. The question now going forward is not the rejection of context but deciding what contextual frameworks work better, are easy to implement, and are readily understood.

So, I think the game has changed and I’d like to believe for the better. UMBEL has placed a marker down — and it’s smack dab in the middle.

Yes I’m stuck in the middle with you,
And I’m wondering what it is I should do,
It’s so hard to keep this smile from my face,
And knowledge, yeah, is all over the place,
Cyc to the left of me, Open Web to the right,
Here I am, stuck in the middle with you. [7]


[1] W3C, SKOS Simple Knowledge Organization System Reference, W3C Working Draft, 25 January 2008; see http://www.w3.org/TR/2008/WD-skos-reference-20080125/#L881.
[2] See Priming the Pump and Threshold Conditions for the ResearchCyc estimate; the OpenCyc predicate count is from the OWL distribution version; the standard OpenCyc predicate count is not calculated.
[3] Obtained by running the CycL query, (kb-statistics), to a local instance of the distribution. The OpenCyc version is 5006; the ResearchCyc version 7117.
[4] Only the distribution size for the World model is reported; there are additional executables and supporting files not included.
[5] For UMBEL, 9 of the 18 are new properties, the rest are from existing OWL and RDFS vocabularies. These predicates include: type (RDF); subClassOf (RDFS); equivalentClass (OWL Full); language (DC, including all lingvoj instances); prefLabel, altLabel, definition, broaderTransitive, and narrowerTransitive (SKOS; also, there are some SKOS notes properties not listed); and the new properties of superClassOf, hasSemset, isAligned, withOverlap, linksConcept, isAbout, linksEntity, isLike, and withLikelihood (UMBEL). The forthcoming UMBEL technical documentation will explain this vocabulary in detail.
[6] While the current and common use of the sameAs relation provides linkage between instances or named entity identities in various datasets, this predicate does nothing to orient or provide frames of reference for the datasets themselves.
[7] Stuck in the Middle with You, with all due respect to Bob Dylan and Stealers Wheel.

Posted by AI3's author, Mike Bergman Posted on May 11, 2008 at 4:16 pm in Adaptive Information, Semantic Web, Structured Web, UMBEL | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/441/the-role-of-umbel-stuck-in-the-middle-with-you/
The URI to trackback this post is: https://www.mkbergman.com/441/the-role-of-umbel-stuck-in-the-middle-with-you/trackback/