<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; UMBEL</title>
	<atom:link href="http://www.mkbergman.com/category/umbel/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Wed, 10 Mar 2010 05:21:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Structured Dynamics&#8217; Product Stack</title>
		<link>http://www.mkbergman.com/842/structured-dynamics-product-stack/</link>
		<comments>http://www.mkbergman.com/842/structured-dynamics-product-stack/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 22:54:24 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Information Automation]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[scones]]></category>
		<category><![CDATA[semantic enterprise]]></category>
		<category><![CDATA[slideshow]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=842</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>

A New Slide Show Consolidates, Explains Recent Developments
Much has been happening on the Structured Dynamics front of late. Besides welcoming Steve Ardire as a senior advisor to the company, we also have been issuing a steady stream of new products from our semantic Web pipeline.
This new slide show attempts to capture these products and relate [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; width: 260px; height: 60px; float: left; margin-right: 10px;" title="Structured Dynamics LLC" src="../wp-content/themes/ai3/images/sd_logo_260.png" alt="Structured Dynamics LLC" hspace="5" vspace="5" align="left" /></a></p>
<h2>A New Slide Show Consolidates, Explains Recent Developments</h2>
<p>Much has been happening on the <a href="http://structureddynamics.com">Structured Dynamics</a> front of late. Besides welcoming <a href="http://www.linkedin.com/in/sardire">Steve Ardire</a> as a senior advisor to the company, we also have been issuing a steady stream of new <a href="http://structureddynamics.com/products.html">products</a> from our semantic Web pipeline.</p>
<p>This new slide show attempts to capture these products and relate them to the various layers in Structured Dynamics&#8217; enterprise product stack:</p>
<div class="center_ok center">
<div id="__ss_2406783" class="center_ok" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="Structured Dynamics's Semantic Technologies Product Stack" href="http://www.slideshare.net/mkbergman/structured-dynamicss-semantic-technologies-product-stack">Structured Dynamics&#8217;s Semantic Technologies Product Stack</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/mkbergman">mkbergman</a>.</div>
</div>
</div>
<p>The show indicates the role of <a href="http://structureddynamics.com/scones.html">scones</a>, <a href="http://openstructs.org/iron">irON</a>, <a href="http://openstructs.org/structwsf">structWSF</a>, <a href="http://umbel.org/">UMBEL</a>, <a href="http://constructscs.com/">conStruct</a> and others and how they leverage existing information assets to enable the semantic enterprise. And, oh, by the way, all of this is done via Web-accessible <a href="http://structureddynamics.com/linked_data.html">linked data</a> and our practical <a href="http://structureddynamics.com/technology.html">technologies</a>.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/842/structured-dynamics-product-stack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SD Now Hosting UMBEL Web Services</title>
		<link>http://www.mkbergman.com/799/sd-now-hosting-umbel-web-services/</link>
		<comments>http://www.mkbergman.com/799/sd-now-hosting-umbel-web-services/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 22:05:51 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web services]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=799</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=SD Now Hosting UMBEL Web Services&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/799/sd-now-hosting-umbel-web-services/&amp;rft.language=English"></span>

Fred Giasson has just announced that Structured Dynamics has moved and is now hosting the new UMBEL Web services.  Check out his &#8220;New Home for UMBEL Web Services&#8221; post to learn more.
I should mention that Structured Dynamics has also used this migration to update parts of its Web site, as well.
]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=SD Now Hosting UMBEL Web Services&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/799/sd-now-hosting-umbel-web-services/&amp;rft.language=English"></span>
<p><a href="http://umbel.structureddynamics.com"><img style="float: left; margin-right: 10px;" title="umbel_ws" src="http://fgiasson.com/blog/wp-content/uploads/2008/10/umbel_ws.png" alt="umbel_ws" width="170" height="74" /></a></p>
<p>Fred Giasson has just announced that Structured Dynamics has moved and is now hosting the new <a href="http://umbel.org">UMBEL</a> Web services.  Check out his &#8220;<a href="http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/">New Home for UMBEL Web Services</a>&#8221; post to learn more.</p>
<p>I should mention that <a href="http://structureddynamics.com">Structured Dynamics</a> has also used this migration to update parts of its Web site, as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/799/sd-now-hosting-umbel-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8216;SuperTypes&#8217; and Logical Segmentation of Instances</title>
		<link>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/</link>
		<comments>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 21:23:20 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[cyc]]></category>
		<category><![CDATA[instances]]></category>
		<category><![CDATA[named entities]]></category>
		<category><![CDATA[superTypes]]></category>
		<category><![CDATA[TBox]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=759</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=&#8216;SuperTypes&#8217; and Logical Segmentation of Instances&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/&amp;rft.language=English"></span>
 
The Significant Advantages to a Logically Segmented TBox
The Message Understanding Conferences (MUC)         were initiated in 1987 and financed by DARPA to encourage the         development of new and better methods of information        [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=&#8216;SuperTypes&#8217; and Logical Segmentation of Instances&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 200px; height: 172px; float: left; margin-right: 10px;" title="Segmented" src="../wp-content/themes/ai3/images/2009Posts/090831_segmented.jpg" alt="Segmented" hspace="5" vspace="5" align="left" /> <a href="http://www.umbel.org/"><img style="border: 0px solid; margin-left: 5px; width: 100px; height: 50px; float: right;" src="../wp-content/themes/ai3/images/umbel_logo_100.png" alt="UMBEL (Upper Mapping and Binding Exchange Layer)" /></a></p>
<h2>The Significant Advantages to a Logically Segmented TBox</h2>
<p>The Message Understanding Conferences (<a href="http://en.wikipedia.org/wiki/Message_Understanding_Conference">MUC</a>)         were initiated in 1987 and financed by <a href="http://en.wikipedia.org/wiki/DARPA">DARPA</a> to encourage the         development of new and better methods of <a href="http://en.wikipedia.org/wiki/Information_extraction">information         extraction</a> (IE). It was a seminal series that resulted in basic         measures of retrieval and semantic efficacy, <a href="http://en.wikipedia.org/wiki/Precision_and_recall">recall</a> (R) and         <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision</a> (P)         and the combined <a href="http://en.wikipedia.org/wiki/F-score">F-measure</a>, and other core         terminology and constructs used by IE today.</p>
<p>By the sixth version in the series (MUC-6), in 1995, the task of         recognition of <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">named         entities</a> and <a href="http://en.wikipedia.org/wiki/Coreference">coreference</a> was added.         That initial slate of named entities included the basic building blocks         of <span style="font-style: italic;">person</span> (PER), <span style="font-style: italic;">location</span> (LOC), and <span style="font-style: italic;">organization</span> (ORG); to these were added         the numeric building blocks of <span style="font-style: italic;">time</span>, <span style="font-style: italic;">percentage</span> or <span style="font-style: italic;">quantity</span>. The very terminology of         <span style="font-style: italic;">named entity</span> was coined for         this seminal meeting, as was the idea of inline markup <a href="#st1">[1]</a>.</p>
<h3>What is a &#8216;Nameable Thing&#8217;?</h3>
<p>The intuition surrounding &#8220;named entity&#8221; and nameable &#8220;things&#8221; was that they         were discrete and disjoint. A <span style="font-style: italic;">rock</span> is not a <span style="font-style: italic;">person</span> and is not a <span style="font-style: italic;">chemical</span> or an <span style="font-style: italic;">event</span>. As initially used, all &#8220;named         entities&#8221; were distinct individuals. But, there also emerged the         understanding that some classes of things could also be treated as         more-or-less distinct nameable &#8220;things&#8221;: <span style="font-style: italic;">beetles</span> are not the same as <span style="font-style: italic;">frogs</span> and are not the same as <span style="font-style: italic;">rocks</span>. While some of these &#8220;things&#8221; might         be a true individual with a discrete name, such as <a href="http://en.wikipedia.org/wiki/Kermit_the_Frog">Kermit the Frog</a>, or         <a href="http://en.wikipedia.org/wiki/The_Rock_%28Northwestern_University%29">The         Rock</a> at Northwestern University, most instances of such things are         unnamed.</p>
<p>The &#8220;nameability&#8221; (or logical categorization) of things is perhaps best         kept separate from other epistemological issues of distinguishing         <span style="font-style: italic;">sets</span>, <span style="font-style: italic;">collections</span>, or <span style="font-style: italic;">classes</span> from <span style="font-style: italic;">individuals</span>, <span style="font-style: italic;">members</span> or <span style="font-style: italic;">instances</span>.</p>
<p>In a closed-world system it is easier to enforce clean distinctions.         The <a href="http://en.wikipedia.org/wiki/Cyc">Cyc knowledge base</a>,         for example, the basis for <a href="http://umbel.org/">UMBEL</a> (<span style="font-style: italic;">Upper Mapping and Binding Exchange         Layer</span>),  makes clear the distinction between <span style="font-style: italic;">individuals</span> and <span style="font-style: italic;">collections</span>. In the semantic Web and RDF,         this can become smeared a bit with the favored terminology shifting to         <span style="font-style: italic;">instances</span> and <span style="font-style: italic;">classes</span>, and in pragmatic, real-world         terms we (as humans) readily distinguish John Smith as distinct from         Jane Doe but don&#8217;t generally (unless we&#8217;re entomologists!) make such distinctions for individual         beetles, let alone entire genera or species of beetles.</p>
<p>Under precise conditions, these distinctions are important. The fact         that Cyc, for example, is assiduous in its application of these         distinctions is a major reason for the overall <a href="../450/when-is-content-coherent/">coherence</a> of its knowledge base. But, for most circumstances, we think it is OK         to accept a distinction between &#8220;nameable&#8221; things such as frogs and         beetles, but also to accept that there may be nameable individuals at         times in those groupings such as Kermit that are truly an individual in         that more refined sense.</p>
<p>This digression sets the background for a natural progression from that         first MUC-6 conference. If we could cluster <span style="font-style: italic;">persons</span> or <span style="font-style: italic;">organizations</span>, why not other categories of         distinct and disjoint things such as <span style="font-style: italic;">frogs</span> or <span style="font-style: italic;">beetles</span> or <span style="font-style: italic;">rocks</span>?</p>
<p>From the first six entity categories of MUC-6 we begin to see an         expansion to broader coverage. Readers of this blog will recall that I         have been a fan for quite some time of the expanded coverage of 64         classes of entities proposed by BBN or the 200 proposed by Sekine <a href="#st2">[2]</a> (as discussed, for example in the April 2008 <a style="font-style: italic;" href="../432/subject-concepts-and-named-entities/">Subject         Concepts and Named Entities</a> article). Again, the intuition was that         real things in the real world could be logically categorized into         discrete and disjoint categories.</p>
<p>Thus, &#8220;named entities&#8221; inexorably moved to become a categorization         system, where the degree of familiarity and distinction dictated         whether it was the individual (with a unique name, such as <span style="font-style: italic;">Abraham Lincoln</span> or <span style="font-style: italic;">Mt. Rushmore</span>) or groupings such as animal         or plant species and their common names (such as <span style="font-style: italic;">beetle</span> or <span style="font-style: italic;">oak</span>) that was the standard &#8220;handle&#8221; for         assigning a name to the &#8220;nameable thing&#8221;.</p>
<p>While many can argue these individual &lt;&#8211;&gt; grouping distinctions         and whether we are talking about true, unique, named individuals or         names of convenience, I think that (at least for this blog post and         discussion), that misses the real, fundamental point.</p>
<p>The real, fundamental point is that some &#8220;things&#8221; (whether <span style="font-style: italic;">individuals</span>, <span style="font-style: italic;">instances</span> or <span style="font-style: italic;">classes</span>) are distinct from other &#8220;things&#8221;.         Such disjoint distinctions are a powerful concept that should not be         lost sight of by &#8220;<a href="http://en.wikipedia.org/wiki/How_many_angels_can_dance_on_the_head_of_a_pin%3F">angels         dancing on the head of a pin</a>&#8221; epistemological arguments. A         <span style="font-style: italic;">frog</span> is not a <span style="font-style: italic;">rock</span>, despite neither are &#8220;individuals&#8221;,         and how can we take advantage of that realilty?</p>
<h3>What Works for Entities, Works for Concepts</h3>
<p>Nearly from the outset of our work with UMBEL as a &#8216;TBox&#8217; <a href="#st3">[3]</a> &#8212; that         is, as a set of 20,000 or so common &#8220;subject concepts&#8221; &#8212; the natural         question was what the relation or correspondence was of these concepts         to the underlying &#8220;things&#8221; (entities) that they organized. As we probed         the disjoint categories within the Sekine 200 entity types, for         example, we began to see significant parallels and overlap. Also         gnawing at our sense of order was the rather artificial and arbitrary         class of concepts in UMBEL that we termed &#8220;Abstract Concepts&#8221;.</p>
<p>We <a href="../430/a-re-introduction-to-umbel/">introduced         Abstract Concepts</a> in the first release of UMBEL. When introduced,         we defined &#8220;<em>Abstract concepts</em> [as] representing abstract or         ephemeral notions such as truth, beauty, evil or justice, or [as]         thought constructs useful to organizing or categorizing things but are         not readily seen in the experiential world.&#8221; In pragmatic terms,         Abstract Concepts in UMBEL were often pivotal nodes in the UMBEL         subject graph necessary to maintain a high degree of concept         interconnectivity.</p>
<p>In any world view that attempts to be more-or-less comprehensive, there         is a gradation of concepts from the concrete and observable to the         abstract and ephemeral. The recognition that some of these concepts may         be more abstract, then, was not the issue. The issue was that there was         no definable basis for segregating a concrete Subject Concept from the         more Abstract Concept. Where was the bright line? What was the         actionable distinction?</p>
<p>Off and on we have probed this question for more than a year, and have         looked at what might constitute a more natural and logical ordering and         segmentation within UMBEL. After many tests and detailed analysis, we         are now releasing the first results of our investigations.</p>
<p>For, like nameable entities or things, we can see a logical         segmentation of (mostly) disjoint concepts within the UMBEL TBox. Here         are the summary percentages of these high-level splits:</p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td>Disjoint Concepts</td>
<td style="text-align: right;">90%</td>
</tr>
<tr>
<td>Attributes</td>
<td style="text-align: right;">1%</td>
</tr>
<tr>
<td>Classifications</td>
<td style="text-align: right;">9%</td>
</tr>
<tr>
<td>TOTAL</td>
<td style="text-align: right;">100%</td>
</tr>
</tbody>
</table>
<p>(Because the analysis is still being refined, exact counts and         percentages for the 20,000 concepts in UMBEL are not provided.)</p>
<h3>Why a Logical Segmentation?</h3>
<p>As we dove deeper into these ideas, not only could we see the basis for         a logical segmentation within UMBEL&#8217;s concepts, but manifest benefits         from doing so as well. Remember that UMBEL&#8217;s concept structure performs         two main roles. It:  1) provides a coherent framework for relating and         &#8220;mapping&#8221; other external ontologies; and 2) provides conceptual binding         points for organizing entities and instances <a href="#st4">[4]</a>. Via logical         segmentation, we get benefits for both roles.</p>
<p>Here are some of the broad areas of benefit from a logical UMBEL         segmentation that we have identified:</p>
<ul>
<li>Template-driven &#8212; as we <a href="../492/ontology-best-practices-for-data-driven-applications-part-3/"> discuss elsewhere</a>, <a href="http://structureddynamics.com/">Structured Dynamics</a> also uses its           ontologies to &#8220;drive applications&#8221; and the user interfaces (UI) that           support them. By proper segmentation of UMBEL concepts, we are able           to determine to what &#8220;cluster&#8221; of things (which we call either           <span style="font-style: italic;">dimensions</span> or <span style="font-style: italic;">superTypes</span>; see below) a given thing           belongs. This identification means we can also determine how best to           display information about that &#8220;thing&#8221;. This determination can           include either the attributes or the display templates appropriate           for that thing. For example, location-based things or time-based           things might invoke map or calendar or timeline type displays.           Moreover, because of the logical segmentation of concepts, we can           also use the power of the concept graph to infer more generic display           templates when specific matches are absent</li>
<li>Computational Efficiency &#8212; as the percentages above indicate, once         we identify what <span style="font-style: italic;">superType</span> concept to which a given instance belongs, we can eliminate nearly all         remaining UMBEL concepts from consideration. This logical winnowing         leads to computational efficiencies at all levels in the system. The         fastest computational work is not to do it, and when large chunks of         data are removed from consideration, many performance advantages accrue</li>
<li>Disambiguation &#8212; via this approach we now can         assess concept matches in addition         to entity matches. This means we can         triangulate between the two assessments to aid disambiguation. Because         of these logical segmentations, we also have multiple &#8220;clusters&#8221; (that         is, either the <span style="font-style: italic;">concept</span>,         <span style="font-style: italic;">type</span>, <span style="font-style: italic;">superType</span> or <span style="font-style: italic;">dimension</span>) upon which to do our         disambiguation evaluations, either between concepts and entities or         within the various concept clusters. We can do so via either multiple         <a href="http://en.wikipedia.org/wiki/Vector_space_model">semantic         vectors</a> (for statistical-based methods) or multiple <a href="http://en.wikipedia.org/wiki/Features_%28pattern_recognition%29">features</a> (for <a href="http://en.wikipedia.org/wiki/Machine_learning">machine         learning</a> methods). In other words, because of logical segmentation,         we have increased the informational power of our concept graph</li>
<li>Structure and Integrity Testing &#8212; the very mindset of looking for         logical segmentation has led to much learning about the UMBEL structure         and OpenCyc upon which it is based. In the process, missing nodes         (concepts), erroneous assignments, and superfluous nodes are all being         discovered. Further, many of these tests can be automated using basic         logical and inference approaches. The net result is a constant         improvement to the scope and completeness of the structure. Lastly,         these same approaches can be applied when mapping external ontologies         to UMBEL, providing similar consistency benefits.</li>
</ul>
<p>With these benefits in mind, we have undertaken concerted analysis of         UMBEL to discern what this &#8220;logical segmentation&#8221; might be. This         investigation has occurred over three concentrated periods over the         past year. (Intervening priorities or other work prevented         concentrating solely on this task.)</p>
<p>We are now complete with our first full iteraton of investigation. In         this post, and then the subsequent release of UMBEL version 0.80 in the         coming weeks, the fruits of this effort should be evident. However, it         should also be noted that we are still learning much from this new         mindset and approach. UMBEL structure refinement may be likely for some         time to come.</p>
<h3>UMBEL Analysis</h3>
<p>Most things and concepts about them are based on real, observable,         physical things in the real world. Because most of these things can not         occupy both the same moment in time and the same location in physical         space, a useful criterion for looking at these things and concepts is         <a href="http://en.wikipedia.org/wiki/Disjoint-set_data_structure">disjointedness</a>.</p>
<p>In a broad sense, then, we can split our concepts of the world between         those ideas that are disjoint because they pertain to separable objects         or ideas and those that are cross-cutting or organizational or         classificatory. Attributes, such as color (pink, for example), are         often cross-cutting in that they can be used to describe quite         disparate things. Inherent classification schemes such as academic         fields of study or library catalog systems &#8212; while useful ways to         organize the world &#8212; are not themselves in-and-of the world or         discrete from other ideas. Thus, classificatory or organizational         concepts are inherently not disjoint.</p>
<p>With the criterion of disjointedness in hand, then, we began an         evaluation process of the UMBEL subject concepts. We looked to         organizational schema such as the entity types of Sekine or BBN for         some starting guidance. We also kept in mind that we also wanted our         categories to inform logical clusterings of possible data presentation,         such as media types or locations or time.</p>
<p>For terminology, we adopted the term <span style="font-style: italic;">superType</span> to denote the largest cluster         designation upon which this disjointedness may occur. As a way to test         the basic coherence of these <span style="font-style: italic;">superTypes</span>, we also collected them into         larger groups which we termed <span style="font-style: italic;">dimensions</span>.</p>
<p>Our analysis process began with branch-by-branch testing of the UMBEL         concept graph using automated scripts, attempting to find pivotal nodes         where child instance members were disjoint from other <span style="font-style: italic;">superTypes</span>. This we term the &#8220;top-down&#8221;         method.</p>
<p>This automated analysis was then supplemented with a complete manual         inspection of all unassigned and assigned concepts, with a &#8220;bottom up&#8221;         assignment of concepts or corrections to the automated approach. This         inspection then led to new insights and identification of missing         concepts that needed to be added into UMBEL.</p>
<p>We are still converging between these two methods. Optimally, we should         be able to tease out all UMBEL <span style="font-style: italic;">superTypes</span> with a relatively few number of         <span style="font-weight: bold;">union</span>, <span style="font-weight: bold;">intersection</span>, or <span style="font-weight: bold;">complement</span> <a href="http://en.wikipedia.org/wiki/Set_theory#Basic_concepts">set         operations</a>. In its current form, we are close, but there are still         some rough spots.</p>
<p>Nonetheless, this analysis method has led us to identify some 33         <span style="font-style: italic;">superTypes</span> <a href="#st5">[5]</a>, clustered into         9 dimensions. Of these, 29 <span style="font-style: italic;">superTypes</span> and 8 dimensions are mostly         disjoint. The one dimension of Classificatory includes the four         cross-cutting <span style="font-style: italic;">superTypes</span> of         attributes and organizational schema that can apply to any of the 29         disjoint <span style="font-style: italic;">superTypes</span>.</p>
<h4>UMBEL superTypes</h4>
<p>Here is the schema, with the descriptions of each:</p>
<table style="border-collapse: collapse; width: 684px;" border="0" cellspacing="0" cellpadding="8">
<col style="width: 110pt;" width="146"></col>
<col style="width: 125pt;" width="166"></col>
<col style="width: 449pt;" width="599"></col>
<tbody>
<tr style="height: 25.5pt;" height="34">
<td style="height: 25.5pt; width: 110pt; font-weight: bold; background-color: #cccccc; text-align: center;">Dimension</td>
<td style="border-left: medium none; width: 125pt; font-weight: bold; background-color: #cccccc; text-align: center;" width="166">superType</td>
<td style="border-left: medium none; width: 449pt; font-weight: bold; background-color: #cccccc; text-align: center;" width="599">Description/Sub-types</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="border-top: medium none; height: 63.75pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Natural World</td>
<td style="border-top: medium none; font-weight: bold;">Natural Phenomena</td>
<td style="border-top: medium none; width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               natural phenomena and natural processes such as weather,               weathering, erosion, fires, lightning, earthquakes, tectonics,               etc. Clouds and weather processes are specifically included. Also               includes climate cycles, general natural events (such as               hurricanes) that are not specifically named, and biochemical               processes and pathways.</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold;">Natural Substances</td>
<td style="width: 449pt;" width="599">Notable inclusions are minerals, compounds, chemicals, or               physical objects that are not the outcome of purposeful human               effort, but are found naturally occurring. Other natural objects               (such as rock, fossil, etc.) are also found under this               <span style="font-style: italic;">superType</span>.</td>
</tr>
<tr style="height: 102pt;" height="136">
<td style="height: 102pt; font-weight: bold; background-color: #cccccc;" height="136"></td>
<td style="font-weight: bold;">Earthscape</td>
<td style="width: 449pt;" width="599">The Earthscape <span style="font-style: italic;">superType</span> consists mostly of the collection of               cartographic features that occur on the surface of the Earth.               Positive examples include Mountain, Ocean, and Mesa. Artificial               features such as canals are excluded. Most instances of these               features have a fixed location in space.</p>
<p>Underground and underwater are also explicitly contained.</p>
<p>This <span style="font-style: italic;">superType</span> is               explicitly disjoint with Extraterrestrial (see below).</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: #cccccc;" height="38"></td>
<td style="font-weight: bold;">Extraterrestrial</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               all natural things not specifically terrestrial, including               celestial bodies (planets, asteroids, stars, galaxies, etc., that               can be located within a sky map)</td>
</tr>
<tr style="height: 30pt;" height="40">
<td style="border-top: medium none; height: 30pt; font-weight: bold; background-color: white; vertical-align: top;">Living Things</td>
<td style="border-top: medium none; font-weight: bold;">Prokaryotes</td>
<td style="border-top: medium none; width: 449pt;" width="599">The Prokaryotes include all prokaryotic organisms, including the               Monera, Archaebacteria, Bacteria, and Blue-green algas. Also               included in this <span style="font-style: italic;">superType</span> are viruses and prions.</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: white;" height="38"></td>
<td style="font-weight: bold;">Protists or Fungus</td>
<td style="width: 449pt;" width="599">This is the remaining cluster of eukaryotic organisms,               specifically including the fungus and the protista (protozoans               and slime molds).</td>
</tr>
<tr style="height: 41.25pt;" height="55">
<td style="height: 41.25pt; font-weight: bold; background-color: white;" height="55"></td>
<td style="font-weight: bold;">Plants</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               all plant types and flora, including flowering plants, algae,               non-flowering plants, gymnosperms, cycads, and plant parts and               body types. Note that all Plant Parts are also included.</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="height: 63.75pt; font-weight: bold; background-color: white;" height="85"></td>
<td style="font-weight: bold;">Animals</td>
<td style="width: 449pt;" width="599">This large <span style="font-style: italic;">superType</span> includes all animal types, including specific animal types and               vertebrates, invertebrates, insects, crustaceans, fish, reptiles,               amphibia, birds, mammals, and animal body parts. Animal parts are               specifically included. Also, groupings of such animals are               included. Humans, as an animal, are included (versus as an               individual Person). Diseases are specifically excluded.</td>
</tr>
<tr style="height: 56.25pt;" height="75">
<td style="height: 56.25pt; font-weight: bold; background-color: white;" height="75"></td>
<td style="font-weight: bold;">Diseases</td>
<td style="width: 449pt;" width="599">Diseases are atypical or unusual or unhealthy conditions for               (mostly human) living things, generally known as conditions,               disorders, infections, diseases or syndromes. Diseases only               affect living things and sometimes are caused by living things.               This <span style="font-style: italic;">superType</span> also               includes impairments, disease vectors, wounds and injuries, and               poisoning</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="height: 63.75pt; font-weight: bold; background-color: white;" height="85"></td>
<td style="font-weight: bold;">Person Types</td>
<td style="width: 449pt;" width="599">The appropriate <span style="font-style: italic;">superType</span> for all named, individual               human beings. This <span style="font-style: italic;">superType</span> also includes the               assignment of formal, honorific or cultural titles given to               specific human individuals. It further includes names given to               humans who conduct specific jobs or activities (the latter case               is known as an avocation). Examples include steelworker,               waitress, lawyer, plumber, artisan. Ethnic groups are               specifically included.</td>
</tr>
<tr style="height: 181.5pt;" height="242">
<td style="border-top: medium none; height: 181.5pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Human Activities</td>
<td style="border-top: medium none; font-weight: bold;">Organizations</td>
<td style="border-top: medium none; width: 449pt;" width="599">Organization is a broad <span style="font-style: italic;">superType</span> and includes formal               collections of humans, sometimes by legal means, charter,               agreement or some mode of formal understanding. Examples include               geopolitical entities such as nations, municipalities or               countries; or companies, institutes, governments, universities,               militaries, political parties, game groups, international               organizations, trade associations, etc. All institutions, for               example, are organizations.</p>
<p>Also included are informal collections of humans. Informal or               less defined groupings of humans may result from ethnicity or               tribes or nationality or from shared interests (such as social               networks or mailing lists) or expertise (&#8221;communities of               practice&#8221;). This dimension also includes the notion of               identifiable human groups with set members at any given point in               time. Examples include music groups, cast members of a play,               directors on a corporate Board, TV show members, gangs, mobs,               juries, generations, minorities, etc.</p>
<p>Finally, Organizations contain the concepts of Industries and               Programs and Communities.</td>
</tr>
<tr style="height: 42pt;" height="56">
<td style="height: 42pt; font-weight: bold; background-color: #cccccc;" height="56"></td>
<td style="font-weight: bold;">Finance &amp; Economy</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> pertains               to all things financial and with respect to the economy,               including chartable company performance, stock index entities,               money, local currencies, taxes, incomes, accounts and accounting,               mortgages and property.</td>
</tr>
<tr style="height: 54pt;" height="72">
<td style="height: 54pt; font-weight: bold; background-color: #cccccc;" height="72"></td>
<td style="font-weight: bold;">Culture, Issues, Beliefs</td>
<td style="width: 449pt;" width="599">This category includes concepts related to political systems,               laws, rules or cultural mores governing societal or community               behavior, or doctrinal, faith or religious bases or entities               (such as gods, angels, totems) governing spiritual human matters.               Culture, Issues, beliefs and various activisms (most -isms) are               included</td>
</tr>
<tr style="height: 53.25pt;" height="71">
<td style="height: 53.25pt; font-weight: bold; background-color: #cccccc;" height="71"></td>
<td style="font-weight: bold;">Activities</td>
<td style="width: 449pt;" width="599">These are ongoing activities that result (mostly) from human               effort, often conducted by organizations to assist other               organizations or individuals (in which case they are known as               services, such as medicine, law, printing, consulting or               teaching) or individual or group efforts for leisure, fun,               sports, games or personal interests (activities)</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: white; vertical-align: top;">Human Works</td>
<td style="border-top: medium none; font-weight: bold;">Products</td>
<td style="border-top: medium none; width: 449pt;" width="599">This is the largest <span style="font-style: italic;">superType</span> and includes any instance               offered for sale or performed as a commercial service. Often               physical object made by humans that is not a conceptual work or a               facility, such as vehicles, cars, trains, aircraft, spaceships,               ships, foods, beverages, clothes, drugs, weapons. Products also               include the concept of &#8217;state&#8217; (e/g/., on/off)</td>
</tr>
<tr style="height: 25.5pt;" height="34">
<td style="height: 25.5pt; font-weight: bold; background-color: white;" height="34"></td>
<td style="font-weight: bold;">Food or Drink</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is any               edible substance grown, made or harvested by humans. The category               also specifically includes the concept of cuisines</td>
</tr>
<tr>
<td style="height: 12.75pt; font-weight: bold; background-color: white;"></td>
<td style="font-weight: bold;">Drugs</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is an               drug, medication or addictive substance</td>
</tr>
<tr style="height: 143.25pt;" height="191">
<td style="height: 143.25pt; font-weight: bold; background-color: white;" height="191"></td>
<td style="font-weight: bold;">Facilities</td>
<td style="width: 449pt;" width="599">Facilities are physical places or buildings constructed by               humans, such as schools, public institutions, markets, museums,               amusement parks, worship places, stations, airports, ports,               carstops, lines, railroads, roads, waterways, tunnels, bridges,               parks, sport facilities, monuments. All can be geospatially               located.</p>
<p>Facilities also include animal pens and enclosures and general               human &#8220;activity&#8221; areas (golf course, archeology sites, etc.).               Importantly, Facilities include infrastructure systems such as               roadways and physical networks.</p>
<p>Facilities also include the component parts that go into making               them (such as foundations, doors, windows, roofs, etc.)</td>
</tr>
<tr style="height: 39.75pt;" height="53">
<td style="border-top: medium none; height: 39.75pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Information</td>
<td style="border-top: medium none; font-weight: bold;">Chemistry (n.o.c)</td>
<td style="border-top: medium none; width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is a               residual category (n.o.c., not otherwise categorized) for               chemical bonds, chemical composition groupings, and the like. It               is formed by what is not a natural substance or living thing               (organic) substance.</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;" height="37"></td>
<td style="font-weight: bold;">Audio Info</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is for               any audio-only human work. Examples include live music               performances, record albums, or radio shows or individual radio               broadcasts</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;" height="37"></td>
<td style="font-weight: bold;">Visual Info</td>
<td style="width: 449pt;" width="599">This<em> superType</em> includes any still image or picture or streaming video human work, with or               without audio. Examples include graphics, pictures, movies, TV               shows, individual shows from a TV show, etc.</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: #cccccc;" height="38"></td>
<td style="font-weight: bold;">Written Info</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               any general material written by humans including books, blogs,               articles, manuscripts, but any written information conveyed via               text.</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold;">Structured Info</td>
<td style="width: 449pt;" width="599">This information <span style="font-style: italic;">superType</span> is for all kinds of               structured information and datasets, including computer programs,               databases, files, Web pages and structured data that can be               presented in tabular form</td>
</tr>
<tr style="height: 127.5pt;" height="170">
<td style="height: 127.5pt; font-weight: bold; background-color: #cccccc;" height="170"></td>
<td style="font-weight: bold;">Notations &amp; References</td>
<td style="width: 449pt;" width="599">Akin to conceptual works, these are codified means of human               expression. Examples range from human languages themselves, to               more domain-specific cases such as chemical symbols, genetic code               (A-G-C-T), protocols, and computer languages, mathematical and               set notations, etc.</p>
<p>Identifiers (numeric or alphanumeric identifiers for objects,               often in a highly patterned way, such as phone numbers, URLs, zip               and postal codes, SKUs, product codes, etc.), Units (any of the               various ways in which measurement, space, volume, weight, speed,               intensity, temperature, calories, siesmic intensity or other               quantitative descriptions of phenomena can be made) and key               reference types are also included in this <span style="font-style: italic;">superType</span></td>
</tr>
<tr style="height: 16.5pt;" height="22">
<td style="height: 16.5pt; font-weight: bold; background-color: #cccccc;"></td>
<td style="font-weight: bold;">Numbers</td>
<td style="width: 449pt;">This unique <span style="font-style: italic;">superType</span> is               for any abstract representation of numbers and numerics</td>
</tr>
<tr style="height: 27pt;" height="36">
<td style="border-top: medium none; height: 27pt; font-weight: bold; background-color: white; vertical-align: top;">Human Places</td>
<td style="border-top: medium none; font-weight: bold;">Geopolitical</td>
<td style="border-top: medium none; width: 449pt;">Named places that have some informal or formal political               (authorized) component. Important subcollections include Country,               IndependentCountry, State_Geopolitical, City, and Province.</td>
</tr>
<tr style="height: 27pt;" height="36">
<td style="height: 27pt; font-weight: bold; background-color: white;"></td>
<td style="font-weight: bold;">Workplaces, etc.</td>
<td style="width: 449pt;">These are various workplaces and areas of human activities,               ranging from single person workstations to large aggregations of               people (but which are not formal political entities)</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="border-top: medium none; height: 38.25pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Time-related</td>
<td style="border-top: medium none; font-weight: bold;">Events</td>
<td style="border-top: medium none; width: 449pt;">These are nameable occasions, games, sports events, conferences,               natural phenomena, natural disasters, wars, incidents,               anniversaries, holidays, or notable moments or periods in time</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;"></td>
<td style="font-weight: bold;">Time</td>
<td style="width: 449pt;">This <span style="font-style: italic;">superType</span> is for               specific time or date or period (such as eras, or days, weeks,               months type intervals) references in various formats</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: white; vertical-align: top;">Descriptive</td>
<td style="border-top: medium none; font-weight: bold; background-color: #ffffcc;">Attributes</td>
<td style="border-top: medium none; width: 449pt; background-color: #ffffcc;">This general <span style="font-style: italic;">superType</span> category is for descriptive attributes of all kinds. Think of the             specific attributes in Wikipedia &#8220;infoboxes&#8221; to understand the             purpose and coverage of this <span style="font-style: italic;">superType</span>. It includes colors, shapes,             sizes, or other descriptive characteristics about an object</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Classificatory</td>
<td style="border-top: medium none; font-weight: bold; background-color: #ffffcc;">Abstract-level</td>
<td style="border-top: medium none; width: 449pt; background-color: #ffffcc;" width="599">This general <span style="font-style: italic;">superType</span> category is largely composed of former AbstractConcepts, and               represent some of the more abstract upper-level nodes for               connecting the UMBEL structure together. This <span style="font-style: italic;">superType</span> also includes theories or               processes or methods for humans to do stuff or any human               technology</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold; background-color: #ffffcc;">Topics/Categories</td>
<td style="width: 449pt; background-color: #ffffcc;" width="599">This largely subject-oriented <span style="font-style: italic;">superType</span> is a means for using               controlled vocabularies and classification schemes for               characterizing what content &#8220;is about&#8221;. The key constituents of               this category are Types, Classifications, Concepts, Topics, and               controlled vocabularies</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold; background-color: #ffffcc;">Markets &amp; Industries</td>
<td style="width: 449pt; background-color: #ffffcc;" width="599">This <span style="font-style: italic;">superType</span> is a               specialized classificatory system for markets and industries. It               could be combined with the <span style="font-style: italic;">superType</span> above, but is kept               separate in order to provide a separate, economy-oriented system.</td>
</tr>
</tbody>
</table>
<p>These may undergo some further refinement prior to release of UMBEL v         0.80, and some of the definitions will be tightened up.</p>
<p>(Note: It should also be mentioned that some of these <span style="font-style: italic;">superTypes</span> further lend themselves to         further splits and analysis. The Product <span style="font-style: italic;">superType</span>, for example, is ripe for such         treatment.)</p>
<h4>Distribution of superTypes</h4>
<p>The following diagram shows the distribution of these 20,000 UMBEL         concepts across major area. By far the largest <span style="font-style: italic;">superType</span> is Products, even with further         splits into Food and Drinks and Pharmaceuticals. The next largest         categories are Person and Places and Events <span style="font-style: italic;">superTypes</span>, with Organizations and Animals not far behind:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_count.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 527px;" title="Click to expand" src="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_count.png" alt="# of superTypes by Category" width="792" height="696" /></a></div>
<p>Even in its generic state, UMBEL provides a very rich vocabulary for         describing things or for tying in more detailed external ontologies.         There are nearly 5,000 concepts across products of all types, for         example.</p>
<h4>Possible Overlaps (non-disjoint) between superTypes</h4>
<p>You may recall that our analysis showed 29 of the <span style="font-style: italic;">superTypes</span> to be &#8220;mostly disjoint.&#8221;          This is because there are some concepts &#8212; say, <span style="font-family: monospace;">MusicPerformingAgent</span> &#8212;         that can apply to either a person or a group (band or orchestra, for         example). Thus, for this concept alone, we have a bit of overlap         between the normally disjoint Person and Organization <span style="font-style: italic;">superTypes</span>.</p>
<p>The following shows the resulting interaction matrix where there may be         some overlap between <span style="font-style: italic;">superTypes</span>:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090831_UMBELmatrix.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 513px;" title="Click to expand" src="../wp-content/themes/ai3/images/2009Posts/090831_UMBELmatrix.png" alt="Instance superTypes Overlap" width="856" height="732" /></a></div>
<p>This kind of interaction diagram is also useful for further analyzing         the concept graph structure, as well.</p>
<h4>Even Where Overlaps Occur, They are Minor</h4>
<p>Of the 29 &#8220;mostly&#8221; disjoint <span style="font-style: italic;">superTypes</span>, only a relatively few show         potential interactions, and then only in minor ways. We can illustrate         this (drawn to scale) for the interaction between the Product, Food         &amp; Drink and Drug (Pharmaceuticals) <span style="font-style: italic;">superTypes</span>, with the fully disjoint         Organization <span style="font-style: italic;">superType</span> thrown         in for comparison:</p>
<div style="margin: 10px 0px;"><img class="center_ok" style="border: 0px solid; width: 380px; height: 519px;" title="Example superTypes Overlap" src="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_venn.png" alt="Example superTypes Overlap" width="380" height="519" /></div>
<p>Across all 20,000 concepts, then, fully 85% are disjoint from one         another (5% is lost due to overlaps between &#8220;mostly&#8221; disjoint         <span style="font-style: italic;">superTypes</span>). This is a         surprising high percentage, with even better likelihood to deliver the         benefits previously noted.</p>
<h3>Interim Conclusions and Observations</h3>
<p>These are exciting findings that bode well for UMBEL&#8217;s ongoing role and         usefulness. Also, the very detailed analysis that has led to these         interim findings very much reaffirms the wisdom of basing UMBEL on         Cyc.  Cyc showed itself to be admirably coherent and remarkably         complete. (It also appears that the first versions of UMBEL were also         extracted well in terms of good coverage.)</p>
<p>This approach now gives us an understandable and defensible basis for         logical segementation of UMBEL. It also provides a much-desired         alternative to the earlier Abstract Concepts, which will now be dropped         entirely as a schema concept.</p>
<p>One area deserving further attention is in the Attribute <span style="font-style: italic;">superType</span>. We are in the process, for         example, of analyzing attributes across Wikipedia and need to look         through a slightly different lens at this <span style="font-style: italic;">superType</span> <a href="#st6">[6]</a>. This area is further         important in its strong interaction with the <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> Instance Record Vocabulary</a> that is accompanying this effort on the         entity side.</p>
<p>Another lesson for us has been to back away from the terminology of         named entity, introduced at MUC-6. The expansions of that idea into         other &#8220;nameable&#8221; things has caused us to embrace the &#8220;instance&#8221;         nomenclature, as evidenced by our emerging IRV.</p>
<p>It is rewarding to prepare this next iteration release of UMBEL with its new mindset of logical segmentation and disjointedness. But &#8212; what is also clear &#8212; there are many treasures left to mine still hidden in the inherent structure of         UMBEL and its Cyc parent.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st1"></a> [1] The original labels were ENAMEX for <span style="font-style: italic;">entity named expression</span> and NUMEX for         <span style="font-style: italic;">numeric expression</span>. The markup         format specified was also SGML. For an interesting history of this         MUC-6 watershed, see Ralph Grishman and Beth Sundheim, 1996.         <em><a title="http://acl.ldc.upenn.edu/C/C96/C96-1079.pdf" rel="nofollow" href="http://acl.ldc.upenn.edu/C/C96/C96-1079.pdf">Message Understanding Conference &#8211; 6: A Brief         History</a></em>, in <em>Proceedings of the 16th International Conference         on Computational Linguistics (COLING),</em> I, Kopenhagen, 1996,         466–471.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st2"></a> [2] In a <em>named entity</em>, the word <em>named</em> applies to         entities that have a &#8220;rigid designators&#8221; as defined by Kripke for the         referent. For instance, the automotive company created by Henry Ford in         1903 is referred to as Ford or Ford Motor Company. Rigid designators         include proper names as well as certain natural kind of terms like         biological species and substances.</p>
<p><span style="font-size: x-small;">Sekine’s <a href="http://nlp.cs.nyu.edu/ene/version6_1_0eng.html">extended hierarchy</a> proposed in 2002 is made up of 200 subtypes, with 32 larger clusters         within that. Here is the top level of the Sekine type system:</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Name-Other</span></td>
<td><span style="font-size: x-small;">Title</span></td>
<td><span style="font-size: x-small;">Timex</span></td>
<td><span style="font-size: x-small;">Frequency</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">Unit</span></td>
<td><span style="font-size: x-small;">Periodx</span></td>
<td><span style="font-size: x-small;">Rank</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Vocation</span></td>
<td><span style="font-size: x-small;">Numex-Other</span></td>
<td><span style="font-size: x-small;">Age</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Location</span></td>
<td><span style="font-size: x-small;">Disease</span></td>
<td><span style="font-size: x-small;">Money</span></td>
<td><span style="font-size: x-small;">School Age</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">God</span></td>
<td><span style="font-size: x-small;">Stock Index</span></td>
<td><span style="font-size: x-small;">Latitude Longitude</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">ID Number</span></td>
<td><span style="font-size: x-small;">Point</span></td>
<td><span style="font-size: x-small;">Measurement</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Event</span></td>
<td><span style="font-size: x-small;">Color</span></td>
<td><span style="font-size: x-small;">Percent</span></td>
<td><span style="font-size: x-small;">Countx</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Natural Object</span></td>
<td><span style="font-size: x-small;">Time-Other</span></td>
<td><span style="font-size: x-small;">Multiplication</span></td>
<td><span style="font-size: x-small;">Ordinal Number</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">Though developed separately and for different purposes,         <a href="http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html"> BBN categories</a> also proposed in 2002 consists of 29 types and 64         subtypes. Here are the BBN types (Note: BBN claims 29 types because         there are double entries or considerations for the first five         entries):</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">Time</span></td>
<td><span style="font-size: x-small;">Animal</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NORP (adjectival GPEs)</span></td>
<td><span style="font-size: x-small;">Percent</span></td>
<td><span style="font-size: x-small;">Substance</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">Money</span></td>
<td><span style="font-size: x-small;">Disease</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Quantity</span></td>
<td><span style="font-size: x-small;">Work of Art</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">GPE (geopolitical places)</span></td>
<td><span style="font-size: x-small;">Ordinal</span></td>
<td><span style="font-size: x-small;">Law</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Location</span></td>
<td><span style="font-size: x-small;">Cardinal</span></td>
<td><span style="font-size: x-small;">Language</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">Events</span></td>
<td><span style="font-size: x-small;">Contact Info</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Date</span></td>
<td><span style="font-size: x-small;">Plant</span></td>
<td><span style="font-size: x-small;">Game</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">Of course, other entity extraction systems have similar         clusterings and approaches. Though less formal in the sense of a         hierarchy or purported complete entity coverage, here for example is         the listing of entity types within <a href="http://opencalais.com/documentation/calais-web-service-api/api-metadata/entity-index-and-definitions"> Calais</a>:</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Anniversary</span></td>
<td><span style="font-size: x-small;">FaxNumber</span></td>
<td><span style="font-size: x-small;">NaturalFeature</span></td>
<td><span style="font-size: x-small;">RadioProgram</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">City</span></td>
<td><span style="font-size: x-small;">Holiday</span></td>
<td><span style="font-size: x-small;">OperatingSystem</span></td>
<td><span style="font-size: x-small;">RadioStation</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Company</span></td>
<td><span style="font-size: x-small;">IndustryTerm</span></td>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Region</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Continent</span></td>
<td><span style="font-size: x-small;">MarketIndex</span></td>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">SportsEvent</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Country</span></td>
<td><span style="font-size: x-small;">MedicalCondition</span></td>
<td><span style="font-size: x-small;">PhoneNumber</span></td>
<td><span style="font-size: x-small;">SportsGame</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Currency</span></td>
<td><span style="font-size: x-small;">Movie</span></td>
<td><span style="font-size: x-small;">Position</span></td>
<td><span style="font-size: x-small;">SportsLeague</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">EmailAddress</span></td>
<td><span style="font-size: x-small;">MusicAlbum</span></td>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">Technology</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">EntertainmentAwardEvent</span></td>
<td><span style="font-size: x-small;">MusicGroup</span></td>
<td><span style="font-size: x-small;">ProgrammingLanguage</span></td>
<td><span style="font-size: x-small;">TVShow</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">NaturalDisaster</span></td>
<td><span style="font-size: x-small;">ProvinceOrState</span></td>
<td><span style="font-size: x-small;">TVStation</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><span style="font-size: x-small;"> </span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;"> </span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;">PublishedMedium</span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;">URL</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">See further the Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">named entity         recognition</a>.</span></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st3"></a> [3] We use the reference to “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” in accordance with         our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st4"></a> [4] UMBEL also provides a <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a>-based vocabulary extension         for describing other domains and mappings between classes and         instances. This purpose, however, is outside of the scope of this current         article.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st5"></a> [5] As a reference roadmap, UMBEL was specifically designed         <span style="font-weight: bold; font-style: italic;">not</span> to         include <a href="http://en.wikipedia.org/wiki/Meronymy">meronymous</a> (part of) relationships (see further this reference). Thus, all &#8220;part         of&#8221; type concepts were assigned to the whole <span style="font-style: italic;">superType</span> category for which they are a         part. Thus, &#8220;animal parts&#8221; are assigned to the <span style="font-style: italic;">superType</span> Animal; &#8220;car parts&#8221; to the         <span style="font-style: italic;">superType</span> Product.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st6"></a> [6] For a general discussion of attributes and their relation to         entities, see Satoshi Sekine, 2008. Extended Named Entity Ontology with         Attribute Information, in <span style="font-style: italic;">Proceedings         of the 6th edition of the Language Resources and Evaluation Conference         (LREC 2008)</span>. Marrakech, Morocco. See <a href="http://www.lrec-conf.org/proceedings/lrec2008/pdf/21_paper.pdf">http://www.lrec-conf.org/proceedings/lrec2008/pdf/21_paper.pdf</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Confronting Misconceptions with Adaptive Ontologies</title>
		<link>http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/</link>
		<comments>http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 17:59:00 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[data federation]]></category>
		<category><![CDATA[data-driven applications]]></category>
		<category><![CDATA[interoperability]]></category>
		<category><![CDATA[Ontology]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[TBox]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=553</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Confronting Misconceptions with Adaptive Ontologies&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/&amp;rft.language=English"></span>

Ontology Best Practices for Data-driven Applications: Part 4
The earlier portions of this occasional  series have set the groundwork for the role of ontologies in  data-driven applications. In this part, I address many of the current  misconceptions of what ontologies do or do not do. For, as practiced by  Structured Dynamics, our [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Confronting Misconceptions with Adaptive Ontologies&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; width: 204px; height: 236px; float: left; margin-right: 10px;" title="Structured Dynamics LLC" src="../wp-content/themes/ai3/images/2008Posts/080505_impossible3.gif" alt="Structured Dynamics LLC" hspace="0" vspace="5" align="left" /></a></p>
<h2>Ontology Best Practices for Data-driven Applications: Part 4</h2>
<p>The earlier portions of this <a href="../category/ontology-best-practices/">occasional  series</a> have set the groundwork for the role of ontologies in  data-driven applications. In this part, I address many of the current  misconceptions of what ontologies do or do not do. For, as practiced by  <a href="http://structureddynamics.com/">Structured Dynamics</a>, our  adaptive TBox-level ontologies <a href="#adapto1">[1]</a> are definitely not your  grandfather&#8217;s Oldsmobile.</p>
<p>To share the punch line early, these modern ontologies are fast to  develop, easy to change, adaptive to new knowledge and perceptions,  robust and flexible. Indeed, it is the structure and nature of these  adaptive ontologies that is the heart and secret of <span style="font-weight: bold;">data-driven applications</span>.</p>
<p>Any knowledge worker can understand and refine the organization and  relationship of information via these structures. And, most  importantly, the resulting ontologies are sufficient to drive the  generic applications that are based on them. Focusing on data and  structure now becomes the emphasis. We can now remove prior bottlenecks  arising from the need to customize applications, configure report  writers, or wait for IT to generate SQL queries.</p>
<p>But, not all ontologies are created equally and not all practitioners  explain or see them in the same way. The purpose of this Part 4 in  <a href="../category/ontology-best-practices/">our  series</a> is to present many of the misconceptions, offering a score  of takeaway messages for how properly considered and constructed  ontologies can achieve these benefits.</p>
<h3><span style="font-style: italic;">Misconception:</span> No &#8216;Big Bang&#8217;  Needed</h3>
<p>To be sure, there are many very large and comprehensive ontologies.  Some are focused on specific applications or domains; some are general;  and some are the result of large and well-funded projects <a href="#adapto2">[2]</a>. I am not  arguing that such efforts do not have their role and place. But when  viewed as exemplars or notable cases, these complex and comprehensive  ontologies can create a misconception that such a scope is an  imperative of proper ontology design.</p>
<p>I believe quite the opposite to be true.</p>
<p>An incredible strength of RDF and OWL ontologies is that they can be  built incrementally. So long as additions are coherent with some degree  of self-consistency in terms of the world view in which they are  represented, any of an ontology&#8217;s constituent concepts, predicates or  entities and datasets can be added and enhanced as needed. This makes  ontologies a very different cat from relational schema, which are  notoriously brittle with expensive re-architecting required anytime  that scope or schema change.</p>
<p>Enterprise consultants that advocate &#8220;big&#8221; upfront ontology development  efforts are doing their clients a massive disservice. They are also  cynically playing on the experience with relational schema. As soon as  the marketplace begins to realize that ontologies are incredibly  plastic and malleable, this huge advantage of ontologies over the  relational model for data federation will ring clear.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #1</span><span style="text-decoration: underline;">:</span> Ontologies can (and should!) start small.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #2</span><span style="text-decoration: underline;">:</span> Ontologies can (and should!) grow incrementally.</p>
<h3><span style="font-style: italic;">Misconception:</span> No &#8216;One Ring to  Rule Them All&#8217;</h3>
<p>As a practitioner, two of the most boring arguments I hear are:  <span style="font-style: italic;">Ontology X is better than other  ontologies and here is why</span>; and, <span style="font-style: italic;">Use of some reference or upper ontology reduces  choice and freedom</span>. Both arguments are somewhat grounded in the  <a href="http://en.wikipedia.org/wiki/One_ring_to_rule_them_all">&#8216;one  ring to rule them all&#8217;</a> mindset &#8212; though coming from opposing perspectives &#8212; that I think fundamentally  misreads the role and purpose of ontologies.</p>
<p>Ontologies provide an organizing context for relating disparate  information together and for making meaningful inferences. Without such  a framework these purposes can not be achieved. But the framework  itself is a function of the world view, context and domain scope at  hand. As a result, there is only context, and not some single,  universal &#8220;truth.&#8221; As they say, it all depends.</p>
<p>The trick, then, to properly designed ontologies is to maintain  internal coherence and self-consistency <a href="#adapto3">[3]</a>. When done, it is then  possible to relate disparate information and data to other data and to  make intelligent business inferences.</p>
<p>So, the use of an ontology does not limit freedom. It sets the context  for making connections and setting relations. And, as long as it is  coherent, the &#8220;correct&#8221; ontology is the one that best captures the  scope and domain at hand. Arguing for one ontology <span style="font-style: italic;">v</span> another is wasted energy. Just get on  with it.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #3</span><span style="text-decoration: underline;">:</span> There is no single &#8220;truth&#8221;, only coherence and relevant context.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Such Thing  as an &#8216;Ontological Commitment&#8217;</h3>
<p>One of the more pernicious ideas promoted by some practitioners or  advocates is the idea of &#8216;<a href="http://en.wikipedia.org/wiki/Ontological_commitment">ontological  commitment</a>.&#8217; Though some definitions are relatively benign, such as  the one offered by the Stanford Knowledge Systems Laboratory (KSL) <a href="#adapto4">[4]</a>,  the unfortunate use of the term &#8220;commitment&#8221; implies permanence and  immutability. (In fact, most definitions of this phrase affirm this  interpretation.)</p>
<p>This is really unfortunate, as it again tends to reinforce the  inaccurate analogies with brittle and inflexible relational schema.</p>
<p>A much better way to view ontologies is not as a &#8220;commitment,&#8221; but as a  vehicle for developing a common world view within the enterprise. Under  this viewpoint, ontology development is somewhat analogous to <a href="http://en.wikipedia.org/wiki/Master_Data_Management">master data  management</a> (MDM) or corporate taxonomies <a href="#adapto5">[5]</a>. In this broader  sense, then, ontology development can become a means for  developing and refining a common language within the enterprise through  consensual or community processes.</p>
<p>For the reasons as noted above, as language or conceptual relationships  or understandings change, so can the vocabulary or structural character  of the ontology change. There is no &#8220;lock in&#8221;; there is no  &#8220;commitment&#8221;. As long as it is coherent, the ontology can morph to  reflect the scope and understandings of the current snapshot in time.</p>
<p>This flexibility results from the fact that the ontologies, properly  constructed, can drive a generic set of tools and applications that  express themselves based on the underlying structure and vocabulary  within those ontologies. The ontologies can thus change at will without  any adverse effects whatsoever on the applications based on them.</p>
<p>This data-driven aspect, as noted throughout this series, is quite  different from any prior paradigm. So, under this view ontologies have  considerably more focus and importance than even some of the strongest  ontology advocates claim, yet paradoxically without the theoretical  bloat or heaviness many purport. Like human languages, our language and  concepts within ontologies change as our world and perceptions change.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #4</span><span style="text-decoration: underline;">:</span> There is no &#8220;lock-in&#8221; with ontologies; they may be modified and changed  at will.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #5</span><span style="text-decoration: underline;">:</span> Like corporate taxonomies or MDM, ontologies provide a framework for  enterprises to develop internally consistent common languages or  vocabularies.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #6</span><span style="text-decoration: underline;">:</span> Unlike corporate taxonomies or MDM, ontologies can drive directly  generic tools and applications.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Need for  Completeness or Comprehensiveness</h3>
<p>Ontology development is not some imperative for conceptual &#8220;truth&#8221;;  rather, it is a very adaptable means for stating, testing and refining  stuff. Like <a href="http://en.wikipedia.org/wiki/Agile_development">agile development</a> for software, this refining approach can and should proceed  incrementally. Too often ontology efforts get caught like deer in the  headlights awaiting some &#8220;completeness&#8221; threshold before release.</p>
<p>One means to promote this approach is to tackle single datasets or data  stores individually before moving on. Having a sense of the eventual  scope is useful, of course. But it is also quite acceptable to only  fill out those portions of the structure with data available at hand.</p>
<p>These observations reflect a prejudice to action and release, rather  than theory. If mistakes are made, fine: simply correct them.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #7</span><span style="text-decoration: underline;">:</span> Understand the full scope, but only build out for the data in hand.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Need for  Predicate Bloat</h3>
<p>It is advisable to keep relationships (predicates) simple at first.  Because, again, like human languages, keeping the verbs simple until  fluency is gained is another best practice.</p>
<p>While all of us can see nuances and subtleties heading into a project,  trying to accommodate those predicates (relationships) at the outset  can introduce unnecessary complexity. This is not an advocacy in any  way for inaccurate predicates, but perhaps to err on the side of the  general and broader at first.</p>
<p>For organizations familiar with taxonomies, the <a href="http://www.w3.org/2004/02/skos/">SKOS</a> vocabulary is a good focus,  and there are some other standard starting ontologies that provide a  good starting base of predicates <a href="#adapto6">[6]</a>. Then, as you work with your data  and its requirements, you can later expand to more sophisticated  relationships.</p>
<p>In taking this approach you will still see immediate benefits due to  the value of connected data through the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span> <a href="#adapto7">[7]</a>.  But, at the same time, you will be embracing a simpler language to  start and then gain fluency.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #8</span><span style="text-decoration: underline;">:</span> Use  simple, well-defined and documented predicates (properties or  attributes).</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #9</span><span style="text-decoration: underline;">:</span> You  are building a common language for the enterprise; do so purposefully.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Need for  Expensive Up-front Engineering</h3>
<p>All of these observations lead to the conclusion that upfront ontology  development need not be expensive. Any consultant selling six-figure  ontology development to businesses ought to be seriously challenged.  Start small and focused. Frankly, a simple spreadsheet taxonomy or  quick conversion of existing XML or metadata or vocabulary standards is  A-OK to get started.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #10</span><span style="text-decoration: underline;">:</span> Start small with stakeholders to build acceptance and best practices.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #11</span><span style="text-decoration: underline;">:</span> Start immediately to organize and federate existing information.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Need to  Reinvent the Wheel</h3>
<p>While it is true that the usefulness of ontologies as advocated by  Structured Dynamics is greater than other constructs, these ontologies  still just represent a more capable representation of knowledge  structures that have been around in various other forms for years. For  decades enterprises have created schema, taxonomies, controlled  vocabularies, standards, and other knowledge structures that represent  untold time, dollars and effort. It would be a waste to not fully  leverage these sunk investments.</p>
<p>Further, many ontologies and interoperable structures also exist  external to the enterprise, many open source and freely available. And,  even if not all are already in proper ontological form, like internal  structures these other constructs can be relatively easily leveraged  and turned into ontology-ready form.</p>
<p>So, what we are doing with adaptive ontologies is not creating new  structures or new representatiions from scratch, but leveraging the  expressions of our current world views. These have been hard-earned,  codified over years of effort, and are legacy expressions of the  enterprise&#8217;s knowledge base.</p>
<p>In this vein, then, there is already much richness available to any  organization upon which to embark on their ontology efforts. Use them,  and gain great leverage.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #12</span><span style="text-decoration: underline;">:</span> Aggressively mine and re-use existing knowledge and structure.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #13</span><span style="text-decoration: underline;">:</span> Leverage and re-use appropriate portions of the &#8220;best&#8221; existing,  external ontologies.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Requirement  to Displace Existing Assets</h3>
<p>Continuing in this same spirit, it is a mistake to see adaptive  ontologies and the associated systems advocated by Structured Dynamics  as a replacement for existing data assets. Rather, the idea and  advantage is to keep data records <span style="font-style: italic;">in  situ</span> as much as possible. These are already performing  investments that can be left largely as is. The role of the adaptive  ontologies is to act as a federation layer that bridges across these  existing assets.</p>
<p>This leverage of existing data assets can occur via the architecture of  the system (generally <a href="../category/web-oriented-architecture-woa/">Web-oriented  architecture</a> <a href="#adapto8">[8]</a>) and a design of the data system and structures  providing proper allocation between the ABox and TBox <a href="#adapto1">[1]</a>.</p>
<p>All of this maintaining of existing assets is aided by the ability to  convert in-place data to ontology-ready RDF form. This is a separate  topic in its own right and one I discuss elsewhere <a href="#adapto9">[9]</a>. There is also a  need to make sure that the attributes of the underlying instance  records (generally, the columns within a relational table) are also  properly modeled within the adaptive ontology. This is part of the best  practices guidelines.</p>
<p>Of course, how much of the existing assets can be leveraged &#8220;as is&#8221; and  what degree of modification or conversion might be necessary needs to  be evaluated on a case-by-case basis. Generally, however, these  mappings can be pretty straightforward and leave in place all existing  hardware, software and administration procedures.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #14</span><span style="text-decoration: underline;">:</span> Leverage your existing databases as rich sources of instance records  (&#8221;ABox&#8221;).</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #15</span><span style="text-decoration: underline;">:</span> Explicitly design your TBox ontologies to be an interoperability layer  over these existing record stores.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #16</span><span style="text-decoration: underline;">:</span> Reconcile the semantics across the enterprise&#8217;s data stores at this  interoperable TBox layer.</p>
<h3><span style="font-style: italic;">Misconception:</span> No Closed World  Assumptions</h3>
<p>A <a title="Closed world assumption" href="http://en.wikipedia.org/wiki/Closed_world_assumption">closed world assumption</a> holds that any  statement that is not known to be true is false. Most enterprise  database and transaction systems are based on this premise. It works  well where there is complete coverage of the entities within a  knowledge base, such as the enumeration of all customers or all  products of an enterprise.</p>
<p>Yet, in the real (&#8221;open&#8221;) world there is no guarantee or likelihood of  complete coverage. Thus, under an <a href="http://en.wikipedia.org/wiki/Open_world_assumption">open world  assumption</a> the lack of a given assertion or fact being available  neither implies whether that possible assertion is true or false: it  simply is not known.</p>
<p>An open world assumption is one of the key factors for enabing adaptive  ontologies to grow incrementally. It is also the basis for enabling  linkage to external (and surely incomplete) datasets.</p>
<p>In fact, systems designed around the open world assumption can still  achieve closed world reasoning where the circumstances and completeness  of the knowledge base permit. But, rather than being a logical outcome  of the framework, such completeness axioms need to be explicitly  stated. Thus, open world systems can achieve the same ends as closed  ones where applicable, but with greater flexibility and extensibility.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #17</span><span style="text-decoration: underline;">:</span> No  enterprise is an island; design according to the <a href="http://en.wikipedia.org/wiki/Open_world_assumption">open world  assumption.</a></p>
<h3><span style="font-style: italic;">Misconception:</span> No Restriction  to a Dedicated Priesthood</h3>
<p>Consultants make their money and academics their reputation by often  making things more obscure and jargon-laden than they need be.  Ontologies &#8212; heck, even the name itself &#8212; is no exception.</p>
<p>But what we have laid out as general guidelines herein and their  reduction to practice does not require a priesthood. Sure, there are  some things to learn and some practices to follow, but these are  certainly easier to understand and master than, say, a programming or  scripting language. Adaptive ontologies done right can be a  participatory activity within most any organization.</p>
<p>Some guidance and mentoring would certainly be helpful. Make sure to  pick the right individuals that truly embrace these perspectives.</p>
<p>Also helpful would the assistance of groups skilled in team building  and group participation <a href="#adapto10">[10]</a>.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #18</span><span style="text-decoration: underline;">:</span> Engage all knowledge stakeholders in ontology creation, review and  refinement.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #19</span><span style="text-decoration: underline;">:</span> Use selected ontology engineers to help ensure consistency, but not  necessarily structure.</p>
<h3>Design for Data-driven Apps</h3>
<p>The above addresses misconceptions related to how the market perceives  current ontologies or how some advocates push the concept. But there  are some unique perspectives that Structured Dynamics brings to  ontology development specific to the purpose of <span style="font-weight: bold;">data-driven applications</span>. From a best  practices standpoint, these considerations should also be included.</p>
<p>In order to properly &#8220;drive&#8221; applications and user interfaces and  reports, specific design attention needs to be give to:</p>
<ul>
<li>Linked data, and the use and accessibility of URIs as resource  identifiers</li>
<li>Context- and instance-sensitive data display, including templates,  and</li>
<li>Driving user interfaces via the inclusion of preferred and  alternate labels in the ontology.</li>
</ul>
<p>Of course, there are other considerations that come to bear. But these  lend themselves to some rather simple checklist guidelines during  ontology development and maintenance.</p>
<p style="margin-left: 40px;"><span style="font-weight: bold; text-decoration: underline;">Takeaway  Message #20</span><span style="text-decoration: underline;">:</span> Follow some relatively straightforward best practices to gain all of  the advantanges of adaptive ontologies.</p>
<div class="boxYellowDotted">This post is part of an occasional <span style="color: #993300; font-weight: bold;">AI3</span> series on  <a href="../category/ontologies/">ontology</a> <a href="../category/ontology-best-practices/">best  practices</a>.</div>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto1" name="adapto1"></a>[1] We use the reference to  &#8220;<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>&#8221; in accordance  with our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description  logics</a>:</p>
<div class="boxGraySolid">&quot;Description logics and their semantics traditionally split  <span style="font-style: italic;">concepts</span> and their  relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and  roles, expressed as fact assertions. The concept split is known as  the TBox (for <em>terminological</em> knowledge, the basis for  <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or  taxonomy of the domain at hand. The TBox is the structural and  intensional component of conceptual relationships. The second split  of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of  instances (and individuals), the roles between instances, and other  assertions about instances regarding their class membership with the  TBox concepts.&quot;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto2" name="adapto2"></a>[2] Chemicals, petroleum and  pharmaceuticals are renowned for large-scale, vertical ontologies.  Examples of general or upper-level ontologies include the Suggested  Upper Merged Ontology (<a href="http://en.wikipedia.org/wiki/Suggested_Upper_Merged_Ontology">SUMO</a>),  the Descriptive Ontology for Linguistic and Cognitive Engineering  (<a title="http://wonderweb.semanticweb.org/deliverables/documents/D18.pdf" rel="nofollow" href="http://wonderweb.semanticweb.org/deliverables/documents/D18.pdf">DOLCE</a>), <a href="http://proton.semanticweb.org/D1_8_1.pdf">PROTON</a>, <a href="http://en.wikipedia.org/wiki/Cyc">Cyc</a>, <a title="http://www.ifomis.org/bfo" rel="nofollow" href="http://www.ifomis.org/bfo">BFO</a> (Basic Formal  Ontology) and <a href="http://umbel.org/">UMBEL</a> (Upper Mapping and  Binding Exchange Layer). Many of the large exemplar ontology projects  are funded under EU auspices; see write-ups for the <a href="http://cordis.europa.eu/fp7/ict/">7th ICT</a> (Information and  Communications Technologies) program for the EU and prior <a href="http://cordis.europa.eu/ictresults/index.cfm?section=home&amp;tpl=home"> ICT projects</a> for more information.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto3" name="adapto3"></a>[3] See, for example, my posting on  <a style="font-style: italic;" title="Permanent Link to When is Content &lt;em&gt;&lt;u&gt;Coherent&lt;/u&gt;&lt;/em&gt;?" rel="bookmark" href="../450/when-is-content-coherent/"> When is Content Coherent?</a> from about one year ago.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto4" name="adapto4"></a>[4] See, for example, the Stanford  KSL discussion on <a href="http://www-ksl.stanford.edu/kst/what-is-an-ontology.html"><span style="font-style: italic;">What is an Ontology?</span></a> One part of that  document explains ontological commitments as &#8220;agreements to use the  shared vocabulary in a coherent and consistent manner,&#8221; which is benign  enough. But other discussions and venues imply much more <span style="font-style: italic;">viz.</span> the &#8220;commitment&#8221; term. This same  Stanford source is also a useful for general philosophical discussions  of ontologies.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto5" name="adapto5"></a>[5] With respect to corporate  taxonomies, see for example, Trish O&#8217;Kane, &#8220;<a href="http://findarticles.com/p/articles/mi_qa3937/is_200607/ai_n17176092/">United  by a Common Language: Developing a Corporate Taxonomy</a>&#8220;. Information  Management Journal. FindArticles.com. 15 Aug, 2009.  http://findarticles.com/p/articles/mi_qa3937/is_200607/ai_n17176092/.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto6" name="adapto6"></a>[6] Some of the standard starting  vocabularies that Structured Dynamics recommends include many of the  ones listed on this useful <a href="http://www.freebase.com/view/base/ontologies/ontology">ontology table  from Freebase</a>, and specifically include <a href="http://purl.org/dc/terms/">Dublin Core</a>, Friend-Of-A-Friend  (<a href="http://www.foaf-project.org/">FOAF</a>), <a href="http://www.geonames.org/">GeoNames</a>, <a href="http://sioc-project.org/">SIOC</a>, <a href="http://www.w3.org/2004/02/skos/">SKOS</a>, <a href="http://www.w3.org/TR/rdf-schema/">RDF Schema</a>, <a href="http://www.w3.org/XML/Schema">XML Schema</a>, <a href="http://www.w3.org/2007/OWL/wiki/OWL_Working_Group">OWL</a>, <a href="http://www.umbel.org/">UMBEL</a>, and <a href="http://bibliontology.com/">BIBO</a>. These are typically supplemented  with domain-specific ontologies appropriate to the scope at hand.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto7" name="adapto7"></a>[7] The <a href="../533/structure-the-world/"><span style="font-weight: bold; font-style: italic;">Linked Data Law</span></a> states the value of a linked data network is proportional to the square  of the number of links between data objects. It is a derivative of  <a href="http://en.wikipedia.org/wiki/Metcalfe%27s_law">Metcalfe&apos;s  law</a>, which states that the value of a telecommunications network is  proportional to the square of the number of users of the system  (n<sup>2</sup>), where the linkages between users (nodes) exist by  definition. For information bases, the data objects are the nodes.  Linked data works to add the connections between the nodes. This  concept was first presented in ago in <a style="font-style: italic;" href="../?p=447">What is Linked Data?</a> and  then formalized in [9].</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto8" name="adapto8"></a>[8] In WOA, discrete functions are  packaged into modular and shareable elements (services), then made  available in a distributed and loosely coupled manner using <a href="http://en.wikipedia.org/wiki/REST"><span style="font-style: italic;">Representational State Transfer</span></a>. REST  provides principles for how resources are defined and used with simple  interfaces without additional messaging layers. REST is a foundation to  the HTTP protocol and a key reason for the success and scalability of  the Web.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto9" name="adapto9"></a>[9] See further my posting,  <a style="font-style: italic;" title="Permanent Link to Structure the World" rel="bookmark" href="../533/structure-the-world/">Structure the World</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="adapto10" name="adapto10"></a>[10] As a matter of full  disclosure, Structured Dynamics does not have expertise nor strengths  in these areas.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Structure the World</title>
		<link>http://www.mkbergman.com/533/structure-the-world/</link>
		<comments>http://www.mkbergman.com/533/structure-the-world/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 03:23:03 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Bibliographic Knowledge Network]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[BKN]]></category>
		<category><![CDATA[data federation]]></category>
		<category><![CDATA[data-driven applications]]></category>
		<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Ontology]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[structWSF]]></category>
		<category><![CDATA[web oriented architecture]]></category>
		<category><![CDATA[web service]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=533</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structure the World&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-03&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/533/structure-the-world/&amp;rft.language=English"></span>

Multiple Techniques and Data Structs can Make the Vision a Reality
Linked  data and subject and domain ontologies provide the organizing  framework. Techniques for converting, tagging and authoring structure  provide the content. In combination, we now have in hand the necessary  pieces to enable all of us to &#8220;structure the World.&#8221;
In this [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structure the World&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-03&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/533/structure-the-world/&amp;rft.language=English"></span>
<p><a href="http://upload.wikimedia.org/wikipedia/commons/9/97/The_Earth_seen_from_Apollo_17.jpg"><img style="border: 0px solid; width: 250px; height: 250px; float: left; margin-right: 10px;" title="The &quot;Blue Marble&quot;: The Earth seen from Apollo 17.jpg from Wikipedia.org" src="../wp-content/themes/ai3/images/2009Posts/The_Earth_seen_from_Apollo_17_240px.jpg" alt="The &quot;Blue Marble&quot;: The Earth seen from Apollo 17.jpg from Wikipedia.org" hspace="5" vspace="5" align="left" /></a></p>
<h2>Multiple Techniques and Data Structs can Make the Vision a Reality</h2>
<p><a href="http://structureddynamics.com/linked_data.html">Linked  data</a> and subject and domain ontologies provide the organizing  framework. Techniques for converting, tagging and authoring structure  provide the content. In combination, we now have in hand the necessary  pieces to enable all of us to &#8220;structure the World.&#8221;</p>
<p>In this vision, the nature of the links or connections between data  need not be complicated to gain tremendous benefit. Similar to <a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe&#8217;s Law</a> for  the increasing value of networks as more nodes (users) get added,  adding connections to existing data is a powerful force multiplier.</p>
<p>We can call this the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span>: the  value of a linked data network is proportional to the square of the  number of links between data objects <a href="#structure1">[1]</a>. Further, if we are purposeful  to include connective links where appropriate as we add more data (that  is, nodes), this multiplier effect becomes even stronger.</p>
<p><a href="http://structureddynamics.com/">Structured Dynamics</a> is  dedicated to help make this prospect real. Meaningful progress in doing  so requires only a relatively few moving parts or techniques. Yet,  because we sometimes bounce from talking or focusing on one part versus  the others, we can lose context or sight of the overarching vision. The  purpose of this article is to re-set and calibrate that overall vision.</p>
<h3><span style="font-style: italic;">The Vision</span>: Data Federation of  Any Desired Content</h3>
<p>The vision is to get all data and information to interoperate,  regardless of legacy or form. Much of this data is already structured,  either from databases or simpler forms of data structs. Some of this  information is unstructured or semi-structured, requiring extraction  and tagging techniques. And new information is being constantly  generated, which warrants better means to author and stage for  interchange and interoperability.</p>
<p>No matter the provenance, all information has context and scope. As a  chunk from here, and a piece from there, gets added to our linked data  mix, having means to characterize what that data is about and how it  can be meaningfully inter-related becomes crucial. Sometimes these  contexts are informed by existing schema; sometimes they are not. But,  in any case, it is the role of ontologies to both position these  datasets into an &#8220;aboutness&#8221; framework and to help guide how the data  can be described and related to other data. This part of the vision  invokes semantics and coherent structures (schema or ontologies) for  positioning and mapping datasets to one another.</p>
<p>As both the means for representing any extant data format and as the  means for describing these conceptual relationships or schema, RDF  provides the canonical data model. A single target representation and  common data model also means we can develop and design a smaller  universe of tools to operate and provide functionality over all of this  data. Indeed, because our RDF data model and its ontologies are so  richly structured, we can design our tools with generic functionality,  the specific operation and expression of which is based on the inherent  structure within the data and its relationships. This vision of  <span style="font-weight: bold; font-style: italic;">data-driven  apps</span> leads to extreme leverage, incredible flexibility, and  inherent &#8220;meshup&#8221; capabilities for tools.</p>
<p>Further, because we use Web identifiers (<a href="http://en.wikipedia.org/wiki/URI">URIs</a>) for our data and concepts  and because we expose and access this linked data via the Web, we use  the proven and scalable architectures of the Web itself for how we  design our systems. This <a href="../category/web-oriented-architecture-woa/"><span style="font-style: italic;"> Web-oriented architecture</span></a> (WOA) provides a completely  decentralized and loosely coupled deployment model that can work  ranging from public and open to private and proprietary, applicable to  data and participants alike.</p>
<p>From the outset, it is essential to recognize that thousands of  contributors are enabling this vision. So, while Structured Dynamics  naturally uses its own tools and techniques to flesh out the various  parts of this vision below, realize there are many players and many  tools from which to choose <a href="#structure2">[2]</a>. For that is another aspect of this  vision that is quite powerful: providing choice and avoiding lock-in.</p>
<h3><span style="font-style: italic;">RDF</span>: The Canonical Data Model</h3>
<p>The core construct &#8212; or fulcrum, if you will &#8212; of the vision is the  <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> (Resource Description Framework) data model <a href="#structure3">[3]</a>. I have written  elsewhere on the <a style="font-style: italic;" title="Permanent Link to Advantages and Myths of RDF" rel="bookmark" href="../483/advantages-and-myths-of-rdf/">Advantages and Myths of  RDF</a>, which explains more precisely the advantages of that model.  RDF provides a common data model to which any external format or schema  can be converted and represented. It also provides a logic model and  basis for building vocabularies that can inform and drive generic  tools.</p>
<p>In the context of data interoperability, a critical premise is that a  single, canonical data model is highly desirable. Why?</p>
<p>Simply because of 2N v N<sup>2</sup>. That is, a single reference  (&#8221;canon&#8221;) structure means that fewer tool variants and  converters need be developed to talk to the myriad of data formats in  the wild. With a canonical data model, talking to external sources and  formats (N) only requires converters to and from the canonical form  (2N). Without a canonical model, the <a href="http://en.wikipedia.org/wiki/Combinatorial_explosion">combinatorial  explosion</a> of required format converters becomes N<sup>2</sup> <a href="#structure4">[4]</a>.</p>
<p>Note, in general, such a canonical data model merely represents the  agreed-upon internal representation. It need not affect data transfer  formats. Indeed, in many cases, data systems employ quite different  internal data models from what is used for data exchange. Many, in  fact, have two or three favored flavors of data exchange such as XML,  JSON or the like. More on this is discussed in a section below.</p>
<p>As this diagram shows, then, we have a single internal representation  that is the target for all data and format converters and upon which  all tools operate. These tools are themselves expressed as Web services  so that they may be distributed and conform to general WOA guidelines.  In addition, there may be multiple external &#8220;hubs&#8221; that represent  alternative data models or formats or schema conversions (say, for  relational databases). So long as we have converters between these  alternate &#8220;hubs&#8221; and our canonical RDF form we can allow a thousand  flowers to bloom:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png"> <img class="center_ok" style="border: 0pt none;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png" border="0" alt="structWSF Data Model Relationships" width="600" height="364" /></a></div>
<p>Other canonical forms could be advocated. Yet RDF has the logical basis  to represent any data form and any schema or conceptual structure. It  is based on a robust set of open standards and languages and tools. It  may be serialized in many formats. It can be grounded in description  logics and, in appropriate forms, reasoned over and expressed in  vocabularies and schema suitable for the most complex of conceptual  structures and semantics. RDF is the data model explicitly designed for  the Web, the clear global information basis for the foreseeable future.</p>
<p>For more than 30 years &#8212; since the widespread adoption of electronic  information systems by enterprises &#8212; the Holy Grail has been complete,  integrated access to all data. With the canonical RDF data model, that  promise is now at hand.</p>
<h3><span style="font-style: italic;">Conversion</span>: So Many Structs,  So Little Time</h3>
<p>Diversity is a truism of human communications as captured by the  biblical <a href="http://en.wikipedia.org/wiki/Tower_of_Babel">Tower of  Babel</a> and the many thousands of current <a href="http://en.wikipedia.org/wiki/Language">human languages</a>. Diversity  in data formats, serializations, notations and languages is a similar  truism. We term the expression of each of these varied forms of data a  <span style="font-style: italic;">struct</span>.</p>
<p>While an internal canonical representation of data makes sense for the  reasons noted above, pragmatic information systems must recognize the  inherent diversity and chaos of data in the real world. The history of  trying to find single representations or to impose standards via fiat  have singularly failed. That will continue to be so due in part to  inertia and legacy, sunk investments, existing infrastructure, and the  purposes for the data.</p>
<p>In pursuing a vision of data interoperability, then, conversion is an  essential glue for cementing understanding with what exists and will  exist.</p>
<h4>RDB-to-RDF</h4>
<p>Arguably the largest source of structured data are enterprise and  government information systems, with the predominant data  representation being the relational data model managed by relational  schema. Much of this data is also cleaner and mission critical compared  to other sources in the wild. Fortunately, there are many logical and  conceptual affinities between the relational model and the one for RDF  <a href="#structure5">[5]</a>.</p>
<p>Just as there are many RDFizers for simpler forms of data structs (see  next), there are also nice ways to convert relational schema to RDF  automatically. Given these overall conceptual and logical affinities  the W3C is also in the process of graduating an incubator group to an  official work group, <a href="http://www.w3.org/2005/Incubator/rdb2rdf/WG-draft-charter/">RDB2RDF</a>,  focused on methods and specifications for mapping relational schema to  RDF.</p>
<p>Amongst all techniques covered in this paper, Structured Dynamics views  the layering of RDF ontologies over existing relational data stores as  one of the most promising and important. Given the advantages of RDF  for interoperability, this area should be a major emphasis of current  and new vendors and service providers.</p>
<h4>RDFizers</h4>
<p>Much data, however, resides in much smaller datasets and often for less  formal purposes than what is found in enterprise databases. Some of  this data is geared for exchange or standardization; much is emerging  from Web and Internet applications and uses; and much might be local or  personal in nature, such as simple lists or spreadsheets.</p>
<p>RDF is well suited to convert (&#8221;RDFize&#8221;) these simpler and more naïve data formats. In my original census about 18 months ago, as reported in  <a style="font-style: italic;" title="Permanent Link to 'Structs': naÃ¯ve Data Formats and the ABox" rel="bookmark" href="../?p=471"> &#8216;Structs&#8217;: Naïve Data Formats and the ABox</a>, I listed  about 90 converters. My most recent <a href="http://openstructs.org/resources/rdfizers">update</a> now lists nearly  double that number, with about 150 converters <a href="#structure6">[6]</a>:</p>
<div style="margin: 15px; font-size: 10px;">
<table class="center_ok" style="text-align: left; margin-left: 0px; width: 90%;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top; width: 25%;">
<p style="font-weight: bold;">URN handlers (in addition to IRI and URI):</p>
<ul>
<li>DOI</li>
<li>LSID</li>
<li>OAI</li>
</ul>
<p style="font-weight: bold;">RDF</p>
<ul>
<li>Serialization formats:
<ul>
<li>N3</li>
<li>RDF/XML</li>
<li>Turtle</li>
</ul>
</li>
<li>Languages and ontologies:
<ul>
<li>AB Meta</li>
<li>Annotea</li>
<li>APML</li>
<li>AtomOWL</li>
<li>Bibliographic Ontology</li>
<li>Creative Commons</li>
<li>EXIF</li>
<li>FOAF</li>
<li> <a title="Java RDFizer" href="http://simile.mit.edu/wiki/Java_RDFizer">Java</a></li>
<li> <a title="Javadoc RDFizer" href="http://simile.mit.edu/wiki/Javadoc_RDFizer">Javadoc</a></li>
<li> <a title="MARC/MODS RDFizer" href="http://simile.mit.edu/wiki/MARC/MODS_RDFizer">MARC/MODS</a></li>
<li>Meta Standards</li>
<li>Music Ontology</li>
<li> <a title="http://cypher.monrai.com" rel="nofollow" href="http://cypher.monrai.com/">Natural Language</a></li>
<li>Open Archives Initiative Protocol for Metadata  Harvesting (OAI-PMH)</li>
<li>Open Geospatial</li>
<li>OWL</li>
<li>SIOC</li>
<li>SIOCT</li>
<li>SKOS</li>
<li>UMBEL</li>
<li>vCard</li>
<li> <a title="http://rhizomik.net/content/" href="http://rhizomik.net/content/">XML</a></li>
<li>Others</li>
</ul>
</li>
<li>(X)HTML pages</li>
<li>Embedded Microformats and GRDDL <a href="#structure7">[7]</a>:
<ul>
<li>DC</li>
<li>eRDF</li>
<li>geoURL</li>
<li>Google Base</li>
<li>hAudio</li>
<li>hCalendar</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Embedded Microformats and GRDDL (con&#8217;t):
<ul>
<li>hCard</li>
<li>hListing</li>
<li>hResume</li>
<li>hReview</li>
<li>HR-XML</li>
<li>Ning</li>
<li>RDFa</li>
<li>relLicense</li>
<li>SVG</li>
<li>XBRL</li>
<li>XFN</li>
<li>xFolk</li>
<li>XR-XML</li>
<li>XSLT</li>
</ul>
</li>
<li>Syndication Formats:
<ul>
<li>Atom</li>
<li>OPML</li>
<li>OCS</li>
<li>RSS 1.1</li>
<li>RSS 2.0</li>
<li>XBEL (for bookmarks)</li>
</ul>
</li>
<li>REST-style Web service APIs:
<ul>
<li>Amazon</li>
<li>Apple</li>
<li>Calais</li>
<li>CrunchBase</li>
<li>Del.icio.us</li>
<li>Digg</li>
<li>Discogs</li>
<li>Disqus</li>
<li>eBay</li>
<li>Facebook</li>
<li>Flickr</li>
<li>Freebase (MQL)</li>
<li>FriendFeed</li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/fromGarmin.py" href="http://www.w3.org/2000/10/swap/pim/fromGarmin.py"> Garmin</a></li>
<li>Get Satisfaction</li>
<li>Google</li>
<li>Hoover&#8217;s</li>
<li>HTTP (raw)</li>
<li>ISBN DB</li>
<li>Last.fm</li>
<li>Library Thing</li>
<li>Magnolia</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>REST-style Web service APIs (con&#8217;t):
<ul>
<li>Meetup</li>
<li>MusicBrainz</li>
<li>New York Times</li>
<li>New York Times Campaign Finance (NYTCF)</li>
<li>New York Times tags</li>
<li>Open Library</li>
<li>Open Social</li>
<li>Open Street</li>
<li>OpenLink (facets)</li>
<li>O&#8217;Reilly</li>
<li>Picasa</li>
<li>Radio Pop (BBC)</li>
<li>Rhapsody</li>
<li>Salesforce</li>
<li>Slideshare</li>
<li>Slidy</li>
<li>Technorati</li>
<li>They Work For You</li>
<li>Twine</li>
<li>Twitter</li>
<li> <a title="Weather RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=Weather_RDFizer&amp;action=edit"> Weather</a></li>
<li>Wikipedia</li>
<li>World Bank</li>
<li>Yahoo! Finance</li>
<li>Yahoo! Maps</li>
<li>Yahoo! Weather</li>
<li>YouTube</li>
<li>Zemanta</li>
</ul>
</li>
<li>Files (multitude of file formats and MIME types,  including):
<ul>
<li>audio (general)</li>
<li>BibJSON</li>
<li> <a title="BibTeX RDFizer" href="http://simile.mit.edu/wiki/BibTeX_RDFizer">BibTEX</a> and <a title="http://www.l3s.de/~siberski/bibtex2rdf/" href="http://www.l3s.de/%7Esiberski/bibtex2rdf/">others</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/rdfizers/" href="http://www.inf.unideb.hu/%7Ejeszy/rdfizers/">BitTorrent</a></li>
<li> <a title="http://www.mindswap.org/%7Erreck/excel2rdf.shtml" href="http://www.mindswap.org/%7Erreck/excel2rdf.shtml"> CSV</a></li>
<li> <a title="http://www.w3.org/2000/10/swap/util/fink2n3.py" href="http://www.w3.org/2000/10/swap/util/fink2n3.py">Fink</a></li>
<li> <a title="Flat RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=Flat_RDFizer&amp;action=edit"> Flat files</a></li>
<li> <a title="JPEG RDFizer" href="http://simile.mit.edu/wiki/JPEG_RDFizer">JPEG</a></li>
<li>JSON</li>
<li>images</li>
<li>MS Office</li>
<li>OpenOffice</li>
<li>Open Document Format</li>
<li> <a title="http://dev.w3.org/cvsweb/2001/palmagent" href="http://dev.w3.org/cvsweb/2001/palmagent">Palm</a></li>
<li> <a title="http://rdf123.umbc.edu/" href="http://rdf123.umbc.edu/">RDF123</a></li>
<li>video</li>
<li> <a title="http://www.mindswap.org/%7Erreck/excel2rdf.shtml" href="http://www.mindswap.org/%7Erreck/excel2rdf.shtml"> XLS</a></li>
<li>etc.</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Metadata extractors:
<ul>
<li> <a title="CRW RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=CRW_RDFizer"> CRW</a></li>
<li> <a title="DEB RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=DEB_RDFizer"> DEB</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/xmp/" href="http://www.inf.unideb.hu/%7Ejeszy/xmp/">EXIF</a></li>
<li> <a title="OCW RDFizer" href="http://simile.mit.edu/wiki/OCW_RDFizer">OCW</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/rdfizers/" href="http://www.inf.unideb.hu/%7Ejeszy/rdfizers/">RPM</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/xmp/" href="http://www.inf.unideb.hu/%7Ejeszy/xmp/">XMP</a></li>
</ul>
</li>
<li>Email formats:
<ul>
<li> <a title="Email RDFizer" href="http://simile.mit.edu/wiki/Email_RDFizer">EMail</a></li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/lookout.py" href="http://www.w3.org/2000/10/swap/pim/lookout.py">Outlook</a></li>
<li> <a title="http://www.w3.org/2000/04/maillog2rdf/aboutMsg.py" href="http://www.w3.org/2000/04/maillog2rdf/aboutMsg.py">RFC822</a></li>
</ul>
</li>
<li>Version control and related systems:
<ul>
<li>Bugzilla</li>
<li> <a title="Jira RDFizer" href="http://simile.mit.edu/wiki/Jira_RDFizer">Jira</a></li>
<li> <a title="Maven POM RDFizer" href="http://simile.mit.edu/wiki/Maven_POM_RDFizer">POM</a></li>
<li> <a title="Subversion RDFizer" href="http://simile.mit.edu/wiki/Subversion_RDFizer">Subversion</a></li>
</ul>
</li>
<li>Other Web service frameworks:
<ul>
<li>BPEL</li>
<li>WSDL</li>
<li>XBRL</li>
<li>XBEL</li>
</ul>
</li>
<li>Data exchange formats:
<ul>
<li>iCalendar</li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/ldif2n3.py" href="http://www.w3.org/2000/10/swap/pim/ldif2n3.py">LDIF</a></li>
<li>vCalendar</li>
<li>vCard</li>
</ul>
</li>
<li>Relational databases and related:
<ul>
<li> <a title="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm" href="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm"> D2RQ</a></li>
<li> <a title="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm" href="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm"> D2RMAP</a></li>
<li> <a href="http://www.openlinksw.com/virtuoso/Whitepapers/html/rdf_views/virtuoso_rdf_views_example.html"> RDF Views</a></li>
</ul>
</li>
<li>Virtuoso VADs</li>
<li>OpenLink license files</li>
<li>Third party metadata extraction frameworks:
<ul>
<li> <a href="http://aperture.sourceforge.net/">Aperture</a></li>
<li>Spotlight</li>
</ul>
</li>
<li>Miscellaneous and other related converters:
<ul>
<li> <a title="http://rhizomik.net/redefer/" href="http://rhizomik.net/redefer/">MPEG-7/CS</a> → OWL</li>
<li>Random</li>
<li> <a title="http://rhizomik.net/redefer/" href="http://rhizomik.net/redefer/">XSD</a> → OWL</li>
</ul>
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>Many of the sources above come from new and emerging Web-based APIs,  which are also huge sources of content growth. Also note that  alternative formats to RDF (<span style="font-style: italic;">e.g.</span>, microformats) or leading  serializations and encodings (<span style="font-style: italic;">e.g,</span> XML, JSON) also have many converter  options.</p>
<p>For many typical naïve data structs, the data is represented as  attribute-value pairs, which easily lend themselves to conversion to  RDF as instance records <a href="#structure8">[8]</a>. See further the <span style="font-weight: bold; font-style: italic;">Authoring</span> section  below.</p>
<h3><span style="font-style: italic;">Tagging</span>: The 80% Solution</h3>
<p>An apocryphal statistic is that 80% to 85% of all information resides  in unstructured text <a href="#structure9">[9]</a>. Besides lacking recent validation, this  claim from a decade ago often attributed to Merrill Lynch also precedes  much of the Internet and the emergence of metadata and tagging.  Nevertheless, what is true is that written text  content is ubiquitous and the majority of it remains untagged or  uncharacterized by any form of metadata.</p>
<p>While such information can be searched, it only matches when exact  terms match. This means that related information, particularly in the  form of conceptual relationships and inferencing, can not be applied to  untagged text content.</p>
<p>While information extraction &#8212; the basis by which tags for entities  and concepts can be obtained &#8212; has been an active topic of  research for two decades, it is only recently that we have begun  to see Web-scale extractors appear. Examples include Yahoo&#8217;s <a href="http://developer.yahoo.com/search/content/V1/termExtraction.html">term  extractor</a>, Thomson Reuter&#8217;s <a href="http://www.opencalais.com/">Calais</a>, or Google&#8217;s <a href="http://www.google.com/squared">Squared</a>, to name but a few.</p>
<p><a href="http://www.umbel.org/"><img style="border: 0px solid; margin-right: 10px; width: 104px; height: 24px; float: left;" src="../wp-content/themes/ai3/images/scones_100.png" alt="scones - Subject Concepts or Named Entities" align="left" /></a> In  Structured Dynamics&#8217; case we have been working on the <span style="font-weight: bold; font-style: italic;">scones</span> (Subject  Concepts Or Named EntitieS) extractor for quite a while. <span style="font-weight: bold; font-style: italic;">scones</span> uses rather  simple natural language processing (NLP) methods as informed by concept  ontologies and named entity (instance record) dictionaries to help  guide the extraction process. The co-occurrence of matches between  concepts and entities also aids the disambiguation task (though  additional modules may be invoked with alternative disambiguation  methods). In prototype forms, the resulting tags can be managed  separately or fed to user interfaces or re-injected back into the  original content as RDFa.</p>
<p>There are literally dozens of such extractors and services presently  available on the Web and many that are available as open source or  commercial products. Some are mostly algorithm based using  machine-learning techiques or statistics, while others are gazeteer- or  dictionary-driven.</p>
<p>These systems will lead to rapid tagging of existing content and the  removal of some of the early &#8220;chicken-and-egg&#8221; challenges associated  with the semantic Web. These systems will also be combined with the  many existing bookmarking and tagging services.</p>
<p>So, just as we will see federation and interoperability of conventional  data, we will also see linkages to relevant and supporting text content  accompanying it. This combination, in turn, will also lead to richer  browsing and discovery experiences.</p>
<h3><span style="font-style: italic;">Authoring</span>: The Neglected Third  Leg of the Stool</h3>
<p>In addition to <span style="font-style: italic;">conversion</span> and  <span style="font-style: italic;">tagging</span>, <span style="font-style: italic;">authoring</span> is the third leg of the stool to  expose structured data. It is a neglected leg to the structured content  stool, and one important to make it easier for datasets to be easily  exposed as RDF linked data.</p>
<p>One of the reasons for the proliferation of data structs has been the  interest in finding notations and conventions for easier reading and  authoring of small datasets. There have literally been hundreds of  <a href="http://en.wikipedia.org/wiki/Lightweight_markup_language">various</a> formats proposed over decades for conveying lightweight data  structures. Most have been proprietary or limited to specific domains  or users. Some, such as <a href="http://en.wikipedia.org/wiki/Fielded_text">fielded text</a>, <a href="http://www.zope.org/Documentation/Articles/STX">structured text</a>,  <a href="http://en.wikipedia.org/wiki/Simple_Declarative_Language">simple  declarative language</a> (SDL), or more recently <a href="http://en.wikipedia.org/wiki/YAML">YAML</a> or its simpler cousin  <a href="http://en.wikipedia.org/wiki/JSON">JSON</a>, have become more  widely adopted and supported by formal specifications, tools or APIs.  JSON, especially, is a preferred form for Web 2.0 applications.</p>
<p>What has been less clear or intuitive in these forms, again mostly  based on an attribute-value pair orientation, is how to adequately  relate them to a more capable data model, such as RDF. In JSON or YAML,  for example, the notations include the concepts of objects, arrays and  datatypes (among other conventions). Other structures lack even these  constructs.</p>
<p>To take the case of JSON as might be related to RDF, there are a couple  of efforts to define representation conventions from <a href="http://n2.talis.com/wiki/RDF_JSON_Specification">Talis</a> and  <a href="http://www.gbv.de/wikis/cls/RDF_in_JSON">GBV</a> for  serializing RDF. There was a floated idea for an RDF version of JSON  called <a href="http://lists.w3.org/Archives/Public/semantic-web/2007Jul/0323.html">RDFON</a> that has now evolved into the <a href="http://www.urf.name/">TURF</a> approach. <a href="http://jdil.org/">JDIL</a> (JSON data integration  layer) instructs how to add namespaces to JSON to enable encoding RDF.  <a href="http://jibbering.com/rdf-parser/">Jim Ley</a>, <a href="http://www.kanzaki.com/works/2006/misc/0308turtle.html">Kanzaki  Masahide</a> and <a href="http://librdf.org/rasqal/roqet.html">Dave  Beckett</a> (likely among others) have written simple and  straightforward RDF and <a href="http://www.dajobe.org/2004/01/turtle/">Turtle</a> parsers and  converters for JSON. And, still further examples are Beckett&#8217;s  <a href="http://triplr.org/">Triplr</a> and <a href="http://www.uni-leipzig.de/">Sören Auer</a>&#8217;s <a href="http://aksw.org/">ASKW</a> <a href="http://triplify.org/Overview">Triplify</a> lightweight conversion  services involving many different formats.</p>
<p>Because JSON is easily readable, can drive many Web 2.0 applications  and widgets, and lends itself to fast conversions and tools in various  scripting languages, Structured Dynamics was commissioned by the  <a href="http://bibkn.org/">Bibliographic Knowledge Network</a> (BKN) to  formalize a BibJSON specification suitable for <a href="http://en.wikipedia.org/wiki/BibTeX">BibTeX</a>-like data records and  citations with an extensible schema to be converted to RDF.</p>
<p>The emerging result of that BibJSON effort will be published shortly.  The specification includes conventions and vocabularies for creating  bibliographic and citation instance records, for specifying structural  schema, and for creating linkage files between the attributes in the  record files with existing and new schema. BibJSON is itself grounded  in <span style="font-style: italic;">IRON</span>, which is an instance  record and object notation developed by Structured Dyamics that can be  serialized as JSON (called <span style="font-style: italic;">irJSON</span>), XML (called <span style="font-style: italic;">irXML</span>) or comma-separated values (or CSV  comma-delimited files, called <span style="font-style: italic;">commON</span>).</p>
<p>The purpose of these notations and serializations is to provide easier  authoring environments and scripting support to RDF-ready datasets.  This approach has the advantage of shielding most users from the  nuances or lengthiness of RDF (though the N3 serialization also works  well).</p>
<p>The design and development of commON was especially geared to using  spreadsheets as authoring environments that would enable easy creation  of instance record tables or simple hierarchical or outline structures.  For example, here is a sample portion of <a href="../new-version-sweet-tools-sem-web/"><span style="color: #990000; font-weight: bold;"> Sweet Tools</span></a> specified in a  spreadsheet using the commON notation:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090801_swt_spreadsheet.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 379px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/090801_swt_spreadsheet.png" alt="Sweet Tools Sample Spreadsheet" width="2406" height="1521" /></a></div>
<p>Once the philosophy and role of naïve data structs is embraced &#8212; with an appreciation of the many converters now available or easily  written for translating to RDF &#8212; it becomes easier to determine data forms  appropriate to the tools and natural work flow of the users and tasks at  hand. Under this mindset, the role of RDF is to be the eventual  conversion target, but not necessarily what is used for intermediate work  tasks, and in particular not for authoring.</p>
<h3>Getting it All Organized</h3>
<p>OK, so now all of this stuff is converted, tagged or authored. How does  it relate? What is the relation of one dataset to another dataset? Is  there a context or framework for laying out these conceptual roadmaps?</p>
<p><a href="http://www.umbel.org/"><img style="border: 0px solid; margin-right: 10px; width: 100px; height: 50px; float: left;" src="../wp-content/themes/ai3/images/umbel_logo_100.png" alt="UMBEL (Upper Mapping and Binding Exchange Layer)" align="left" /></a> Two years ago as we looked at the state of RDF and the  incipient semantic Web as promised via linked data, we saw that such a  specific framework was lacking. (Though there were existing  higher-level ontologies, either their complexity or design were not  well-suited to these purposes.) It was at that time that <a href="http://fgiasson.com/blog">Frédérick Giasson</a> and I began to  formulate the <a href="http://umbel.org/intro.html">UMBEL</a> (<em>Upper Mapping and Binding Exchange Layer</em>) ontology, which  eventually led to our more formal business partnership and Structured  Dynamics.</p>
<p>What we sought to achieve with UMBEL was a coherent reference framework  of about 20,000 subject concepts, connected and acting like  constellations in the information sky for orienting content and new  datasets. At the same time, we wanted to create a general vocabulary  and approach that would lend themselves to creation of domain-specific  ontologies, which would also naturally tie in and inter-relate to the  more general UMBEL structure.</p>
<p>This objective was achieved, though UMBEL deserves an upgrade to OWL 2  and some other pending improvements. A number of domain ontologies have  been created and now relate to UMBEL. So, rather than being an end to  itself, UMBEL was one of the necessary infrastructure pieces to help  make the vision herein a reality.</p>
<p>Similar approaches may be taken by others with new domain ontologies  based on the UMBEL vocabulary with tie-in as appropriate to existing  subject concepts, or by mapping to the existing UMBEL structure.</p>
<p>Of course, UMBEL is not an absolute condition to the vision herein.  However, insofar as users desire to see multiple datasets  inter-related, including the use of existing public Web data, something  akin to UMBEL and related domain ontologies will be necessary to  provide a similar roadmap.</p>
<h3>Making it All Available</h3>
<p>The parts and techniques discussed so far pertain almost exclusively to  data and content. But, these structures so created now can inform  data-driven applications which also now must be deployed. To do so,  Structured Dynamics is committed to what is known as a <a href="../category/web-oriented-architecture-woa/"><em>Web-oriented  architecture</em></a> (WOA):</p>
<div style="margin-left: 40px; margin-bottom: 15px;"><a href="http://en.wikipedia.org/wiki/Web_Oriented_Architecture">WOA</a> =  <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">SOA</a> +  <a href="http://en.wikipedia.org/wiki/World_Wide_Web">WWW</a> +  <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST</a></div>
<p>WOA is a subset of the <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">service-oriented  architectural</a> style, wherein discrete functions are packaged into  modular and shareable elements (&#8221;services&#8221;) that are made  available in a distributed and loosely coupled manner. WOA generally  uses the representational state transfer (REST) architectural style  defined by <a href="http://en.wikipedia.org/wiki/Roy_Fielding">Roy  Fielding</a> in his 2000 <a href="http://www.ics.uci.edu/%7Efielding/pubs/dissertation/top.htm">doctoral  thesis</a>; Fielding is also one of the principal authors of the  <a href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">Hypertext  Transfer Protocol</a> (HTTP) specification.</p>
<p>REST provides principles for how resources are defined and used and  addressed with simple interfaces without additional messaging layers  such as <a href="http://en.wikipedia.org/wiki/SOAP">SOAP</a> or  <a href="http://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a>.  The principles are couched within the framework of a generalized  architectural style and are not limited to the Web, though they are a  foundation to it.</p>
<p><a href="http://openstructs.org/"><img style="border: 0px solid; margin-right: 5px; width: 150px; height: 36px; float: left;" src="../wp-content/themes/ai3/images/structWSF_150.png" alt="structWSF Web Services Framework" align="left" /></a>Within this design we need a suite of generic  functions and tools that are driven by the structure of the available  datasets. The deployment vehicle and design we have implemented to  provide this WOA design is <a href="http://openstructs.org/">structWSF</a> <a href="#structure10">[10]</a>.</p>
<p>structWSF is a platform-independent Web services framework for  accessing and exposing structured RDF data. Its central organizing  perspective is that of the dataset. These datasets contain instance  records, with the structural relationships amongst the data and their  attributes and concepts defined via ontologies (schema with  accompanying vocabularies). The master or controlling Web service in  the framework is the module for granting access and use rights to  datasets based on permissions.</p>
<p>The structWSF middleware framework is generally RESTful in design and  is based on HTTP and Web protocols and open standards. The initial  structWSF framework comes packaged with a baseline set of about a dozen  Web services in CRUD, browse, search and export and import. More  services can readily be added to the system.</p>
<p>All Web services are exposed via APIs and SPARQL endpoints. Each  request to an individual Web service returns an HTTP status and a  document of resultsets (if the query result is not null). Each results  document can be serialized in many ways, and may be expressed as either  RDF or pure XML.</p>
<p>In initial release, structWSF has direct interfaces to the <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/">Virtuoso</a> RDF triple store (via ODBC, and later HTTP) and the <a href="http://lucene.apache.org/solr/">Solr</a> faceted, full-text search  engine (via HTTP). However, structWSF has been designed to be fully  platform-independent. The framework is open source (Apache 2 license)  and designed for extensibility.</p>
<h3>No End in Sight</h3>
<p>Like all visions, there are many aspects and many improvements  possible. This vision is definitely a work-in-progress with no end in  sight.</p>
<p>But, meaningful movement embracing the full scope of this vision is  doable today. Structured Dynamics welcomes <a href="mailto:mailto:mike%20at%20structureddynamics%20dot%20com">inquiries</a> regarding any of these aspects, improvements to them, or application to  your specific needs and problems.</p>
<p>We also welcome you to come back and visit our blogs (Fred&#8217;s is found  <a href="http://fgiasson.com/blog">here</a>). We try to speak on  various aspects of this vision in all of our posts and are pleased to  share our experience and insights as gained.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure1" name="structure1"></a> [1] Metcalfe&#8217;s law states  that the value of a telecommunications network is proportional to the  square of the number of users of the system (n<sup>2</sup>), where the  linkages between users (nodes) exist by definition. For information  bases, the data objects are the nodes. Linked data works to add the  connections between the nodes. We can thus modify the original sense to  become the Linked Data Law: the value of a linked data network is  proportional to the square of the number of links between the data  objects. I first presented this formulation about a year ago in  <a style="font-style: italic;" href="../?p=447">What is Linked Data?</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure2" name="structure2"></a> [2] This piece introduces for  the first time a couple of efforts-in-progress by Structured Dynamics.  For a general tools listing, see my own <a href="../new-version-sweet-tools-sem-web/"><span style="color: #990000; font-weight: bold;"> Sweet Tools</span></a> listing of about 800 semantic Web and -related  tools.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure3" name="structure3"></a> [3] As quoted in <a style="font-style: italic;" href="http://www.math.nyu.edu/%7Ecrorres/Archimedes/Lever/LeverQuotes.html">The  Lever</a>, &#8220;&#8221;Archimedes, however, in writing to King Hiero, whose  friend and near relation he was, had stated that given the force, any  given weight might be moved, and even boasted, we are told, relying on  the strength of demonstration, that if there were another earth, by  going into it he could remove this.&#8221; from <a href="http://www.utexas.edu/depts/classics/chaironeia/">Plutarch</a> (<span style="font-style: italic;">c.</span> 45-120 <span>AD</span>) in the <a href="http://classics.mit.edu/Plutarch/marcellu.html"><em>Life of  Marcellus</em></a>, as translated by <a href="http://www.newadvent.org/cathen/05167b.htm">John Dryden</a> (1631-1700).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure4" name="structure4"></a> [4] The canonical data model  is especially prevalent in <a title="Enterprise application integration" href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise application  integration</a>. An interesting animated visualization of the canonical  data model may be found at: <a href="http://soa-eda.blogspot.com/2008/03/canonical-data-model-visualized.html"> http://soa-eda.blogspot.com/2008/03/canonical-data-model-visualized.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure5" name="structure5"></a> [5] An excellent piece on  those relations was written by Andrew Newman a bit over a year ago; see  Andrew Newman, 2007. &#8220;A Relational View of the Semantic  Web,&#8221; published on <a href="http://xml.com/">XML.com</a>, March  14, 2007; <a href="http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html"> http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html</a>. RDF can be modeled relationally as a single table with three columns  corresponding to the <span style="font-style: italic;">subject</span>-<span style="font-style: italic;">predicate</span>-<span style="font-style: italic;">object</span> triple. Conversely, a relational  table can be modeled in RDF with the <em>subject</em> <a href="http://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a> derived from the primary key or a blank node; the <em>predicate</em> from the column identifier; and the <em>object</em> from the cell  value. Because of these affinities, it is also possible to store RDF  data models in existing relational databases. (In fact, most RDF  &#8220;triple stores&#8221; are RDBM systems with a tweak, sometimes as  &#8220;quad stores&#8221; where the fourth tuple is the  <em>graph</em>.) Moreover, these affinities also mean that RDF stored  in this manner can also take advantage of the historical learnings  around RDBMS and SQL query optimizations.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure6" name="structure6"></a> [6] The largest source for  RDFizers, which it calls Sponger cartridges, is from <a href="http://www.openlinksw.com/">OpenLink Software</a> in relation to its  <a href="http://www.openlinksw.com/virtuoso/">Virtuoso</a> universal  server. Most of its converters use XSLT stylesheets to translate to  RDF, but the system has other conversion capabilities as well. Two  additional OpenLink resources are a <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/ClickableVirtSpongerCloud"> clickable diagram</a> of converters and relationships with links and an  online storehouse of <a href="http://github.com/openlink/Virtuoso-RDFIzer-Mapper-Scripts/tree/master"> available XSLT converters</a>. In addition, two other sources &#8212; the  W3C&#8217;s Semantic Web wiki with <a href="http://esw.w3.org/topic/ConverterToRdf?highlight=%28converter%29">converter  listings</a> and MIT&#8217;s Simile program and <a href="http://simile.mit.edu/wiki/RDFizers">listing of RDFizers</a> &#8212; have a  rich set of listings. Note that many of the categories shown on the table also have multiple  sources of converters, so that the absolute number of converters has  also grown faster than the unique formats supported.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure7" name="structure7"></a> [7] <a href="http://www.w3.org/TR/grddl/">GRDDL</a> (Gleaning Resource Descriptions  from Dialects of Languages) is a W3C markup format for getting RDF data  out of XML and XHTML documents using explicitly associated  transformation algorithms, typically represented in XSLT GRDDL  accomodates a wide variety of dialects (see <a href="http://esw.w3.org/topic/CustomRdfDialects">one listing</a>) and can be  combined with arbitrary transformation mechanisms (though currently  mostly based on XSLTs).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure8" name="structure8"></a> [8] We characterize <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> instance records</a> as representing the &#8220;ABox&#8221;, in accordance with our  <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/">working  definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description  logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split  <span style="font-style: italic;">concepts</span> and their  relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and  roles, expressed as fact assertions. The concept split is known as  the TBox (for <em>terminological</em> knowledge, the basis for  <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or  taxonomy of the domain at hand. The TBox is the structural and  intensional component of conceptual relationships. The second split  of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of  instances (and individuals), the roles between instances, and other  assertions about instances regarding their class membership with the  TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure9" name="structure9"></a> [9] One of the more recent  discussions of this percentage is by Seth Grimes, <a style="font-style: italic;" href="http://clarabridge.com/default.aspx?tabid=137&amp;ModuleID=635&amp;ArticleID=551"> Unstructured Data and the 80 Percent Rule</a>, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure10" name="structure10"></a> [10] structWSF is also  designed to integrate with third-party apps and content management  systems (CMSs) to provide the user interfaces to these functions. The  first implementation of this design is <a href="http://constructscs.com/">conStruct SCS</a>, a structured content  system that extends the basic Drupal content management framework.  conStruct enables structured data and its controlling vocabularies  (ontologies) to drive applications and user interfaces.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/533/structure-the-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ontologies as the &#8216;Engine&#8217; for Data-Driven Applications</title>
		<link>http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/</link>
		<comments>http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 19:36:49 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[data federation]]></category>
		<category><![CDATA[data-driven applications]]></category>
		<category><![CDATA[interoperability]]></category>
		<category><![CDATA[Ontology]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[TBox]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=492</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Ontologies as the &#8216;Engine&#8217; for Data-Driven Applications&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-06-10&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/&amp;rft.language=English"></span>

Ontology Best Practices for Data-driven Applications: Part 3
In my Intrepid Guide to Ontologies from a  couple of years back, I noted that &#8220;Ontology is one of the more  daunting terms for those exposed for the first time to the semantic  Web.&#8221; And, for sure, if one starts to peruse the current discussions [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Ontologies as the &#8216;Engine&#8217; for Data-Driven Applications&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-06-10&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; width: 209px; height: 163px; float: left; margin-right: 10px;" title="Structured Dynamics LLC" src="../wp-content/themes/ai3/images/2009Posts/090610_impossible4.gif" alt="Structured Dynamics LLC" hspace="0" vspace="5" align="left" /></a></p>
<h2>Ontology Best Practices for Data-driven Applications: Part 3</h2>
<p>In my <a href="../?p=374"><span style="font-style: italic">Intrepid Guide to Ontologies</span></a> from a  couple of years back, I noted that &#8220;Ontology is one of the more  daunting terms for those exposed for the first time to the semantic  Web.&#8221; And, for sure, if one starts to peruse the current discussions  ranging from the <a href="http://ontolog.cim3.net/cgi-bin/wiki.pl/">Ontolog Forum</a> to major  academic symposia (not meaning to single anyone out), it is clear that  the idea of developing &#8220;ontologies&#8221; is often freighted with much  weight, hot air, and (by implication) cost.</p>
<p>This is both a shame because, firstly, it is unnecessary and not often  true. And, secondly, because the whole pragmatic idea of what an  ontology is and what it can do has often gotten lost in the shuffle.</p>
<p>To be sure, there have been massive standards efforts and EU-funded  mega-projects devoted to ontologies. There are certainly cases where  coordination of specific domains such as petroleum or integration with  a complicated supplier base such as in airline manufacture warrant  these massive, complicated ontology development efforts.</p>
<p>But, from my vantage, these extremes overshadow the vast majority of  more prosaic, pragmatic applications of ontologies. Remember,  ontologies are merely a means of describing a conceptual view of the  world <a href="#onto3-1">[1]</a>. If one defines that &#8220;world&#8221; within focused and appropriate  scope, it is surprising (we believe) how much mileage can be extracted  from these suckers.</p>
<p>As we see a breakthrough of interest in semantic Web and <a href="http://structureddynamics.com/linked_data.html">linked data</a> principles applied to the enterprise, as wonderfully described in the  recent seminal <a href="http://www.pwc.com/techforecast">PricewaterhouseCoopers</a> quarterly  <a href="http://www.pwc.com/extweb/pwcpublications.nsf/docid/C85F6867F37F5307852575BA00633FF7"> Technology Forecast</a>, also notably with a prominent <a href="../?p=490">focus on ontologies</a>, I think it  is time to direct all guns on prior bad assumptions and bad anecdotes.  To wit:</p>
<ul>
<li>Ontology development need not be a comprehensive, self-contained  definition of a &#8220;big picture.&#8221; Ontologies can be focused, limited, and  grow and change as needed</li>
<li>Ontology development need not be expensive. Whoever is selling  six-figure ontology development to businesses ought to be taken out and  shot. Start small and focused; frankly, a simple spreadsheet taxonomy  or quick conversion of existing XML or metadata or vocabulary standards  is A-OK to get started</li>
<li>Ontology development is not massive and static: rather, it is small  and flexible and incremental as more is brought in and more is learned</li>
<li>Ontology development is not some imperative for conceptual &#8220;truth&#8221;;  rather, it is a very adaptable means for stating, testing and refining  stuff</li>
<li>Ontology development is certainly no massive relational schema; by  its nature it is malleable with nary a whiff of &#8220;lock-in&#8221;, and</li>
<li>Most importantly, ontology development is a way of &#8220;driving&#8221;  applications and user interfaces and reports.</li>
</ul>
<p>In fact, it is the last point that no one is discussing today, but it  is the most important of the lot: Ontologies, properly crafted, can be  the &#8216;engines&#8217; for data-driven applications.</p>
<p>It is this latter point that is a true paradigm shift and one of the  most exciting prospects of ontologies.</p>
<h3>Manifest Uses</h3>
<p>Ontologies, for sure, are a formal representation of conceptual  relations, a &#8220;world view.&#8221; But, that world view need not carry with it  the freight of trying to describe all human knowledge. It can (and  should) be restricted to an understandable scope (domain) and purpose.  In that vein, what does such a &#8220;world view&#8221; need?</p>
<p>Let&#8217;s first talk about scope. We don&#8217;t need a &#8220;global ontology&#8221; that is accepted by everybody on Earth. What we need are focused ontology(ies) for describing things within a given problem space (whose data may reside in a single dataset or aggregate of datasets). We need to communicate how this system describes the things within its domain and how it understands the concepts and attributes associated with its problem space and data. This communication is published as the ontology. Rather than a global, comprehensive schema, we simply need these well constructed bricks, one by one.</p>
<p>Then, the ontology itself needs to be understandable and manageable. Ontologies  should be readable by machines, but too many see ontologies solely through the lens of machines. I believe that to be a mistake. While importantly  needing to be designed for machine ingest, I believe the real purpose  of ontologies is for humans. How do we label things? How do we describe  and define things? How do we find things? How do we organize things?  How can these understandings be brought before us in the software that  we use?</p>
<p>These types of questions lead us to the pragmatic and pull us back from  the abstract. If we keep foremost the simple idea that ontologies are  merely structures for how to organize (schema) and describe  (vocabularies) our problem space at hand, then we can actually get on  with cutting the bull and getting real stuff done.</p>
<p>Let&#8217;s take as an example our structWSF Web services framework that I will be announcing and demoing for the first time at <a href="http://www.semantic-conference.com/session/1806/">SemTech 2009</a> next week. We developed a simple and flexible ontology to describe what a &#8220;Web services framework&#8221; should be. Then, we developed and implemented the software to make it happen. This means that an ontology development task can be seen as a specification task, too.</p>
<h3>Pragmatic Applications of Ontologies</h3>
<p>So, OK, what do these exhortations mean? Without respect to any  particular scope or domain, let me then list below some important  functional areas to which ontologies &#8212; properly and pragmatically  designed &#8212; can contribute.</p>
<h4>Conceptual Relationships</h4>
<p>The traditional lens for viewing ontologies is as a means to express  conceptual relationships. We agree.</p>
<p>However, ontologies need not have large and nuanced predicate  (relationship) vocabularies in order to be useful. Relatively simple  but powerful structures with hierarchical or part-of relationships can  be very effectively employed for inferencing or faceted searching. From  a pragmatic standpoint, let&#8217;s first agree on what &#8220;things&#8221; (nouns)  there are in our domains, then let&#8217;s worry about how they relate (verbs).</p>
<p>The idea here has long been known as successive approximation: Let&#8217;s  first get ourselves into the right country, then right province, then  right city, then right neighborhood, then right house, and then right  room. Only then should we worry about the condition of the paint or the  age of the floors.</p>
<p>Endless harangues about &#8220;true&#8221; conceptual relations are a hindrance,  not a help, to this perspective. It is much better (and faster, cheaper  and more pragmatic) to put forward simple but coherent relationships  than to worry about what all of that &#8220;really&#8221; means. From a business  perspective, isn&#8217;t being able to utilize the assertion that the hip  bone is connected to the thigh bone more important than having to await  a full explication of all of the muscles and ligaments and tendons that  might comprise them?</p>
<p>Once such simple relationship structures are embraced, then amazing  inferencing power comes to the fore. If one searches on thigh bone,  inferencing can also bring forward the hip because of its relationships  to the leg.</p>
<h4>Integrating Instance Data</h4>
<p>OK, so now at least we have a coherent scaffolding of concepts and  their straightforward relationships. That is, is concept A a &#8220;bigger&#8221;  one (class or super set) than concept B? (Other simple relations could  be substituted.) If so, we now see a bit of an organizing &#8220;world view&#8221;  begin to emerge.</p>
<p>So, now we begin to bring in external data. But that data and its  schema describe themselves differently. In one realm it is &#8220;foo&#8221;; in  the other, &#8220;bar&#8221;.</p>
<p>While this different terminology for the same &#8220;thing&#8221; or related things  may not be known at the outset, it is discoverable. And, when  discovered, it is quite easy to associate the idea or concept of &#8220;foo&#8221;  in one dataset to &#8220;bar&#8221; in another. In this manner, through learning  and accretion, we are able to associate more and more similar things to  one another.</p>
<p>We did not need to begin with some global, cosmic view to begin  relating this data to one another. We only needed the right framework  and structure that allowed this association to evolve as the learning  occurred.</p>
<p>And, oh, by the way: this very same process is akin to documenting the  organization&#8217;s institutional memory.</p>
<h4>Orienting to Other Knowledge and Domains</h4>
<p>Being able to relate and &#8220;classify&#8221; or &#8220;organize&#8221; some things to other  things also means that we are now beginning to create a roadmap for how  &#8220;stuff&#8221; in a broad sense relates to other &#8220;stuff.&#8221; For example, if I  develop a detailed understanding of the hip bone, I can now bring that body of knowledge into  the context above to relate this new information to the thigh and to the leg.</p>
<p>Frankly, at this juncture, while perhaps ultimately important, it is  helpful merely to know that Domain A (hip) is somehow related to Domain  B (thigh). Think of the issue more like trying to get into the right  map vicinity on the globe, and not whether individual streets  intersect.</p>
<p>Again, the mindset here is one of letting ontologies and their concepts  get related knowledge bases into the same ballpark. Whether we are  trying to match Little League ball players with Major League  ballplayers is beside the point: accept that both are playing baseball,  then decide the importance and specifics of the relationship in a later  step.</p>
<p>Again, &#8220;ballpark&#8221; is more helpful than no connection whatsoever. Silly statements about &#8220;ontological commitments&#8221; really mean  nothing. Ontologies, like any other tool, can play different roles at  different times. When helping to get like-related things into the same  ballpark, ontologies are easy and quite effective.</p>
<p>(As an aside, it is useful to note here that our efforts with the  <a href="http://umbel.org/">UMBEL</a> upper-level subject ontology are  solely premised on this &#8220;roadmap&#8221; purpose. In and of itself, UMBEL is  not a very complicated explication of the world. But it does provide a  comprehensive set of 20,000 subject concepts for orienting quite  disparate datasets and information to one another. This very same approach could be replicated and then applied to the granularity of individual domains, kind of like zooming in on <a href="http://maps.google.com/maps?hl=en&amp;tab=gl">Google Maps</a>, to  provide similar benefits at smaller scale for domain-specific roadmaps. In fact, that is a common approach we apply in our own client ontologies, which we then also make sure we tie into UMBEL for global orientation.)</p>
<h4>Mapping to Other Schema</h4>
<p>OK, so with this foundation now built, we can next raise the bar a bit  further. Once one begins to express these &#8220;world views&#8221; formally as an  ontology, even with reduced ambitions as presented above, one still  ends up with a formal specification of that conceptualization. And,  that means, we now also have a basis with standard languages for  mapping two disparate or separately developed ontologies to one  another.</p>
<p>This is powerful. Through such mapping, we end up, in the memorable  phrase of my colleague, <a href="http://fgiasson.com/blog">Fred  Giasson</a>, &#8220;<a href="http://fgiasson.com/blog/index.php/2008/09/04/exploding-dbpedias-domain-using-umbel/">exploding  the domain</a>.&#8221;</p>
<p>Moreover, we also have found a means for stitching together datasets  with disparate schema to one another. Voilà: We now have met the Holy  Grail of data interoperability.</p>
<p>In my opinion, this is the money shot from all of this effort. But,  again, if we set the deployment threshold to the unrealistic levels that some ontology  pundits suggest, this payoff is unlikely to happen. We are not trying to state  absolute, universal truth about anything nor to be unrealistically  comprehensive. All we are trying to do is make defensible assertions  that one portion of a world view is similar or related to a portion of  a different world view.</p>
<p>Now, does that sound that scary? No, of course not. It is merely a  reasonable and pragmatic means for relating two structures together.</p>
<p>A key aspect to this mapping ability is to enrich the description of  our concepts with what we call &#8220;semsets.&#8221; Semsets are a listing of  related terms and phrases that provide synonyms, aliases, jargon and  related context for alternative ways to describe or bound the concept  at hand. This terminological &#8220;grist&#8221; is the basis for relatively  straightforward natural language processing techniques to suggest  matches between concepts in different ontologies (which might also be combined with other ontology components such as preferred labels, descriptions or structural placements in the schema).</p>
<p>Like many of the points above, these semsets can be built incrementally  and over time as new jargon and terminology is discovered.</p>
<h4>Linked Data, with Federated and Comprehensive Data</h4>
<p>These techniques of mapping datasets and their ontology structures can  be leveraged still further with the proper application of <a href="http://structureddynamics.com/linked_data.html">linked data</a> practices. Via linked data, we place our data into Web-accessible  (HTTP) networks and give them Web-scalable identifiers (URIs). This  means we can now integrate and interoperate with much external public  Web data and break down our own internal data silos.</p>
<p>Our instance records can be fleshed out with supplementary sources to  provide more comprehensive attibutes and characterizations. Uniformity  of treatment and coverage is promoted. Data interoperability is finally  at hand.</p>
<p>A key best practice to this, of course, needs to be the recognition  that not all data or information is public and not all users have the  same roles or should have the same access to different sets of data.  Thus, to embrace global mechanisms for data interoperability, there  must also be local methods for enforcing access, privacy and  confidentiality.</p>
<p>Properly designed ontologies can fulfill this requirement, as well. By  organizing information into datasets and setting profiles for access  and CRUD (<em>create &#8211; read &#8211; update &#8211; delete</em>) rights, an effective  environment for data sharing and federation is established.</p>
<h4>Context- and Instance-sensitive Data Display</h4>
<p>To this point, we have taken almost an exclusively data- or  schema-centric view of ontologies. But, as structures, pure and simple,  their structural nature can be exploited in other ways. It is here,  frankly, that less is spoken of the potential for ontologies than in  the more &#8220;conceptual&#8221; areas noted above.</p>
<p>The first of these new areas is in instance-sensitive data display. Each  instance record is associated with an instance type in a governing  ontology. Detecting this type means that context-sensitive display  templates can then be invoked.</p>
<p>Detecting that something refers to a city, for example, can  invoke a template providing a map, population figures, area size, city  governance method and the like. In contrast, detecting an instance as a  camera might invoke an entirely different display template focusing on  product features or price or store and purchasing locations. Such  instance-type displays are common; they are known as &#8220;infoboxes&#8221; within Wikipedia articles, as one example.</p>
<p>But this power of data display templates can be generalized further.  What if we detect our instance represents a camera but do not have a display  template specific to cameras? Well, the ontology and simple inferencing  can tell us that cameras are a form of digital or optical products,  which more generally are part of a product concept, which more  generally is a form of a human-made artifact, or similar.</p>
<p>By tracing this inferencing chain from the specifc to the more general  we can &#8220;fall back&#8221; until a somewhat OK display template is discovered,  even in the absence of the better and more specific one. Then, if we  find we are trying to display information on cameras frequently, we  only need take one of the more general, parent templates and  specifically modify it for the desired camera attributes. We also keep  presentation separate from data so that the styling and presentation  mode of these templates is also freely modifiable.</p>
<p>This parallel set of display structures to the domain ontology provides  a highly reusable and leveraged data presentation framework. For 30  years organizations have struggled with report generators and all sorts  of complicated systems for responsive reporting and data display. When  driven by ontologies, this challenge is greatly simplified.</p>
<h4>Driving User Interfaces</h4>
<p>The careful reader of the above will note that our ontologies now have  a number of interesting characteristics, all of which can be leveraged  within the user interface. For example, we have:</p>
<ul>
<li>Human-readable labels for our &#8220;things&#8221;</li>
<li>Alternative labels in our semsets that can characterize those same  &#8220;things&#8221;</li>
<li>A readable description of each &#8220;thing&#8221;</li>
<li>An organized and logical schema for how each &#8220;thing&#8221; relates to  other things.</li>
</ul>
<p>This very information, when indexed in a supplementary full-text search  engine with faceting capabilities (such as the <a href="http://fgiasson.com/blog/index.php/2009/04/29/rdf-aggregates-and-full-text-search-on-steroids-with-solr/">Solr engine we use</a>), can be  leveraged in the user interface for these types of desired UI  capabilities:</p>
<ul>
<li>Attribute labels and tooltips</li>
<li>Navigation and browsing structures and trees</li>
<li>Menu structures</li>
<li>Auto-completion of entered data</li>
<li>Contextual dropdown list choices</li>
<li>Spell checkers</li>
<li>Online help systems</li>
<li>Etc.</li>
</ul>
<p>This is absolutely mindblowing power!</p>
<p>We can now design generic tools that do patterned functions. Then,  based on the data at hand and the ontologies that describe them, we can  now see completely modified and tailored interfaces. And all of this is done without modifying a single  line of application code!</p>
<p>Applications in this brave new world now consist of assembling a proper  suite of generic tools, and then spending the bulk of our time on  describing and characterizing our data via ontologies and refining  templates for displaying or reporting the types of specific instances within our current  problem space.</p>
<h3>Conclusions</h3>
<p>All of the points made above are doable and being done today. Properly  designed ontologies can readily deliver all of the aspects noted above.  Later parts in this ongoing series will address many of those aspects  in greater detail.</p>
<p>Ontologies are not magic. Properly done &#8212; an important emphasis &#8212;  ontologies are the pivot point for faster and more adaptable ways of  doing business. A simple, pragmatic mindset can help.</p>
<p>Our perspective is that ontologies are really the &#8220;flour that gets  backed into the cake&#8221;. While viewable and definable as their own  structures, properly constructed ontologies  actually should exist everywhere within applications and contribute  everywhere <span style="font-weight: bold; font-style: italic">to</span> applications. This  is what we mean by &#8220;data-driven applications.&#8221;</p>
<p>To be sure, we are suggesting a paradigm shift from 30 years of IT  frustrations: schema no longer must be fragile; reports no longer must  be costly and delayed; and data can finally be made interoperable.</p>
<p>We will continue to give you our best thinking on these topics over the  coming weeks and how they might be important to you.</p>
<p>Sound too good to be true? Read the material above again. And, then, we  welcome getting your <a href="http://structureddynamics.com/about.html">call</a>.</p>
<div class="boxYellowDotted">This post is part of an occasional <span style="color: #993300; font-weight: bold">AI3</span> series on  <a href="../?cat=96">ontology</a> <a href="../?cat=173">best practices</a>.</div>
<hr style="margin: 15px 0px" size="1" />
<div style="margin: 10px 0pt; font-size: 90%"><a title="onto3-1" name="onto3-1"></a> [1] As used in knowledge representation or information science,  &#8216;ontology&#8217; is most often defined using Tom Gruber&#8217;s  &#8220;explicit specification of a conceptualization.&#8221; See Thomas R.  Gruber, 1993. &#8220;A Translation Approach to Portable Ontology  Specifications,&#8221; in <span style="font-style: italic">Knowledge  Acquisition</span> <span style="font-weight: bold">5(2):</span> 199-220; see <a href="http://tomgruber.org/writing/ontolingua-kaj-1993.pdf">http://tomgruber.org/writing/ontolingua-kaj-1993.pdf</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UMBEL Now Included in SearchMonkey</title>
		<link>http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/</link>
		<comments>http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/#comments</comments>
		<pubDate>Tue, 21 Apr 2009 22:18:50 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Searching]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[SearchMonkey]]></category>
		<category><![CDATA[vocabularies]]></category>
		<category><![CDATA[Yahoo!]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=485</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=UMBEL Now Included in SearchMonkey&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Searching&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-04-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/&amp;rft.language=English"></span>

SearchMonkey&#8217;s Recommended Vocabularies a Useful Resource
I am pleased to report that UMBEL is now included as one of the recommended vocabularies for the Yahoo! SearchMonkey service. Using SearchMonkey, developers and site owners can use structured data to enhance the value of standard Yahoo! search results and customize their presentation, including through &#8220;infobars&#8220;. SearchMonkey is integral [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=UMBEL Now Included in SearchMonkey&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Searching&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-04-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/&amp;rft.language=English"></span>
<p><a href="http://ycorpblog.com/wp-content/uploads/2008/04/searchmonkey.jpg"><img src="../wp-content/themes/ai3/images/2009Posts/searchmonkey.jpg" style="border: 0px solid ; width: 205px; height: 212px; float: left; margin-right: 10px" alt="SearchMonkey" /></a></p>
<h2>SearchMonkey&#8217;s Recommended Vocabularies a Useful Resource</h2>
<p>I am pleased to report that <a href="http://umbel.org/">UMBEL</a> is now included as one of the recommended vocabularies for the <a href="http://www.yahoo.com/">Yahoo!</a> <a href="http://developer.yahoo.com/searchmonkey/">SearchMonkey</a> service. Using SearchMonkey, developers and site owners can use structured data to enhance the value of standard Yahoo! search results and customize their presentation, including through &#8220;<a href="http://developer.yahoo.com/searchmonkey/smguide/presentation.html">infobars</a>&#8220;. SearchMonkey is integral to a concerted effort by Yahoo! to embrace structured data, RDF and the semantic Web.</p>
<p>SearchMonkey was first announced in February 2008 with a beta release in April and then public release in May with 28 supported vocabularies. Then, last October, an additional set of common, external vocabularies were recommended for the system including <a href="http://www.dbpedia.org/">DBpedia</a>, <a href="file:///F:/5-WebSites/All%20In%20Progress/aces.">Freebase</a>, <a href="http://developer.yahoo.com/searchmonkey/smguide/gr.html">GoodRelations</a> and <a href="http://developer.yahoo.com/searchmonkey/smguide/sioc.html">SIOC</a>. At the same time, some further internal Yahoo! vocabularies and standard Web languages (<em>e.g</em>., OWL, XHTML) were also added.</p>
<p>This is the first vocabulary update since then.  Besides UMBEL, the <a href="http://abmeta.org/">AB Meta</a> and <a href="http://semantictagging.org/Home">Semantic Tags</a> vocabularies have also been added to this latest revision. (There have also been a few deprecations over time.)</p>
<p>A recommended vocabulary means that its namespace prefix is recognized by SearchMonkey. The namespaces for the recommended vocabularies are reserved. Though site owners may customize and add new SearchMonkey structure, they must be explicitly defined in specific DataRSS feeds.</p>
<p>Structured data may be included in Yahoo! search results from these sources:</p>
<ul>
<li><span class="bold"><strong>Yahoo! Index</strong></span> &#8212;  the core Yahoo! search data with limited structure such as the page&#8217;s title, summary, file size, MIME type, etc. This structure is only provided by Yahoo!</li>
<li><span class="bold"><strong>Semantic Web Data</strong></span> &#8212;  including <a href="http://developer.yahoo.com/searchmonkey/smguide/semantic_web.html#microformats" title="Microformats">microformats</a> and <a href="http://developer.yahoo.com/searchmonkey/smguide/rdf.html" title="RDF">RDF data</a> embedded in the host page</li>
<li><span class="bold"><strong>Data Feed</strong></span> &#8212; A feed of Yahoo! native DataRSS provided by a third party site</li>
<li><span class="bold"><strong>Custom Data Service</strong></span> &#8212; Any data extracted from an (X)HTML page or web service and represented within SearchMonkey as DataRSS.</li>
</ul>
<p>As a recommended vocabulary, UMBEL namespace references can now be embedded and recognized (and then presented) in Yahoo! search results.</p>
<h3>The Current Vocabulary Set</h3>
<p>Here are the 34 current vocabularies (plus five deprecated) recognized by the system:</p>
<div style="margin: 15px" align="center">
<table border="0">
<tr>
<td style="background-color: #dddddd" align="center">Prefix</td>
<td style="background-color: #dddddd" align="center">Name</td>
<td style="background-color: #dddddd" align="center">Namespace</td>
</tr>
<tr>
<td style="color: #33cc00">abmeta</td>
<td style="color: #33cc00">AB Meta</td>
<td style="color: #33cc00">http://www.abmeta.org/ns#</td>
</tr>
<tr>
<td>action</td>
<td>SearchMonkey Actions</td>
<td>http://search.yahoo.com/searchmonkey/action/</td>
</tr>
<tr>
<td style="color: #f00000">assert</td>
<td style="color: #f00000">SearchMonkey Assertions (deprecated)</td>
<td style="color: #f00000">http://search.yahoo.com/searchmonkey/assert/</td>
</tr>
<tr>
<td>cc</td>
<td>Creative Commons</td>
<td>http://creativecommons.org/ns#</td>
</tr>
<tr>
<td>commerce</td>
<td>SearchMonkey Commerce</td>
<td>http://search.yahoo.com/searchmonkey/commerce/</td>
</tr>
<tr>
<td style="color: #f00000">context</td>
<td style="color: #f00000">SearchMonkey Context (deprecated)</td>
<td style="color: #f00000">http://search.yahoo.com/searchmonkey/context/</td>
</tr>
<tr>
<td>country</td>
<td>SearchMonkey Country Datatypes</td>
<td>http://search.yahoo.com/searchmonkey-datatype/country/</td>
</tr>
<tr>
<td>currency</td>
<td>SearchMonkey Currency Datatypes</td>
<td>http://search.yahoo.com/searchmonkey-datatype/currency/</td>
</tr>
<tr>
<td>dbpedia</td>
<td>DBPedia</td>
<td>http://dbpedia.org/resource/</td>
</tr>
<tr>
<td>dc</td>
<td>Dublin Core</td>
<td>http://purl.org/dc/terms/</td>
</tr>
<tr>
<td>fb</td>
<td>Freebase</td>
<td>http://rdf.freebase.com/ns/</td>
</tr>
<tr>
<td>feed</td>
<td>SearchMonkey Feed</td>
<td>http://search.yahoo.com/searchmonkey/feed/</td>
</tr>
<tr>
<td>finance</td>
<td>SearchMonkey Finance</td>
<td>http://search.yahoo.com/searchmonkey/finance/</td>
</tr>
<tr>
<td>foaf</td>
<td>FOAF</td>
<td>http://xmlns.com/foaf/0.1/</td>
</tr>
<tr>
<td>geo</td>
<td>GeoRSS</td>
<td>http://www.georss.org/georss#</td>
</tr>
<tr>
<td>gr</td>
<td>GoodRelations</td>
<td>http://purl.org/goodrelations/v1#</td>
</tr>
<tr>
<td>job</td>
<td>SearchMonkey Jobs</td>
<td>http://search.yahoo.com/searchmonkey/job/</td>
</tr>
<tr>
<td>media</td>
<td>SearchMonkey Media</td>
<td>http://search.yahoo.com/searchmonkey/media/</td>
</tr>
<tr>
<td>news</td>
<td>SearchMonkey News</td>
<td>http://search.yahoo.com/searchmonkey/news/</td>
</tr>
<tr>
<td>owl</td>
<td>OWL ontology language</td>
<td>http://www.w3.org/2002/07/owl#</td>
</tr>
<tr>
<td style="color: #f00000">page</td>
<td style="color: #f00000">SearchMonkey Page (deprecated)</td>
<td style="color: #f00000">http://search.yahoo.com/searchmonkey/page/</td>
</tr>
<tr>
<td>product</td>
<td>SearchMonkey Product</td>
<td>http://search.yahoo.com/searchmonkey/product/</td>
</tr>
<tr>
<td>rdf</td>
<td>RDF</td>
<td>http://www.w3.org/1999/02/22-rdf-syntax-ns#</td>
</tr>
<tr>
<td>rdfs</td>
<td>RDF Schema</td>
<td>http://www.w3.org/2000/01/rdf-schema#</td>
</tr>
<tr>
<td>reference</td>
<td>SearchMonkey Reference</td>
<td>http://search.yahoo.com/searchmonkey/reference/</td>
</tr>
<tr>
<td style="color: #f00000">rel</td>
<td style="color: #f00000">SearchMonkey Relations (deprecated)</td>
<td style="color: #f00000">http://search.yahoo.com/searchmonkey-relation/</td>
</tr>
<tr>
<td>resume</td>
<td>SearchMonkey Resume</td>
<td>http://search.yahoo.com/searchmonkey/resume/</td>
</tr>
<tr>
<td>review</td>
<td>Review</td>
<td>http://purl.org/stuff/rev#</td>
</tr>
<tr>
<td>sioc</td>
<td>SIOC</td>
<td>http://rdfs.org/sioc/ns#</td>
</tr>
<tr>
<td>social</td>
<td>SearchMonkey Social</td>
<td>http://search.yahoo.com/searchmonkey/social/</td>
</tr>
<tr>
<td style="color: #33cc00">stag</td>
<td style="color: #33cc00">Semantic Tags</td>
<td style="color: #33cc00">http://semantictagging.org/ns#</td>
</tr>
<tr>
<td style="color: #f00000">tagspace</td>
<td style="color: #f00000">SearchMonkey Tagspace (deprecated)</td>
<td style="color: #f00000">http://search.yahoo.com/searchmonkey/tagspace/</td>
</tr>
<tr>
<td style="color: #33cc00">umbel</td>
<td style="color: #33cc00">UMBEL</td>
<td style="color: #33cc00">http://umbel.org/umbel/sc/</td>
</tr>
<tr>
<td>use</td>
<td>SearchMonkey Use Datatypes</td>
<td>http://search.yahoo.com/searchmonkey-datatype/use/</td>
</tr>
<tr>
<td>vcal</td>
<td>VCalendar</td>
<td>http://www.w3.org/2002/12/cal/icaltzd#</td>
</tr>
<tr>
<td>vcard</td>
<td>VCard</td>
<td>http://www.w3.org/2006/vcard/ns#</td>
</tr>
<tr>
<td>xfn</td>
<td>XFN</td>
<td>http://gmpg.org/xfn/11#</td>
</tr>
<tr>
<td>xhtml</td>
<td>XHTML</td>
<td>http://www.w3.org/1999/xhtml/vocab#</td>
</tr>
<tr>
<td>xsd</td>
<td>XML Schema Datatypes</td>
<td>http://www.w3.org/2001/XMLSchema#</td>
</tr>
</table>
</div>
<p>In addition, there are a number of standard datatypes recognized by SearchMonkey, mostly a superset of <a href="http://www.w3.org/TR/xmlschema-2/">XSD</a> (XML Schema datatypes).</p>
<p>What is emerging from this Yahoo! initiative is a very useful set of structured data definitions and vocabularies. These same resources can be great starting points for non-SearchMonkey applications as well.</p>
<h3>For More Information</h3>
<p>There is quite a bit of online material now available for SearchMonkey, with new expansions and revisions also accompanying this most recent release. As some starting points, I recommend:</p>
<ul>
<li style="font-style: italic"><a href="http://developer.search.yahoo.com/start">SearchMonkey: Getting Started</a></li>
<li>The online <a href="http://developer.yahoo.com/searchmonkey/smguide/" style="font-style: italic">SearchMonkey Guide</a> for developers, and</li>
<li> The 243-pp <a href="http://developer.yahoo.com/searchmonkey/searchmonkey_manual.pdf" style="font-style: italic">SearchMonkey Guide</a> in PDF.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Linked Data Reasonable using Description Logics, Part 4</title>
		<link>http://www.mkbergman.com/478/making-linked-data-reasonable-using-description-logics-part-4/</link>
		<comments>http://www.mkbergman.com/478/making-linked-data-reasonable-using-description-logics-part-4/#comments</comments>
		<pubDate>Mon, 23 Feb 2009 19:14:05 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[ABox]]></category>
		<category><![CDATA[BKN]]></category>
		<category><![CDATA[owl]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[TBox]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=478</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Making Linked Data <em>Reasonable</em> using Description Logics, Part 4&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-02-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/478/making-linked-data-reasonable-using-description-logics-part-4/&amp;rft.language=English"></span>

Concluding with a Simplified Instance Record Vocabulary for Linked Data ABoxes 
In Part 1 of this series, I advocated the placement of linked data in an ABox construct from description logics [1] based on a separation of concerns argument. In Part 2, I reinforced that argument from the perspective of the work to be done [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Making Linked Data <em>Reasonable</em> using Description Logics, Part 4&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-02-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/478/making-linked-data-reasonable-using-description-logics-part-4/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; margin: 0pt 15px 10px 0pt; width: 200px; height: 184px; float: left;" src="../wp-content/themes/ai3/images/2009Posts/090122_abc_struct.jpg" alt="structs" /></p>
<h2>Concluding with a Simplified Instance Record Vocabulary for Linked Data ABoxes<span style="font-style: italic"> </span></h2>
<p>In <a href="../?p=474">Part 1</a> of this series, I advocated the placement of <a href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a> in an ABox construct from <a href="../?cat=126">description logics</a> <a href="#ld-pt4-1">[1]</a> based on a <a href="http://en.wikipedia.org/wiki/Separation_of_concerns">separation of concerns</a> argument. In <a href="../?p=476">Part 2</a>, I reinforced that argument from the perspective of the <span style="font-weight: bold; font-style: italic">work</span> to be done within a knowledge base. In <a href="../?p=477">Part 3</a> we surveyed some of the key literature, finding justification for the split of the TBox from the ABox and the use of specialty RDFS and OWL dialects for work-oriented reasoning in the context of an integral logics.</p>
<p>We now conclude this series and try to bring these threads full circle to address what might be a vocabulary for an ABox instance record design. We&#8217;d very much like to thank <a href="http://www.stat.berkeley.edu/%7Epitman/">Dr. Jim Pitman</a> of the <a href="http://bibkn.org/">Bibliographic Knowledge Network</a> project for having stimulated much of the thinking about the benefits and design of simple, human-authored and -readable instance records.</p>
<h3>A Re-cap</h3>
<p>Up until about six to eight months ago <a href="http://fgiasson.com/blog/">Fred Giasson</a> and I were spending much of our thinking and design time on <a href="http://www.umbel.org/">UMBEL</a>, ontologies and what we now more precisely define as the TBox. Our intent all along was to get our process and thinking down pat there, and then turn ourselves to the representation of the actual entity data.</p>
<p>We have wanted to keep data records separate from logic and structure all along. Some clients have their own specific data records but may still want to interact with Web stuff or apply similar logic. Moreover, some client data is proprietary, some public. By organizing the data into &#8220;named entity dictionaries&#8221; we could modularize the architecture to allow swapping in and out of data appropriate to the customer or circumstance at hand.</p>
<p>Our initial design of this and what we share publicly has UMBEL and various standard public ontologies (<a href="http://www.foaf-project.org/">FOAF</a>, <a href="http://dublincore.org/documents/dcmi-terms/">DC</a>, <a href="http://sioc-project.org/">SIOC</a>, <a href="http://bibliontology.com/">BIBO</a>, etc) for the TBox, with Wikipedia entities and stuff from the BBC at the entity level (the ABox).</p>
<p>However, earlier work with another client showed us that our initial named entity structure was not sufficiently general or robust. That company&#8217;s records have complex relationships, such as affiliations for entities embedded in the same data record.</p>
<p class="boxGreenDotted" style="margin: 10px 0px 10px 10px; padding: 10px; float: right; width: 320px; text-align: center;"><big><span style="font-style: italic">For linked data to become truly successful, we need to find <span class="double_u">easier</span> ways for data publishers to write, expose and share structured data on the Web.</span></big></p>
<p>In order to improve the design, we went back to the drawing board to see if we could find guidance from the literature and other researchers as to how to &#8220;best&#8221; architect instance data in relation to the logic in the TBox (though we were not yet thinking and framing our questions <span style="font-style: italic">viz</span> description logics, or DL).</p>
<p>This series of postings itself, and some of its predecessor articles, were motivated by probing the description logics space and the guidance it might provide to help determine performant architectures and designs.</p>
<h3>Folks, We&#8217;re Making Linked Data Just Too Tough</h3>
<p>For linked data to become truly successful, we need to find easier ways for data publishers to write, expose and share structured data on the Web.</p>
<p>As anyone who reads my blog knows, I frequently rail against poor semantics or other aspects of the linked data space that I feel are counterproductive. At the same time, I&#8217;d like to think that I am also a vocal advocate and proponent for linked data. I am indeed a fan.</p>
<p>To me, the fundamental precepts of RDF as a data model able to capture virtually any data structure or relationship, and the use of Web URIs as linkable identifiers for a global &#8216;Web of Data&#8217;, are simply foundational and game changing. Stuff like this quickens my pulse.</p>
<p>But look at what it takes someone today to publish linked data:</p>
<ul>
<li>He must understand the terminology and standards and best practices &#8212; and actually, even amongst current practitioners, few do</li>
<li>She must assign Web identifiers (URIs) to her data objects, which means finding them and making them (gawd, I hate this word) &#8220;dereferencable&#8221;</li>
<li>He must understand the semantics of the relationships and linkages his data asserts (which, unfortunately, many don&#8217;t)</li>
<li>She must present her data in serialized subject-predicate-object &#8220;triples&#8221;, which are arcane and difficult for most to understand, and</li>
<li>They both often confuse data and instances with structure and world views.</li>
</ul>
<p>Now, come on. This is not the recipe to success.</p>
<p><span style="font-style: italic">Simple </span>and <span style="font-style: italic">unbreakable</span> and <span style="font-style: italic">forgiving</span> is the recipe to success.</p>
<p>As I noted in an <a href="../?p=471">earlier posting</a>, there are many different data structures (&#8217;<span style="font-style: italic">structs</span>&#8216;) for describing and conveying (transmitting) data records. Most of these are easy to understand and easy to read. We know that <a href="http://en.wikipedia.org/wiki/Microformats">microformats</a> have tried to capture a part of this space, but so has in other ways data serializations such as JSON or others. What can we learn from such formats?</p>
<p>Well, one thing I have learned is that many on the Web positively want to expose their data. Another thing I have learned is that there is much structured data that will not get exposed without hurdle rates that are <span style="font-weight: bold; font-style: italic">small</span>.</p>
<h3>Revenge of the ABox</h3>
<p>The phrase &#8216;revenge of the ABox&#8217; comes from Heiko Stoermer&#8217;s thesis <a href="#ld-pt4-2">[2]</a>; it conveys well, I think, the fact that everyone wants to capture and structure &#8220;world views&#8221; via ontologies and the big picture, but many do not want to grub around at the level of individual instances and data records. As he states, &#8220;. . . the most valuable knowledge is typically the one about individuals, but research on ontology integration has traditionally concentrated on concepts and relations.&#8221;</p>
<p>(The perverse outcome of this is that even though linked data as practiced to date is almost 100% about instance data, the discussion rarely looks at ABox-level work or instance data integrity.)</p>
<p>As this series and its predecessor posts have argued, description logics (DL) is an excellent guiding framework for how to make architectural and design decisions about linked data. DL and the ABox &#8211; TBox have meshed beautifully with our earlier intuition to split ontologies and a structural and organizational view of the world (TBox) from the instance records (ABox, or what we had been calling internally our &#8216;named entity dictionaries&#8217;).</p>
<p>As this four-part series and its predecessor pieces indicate, not only can we gain better conceptual understanding and realization of some of this semantic Web stuff by using DL, but also, perhaps, many of today&#8217;s silly or inefficient design practices may be remedied by better grounding our architectures in these logics.</p>
<p>One area, for example, that has helped us much is to get away from the confusing terminology of &#8216;individuals&#8217; v &#8216;instances&#8217;. Once we come to see an instance record as just that (so, that is why collections can play on an equal footing with individual things, for example), we now only need worry about asserting the attributes of the instance. We can defer all of the logic and reasoning about individuals and members and sets and collections and classes, etc., to the TBox and just get on with capturing and conveying our instance record, as an ABox.</p>
<p>For this reason alone (but there are others), <a href="http://structureddynamics.com/">Structured Dynamics</a> has now abandoned the terminology of a &#8216;named entity dictionary&#8217; in favor or &#8216;instance dictionaries&#8217; or ABox (either term of which is understood to contain one or more instance records).</p>
<h3>The &#8216;Instance Record&#8217;</h3>
<p>An instance record is simply a means to either represent or convey the information (&#8221;attributes&#8221;) of a given instance. An instance is the thing at hand, and need not represent an individual; it could, for example, represent the entire holdings or collection of books in a given library.</p>
<p>An instance record may convey information about multiple instances, but each block of information for each instance is about that instance alone. Thus, for example, if the instance is a paper citation, the instance is the paper. If as attributes it asserts multiple authors, each with different institutional affiliations, those affiliations get asserted in a separate instance for each author. They are attributes of the authors, not of the paper.</p>
<p>In this manner it is easy to see attributes as only pertaining to a given instance. If the overall information to be conveyed discusses attributes for multiple instances, than the instance record presents in series each instance that is characterized.</p>
<h4>The Simplicity of Key-Value Pairs</h4>
<p>The objective is to make it easy for data owners to write, read and publish data. This means the starting format should be a human readable, easily writable means for authoring and conveying these instance records (that is, instances and their attributes and assigned values).</p>
<p>The simplest, naÃ¯ve format (independent of syntax or serialization) is the <a href="http://en.wikipedia.org/wiki/Key-value_pair">key-value</a> (name-value) pair. In the key-value pair, the <span style="font-style: italic">subject</span> is always implied. So, for me, MikeBergman, as the subject:</p>
<dl style="font-family: monospace">
<dd>first_name:Mike</dd>
<dd>sex:male</dd>
<dd>citizenship:USA</dd>
<dd>town:Iowa City</dd>
</dl>
<p>Because an instance record only describes attributes for a single instance at a time, all assertions can easily be transformed into the <span style="font-style: italic">subject</span>-<span style="font-style: italic">predicate</span>-<span style="font-style: italic">object</span> (<span style="font-style: italic">s</span>-<span style="font-style: italic">p</span>-<span style="font-style: italic">o</span>) &#8220;triples&#8221; of RDF. So,</p>
<dl style="font-family: monospace">
<dd>&lt;subject:MikeBergman&gt; &lt;hasFirstName&gt; &lt;Mike&gt;</dd>
</dl>
<p>Now, of course, in conventional linked data many of these entries need to be expressed as URIs in order to &#8220;define&#8221; the item. Our design allows for that, of course, but also allows the user to simply provide literals (that is, not identifiers, but text strings or numeric or actual values) for each item. Thus, the declaration of a &#8220;new&#8221; attribute only need occur by its expression, with its value also as simply declared.</p>
<p>Separate, specialized services (see below) may be (and often will need to be!) employed to look up and de-reference URIs, do datatype or data instance validation checks, evaluate identity relationships, disambiguate terms and so forth. The data supplier may choose to publish more-or-less complete &#8220;records&#8221; on their own, or they may not.</p>
<p>Through this design, nothing need change with regard to how linked data is being done today (other than the addition of some simple converters to accommodate the new format; see below). But, by shifting testing and validation work to external services, we can make it much easier for more data to get exposed and published. It is now time for linked data intermediaries and services to evolve in the linked data ecosystem.</p>
<p>In its most naÃ¯ve form, this key-value pair format allows for fast and easy instance record creation with the ability to create instances and new attributes on the fly. Sure, these assertions need to be checked, but so does most data when it is asked to participate in any meaningful work.</p>
<p>This simple design, then, is very much in keeping with the limited roles and work associated with an ABox. Only attributes and metadata for an instance are being asserted. Conceptual relationships and specialized work that might be applied against the ABox to determine data validity or whatever is shifted to be external to the instance record, where it properly and logically belongs.</p>
<h4>Relation to RDF</h4>
<p>In <a href="../?p=477">Part 3</a> we discussed how fragments of the RDF and OWL languages can be used for specialized purposes within a knowledge base while keeping the overall logics of the system integral and decidable. Clearly, this instance record approach where the sole purpose is to assert attributes and values for an instance does not require any OWL. In fact, most linked data to date only brings OWL into the picture for the <span style="font-family: monospace">owl:sameAs</span> property, the common errors of which we discussed in <a href="../?p=476">Part 2</a>.</p>
<p>The instance record only requires a small subset of the RDF language. But it does require use of RDFS (Schema) because of the appropriate use of datatypes within the instance data record.</p>
<p>At the level of the TBox and the &#8220;specialized work&#8221; areas, there are other fragments of OWL, now called profiles in the soon to be released OWL 2 <a href="#ld-pt4-3">[3]</a>, that similarly can be applied to areas such as instance checking and validation, identity relation testing, etc., that I mentioned above. In other words, we can logically fragment RDF and OWL to do the individual parts of a complete system in order to simplify things and aid performance and computational efficiency.</p>
<h3>The Instance Record Vocabulary</h3>
<p>We are implementing this design internally through what we call the <span style="font-style: italic">Instance Record Vocabulary</span> (<a href="http://en.wikipedia.org/wiki/QName">QName</a>: irv). It is still quite experimental and we are testing some important aspects, some of which we describe below. As we get these nuances worked out better, we will release this vocabulary publicly for any to use and comment.</p>
<p>As we presently see it, the namespace languages required for the IRV vocabulary are <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>, <a href="http://en.wikipedia.org/wiki/RDF_schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Dublin_Core">DCterms</a> and <a href="http://en.wikipedia.org/wiki/XSD">XSD</a>. The RDFS (Schema) is required because, at minimum, of the incorporation of XML Schema datatypes (XSD), which we think to be a desirable requirement for what is, after all, an instance data specification and transfer protocol. However, the actual RDF and RDFS vocabulary used would be extremely minimal, with no OWL required.</p>
<p>In pseudo-form, with many serializations and simple syntaxes possible, this <span style="font-style: italic">Instance Record Vocabulary</span> has the following properties. Note as discussed above that the &lt;<span style="font-style: italic">s</span>&gt; in <span style="font-style: italic">s</span>-<span style="font-style: italic">p</span>-<span style="font-style: italic">o</span> is implied. Thus, in its naÃ¯ve or handwritten form, it could be expressed in pretty simple key-value pairs:</p>
<div class="boxGraySolid">
<pre>&lt;InstanceRecord&gt;
<span class="lineIndent20">&lt;Instance&gt;</span>
<span class="lineIndent40">&lt;hasLabel&gt; &lt;[literal]&gt; @en</span>
<span class="lineIndent40">&lt;hasAltLabel&gt; &lt;[literal]&gt; @en</span>
<span class="lineIndent40">&lt;hasURI&gt; &lt;[URI]&gt;</span>
<span class="lineIndent40">&lt;hasDescription&gt; &lt;[literal]&gt; @en</span>
<span class="lineIndent40">&lt;Attribute&gt;</span>
<span class="lineIndent60">&lt;hasAttribute1&gt; &lt;[literal with optional XSD (@en) <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent60">&lt;hasAttribute2&gt; &lt;[literal with optional XSD (@en) <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent60">&lt;hasAttribute3&gt; &lt;[literal with optional XSD (@en) <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent60">&lt;hasAttributeX&gt; &lt;[literal with optional XSD (@en) <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent40">&lt;/Attribute&gt;</span>
<span class="lineIndent40">&lt;assertIdentity&gt; &lt;[literal <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent40">&lt;assertType&gt; &lt;[literal <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent40">&lt;hasSource&gt; &lt;[literal <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent40">&lt;hasVetting&gt; &lt;[literal <span style="font-weight: bold">or</span> URI]&gt;</span>
<span class="lineIndent20">&lt;/Instance&gt;
<span class="lineIndent20">&lt;Instance&gt;</span>
<span class="lineIndent40">. . . repeat as needed . . . </span>
<span class="lineIndent20">&lt;/Instance&gt;</span>
</span>&lt;/InstanceRecord&gt;</pre>
</div>
<p>Note that most values allow either literal or URI specifications. Some of the properties are obviously optional, others, such as <span style="font-family: monospace">hasLabel</span>, will be required. <span style="font-family: monospace">hasURI</span>, for example, is one case of an optional property that then may require a separate lookup service to complete it as a linked data record.</p>
<p>Instance records with literal specifications would need to be validated and checked before actually used for standard linked data or meaningful data purposes. However, this approach is already well-proved through, for example, <a href="http://www.openlinksw.com/">OpenLink&#8217;s</a> <a href="http://virtuoso.openlinksw.com/">Virtuoso</a> <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger">Sponger</a> cartridges and design. Sure some work would need to be done at time of ingest, but there are no technical challenges.</p>
<p>The language used to write a literal can be specified for any kind of attribute (metadata or not). The language is specified using the &#8220;<span style="font-family: monospace">@lang-tag</span>&#8221; at the end of the literal. This method is similar to the N3 serialization of RDF, which is also equivalent to the XML serialization of RDF using the &#8220;<span style="font-family: monospace">xml:lang</span>&#8221; attribute.</p>
<h4>Metadata</h4>
<p>Most of the first properties are simply metadata describing the instance. The strings could be qualified by language.</p>
<h4>Attributes</h4>
<p>The bulk of the instance record is devoted to the attributes and their values. Attributes could be optionally declared with XSD datatypes. URI references could be specified or later substituted by vetting services (see below).</p>
<p>Attributes could also optionally be characterized in a list format, similar to the Lists specification for <a href="http://www.w3.org/DesignIssues/Notation3.html">Notation 3 </a>(N3).</p>
<h4>Asserted Relations</h4>
<p>Identity and class membership (<span style="font-family: monospace">rdf:type</span>) assertions could be made; these could later be checked for correctness or identity relations with external or specialized services. The <span style="font-family: monospace">assertIdentity</span> property, in particularly, is the replacement with more appropriate ABox semantics for <span style="font-family: monospace">owl:sameAs</span>.</p>
<h4>hasSource</h4>
<p>A separate Source record is being developed to cover source or dataset characterizations. A single instance extraction from a Web page, for example, would be accompanied by a simple source characterization. Instances of particular types, such as <a href="http://en.wikipedia.org/wiki/Microformats">microformats</a> for example, would be so noted (as they might invoke specialized processors or carry certain authority). Instances from large datasets would have a still longer list of possible characterizations.</p>
<p>This property may look closely at what is also being done for the <a href="http://rdfs.org/ns/void/html">voiD</a> dataset vocabulary.</p>
<p>Certain parameters in a Source record, such as language for example, may also be applied in special ways by the IRV parser at time of ingest with respect to specific literal specifications.</p>
<p>In any event, this is one of the properties still needing much more thought and definition.</p>
<h4>hasVetting</h4>
<p>This property, too, needs much more thought and definition.</p>
<p>The <span style="font-family: monospace">hasVetting</span> property, for which multiples are allowed, would identify the specific checks and services applied to the instance data. Depending on service, such checks might include URI lookup or de-referencing, identity relations and testing, record completeness and sufficiency checks, data type checking and validation, general instance checking, disambiguation, and so forth (see &#8220;<span style="font-style: italic">Specialized Work</span>&#8221; below).</p>
<p>Some services might also re-write the instance record with corrected values or URIs returned in place of literals.</p>
<p>Best practice for external services would suggest identifying them by URI, though literals would also be allowed to identify internal checks or for lookup purposes.</p>
<p>This property is meant to be a key indicator of how third parties may want to rely on the data. Combined with <span style="font-family: monospace">hasSource</span>, these <span style="font-family: monospace">hasVetting</span> entries provide essential authority and provenance information about the data at hand.</p>
<h3>Putting it All Together</h3>
<p>This diagram attempts to show the relationship of how many of these pieces may interact:</p>
<div style="margin: 10px 0px"><a href="../wp-content/themes/ai3/images/2009Posts/090221_abox_flow.png"><img style="border: 0px solid; width: 600px; height: 297px;" title="Click to expand" src="../wp-content/themes/ai3/images/2009Posts/090221_abox_flow.png" alt="Information flow to the ABox" /></a></div>
<p>Some of these bubbles deserve some additional commentary.</p>
<h4>Hand-crafted Input</h4>
<p>An important objective in this design is to allow naïve, simple text specifications to be hand-crafted for instance records. There are many relatively simple formats for specifying key-value pairs with a relatively few conventions, ranging from BibTeX to YAML and JSON and others. There are literally hundreds of such formats available, as my earlier overview of <a style="font-style: italic" href="../?p=471">Naïve Representations and Structs</a> discussed.</p>
<p>There may be justification for still another form in relation to this <span style="font-style: italic">Instance Record Vocabulary</span> or not; this topic is still under active discussion.</p>
<h4>External Structs</h4>
<p>However, whether there is a separate format or not, that same earlier piece overviewed the many simple data <span style="font-style: italic">structs</span> presently out there. It also noted the nearly 100 existing converters for these forms to RDF. These same converters, with quite slight modifications, could all output the <span style="font-style: italic">Instance Record Vocabulary</span> in an appropriate serialization as well.</p>
<h4>Hooks to Functional and Scripting Languages</h4>
<p>Another option is to combine this design with a functional language front-end to generate these records. (Though they could be produced in other ways, as well.) For example, <a href="http://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a> or even a <a href="http://en.wikipedia.org/wiki/Domain-specific_language">domain-specific language</a> (DSL) could be used to create this very simple record generator. This simple system, in turn, could have a straightforward API that would allow existing scripting languages (such as Python or others) to be used as well.</p>
<h4>Specialized Work</h4>
<p>So, in fact, we can also now see the specialized work (see also <a href="../?p=476">Part 2</a>) that itself is not part of the ABox but can and often should be applied to the instance data in the ABox:</p>
<div class="smallIndent" style="margin: 15px">
<table style="text-align: left; margin-left: 0px; width: 90%;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top; width: 25%;">
<ul>
<li>Record sufficiency checking</li>
<li>De-duplication</li>
<li>Membership testing</li>
<li>Most specific concept identifying</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Datatype checking</li>
<li>Identity relation testing</li>
<li>New attribute checking</li>
<li>ABox consistency testing</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Data range checking</li>
<li>Disambiguation</li>
<li>Source-specific testing</li>
<li>Uniqueness testing</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>URI lookup</li>
<li>URI de-referencing</li>
<li>Satisfiability checking</li>
<li>Others . . .</li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>Though, strictly speaking, such specialty work could be seen to occur at the TBox level, it is actually different and separate logic from &#8220;standard&#8221; inferencing or reasoning. Specialized work can therefore often occur as separate tests or in batch mode with fragments of OWL or other dedicated indexes and algorithms. Some of this specialized work may take advantage of the conceptual relationships in the TBox, but may not necessarily need to do so. In these manners, the inferencing work of the TBox can be kept clean and efficient.</p>
<h3>Beyond Browsing and Unvalidated Queries</h3>
<p>Today, linked data has largely been used for browsing and providing unvalidated responses to queries; focus and attention to its ABox roles are important to move beyond this baseline into meaningful work <a href="#ld-pt4-2">[2]</a>. In those limited instances where this linked data has been looked at and evaluated as a complete knowledge base, such as the <a href="http://swse.org/">SWSE</a> search engine with the SAOR approach as discussed in <a href="../?p=477">Part 3</a>, more than 97% of the RDF triples provided in those cases were removed from consideration, often for logical or mis-assertion reasons <a href="#ld-pt4-4">[4]</a>.</p>
<p>The ideas presented here for a simpler linked data specification that can be easily represented in readable text is not new. RDF in JSON has been looked at in this way by <a href="http://n2.talis.com/wiki/RDF_JSON_Specification">Talis</a> and <a href="http://jdil.org/">JDIL</a>, <a href="http://search.cpan.org/%7Eautrijus/RDF-YAML-0.11/lib/RDF/YAML.pm">YAML</a> has been looked at similarly, and similar and <a href="http://tinytim.sourceforge.net/docs/2.0/mio/rdf-import.html">simpler approaches</a> have been looked at closely for <a href="http://www.garshol.priv.no/blog/176.html">topic maps</a>. There are other examples.</p>
<p>A key thrust of these efforts is to make it easier for the data publisher, thereby encouraging the exposure of more structured data.</p>
<p>These emerging ideas do not change in any way the usefulness of current linked data. Our suggested approach interoperates seamlessly with current practices and easily co-resides with them. But, these ideas do:</p>
<ul>
<li>Provide a simpler path for writing and publishing human-readable instance data</li>
<li>Provide an ABox instance record structure that can have much specialized work applied against it in a consistent way, and</li>
<li>Contributes to an overall logic and architecture that is performant and scalable for doing meaningful work.</li>
</ul>
<p>Though still needing further thought and refinement, this broad outline of roles and architecture and structure for the ABox completes the last missing piece to Structured Dynamics&#8217; overall approach to linked data and RDF. Much time, thought and research have gone into it. Again, we&#8217;d very much like to thank Jim Pitman for his ideas that have helped catalyze this design <a href="#ld-pt4-5">[5]</a>.</p>
<p>We think the combination of a generalized <span style="font-style: italic">Instance Record Vocabulary</span> that can be reasoned over for ABox-level data checking, and that works with a simple, text-based key-value pair input format, might be a winning combination.</p>
<hr style="margin: 15px 0px" size="1" />
<div style="margin: 10px 0pt; font-size: 90%"><a title="ld-pt4-1" name="ld-pt4-1"></a>[1] This is our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../?p=466">working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description logics</a>:</p>
<div class="boxGrayDotted">&#8220;Description logics and their semantics traditionally split <span style="font-style: italic">concepts</span> and their relationships from the different treatment of <span style="font-style: italic">instances</span> and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for <em>terminological</em> knowledge, the basis for <span style="font-style: italic">T</span> in <span style="font-style: italic">TBox</span>) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for <span style="font-style: italic">assertions</span>, the basis for <span style="font-style: italic">A</span> in <span style="font-style: italic">ABox</span>) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%"><a title="ld-pt4-2" name="ld-pt4-2"></a>[2] Heiko Stoermer, 2008. <span style="font-style: italic">Okkam: Enabling Entity-centric Information Integration in the Semantic Web</span>, Ph.D. thesis presented to the DIT &#8211; University of Trento, January 2008, 185 pp. See <a href="http://eprints.biblio.unitn.it/archive/00001394/01/dissertation_camera_ready.pdf">http://eprints.biblio.unitn.it/archive/00001394/01/dissertation_camera_ready.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%"><a title="ld-pt4-3" name="ld-pt4-3"></a>[3] Boris Motik <span style="font-style: italic">et al.</span>, eds., 2008. &#8220;OWL 2 Web Ontology Language: Profiles,&#8221; a <span style="font-style: italic">W3C Working Draft</span>, December 2, 2008. See <a href="http://www.w3.org/TR/owl2-profiles/">http://www.w3.org/TR/owl2-profiles/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%"><a title="ld-pt4-4" name="ld-pt4-4"></a>[4] Aidan Hogan, Andreas Harth and Axel Polleres, 2008. &#8220;Scalable Authoritative OWL Reasoning on a Billion Triples,&#8221; in <span style="font-style: italic">Proceedings of Billion Triple Semantic Web Challenge 2008</span>, at the <span style="font-style: italic">7th International Semantic Web Conference (ISWC2008)</span>, Karlsruhe, Germany, 2008. See <a href="http://sw.deri.org/%7Eaidanh/docs/saor_billiontc08.pdf">http://sw.deri.org/~aidanh/docs/saor_billiontc08.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%"><a title="ld-pt4-5" name="ld-pt4-5"></a>[5] This input has come as a result of research supported in part by NSF Award 0835851, <a href="http://bibkn.org/">Bibliographic Knowledge Network</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/478/making-linked-data-reasonable-using-description-logics-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
