<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; Semantic Web</title>
	<atom:link href="http://www.mkbergman.com/category/semantic-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Wed, 10 Mar 2010 05:21:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Two Contrasting Styles for the Semantic Enterprise</title>
		<link>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 15:36:49 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[semantic enterprise]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=866</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
Our Own Approach is Adaptive and Incremental
It is gratifying to see the emergence of the term semantic enterprise, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
<h2><img style="border: 0px solid; width: 225px; height: 225px; float: left; margin-right: 10px;" title="Two Faces in Circle, from http://energeticrelations.com/" src="../wp-content/themes/ai3/images/2010Posts/100214_two_faces_in_circle.jpg" alt="Two Faces in Circle, from http://energeticrelations.com/" />Our Own Approach is Adaptive and Incremental</h2>
<p>It is gratifying to see the emergence of the term <span style="font-style: italic;">semantic enterprise</span>, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single (nor best, depending on         circumstance) way to approach becoming a semantic enterprise.</p>
<p>In this piece I contrast two styles. The more traditional and familiar         one is comprehensive, complete and &#8220;engineered&#8221; in its approach. The         second, and emerging style, is more adaptive and incremental. While         <a href="http://structureddynamics.com/">Structured Dynamics</a> is a         proponent and thought leader for the adaptive style, the use and         applicability of either approach is really a function of objectives and         circumstances. The choice of approach depends on use case, and should not be a dogmatic one.</p>
<p>Any time a contrast is posed, one should be on guard about         setting up a rhetorical strawman. There may perhaps be a bit of this         flavor in this article; if so, it is unintended. It is probably best to         realize that there is a gradient &#8212; or spectrum &#8212; of possible         approaches between these contrasting styles. The real message is to         understand these differences such that you can comfortably place your         own organization at the right points along this spectrum.</p>
<h3>A Spectrum of Advantages and Differences</h3>
<p>The general idea of semantics in the enterprise preceeds the use of the         term, having been somewhat captured before by the ideas of <a href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise         application integration</a>, <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration">enterprise         information integration</a> and other concepts even related to <a href="http://en.wikipedia.org/wiki/Federated_database_system">data         federation</a> and <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a> stretching back to the 1980s. However, as a specific label, we can look         back to the first mentions in the late 1990s and more concerted         attention beginning from about 2002 or so onward <a href="#styles1">[1]</a>. As another         indicator, since 2005 the Semantic Technology Conference has given         specific prominence to the enterprise <a href="#styles2">[2]</a>.</p>
<p>Throughout this period, the sense from academic papers, many vendors,         and most pundits <a href="#styles3">[3]</a> has been on things like automated reasoning,         machine-aided decision making, aspects of artificial intelligence, and         so forth. The general tone is often framed as &#8220;revolution&#8221; or &#8220;massive         changes&#8221; or something &#8220;entirely new.&#8221; If you are a consultant or         software/implementation vendor &#8212; especially where VC money is backing         the venture with hopes for big returns and home runs &#8212; it may make         cynical sense to sell such large and costly change.</p>
<p>I believe there are circumstances where the <span style="font-style: italic;">Semantic Enterprise</span> writ this large may         make sense and be financially justified. But, this kind of &#8220;big change&#8221;         view has also seen relatively few visible (or successful) deployments.         It has colored what it means to be a semantic enterprise. And, I         believe, it has weakened market credibility by perhaps overpromising         and underdelivering. The conventional view of what it is         be a semantic enterprise deserves to be balanced.</p>
<p>So, as we balance this understanding of the semantic enterprise to one         that is more nuanced, we can contrast the characteristics of the two         apposite styles as follows:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style</td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style</td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>A focus on a more complete, comprehensive coverage of the                 semantics in the domain</li>
<li>More enterprise-wide, less partial or departmental</li>
<li>Greater emphasis on &#8220;<a href="http://en.wikipedia.org/wiki/Closed_world_assumption">closed                 world</a>&#8221; approaches <a href="#styles4">[4]</a>; more akin to relational database                 architecting and schema</li>
<li>Expansion is possible, but effort may be somewhat complex</li>
<li>A general implication is to replace or supplant existing                 information structures with semantic ones</li>
<li>Not necessarily based on semantic Web standards and                 languages <a href="#styles5">[5]</a> (<span style="font-style: italic;">e.g.</span>,                 may include <a href="http://en.wikipedia.org/wiki/Common_logic">Common Logic</a>,                 <a href="http://en.wikipedia.org/wiki/Frame_%28artificial_intelligence%29"> frame logics</a>, etc.)</li>
<li>Richer set of predicates (relations)</li>
<li>Though a distinction is maintained between                 schema and instances, their separation may not be consistently                 (physically) enforced</li>
<li>Often more complicated inferencing and logic tests</li>
<li>More complete enumeration and characterization of items</li>
<li>Much process around semantics agreement across groups</li>
<li>Fairly well-developed implementation tools, including for                 ontology engineering</li>
<li>Implementation times in months to years</li>
<li>Implementation costs akin to traditional large-scale IT                 projects</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>An emphasis on a simpler, incremental, &#8220;learn as you go&#8221;                 approach</li>
<li>Start with single departments or limited vertical apps</li>
<li>Embedded in the &#8220;<a href="http://en.wikipedia.org/wiki/Open_world_assumption">open                 world</a>&#8221; approach <a href="#styles4">[4]</a>, with incorporation of external                 information</li>
<li>Design and approach inherently allows incremental expansion                 and adaptation</li>
<li>A key premise is to build from and leverage existing                 information structures, vocabularies and assets</li>
<li>Fully based on semantic Web standards and languages <a href="#styles5">[5]</a>,                 often including linked data <a href="#styles6">[6]</a></li>
<li>Tends to start simply with hierarchical or related concepts                 (<span style="font-style: italic;">e.g.</span>, SKOS)</li>
<li>Conscious distinction in the structure for                 handling schema separate from instances <a href="#styles7">[7]</a></li>
<li>Inferencing logic based more on concept matching, or                 parent-child or part-of relationships</li>
<li>Degree of item characterization based on current scope</li>
<li>Initial semantic matching can be driven from existing                 assets</li>
<li>Fairly well-developed implementation tools, <span style="font-style: italic; text-decoration: underline;">except</span> for how to engage publics in the development process</li>
<li>Implementation times in weeks to months</li>
<li>Implementation costs driven by available budgets (and thus                 scope)</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Note we have labeled the conventional approach as the &#8220;comprehensive,         engineering&#8221; style; its contrast, and the one we position more closely to, is the         &#8220;adaptive, incremental&#8221; style.</p>
<p style="margin-left: 30px; margin-right: 30px;">[Others have posited contrasting styles, most often as "top down"         <span style="font-style: italic;">v.</span> "bottom up." However, in         one interpretation of that distinction, "top down" means a layer on top         of the existing Web <a href="#styles8">[8]</a>. On the other hand, &#8220;top down&#8221; is more often         understood in the sense of a &#8220;comprehensive, engineered&#8221; view,         consistent with my own understanding <a href="#styles9">[9]</a>. Yet no matter which  		characterization, neither captures what I feel to be the more         important considerations of mindset, logic and premise.]</p>
<p>Though the table above contrasts many points, I think there are two         main distinctions to the adaptive approach. First, it firmly embraces         the open world assumption. OWA is key to an incremental, &#8220;learn as you         go&#8221; deployment that is also well suited to incorporation of external         information. The second main distinction is to leverage and build from         existing assets.</p>
<h3>A Spectrum of Applications</h3>
<p>Yet as noted in the opening, which of these approaches makes better         sense depends on circumstance. One aspect of circumstance is available         budget and deployment times for pilots or proofs-of-concept. Another         aspect, of course, is the planned use or application         for the deployment.</p>
<p>These are by no means hard distinctions, but in general we can see         these contrasting approaches applying to the following uses:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more CWA driven)</span></td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more OWA driven)</span></td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>Bounded, &#8220;inward&#8221; applications (high degree of control and                 completeness)</li>
<li>Engineering enterprises</li>
<li>Technical domains and organizations</li>
<li>Aeronautics</li>
<li>Pharmaceuticals</li>
<li>Chemicals</li>
<li>Petroleum</li>
<li>Energy</li>
<li>A/E firms (construction)</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>External facing applications, organizations (customers,                 incorporation of external data)</li>
<li>Faceted Search</li>
<li>Taxonomy updates</li>
<li>Multi-domain master data management (MDM)</li>
<li>Simple (initially) inferencing</li>
<li>Consumer products</li>
<li>Finance</li>
<li>Health care</li>
<li>Knowledge enterprises</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>A critical distinction is the nature of the enterprise itself.         &#8220;External-facing&#8221; enterprises or functions that want or need to         incorporate much external information (say, marketing or competitive intelligence) are advised to look closely at         the adaptive approach. Organizations that have more complete control         over their circumstances should perhaps focus on the conventional         approach.</p>
<h3>Adoption Thresholds and Risks</h3>
<p>In previous writings I have pointed to the manifest benefits that can         accrue to the semantic enterprise [see, esp. <a href="#styles10">10</a>]. But we also have         witnessed nearly a decade of promotion for semantics in the enterprise,         with perhaps a lack of progress in some areas or unmet promises in         others. These raise questions and skepticism of the real eventual costs         and benefits.</p>
<p>I believe some of this skepticism is inherent with anything new &#8212; the         general IT fatigue from what the current &#8220;next great thing&#8221; might be.         But I also believe that some of this skepticism results from an         approach to semantics in the enterprise that is both lengthy to deploy         and high cost.</p>
<p>The key advantage of the adaptive, incremental approach is that the         whole IT game in the enterprise can change. An open world approach         enables adoption as it proves itself and as budgets allow. Commitments         made under this approach have, in essence, permanent value. Past fears         and concerns about making &#8220;wrong&#8221; bets no longer apply. With learning,         targets can be re-adjusted, structure re-defined and applications         re-focused, all as new discoveries and broadening scope dictate.</p>
<p>This does not make the adaptive approach better than the conventional         one. But, it does make it less risky and, well, more <span style="font-style: italic;">adaptive</span>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles1"></a>[1] For example, the earliest Google mentions on &#8220;semantic enterprise&#8221;         date to about 1998 or 1999. In 2002, the University of Georgia and Amit         Sheth offered the first known academic course on the Semantic         Enterprise; see <a href="http://lsdis.cs.uga.edu/SemanticEnterprise/">http://lsdis.cs.uga.edu/SemanticEnterprise/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles2"></a>[2] See the conference guide for the <a href="http://www.wilshireconferences.com/webfiles/STC05/Stc05Final.pdf">Semantic         Technology Conference 2005</a>. The sixth one, the <a href="http://www.semantic-conference.com/">2010 Semantic Technology         Conference</a>, is upcoming on June 21-25 in San Francisco.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles3"></a>[3] See, for example, Mitchell Ummell, ed., 2009. “The Rise of         the Semantic Enterprise,” special dedicated edition of the         <span style="font-style: italic;">Cutter IT Journal</span>, Vol. 22(9),         40 pp., September 2009. See <a href="http://www.cutter.com/offers/semanticenterprise.html">http://www.cutter.com/offers/semanticenterprise.html</a> (after filling out contact form). Partially in response to this         conventional view, I wrote <a href="#styles10">[10]</a>. In that article I offered as a working         definition that &#8220;<span style="font-style: italic;">a</span> <span style="font-weight: bold; font-style: italic;">semantic         enterprise</span> <span style="font-style: italic;">is one that adopts         the languages and standards of the</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a> <span style="font-style: italic;">. . .</span> <span style="font-style: italic;">and applies them to the issues of information         interoperability, preferably using the best practices of</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a><span style="font-style: italic;">.</span>&#8221; That happens to be Structured Dynamics&#8217;         preferred definition, though as this posting indicates, there is a         spectrum of definitions of the term.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles4"></a>[4] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> “The Open World Assumption: Elephant in the Room</a>“,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles5"></a>[5] See for example <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> , <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> and <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles6"></a>[6] <a href="http://en.wikipedia.org/wiki/Linked_data">Linked data</a> is a set of best practices for publishing and deploying instance and         class data using the RDF data model. Two of the best practices are to         name the data objects using uniform resource identifiers (URIs), and to         expose the data for access via the HTTP protocol. Both of these         practices enable the Web to become a distributed database, which also         means that Web architectures can also be readily employed.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles7"></a>[7] We use a basis in <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a> for defining the roles and splits in schema and instances.         As we define it:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles8"></a>[8] One article that got quite a bit of play a few years back was A.         Iskold, 2007. &#8220;<a href="http://www.readwriteweb.com/archives/the_top-down_semantic_web.php">Top         Down: A New Approach to the Semantic Web</a>,&#8221; in <em>ReadWrite Web</em>, Sept.         20, 2007. The problem with this terminology is that it offers a         completely different sense of &#8220;top down&#8221; to traditional uses. In         Iskold&#8217;s argument, his &#8220;top down&#8221; is a layering on top of the existing         Web.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles9"></a>[9] The more traditional view of &#8220;top down&#8221; with respect to the         semantic Web is in relation to how the system is constructed. This is         reflected well in a presentation from the <a href="http://lsdis.cs.uga.edu/SemNSF/SemWebWorkshopAgenda.htm">NSF Workshop         on DB &amp; IS Research for Semantic Web and Enterprises</a>, April 3,         2002, entitled &#8220;<a href="http://lsdis.cs.uga.edu/%7Ekashyap/talks/SWWS%20Panel.ppt">The         &#8216;Emergent, Semantic Web: Top Down Design or Bottom Up         Consensus?</a>&#8220;. Under this view, top down is design and         committee-driven; bottom up is more decentralized and based on social         processes, which is more akin to Iskold&#8217;s &#8220;top down.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles10"></a>[10] M.K. Bergman, 2009. &#8220;<a href="../825/fresh-perspectives-on-the-semantic-enterprise/">Fresh         Perspectives on the Semantic Enterprise</a>,&#8221; <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, Sept.         28, 2009.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Updates Posted to Sweet Tools, SWEETpedia</title>
		<link>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/</link>
		<comments>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 19:15:30 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[information extraction]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[owl]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[Sweet Tools]]></category>
		<category><![CDATA[sweetpedia]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=861</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Updates Posted to Sweet Tools, SWEETpedia&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-25&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/&amp;rft.language=English"></span>

Minor Updates Provided to these Standard AI3 Datasets
If you are like me, you like to clear the decks before the start of major new projects. In Structured Dynamics&#8216; case, we actually have multiple new initiatives getting underway, so the deck clearing has been especially focused this time.
As a result, we have updated Sweet   [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Updates Posted to Sweet Tools, SWEETpedia&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-25&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/&amp;rft.language=English"></span>
<p><img title="Sweet Tools Listing" src="../wp-content/themes/ai3/images/sweetsearchlogo80.png" alt="Sweet Tools Listing" hspace="5" vspace="0" width="89" height="80" align="left" /></p>
<h2>Minor Updates Provided to these Standard AI3 Datasets</h2>
<p>If you are like me, you like to clear the decks before the start of major new projects. In <a href="http://structureddynamics.com">Structured Dynamics</a>&#8216; case, we actually have multiple new initiatives getting underway, so the deck clearing has been especially focused this time.</p>
<p>As a result, we have updated <span style="color: #993300;"><strong><a href="../?page_id=325">Sweet         Tools</a></strong></span>, <span style="color: maroon;"><strong>AI3</strong></span>&#8217;s listing of semantic Web and         -related tools, with the addition of some 30 new tools, updates to others, and deletions of five expired entries. The dataset now lists 835 tools. And, as before, there is also now a new <a href="http://constructscs.com/conStruct/browse/">structured data view via conStruct</a> (pick the <span style="color: #990000; font-weight: bold;">Sweet Tools</span> dataset).</p>
<p>We have also updated <strong><a href="http://www.mkbergman.com/sweetpedia/">SWEETpedia</a></strong>, a listing of 246 research articles that use Wikipedia in one way or         another to do semantic-Web related research. Some 20 new papers were added to this update.</p>
<p>Please use the comments section on this post to suggest new tools or new research articles for inclusion in future updates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Seven Pillars of the Open Semantic Enterprise</title>
		<link>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 20:26:54 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[adaptive ontologies]]></category>
		<category><![CDATA[ontology-driven apps]]></category>
		<category><![CDATA[open semantic enterprise]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[semantic enterprise]]></category>
		<category><![CDATA[web oriented architecture]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=859</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Seven Pillars of the <i>Open Semantic Enterprise</i>&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/&amp;rft.language=English"></span>

Guideposts for How to Make the Transition
The beginning of a new year and a new decade is a perfect opportunity         to take stock of how the world is changing and how we can change with         it. Over the past [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Seven Pillars of the <i>Open Semantic Enterprise</i>&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 250px; height: 211px; float: left; margin-right: 10px;" title="Seven Pillars of the Open Semantic Enterprise" src="../wp-content/themes/ai3/images/2010Posts/100110_7pillars.png" alt="Seven Pillars of the Open Semantic Enterprise" align="left" /></p>
<h2>Guideposts for How to Make the Transition</h2>
<p>The beginning of a new year and a new decade is a perfect opportunity         to take stock of how the world is changing and how we can change with         it. Over the past year I have been writing on many foundational topics         relevant to the use of semantic technologies in enterprises.</p>
<p>In this post I bring those threads together to present a unified view         of these foundations &#8212; some seven pillars &#8212; to the <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span>.</p>
<p>By <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span> we mean an organization that uses the languages and         standards of the <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a>, including         <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a>, <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a> to integrate existing information assets,         using the best practices of <a href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a> and the <a href="http://en.wikipedia.org/wiki/Open_world_assumption">open         world assumption</a>, and targeting knowledge management applications. It         does so using some or all of the seven foundational pieces (&#8221;pillars&#8221;)         noted herein.</p>
<p>The foundational approaches to the open semantic enterprise do not necessarily mean open data nor open source (though they are suitable for these purposes with many open source tools available <a href="#ose3">[3]</a>). The techniques can equivalently be applied to internal, closed, proprietary data and structures. The techniques can themselves be used as a basis for bringing external information into the enterprise. &#8216;Open&#8217; is in reference to the critical use of the open world assumption.</p>
<p>These practices do not require replacing current systems and assets;         they can be applied equally to public or proprietary information; and         they can be tested and deployed incrementally at low risk and cost. The         very foundations of the practice encourage a learn-as-you-go approach         and active and agile adaptation. While embracing the open semantic         enterprise can lead to quite disruptive benefits and changes, it can be         accomplished as such with minimal disruption in itself. This is its         most compelling aspect.</p>
<p>Like any change in practice or learning, embracing the open semantic         enterprise is fundamentally a people process. This is the pivotal piece         to the puzzle, but also the one that does not lend itself to ready         formula about pillars or best practices. Leadership and vision is         necessary to begin the process. People are the fuel for impelling it.         So, we&#8217;ll take this fuel as a given below, and concentrate instead on         the mechanics and techniques by which this vision can be achieved. In         this sense, then, there are really <span style="font-style: italic; text-decoration: underline;">eight</span> pillars         to the open semantic enterprise, with people residing at the apex.</p>
<p>This article is synthetic, with links to (largely) my preparatory blog         postings and topics that preceded it. Assuming you are interested in         becoming one of those leaders who wants to bring the benefits of an         open semantic enterprise to your organization, I encourage you to         follow the reference links for more background and detail.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar0.png" alt="Benefits" /> A Review of the Benefits</h3>
<p>OK, so what&#8217;s the big deal about an open semantic enterprise and why         should my organization care?</p>
<p>We should first be clear that the natural scope of the open semantic         enterprise is in knowledge management and representation <a href="#ose1">[1]</a>. Suitable         applications include data federation, data warehousing, search,         enterprise information integration, business intelligence, competitive         intelligence, knowledge representation, and so forth <a href="#ose2">[2]</a>. In the         knowledge domain, the benefits for embracing the open semantic         enterprise can be summarized as <span class="double_u">greater insight</span> with <span class="double_u">lower         risk</span>, <span class="double_u">lower cost</span>, <span class="double_u">faster deployment</span>, and more <span class="double_u">agile responsiveness</span>.</p>
<p>The intersection of knowledge domain, semantic technologies and the         approaches herein means it is possible to start small in testing the         transition to a semantic enterprise. These efforts can be done         incrementally and with a focus on early, high-value applications and         domains.</p>
<p>There is absolutely no need to abandon past practices. There         is much that can be done to leverage existing assets. Indeed, those         prior investments are often the requisite starting basis to inform         semantic initiatives.</p>
<p>Embracing the pillars of the open semantic enterprise brings these knowledge management benefits:</p>
<ul>
<li>Domains can be analyzed and inspected incrementally</li>
<li>Schema can be incomplete and developed and refined incrementally</li>
<li>The data and the structures within these frameworks can be used and         expressed in a piecemeal or incomplete manner</li>
<li>Data with partial characterizations can be combined with other data         having complete characterizations</li>
<li>Systems built with these frameworks are flexible and robust; as new         information or structure is gained, it can be incorporated without         negating the information already resident, and</li>
<li>Both open and closed world subsystems can be bridged.</li>
</ul>
<p>Moreover, by building on successful Web architectures, we can also put         in place loosely coupled, distributed systems that can grow and         interoperate in a decentralized manner. These also happen to be perfect         architectures for flexible collaboration systems and networks.</p>
<p>These benefits arise both from individual pillars in the open semantic         enterprise foundation, as well as in the interactions between them.         Let&#8217;s now re-introduce these seven pillars.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar1.png" alt="Pillar #1" />Pillar         #1: The RDF Data Model</h3>
<p>As I stated on the occasion of the 10th birthday of the <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">Resource         Description Framework</a> data model, I belief RDF is the single most         important foundation to the open semantic enterprise <a href="#ose4">[4]</a>. RDF can be         applied equally to all structured, semi-structured and unstructured         content. By defining new types and predicates, it is possible to create         more expressive vocabularies within RDF. This expressiveness enables         RDF to define controlled vocabularies with exact semantics. These         features make RDF a powerful data model and language for data         federation and interoperability across disparate datasets.</p>
<p>Via various processors or extractors, RDF can capture and convey the         metadata or information in unstructured (say, text), semi-structured         (say, HTML documents) or structured sources (say, standard databases).         This makes RDF almost a “universal solvent” for         representing data structure.</p>
<p>Because of this universality, there are now more than 150 off-the-shelf         ‘RDFizers’ for converting various non-RDF notations (data         formats and serializations) to RDF <a href="#ose5">[5]</a>. Because of its diversity of         serializations and simple data model, it is also easy to create new         converters. Once in a common RDF representation, it is easy to         incorporate new datasets or new attributes. It is also easy to         aggregate disparate data sources as if they came from a single source.         This enables meaningful compositions of data from different applications         regardless of format or serialization.</p>
<p>What this practically means is that the integration layer can be based         on RDF, but that all source data and schema can still reside in their         native forms <a href="#ose6">[6]</a>. If it is easier or more convenient to author,         transfer or represent data in non-RDF forms, great <a href="#ose7">[7]</a>. RDF is only         necessary at the point of federation, and not all knowledge workers         need be versed in the framework.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar2.png" alt="Pillar #2" /> Pillar #2: Linked Data Techniques</h3>
<p>Linked data is a set of best practices for publishing and deploying         instance and class data using the RDF data model. Two of the best         practices are to name the data objects using uniform resource         identifiers (URIs), and to expose the data for access via the HTTP         protocol. Both of these practices enable the Web to become a         distributed database, which also means that Web architectures can also         be readily employed (see Pillar #5 below).</p>
<p>Linked data is applicable to public or enterprise data, open or         proprietary. It is really straightforward to employ. Structured         Dynamics has published a <a href="http://structureddynamics.com/linked_data.html">useful FAQ</a> on         linked data.</p>
<p>Additional linked data best practices relate to how to characterize and         classify data, especially in the use of predicates with the proper         semantics for establishing the degree of relatedness for linked data         items from disparate sources.</p>
<p>Linked data has been a frequent topic of this blog, including how         adding linkages creates value for existing data, with a four-part         series about a year ago on linked data best practices <a href="#ose8">[8]</a>. As advocated         by Structured Dynamics, our linked data best practices are geared to         data interconnections, interrelationships and context that is equally         useful to both humans and machine agents.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar3.png" alt="Pillar #3" /> Pillar #3: Adaptive Ontologies</h3>
<p>Ontologies are the guiding structures for how information is         interrelated and made coherent using RDF and its related schema and         ontology vocabularies, <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a> and <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> <a href="#ose10">[10]</a>.         Thousands of off-the-shelf ontologies exist &#8212; a minority of which are         suitable for re-use &#8212; and new ones appropriate to any domain or scope         at hand can be readily constructed.</p>
<p>In standard form, semantic Web ontologies may range from the small and         simple to the large and complex, and may perform the roles of defining         relationships among concepts, integrating instance data, orienting to         other knowledge and domains, or mapping to other schema <a href="#ose11">[11]</a>. These are         explicit uses in the way that we construct ontologies; we also believe         it is important to keep concept definitions and relationships expressed         separately from instance data and their attributes <a href="#ose9">[9]</a>.</p>
<p>But, in addition to these standard roles, we also look to ontologies to         stand on their own as guiding structures for ontology-driven         applications (see next pillar). With a relatively few minor and new         best practices, ontologies can take on the double role of informing         user interfaces in addition to standard information integration.</p>
<p>In this vein we term our structures <span style="font-style: italic;">adaptive ontologies</span> [<a href="#ose11">11</a>,<a href="#ose12">12</a>,<a href="#ose13">13</a>]. Some of         the user interface considerations that can be driven by adaptive         ontologies include: attribute labels and tooltips; navigation and         browsing structures and trees; menu structures; auto-completion of         entered data; contextual dropdown list choices; spell checkers; online         help systems; etc. Put another way, what makes an ontology adaptive is         to supplement the standard machine-readable purpose of ontologies to         add human-readable labels, synonyms, definitions and the like.</p>
<p>A neat trick occurs with this slight expansion of roles. The knowledge         management effort can now shift to the actual description, nature and         relationships of the information environment. In other words,         ontologies themselves become the focus of effort and development. The         KM problem no longer needs to be abstracted to the IT department or         third-party software. The actual concepts, terminology and relations         that comprise coherent ontologies now become the explicit focus of KM         activities.</p>
<p>Any existing structure (or multiples thereof) can become a starting         basis for these ontologies and their vocabularies, from spreadsheets to         naïve data structures and lists and taxonomies. So, while producing an         operating ontology that meets the best practice thresholds noted herein         has certain requirements, kicking off or contributing to this process         poses few technical or technology demands.</p>
<p>The skills needed to create these adaptive ontologies are logic,         coherent thinking and domain knowledge. That is, any subject matter         expert or knowledge worker likely has the necessary skills to         contribute to useful ontology development and refinement. With adaptive         ontologies powering ontology-driven apps (see next), we thus see a shift         in roles and responsibilities away from IT to the knowledge workers         themselves. This shift acts to democratize the knowledge management         function and flatten the organization.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar4.png" alt="Pillar #4" /> Pillar #4: Ontology-driven Applications</h3>
<p>The complement to adaptive ontologies are <span style="font-style: italic;">ontology-driven applications</span>. By         definition, ontology-driven apps are modular, generic software         applications designed to operate in accordance with the specifications         contained in an adaptive ontology. The relationships and structure of         the information driving these applications are based on the standard         functions and roles of ontologies, as supplemented by the human and         user interface roles noted above [<a href="#ose11">11</a>,<a href="#ose12">12</a>,<a href="#ose13">13</a>].</p>
<p>Ontology-driven apps fulfill specific generic tasks. Examples of         current ontology-driven apps include imports and exports in various         formats, dataset creation and management, data record creation and         management, reporting, browsing, searching, data visualization, user         access rights and permissions, and similar. These applications provide         their specific functionality in response to the specifications in the         ontologies fed to them.</p>
<p>The applications are designed more similarly to widgets or API-based         frameworks than to the dedicated software of the past, though the         dedicated functionality (<span style="font-style: italic;">e.g.</span>,         graphing, reporting, etc.) is obviously quite similar. The major change         in these ontology-driven apps is to accommodate a relatively common         abstraction layer that responds to the structure and conventions of the         guiding ontologies. The major advantage is that single generic         applications can supply shared functionality based on any properly         constructed adaptive ontology.</p>
<p>This design thus limits software brittleness and maximizes software         re-use. Moreover, as noted above, it shifts the locus of effort from         software development and maintenance to the creation and modification         of knowledge structures. The KM emphasis can shift from programming and         software to logic and terminology <a href="#ose12">[12]</a>.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar5.png" alt="Pillar #5" /> Pillar #5: A Web-oriented Architecture</h3>
<p>A Web-oriented architecture (WOA) is a subset of the <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">service-oriented         architectural</a> (SOA) style, wherein discrete functions are packaged         into modular and shareable elements (”services”) that are         made available in a distributed and loosely coupled manner. WOA uses         the representational state transfer (REST) style. REST provides         principles for how resources are defined and used and addressed with         simple interfaces without additional messaging layers such as <a href="http://en.wikipedia.org/wiki/SOAP">SOAP</a> or <a href="http://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a>. The         principles are couched within the framework of a generalized         architectural style and are not limited to the Web, though they are a         foundation to it <a href="#ose14">[14]</a>.</p>
<p>REST and WOA stand in contrast to earlier Web service styles that are         often known by the WS-* acronym (such as <a href="http://en.wikipedia.org/wiki/Web_Services_Description_Language">WSDL</a>,         <a href="http://en.wikipedia.org/wiki/List_of_Web_service_specifications">etc</a>.).         WOA has proven itself to be highly scalable and robust for         decentralized users since all messages and interactions are         self-contained.</p>
<p>Enterprises have much to learn from the Web’s success. WOA has a         simple design with REST and idempotent operations, simple messaging,         distributed and modular services, and simple interfaces. It has a         natural synergy with linked data via the use of URI identifiers and the         HTTP transport protocol. As we see with the explosion of searchable         dynamic databases exposed via the Web, so too can we envision the same         architecture and design providing a distributed framework for data         federation. Our daily experience with browser access of the Web shows         how incredibly diverse and distributed systems can meaningfully         interoperate <a href="#ose15">[15]</a>.</p>
<p>This same architecture has worked beautifully in linking documents; it         is now pointing the way to linking data; and we are seeing but the         first phases of linking people and groups together via meaningful         collaboration. While generally based on only the most rudimentary basis         of connections, today&#8217;s social networking platforms are changing the         nature of contacts and interaction.</p>
<p>The foundations herein provide a basis for marrying data and documents         in a design geared from the ground up for collaboration. These         capabilities are proven and deployable today. The only unclear aspects         will be the scale and nature of the benefits <a href="#ose16">[16]</a>.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar6.png" alt="Pillar #6" /> Pillar #6: An Incremental, Layered Approach</h3>
<p>To this point, you&#8217;ll note that we have been speaking in what are         essentially &#8220;layers&#8221;. We began with existing assets, both internal and         external, in many diverse formats. These are then converted or         transformed into RDF-capable forms. These various sources are then         exposed via a WOA Web services layer for distributed and         loosely-coupled access. Then, we integrate and federate this         information via adaptive ontologies, which then can be searched,         inspected and managed via ontology-driven apps. We have presented this         layered architecture before <a href="#ose13">[13]</a>, and have also expressed this design         in relation to current Structured Dynamics&#8217; products <a href="#ose17">[17]</a>.</p>
<p>A slight update of this layered view is presented below, made even more         general for the purposes of this foundational discussion:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091213_open_enterprise.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 500px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091213_open_enterprise.png" alt="Open Enterprise Architecture" width="982" height="818" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Semantic technology does not change or alter the fact that most         activities of the enterprise are transactional, communicative or         documentary in nature. Structured, relational data systems for         transactions or records are proven, performant and understood. On its         very face, it should be clear that the <span style="font-style: italic;">meaning</span> of these activities — their         <span style="font-style: italic;">semantics</span>, if you will —         is by nature an augmentation or added layer to how to conduct the         activities themselves.</p>
<p>This simple truth affirms that semantic technologies are not a starting         basis, then, for these activities, but a way of expressing and         interoperating their outcomes. Sure, some semantic understanding and         common vocabularies at the front end can help bring consistency and a         common language to an enterprise’s activities. This is good         practice, and the more that can be done within reason while not         stifling innovation, all the better. But we all know that the budget         department and function has its own way of doing things separate from         sales or R&amp;D. And that is perfectly OK and natural.</p>
<p>Clearly, then, an obvious benefit to the semantic enterprise is to         federate across existing data silos. This should be an objective of the         first semantic &#8220;layer&#8221;, and to do so in a way that leverages existing         information already in hand. This approach is inherently incremental;         if done right, it is also low cost and low risk.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar7.png" alt="Pillar #7" /> Pillar #7: The Open World Mindset</h3>
<p>As these pillars took shape in our thinking and arguments over the past         year, an illusive piece seemed always to be missing. It was like having         one of those meaningful dreams, and then waking up in the morning         wracking your memory trying to recall that essential, missing insight.</p>
<p>As I most recently wrote <a href="#ose1">[1]</a>, that missing piece for <span style="font-weight: bold; font-style: italic; text-decoration: underline;">this</span> story is the open world assumption (OWA). I argue that this somewhat         obscure concept holds within it the key as to why there have been         decades of too-frequent failures in the enterprise in <a href="http://en.wikipedia.org/wiki/Business_intelligence">business         intelligence</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a>,         <a href="http://en.wikipedia.org/wiki/Data_integration">data         integration</a> and <a href="http://en.wikipedia.org/wiki/Federated_database_system">federation</a>,         and <a href="http://en.wikipedia.org/wiki/Knowledge_management">knowledge         management</a>.</p>
<p>Enterprises have been captive to the mindset of traditional relational         data management and its (most often unstated) <a href="http://en.wikipedia.org/wiki/Closed_World_Assumption">closed world         assumption</a> (CWA). Given the success of relational systems for         transaction and operational systems &#8212; applications for which they are         still clearly superior &#8212; it is understandable and not surprising         that this same mindset has seemed logical for knowledge management         problems as well.  But knowledge and KM are by their nature         incomplete, changing and uncertain. A closed-world mindset carries with         it certainty and logic implications not supportable by real         circumstances.</p>
<p>This is not an esoteric point, but a fundamental one. How one thinks         about the world and evaluates it is pivotal to what can be learned and         how and with what information. Transactions require completeness and         performance; insight requires drawing connections in the face of         incompleteness or unknowns.</p>
<p>The absolute applicability of the semantic Web stack to an open-world         circumstance is the elephant in the room <a href="#ose1">[1]</a>. By itself, the open world mindset         provides no assurance of gaining insight or wisdom. But, absent it, we         place thresholds on information and understanding that may neither be         affordable nor achievable with traditional, closed-world approaches.</p>
<p>And, by either serendipity or some cosmic beauty, the open world         mindset also enables incremental development, testing and refinement.         Even if my basic argument of the open world advantage for knowledge         management purposes is wrong, we can test that premise at low cost and         risk. So, within available budget, pick a doable proof-of-concept, and         decide for yourself.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_7pillars_small.png" alt="Seven Pillars" /> The Foundations for the <span style="font-style: italic;">Open Semantic         Enterprise</span></h3>
<p>The seven pillars above are not magic bullets and each is likely not         absolutely essential. But, based on today&#8217;s understandings and with         still-emerging use cases being developed, we can see our <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span> as resulting from the interplay of these seven         factors:</p>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 414px; height: 404px;" title="Seven Pillars of the Open Semantic Enterprise" src="http://mkbergman.com/wp-content/themes/ai3/images/2010Posts/100110_ose.png" alt="Open Semantic Enterprise" width="414" height="404" /></div>
<p>Thirty years of disappointing knowledge management projects and much         wasted money and effort compel that better ways must be found. On         the other hand, until recently, too much of the semantic Web discussion         has been either revolutionary (<span style="font-style: italic;">&#8220;change everything!!&#8221;</span>) or argued from         pie-in-the-sky bases. Something needs to give.</p>
<p>Our work over the past few years &#8212; but especially as focused in the         last 12 months &#8212; tells us that meaningful semantic Web initiatives can         be mounted in the enterprise with potentially huge benefits, all at         manageable risks and costs. These seven pillars point to way to how         this might happen. What is now required is that eighth pillar &#8212; you.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose1"></a> [1] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> &#8220;The Open World Assumption: Elephant in the Room</a>&#8220;, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose2"></a> [2] In most instances, semantic technologies are poorly suited to         transactional or operational applications. Also, there are instances in         modeling specific closed-world domains where ontologies can be quite         useful, such as in aerospace, petrochemicals, engineering, etc., where         the scope of the domain can be precisely bounded and defined. Such         efforts tend to be high cost with lengthy lead times. There are vendors         who support efforts in these areas, though my company, <a href="http://structureddynamics.com/">Structured Dynamics</a>, does not. Our         focus and the more generally suitable case for semantic technologies we         believe is in knowledge representation and management.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose3"></a> [3] The standard <a style="font-weight: bold; font-style: italic; color: #990000;" href="../new-version-sweet-tools-sem-web/">Sweet         Tools</a> listing on my <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog contains more than 800 semantic Web and         -related tools, most of which are open source, which can be inspected         via filtered and faceted search.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose4"></a> [4] See, M.K. Bergman, 2009. <a href="../483/advantages-and-myths-of-rdf/">&#8220;Advantages         and Myths of RDF&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog, April 8, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose5"></a> [5] For example, see this listing of more than 150 specific <a href="http://openstructs.org/resources/rdfizers">format options</a> available as open source. These converters can also work directly with         major application APIs.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose6"></a> [6] For an expansion on RDF as a canonical data model, see further M.K.         Bergman, 2009. <a href="../533/structure-the-world/">&#8220;Structure the         World&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog, August 3, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose7"></a> [7] For example, for dataset authoring, Structured Dynamics has         developed <a href="http://openstructs.org/iron"><span style="font-style: italic; font-weight: bold;">irON</span></a>, an instance         record and object notation that can be serialized as JSON (called         <span style="font-style: italic;">irJSON</span>), XML (called         <span style="font-style: italic;">irXML</span>) or comma-separated         values (or CSV comma-delimited files, called <span style="font-style: italic;">commON</span>). The purpose of these notations is         to provide easier authoring environments and scripting support to         RDF-ready datasets. The advantage is to shield users from the nuances         of RDF. The design of <span style="font-style: italic;">commON</span> is especially geared to using spreadsheets as authoring environments         for instance record tables or simple outline structures.  See         further the <a href="http://openstructs.org/iron/iron-specification"><span style="font-style: italic; font-weight: bold;">irON</span> specification</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose8"></a> [8] For a general listing of linked data articles, please see <a href="../category/linked-data/">that category</a> on         this <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog. Specific articles of interest include the         four-part series on &#8220;Making Linked Data Reasonable Using Description         Logics&#8221; [9] (<a href="../474/making-linked-data-reasonable-using-description-logics-part-1/">February         11</a>, <a href="../476/making-linked-data-reasonable-using-description-logics-part-2/"> February 15</a>, <a href="../477/making-linked-data-reasonable-using-description-logics-part-3/"> February 18</a> and <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> February 23</a>, 2009) and the <a href="../837/the-law-of-linked-data/">&#8220;The Law of         Linked Data&#8221;</a> (October 11, 2009).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose9"></a> [9] Our best practices approach makes explicit splits between the         &#8220;<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>&#8221; (for instance         data) and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” (for ontology         schema) in accordance with our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>, a fundamental underpinning for how we use RDF:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose10"></a> [10] Those unfamiliar with the term <span style="font-style: italic;">ontology</span> might be interested in my first         introduction to the subject: M.K. Bergman, 2007. <a href="../374/an-intrepid-guide-to-ontologies/"><span style="font-style: italic;"> &#8220;</span>An Intrepid Guide to Ontologies<span style="font-style: italic;">&#8220;</span></a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, May         16, 2007.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose11"></a> [11] See M.K. Bergman, 2009. <a href="../492/ontology-best-practices-for-data-driven-applications-part-3/"> <span style="font-style: italic;">&#8220;</span>Ontologies as the         ‘Engine’ for Data-Driven Applications<span style="font-style: italic;">&#8220;</span></a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, June         10, 2009. This is the most detailed explanation, but the specific term         <span style="font-style: italic;">adaptive ontology</span> was not yet         used. The first dedicated focus on adaptive ontologies was in <a href="../553/confronting-misconceptions-with-adaptive-ontologies/"> &#8220;Confronting Misconceptions with Adaptive Ontologies&#8221;</a> (August 17,         2009). See also [12] and [13].</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose12"></a> [12] See, M.K. Bergman, 2009. <a href="../847/ontology-driven-applications-using-adaptive-ontologies/"> &#8220;Ontology-driven Applications Using Adaptive Ontologies&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         November 23, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose13"></a> [13] See, M.K. Bergman, 2009. <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> &#8220;Fresh Perspectives on the Semantic Enterprise&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         September 28, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose14"></a> [14] See, M.K. Bergman, 2009. <a href="../486/a-general-web-oriented-architecture-woa-for-structured-data/"> &#8220;A General Web-oriented Architecture (WOA) for Structured Data&#8221;</a>,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, May         3, 2009. Also, see the related <a href="../category/web-oriented-architecture-woa/">WOA         category</a> for other articles in this area.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose15"></a> [15] See, M.K. Bergman, 2008. <a href="../459/woa-a-new-enterprise-partner-for-linked-data/"> &#8220;WOA: A New Enterprise Partner for Linked Data&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         October 12, 2008.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose16"></a> [16] See, M.K. Bergman, 2009. <a href="../497/structwsf-a-framework-for-collaboration-networks/"> &#8220;structWSF: A Framework for Collaboration Networks&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, July         2, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose17"></a> [17] See <a href="http://structureddynamics.com/products.html">http://structureddynamics.com/products.html</a> for a general descriptive illustration of Structured Dynamics&#8217; product         stack. There is also a longer <a href="http://www.slideshare.net/mkbergman/structured-dynamicss-semantic-technologies-product-stack"> slideshow</a>, with particular reference to slide #37.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>The Open World Assumption: Elephant in the Room</title>
		<link>http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/</link>
		<comments>http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 04:20:14 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[closed world assumption]]></category>
		<category><![CDATA[cwa]]></category>
		<category><![CDATA[knowledge management]]></category>
		<category><![CDATA[open world assumption]]></category>
		<category><![CDATA[owa]]></category>
		<category><![CDATA[owl]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[relational model]]></category>
		<category><![CDATA[semantic enterprise]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=852</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Open World Assumption: Elephant in the Room&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-12-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/&amp;rft.language=English"></span>

OWA Enables Incremental, Low-risk Wins for the Semantic Enterprise
In speaking of the semantic Web, it is not         infrequent that the open world         assumption (OWA) gets mentioned. What this post argues is that this       [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Open World Assumption: Elephant in the Room&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-12-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 250px; height: 276px; margin-right: 10px;" title="Open World" src="../wp-content/themes/ai3/images/2009Posts/091221_open_globe_elephant.png" alt="Open World" width="367" height="405" align="left" /></p>
<h2>OWA Enables Incremental, Low-risk Wins for the Semantic Enterprise</h2>
<p>In speaking of the <a href="http://en.wikipedia.org/wiki/Semantic_web">semantic Web</a>, it is not         infrequent that the <a href="http://en.wikipedia.org/wiki/Open_world_assumption">open world         assumption</a> (OWA) gets mentioned. What this post argues is that this         somewhat obscure concept may hold within it the key as to why there         have been decades of too-frequent failures in the enterprise in         <a href="http://en.wikipedia.org/wiki/Business_intelligence">business         intelligence</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a>,         <a href="http://en.wikipedia.org/wiki/Data_integration">data         integration</a> and <a href="http://en.wikipedia.org/wiki/Federated_database_system">federation</a>,         and <a href="http://en.wikipedia.org/wiki/Knowledge_management">knowledge         management</a>.</p>
<p>This is a fairly bold assertion. In order to support it, we first need         to look to the logic and mindset assumptions associated with         traditional relational data management and the semantic Web. We then         need to look to the nature of knowledge itself and its relation to data         federation. It is in this intersection that the key of decades of         faulty premises may reside.</p>
<p>The main argument is that the <a href="http://en.wikipedia.org/wiki/Closed_World_Assumption">closed world         assumption</a> (CWA) and its prevalent mindset in traditional database         systems have hindered the ability of enterprises and the vendors that         support them to adopt incremental, low-risk means to knowledge systems         and management. CWA, in turn, has led to over-engineered schema,         too-complicated architectures and massive specification efforts that         have led to high deployment costs, blown schedules and brittleness.</p>
<p>The good news is that abandoning these failed practices and embracing         the open world approach can be done immediately based on existing         assets. Simply shifting from the closed world to open world premise         can, I argue, improve the odds for enterprise IT success in these         areas.</p>
<p>It is time to meet the elephant in the room.</p>
<h3>Scope and Some Root Causes of Enterprise IT Failures</h3>
<p>It is, of course, a bit of editorial hyperbole to label most enterprise         initiatives in business intelligence and knowledge management as being         failures over the past few decades. And, insofar as failures have         occurred, I also do not believe they are the result of vendor greed or         cynicism, or IT management mistakes or incompetence. Rather, I believe         the fault resides in the attempt to pound a square peg (relational         model) into a round hole (knowledge representation).</p>
<p>The scope of these failures is not known. We have seen anecdotal claims         of trillions of dollars in annual loses due to IT project failures         worldwide; failure rates for major IT projects in the 65% to 80%         ranges; and analysis of waste and failures in individual firms that are         fairly eye-popping <a href="#owa1">[1]</a>. The real point of this post is not to try to         quantify these problems. However, in my many years within IT it has         been a common perception and concern that many &#8212; if not most &#8212;         large-scale information technology deployments have disappointed in one         way or another.</p>
<p>These disappointments range from cost overruns, to late delivery, to         unmet objectives, or to low user acceptance. Many initiatives are         simply cancelled before any such metrics can be documented. Whatever         the absolute quantification, I think most experienced IT managers and         executives would agree that these failures and disappointments have         been all too commonplace.</p>
<div class="boxGreenDotted" style="margin: 5px 0pt 5px 10px; float: right; text-align: center; width: 400px; font-style: italic; color: #666666; font-weight: bold; font-size: 110%;">“Business       Intelligence projects are famous for low success rates, high costs and       time overruns. The economics of BI are visibly broken, and have been for       years. Yet BI remains the #1 technology priority according to       Gartner.”<span style="font-size: x-small;"><a href="#owa2">[2]</a></span></div>
<p>Why might this be?</p>
<p>I truly believe the reasons for these disappointments do not reside in         bad faith or incompetence. The potential importance of IT knowledge         projects to improve competitive position, lower costs, or aid         innovation for new markets is understood by all. <a href="http://en.wikipedia.org/wiki/Dilbert">Dilbert</a> aside, I find it         simply incomprehensible that disappointments or failures are rooted in         these causes.</p>
<p>Rather, I suspect the root cause resides in the success of the         relational model in the enterprise.</p>
<p>As transaction systems and for modeling narrowly bound and structured         domains (such as products, inventory or customer lists), the relational         model and its proven and optimized RDBMs and SQL query language have         been resounding successes. It is natural to take a successful approach         and try to extend it to other areas.</p>
<p>However, beginning with data warehouses in the 1980s, business         intelligence (BI) systems in the 1990s, and the general issue of most         enterprise information being bound up in documents for decades, the         application of the relational model to these areas has been         disappointing.</p>
<p>The reasons for this do not reside in areas such as storage or         hardware; these areas have seen remarkable improvements over the         decades. Rather, the problem resides in the nature of the relational         model itself, and its lack of suitability to knowledge-based problems.</p>
<h3>Technical Aspects of OWA, Broadly Defined</h3>
<p>I have noted the importance of the open world assumption to the         semantic enterprise in many of my more recent posts [<a href="#owa3">3</a>,<a href="#owa4">4</a>]. But I, like         many others, often refer to the open world assumption with facile         summaries such as it means that a lack of information does not imply         the missing information to be false. Yet to fully understand the         implications of OWA and many of its associated assumptions, it is         necessary to delve deeper.</p>
<p>I am using here a shorthand that poses the closed world assumption         (CWA) <span style="font-style: italic;">vs.</span> the open world         assumption (OWA). Actually, the data models behind these approaches         (<a href="http://en.wikipedia.org/wiki/Datalog">Datalog</a> or <a href="http://en.wikipedia.org/wiki/Non-monotonic_logic">non-monotonic         logic</a> in the case of CWA; <a href="http://en.wikipedia.org/wiki/Monotonic#Monotonic_logic">monotonic</a> in the case of OWA <a href="#owa5">[5]</a>; OWA is also firmly grounded in description         logics <a href="#owa4">[4]</a>) tend be coupled with a few other assumptions. I use the         shorthand of relational approach <span style="font-style: italic;">vs</span>. (open) semantic Web approach to         contrast these two models.</p>
<p>There are instances where the relational model can embrace the open         world assumption (for example, the <a href="http://en.wikipedia.org/wiki/Null_%28SQL%29">null in SQL</a>) and         there are instances where semantic Web approaches can be closed world         (as with frame logic or Prolog or other special considerations; see         conclusion). But, as generally applied and as generally understood,         this contrast between typical relational practice and the semantic Web         (based on RDF and OWL) tends to hold.</p>
<p>From a theoretical standpoint, I have found the treatment of         Patel-­Schneider and Horrocks <a href="#owa6">[6]</a> to be most useful in comparing these         approaches. However, the <span style="font-style: italic;">Description         Logics Handbook</span> and some other varied sources are also helpful         [<a href="#owa7">7</a>,<a href="#owa5">5</a>]. Much of the technical aspects summarized in the table below are         from these sources; I refer you to these sources for more informed         technical discussions:</p>
<table class="center_ok" style="text-align: left; width: 620px;" border="1" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td style="vertical-align: top; font-weight: bold; font-size: 100%; width: 300px; text-align: center; background-color: #ffffcc;">Relational Approach</td>
<td style="vertical-align: top; font-weight: bold; font-size: 100%; width: 300px; text-align: center; background-color: #ffffcc;">(Open) Semantic Web Approach</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Closed World Assumption (CWA)</p>
<p style="font-size: 90%;">That which is not known to be true is presumed to be false; it                 needs to be explicitly stated as true. <span style="font-style: italic;">Negation as failure</span> (NAF) is a                 related assumption, since it assumes as false every predicate                 that cannot be proven to be true. Under CWA, any statement not                 known to be true is false.</p>
<p style="font-size: 90%;">Everything is prohibited until it is permitted.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Open World Assumption (OWA)</p>
<p style="font-size: 90%;">The lack of a given assertion or fact being available does not                 imply whether that possible assertion is true or false: it                 simply is not known. In other words, lack of knowledge does not                 imply falsity.</p>
<p style="font-size: 90%;">Everything is permitted until it is prohibited.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Unique Name Assumption (UNA)</p>
<p style="font-size: 90%;">The unique name assumption (UNA) is premised that different                 names always refer to different entities in the world.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Duplicate Labels Allowed</p>
<p style="font-size: 90%;">OWL allows different synonym labels to be used for the same                 object; same names may refer to different objects. Identity                 assertions must be explicitly stated.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Complete Information</p>
<p style="font-size: 90%;">The data system at hand is assumed to be complete. (Missing                 information is often handled via the <a href="http://en.wikipedia.org/wiki/Null_%28SQL%29">null statement in                 SQL</a>, but that has been controversial and contentious in its                 own right.) This is also known as the <span style="font-style: italic;">domain-closure assumption</span>.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Incomplete Information</p>
<p style="font-size: 90%;">A central tenet of OWA is that information is incomplete. A                 corollary is that the attributes of specific objects or                 instances may also be incomplete or partially known.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Single Schema (one world)</p>
<p style="font-size: 90%;">A single schema is necessary to define the scope and                 interpretation of the world (domain at hand).</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Many World Interpretations</p>
<p style="font-size: 90%;">Schema and data instance assertions are kept separate. Multiple                 interpretations (worlds) for the same data are possible.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Integrity Constraints</p>
<p style="font-size: 90%;">Integrity constraints prevent “incorrect” values                 from being asserted in the relational model. It is useful for                 validation/parsing/data input and is related to the single                 model that contains only the facts asserted. Strict cardinality                 is used for checking validation.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Logical Axioms (restrictions)</p>
<p style="font-size: 90%;">Logical axioms provide restrictions through property domains                 and ranges. Everything can be true unless proven otherwise, and                 multiple possible models can satisfy the axioms. This provides                 more powerful inferencing, though can also be unintuitive at                 times. Cardinality and range restrictions exhibit different                 behavior for objects (inferred) or datatypes.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Non-monotonic Logic</p>
<p style="font-size: 90%;">The set of conclusions warranted on the basis of a given                 knowledge base does not increase (in fact, it likely shrinks)                 with the size of the knowledge base <a href="#owa5">[5]</a>.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Monotonic Logic</p>
<p style="font-size: 90%;">The hypotheses of any derived fact may be freely extended with                 additional assumptions. Additional assertions tend to reduce                 the inferences or entailments that can be applied. A new piece                 of knowledge cannot reduce what is known <a href="#owa5">[5]</a>. New knowledge can                 arise through inference.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Fixed and Brittle</p>
<p style="font-size: 90%;">Changing the schema requires re-architecting the database; not                 inherently extensible.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Reusable and Extensible</p>
<p style="font-size: 90%;">Designed from the ground up to reuse existing ontologies                 (axioms) and to be extensible. Database design and management                 can be more agile, with schema evolving incrementally.</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Flat Structure; Strong Typing</p>
<p style="font-size: 90%;">Information organized into flat tables; linkages and                 connections between tables based on foreign keys or joins.                 Strong data typing orientation.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Graph Structure; Open Typing</p>
<p style="font-size: 90%;">Inherent graph structure, supporting of linkage and                 connectivity analysis. Datatypes are inherently loose, though                 axioms can add strong types. Datatypes treated in the same way                 as classes, and datatype values are treated in the same way as                 individual identiers (<span style="font-style: italic;">i.e.</span>, a data value is treated as                 referring to an object).</p>
</td>
</tr>
<tr>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Querying and Tooling</p>
<p style="font-size: 90%;">SQL and query optimizers well developed. Tooling well                 developed. Disjunction not supported; negation must be                 accommodated through approaches such as NAF. Sums and counts                 are easier due to unique name premise. Answer closure (one                 answer passable to a next calculation) is easier than OWA. Most                 tools are not suitable for any arbitrary schema.</p>
</td>
<td style="vertical-align: top;">
<p style="font-weight: bold; text-align: center; font-size: 90%;">Querying and Tooling</p>
<p style="font-size: 90%;">SPARQL and emerging rule languages used for querying;                 performance at scale and with broad distribution a concern.                 Queries require contextual information for proper set                 selection. Negation and disjunction are allowed and are                 powerful constructs. Tools generally less developed. Exciting                 opportunities for <span style="font-style: italic;">ontology-driven applications</span> working against a small set of generic tools.</p>
</td>
</tr>
</tbody>
</table>
<p>In well-characterized or self-contained domains (seats on a plane,         books in a library, customers of a company, products sold via         distribution channels), the traditional relational model works well. A         closed-world assumption is performant for transaction operations with         easier data validation. The number of negative facts about a given         domain is typically much greater than the number of the positive ones.         So, in many bounded applications, the number of negative facts is so         large that their explicit representation can become practically         impossible <a href="#owa7">[7]</a>. In such cases, it is simpler and shorter to state known         &#8220;true&#8221; statements than to enumerate all &#8220;false&#8221; conditions.</p>
<p>However, the relational model is a paradigm where the information must         be complete and it must be described by a single schema. Traditional         databases require an agreement on a schema, which must be made before         data can be stored and queried. The relational model assumes that the         only objects and relationships that exist in the domain are those that         are explicitly represented in the database, and that names uniquely         identify objects in this domain. The result of these assumptions is         that there is a <span style="font-style: italic;">single</span> (canonical) model for relational systems where objects and         relationships are in a one-to-one correspondence with the data in the         database <a href="#owa6">[6]</a>.</p>
<p>This makes CWA and its related assumptions a very poor choice when         attempting to combine information from multiple sources, to deal with         uncertainty or incompleteness in the world, or to try to integrate         internal, proprietary information with external data.</p>
<p>The process of describing an open, semantic Web &#8220;world&#8221; can proceed         incrementally, sequentially asserting new statements or conditions. The         schema in the open semantic Web &#8212; the <span style="font-style: italic;">ontology</span> &#8212; consists of sets of statements         (called axioms) that describe characteristics that must be satisfied by         the ontology designer&#8217;s idea of “reasonable” states of the         world. Formally, such statements correspond to logical sentences, and         an ontology corresponds to a logical theory <a href="#owa6">[6]</a>.</p>
<p>Irregularity and incompleteness are toxic to relational model design.         In the open semantic Web, data that is structured differently can still         be stored together via RDF triple statements (<span style="font-style: italic;">subject</span> &#8211; <span style="font-style: italic;">predicate</span> &#8211; <span style="font-style: italic;">object</span>). For example, OWA allows suppliers         without cities and names to be stored along alongside suppliers with         that information. Information can be combined about similar objects or         individuals even though they have different or non-overlapping         attributes. Duplicate checking now occurs based on the logic of the         system and not unique name evaluations. Data validation in OWA systems         can both become more complicated (via testing against restriction         statements) or partially easier (via inference).</p>
<p>It is interesting to note that the theoretical underpinnings of CWA by         Reiter <a href="#owa8">[8]</a> began to be understood about the same time (1978) that data         federation and knowledge representation (KR) activities also began to         come to the fore. CWA and later work on (for example) default reasoning         <a href="#owa5">[5]</a> appeared to have informed early work in description logics and its         alternative OWA approach. This heavily influenced the development of         the semantic Web languages RDF and OWL. However, the early path toward         KM work based on the relational model also appears to have been set in         this timeframe.</p>
<p>We are still reaping the whirlwind from this unfortunate early choice         of the relational model for KR, KM and BI purposes. Moreover, though         there is quite a bit of theoretical and logical discussion of the         alternative OWA and CWA data models, there are surprisingly few         discussions of what the implications of these models are to the         enterprise. (That is, the elephant in the room.) The next two sections         tackle this gap.</p>
<h3>The Knowledge Management Argument for OWA</h3>
<p>The above should make clear that the relational model and CWA are         appropriate for defined and bounded systems. However, many of the new         <a href="http://en.wikipedia.org/wiki/Knowledge_economy">knowledge         economy</a> challenges are anything but defined and bounded. These         applications all reside in the broad category of <a href="http://en.wikipedia.org/wiki/Knowledge_management">knowledge         management</a> (KM), and include such applications as data federation,         data warehousing, enterprise information integration, business         intelligence, competitive intelligence, knowledge representation, and         so forth.</p>
<p>Let&#8217;s looks at the characteristics of such knowledge systems and why         they are more appropriately modeled through the open world assumption         (OWA) rather than the relational model and CWA:</p>
<ul>
<li style="padding-top: 9px;"> <span style="font-style: italic; font-weight: bold;">Knowledge is           never complete</span> &#8212; gaining and using knowledge is a process,           and is never complete. A completeness assumption around knowledge is           by definition inappropriate</li>
<li style="padding-top: 9px;"> <span style="font-style: italic; font-weight: bold;">Knowledge is           found in structured, semi-structured and unstructured forms</span> &#8212;           structured databases represent only a portion of structured           information in the enterprise (spreadsheets and other non-relational           datastores provide the remainder). Further, general estimates are           that 80% of information available to enterprises reside in documents,           with a growing importance to metadata, Web pages, markup documents           and other semi-structured sources. A proper data model for knowledge           representation should be equally applicable to these various           information forms; the open semantic language of RDF is specifically           designed for this purpose</li>
<li style="padding-top: 9px;"> <span style="font-weight: bold; font-style: italic;">Knowledge can be           found anywhere</span> &#8212; the open world assumption does not imply           open information only. However, it is also just as true that relevant           information about customers, products, competitors, the environment           or virtually any knowledge-based topic can also not be gained via           internal information alone. The emergence of the Internet and the           universal availability and access to mountains of public and shared           information demands its thoughtful incorporation into KM systems.           This requirement, in turn, demands OWA data models</li>
<li style="padding-top: 9px;"> <span style="font-weight: bold; font-style: italic;">Knowledge           structure evolves with the incorporation of more information</span> &#8212; our ability to describe and understand the world or our problems           at hand requires inspection, description and definition.           Birdwatchers, botanists and experts in all domains know well how           inspection and study of specific domains leads to more discerning           understanding and &#8220;seeing&#8221; of that domain. Before learning,           everything is just a shade of green or a herb, shrub or tree to the           incipient botanist; eventually, she learns how to discern entire           families and individual plant species, all accompanied by a rich           domain language. This truth of how increased knowledge leads to more           structure and more vocabulary needs to be explicitly reflected in our           KM systems</li>
<li style="padding-top: 9px;"> <span style="font-style: italic; font-weight: bold;">Knowledge is           contextual</span> &#8212; the importance or meaning of given information           changes by perspective and context. Further, exactly the same           information may be used differently or given different importance           depending on circumstance. Still further, what is important to           describe (the &#8220;attributes&#8221;) about certain information also varies by           context and perspective. Large knowledge management initiatives that           attempt to use the relational model and single perspectives or schema           to capture this information are doomed in one of two ways:            either they fail to capture the relevant perspectives of some users;           or they take forever and massive dollars and effort to embrace all           relevant stakeholders&#8217; contexts</li>
<li style="padding-top: 9px;"> <span style="font-weight: bold; font-style: italic;">Knowledge should           be coherent</span> &#8212; <a href="../450/when-is-content-coherent/">coherence</a> is the state of having internal logical consistency. A library of           books organized by the <a href="http://en.wikipedia.org/wiki/Dewey_Decimal_Classification">Dewey           Decimal Classification</a> <span style="font-style: italic;">v.</span> the <a href="http://en.wikipedia.org/wiki/Library_of_Congress_Classification">Library           of Congress Classification</a> <span style="font-style: italic;">v.</span> the <a href="http://en.wikipedia.org/wiki/Colon_classification">Colon           classification</a> system (or others) is not inherently correct or           wrong, but it is important that whatever system is used be applied           consistently. Because of the power of OWA logics in inferencing and           entailments, whatever &#8220;world&#8221; is chosen for a given knowledge           representation should be coherent.  Fantasies such as <a href="http://en.wikipedia.org/wiki/Avatar_%282009_film%29">Avatar</a> and           the <a href="http://en.wikipedia.org/wiki/The_Lord_of_the_Rings_film_trilogy">Lord           of the Rings</a> trilogy, even though not real, can be made           believable and compelling by virtue of their coherence</li>
<li style="padding-top: 9px;"> <span style="font-weight: bold; font-style: italic;">Knowledge is           about connections</span> &#8212; the epistemological nature of <a href="http://en.wikipedia.org/wiki/Knowledge">knowledge</a> can be argued           endlessly, but I submit much of what distinguishes knowledge from           information is that knowledge makes the connections between disparate           pieces of relevant information. As these relationships accrete, the           knowledge base grows. Again, RDF and the open world approach are           essentially connective in nature. New connections and relationships           tend to break brittle relational models, and</li>
<li style="padding-top: 9px;"> <span style="font-weight: bold; font-style: italic;">Knowledge is           about its users defining its structure and use</span> &#8212; since           knowledge is a state of understanding by practitioners and experts in           a given domain, it is also important that those very same users be           active in its gathering, organization (structure) and use. Data           models that allow more direct involvement and authoring and           modification by users &#8212; as is inherently the case with RDF and OWA           approaches &#8212; bring the knowledge process closer to hand. Besides           this ability to manipulate the model directly, there are also the           immediacy advantages of incremental changes, tests and tweaks of the           OWA model. The schema consensus and delays from single-world views           inherent to CWA remove this immediacy, and often result in delays of           months or years before knowledge structures can actually be used and           tested <a href="#owa9">[9]</a>.</li>
</ul>
<p>To be sure, there are many circumstances where large stores of instance         data and their analysis are necessary for knowledge purposes. In these         cases, hybrid CWA-OWA systems (see conclusion) may make sense.</p>
<p>But, as these points emphasize, the general assembly and organization         of knowledge is open world in nature. Trying to fit KM and related         applications into the straightjacket of the relational model is folly.         The relational model and CWA for KM is the elephant in the room. Three         decades of failures and disappointments affirm this fact.</p>
<h3>The Business Argument for OWA</h3>
<p>Besides the native match of knowledge systems with OWA, there are sound         business arguments for embracing the (open) semantic enterprise as         well. These arguments can be summarized as <span class="double_u">lower risk</span>, <span class="double_u">lower         cost</span>, <span class="double_u">faster deployment</span>, and         more <span class="double_u">agile responsiveness</span>. What is         there not to love?</p>
<p>It should now be clear that it is possible to start small in testing         the transition to a semantic enterprise. These efforts can be done         incrementally and with a focus on early, high-value applications and         domains.</p>
<p>Open world does not necessarily mean open data and it does not mean         open source. Open world is simply a way to think about the information         we have and how we act on it. OWA technologies are neutral to the         question of open or public sources. The techniques can equivalently be         applied to internal, closed, proprietary data and structures. Moreover,         the technologies can themselves be used as a basis for bringing         external information into the enterprise. An open world assumption         merely asserts that we never have all necessary information and lacking         that information does not itself lead to any conclusions.</p>
<p>Further, we need not abandon past practices. There is much that can be         done to leverage existing assets. Indeed, those prior investments are         often the requisite starting basis to inform semantic initiatives.         However, in leveraging those assets, it is important that the         enterprise begin to embrace and understand the open world assumption.</p>
<p>We also see that RDF and OWL, while important behind the scenes as a         canonical data model and languages for organizing this information,         need not be exposed as such to most users. Most instance data can be         expressed as is with the data languages of choice such as XML, JSON or         whatever. We are merely using the techniques of the (open) semantic Web         as the data model to organize our information assets at hand. These         assets need not themselves be represented in the native RDF or OWL         languages.</p>
<p>Thus, open world frameworks provide some incredibly important benefits         for knowledge management applications in the enterprise:</p>
<ul>
<li>Domains can be analyzed and inspected incrementally</li>
<li>Schema can be incomplete and developed and refined incrementally</li>
<li>The data and the structures within these open world frameworks can         be used and expressed in a piecemeal or incomplete manner</li>
<li>We can readily combine data with partial characterizations with         other data having complete characterizations</li>
<li>Systems built with open world frameworks are flexible and robust;         as new information or structure is gained, it can be incorporated         without negating the information already resident, and</li>
<li>Open world systems can readily bridge or embrace closed world         subsystems.</li>
</ul>
<p>One might argue, as we believe, that the biggest impediment to the         semantic enterprise is the mind shift necessary to start thinking about         and accepting the open world premise. Again, this perspective is not         applicable to all problems and domains. But, where it is, much can be         left in place and leveraged with semantic technologies, so long as the         enterprise begins to look at these existing assets through a different         open-world lens.</p>
<p>In most real world circumstances, there is much we don&#8217;t know and we         interact in complex and external environments. Knowledge management         inherently occupies this space. Ultimately, data interoperability         implies a global context. Open world is the proper logic premise for         these circumstances. Via the OWA framework, we can readily change and         grow our conceptual understanding and coverage of the world, including         incorporation of external ontologies and data. Since this can easily         co-exist with underlying closed-world data, the semantic enterprise can         readily bridge both worlds.</p>
<p>So, we can now define the <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span> as one that embraces OWA for its knowledge management         applications and engages in rapid and low-risk testing of incremental         learning. The open world assumption is the proper framework to reverse         decades of failure and disappointment for knowledge projects in the         enterprise.</p>
<h3>Some Open Questions about OWA</h3>
<p>In our own discussions about ABox &#8211; TBox splits <a href="#owa10">[10]</a>, we have, in         essence, supported a hybrid OWA-CWA argument for the enterprise. It is         beyond the scope of this current piece to describe these approaches in         detail, but some of the options include local CWA, the addition of rule         languages and constraints to basic OWA, use of the new OWL 2,         TopQuadrant&#8217;s SPIN notation, and others <a href="#owa11">[11]</a>. I will address some of         these in a later post.</p>
<p>There are also questions about performance and scalability with open         semantic technologies. Here, too, progress is rapid, with billion         triple thresholds rapidly falling with daily reports of better         performance <a href="#owa12">[12]</a>. Fortunately, the incremental approach that we         advocate herein dovetails well with these rapid developments. There         should be no arguing the benefits of a successful incremental project         in a smaller domain, perhaps repeated across multiple domains, in         comparison to large, costly initiatives that never produce (even though         their underlying technologies are performant).</p>
<p>There are also architecture issues inherent in these OWA designs. In         one of our next posts, we return to the topic of <a href="../category/web-oriented-architecture-woa/">Web-oriented         architecture</a> and its role in support of these OWA knowledge         management initiatives.</p>
<p>In the end, there is no substitute for doing and learning. KM based on         OWA for the open semantic enterprise can be started today, in a focused         manner with tangible benefits and outcomes, at low cost and risk. Let&#8217;s         push the elephant out of the room and let the learning and doing begin.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa1" name="owa1"></a> [1] For example, see Roger Sessions,         2009. <a style="font-style: italic;" href="http://simplearchitectures.blogspot.com/2009/09/cost-of-it-failure.html"> Cost of IT Failure</a>, September 28, 2009. This analysis suggests         failure rates of 65% with a total estimated worldwide cost of $6.2         trillion in 2009. Commenters have raised questions as to what         constitutes failure and have questioned some of the analysis         assumptions. Nonetheless, even with over-estimates, the scale of the         numbers is alarming; see Jorge Dominguez, 2009. <a style="font-style: italic;" href="file:///F:/5-WebSites/All%20In%20Progress/The%20CHAOS%20Report%202009%20on%20IT%20Project%20Failure">The CHAOS         Report 2009 on IT Project Failure</a>, June 16, 2009, which indicates         combined failure and challenge rates for IT projects have ranged from         65% to 84% over the period 1994 to 2009; see Dan Galorath, 2008.         <a style="font-style: italic;" href="http://www.galorath.com/wp/software-project-failure-costs-billions-better-estimation-planning-can-help.php"> Software Project Failure Costs Billions; Better Estimation &amp;         Planning Can Help</a>, June 7, 2008. In this report, Galorath compares         and combines many of the available IT failure studies and summarizes         that 3 of 5 IT projects do not do what they were supposed to for the         expected costs, with 49% showing budget overruns, 47% showing higher         than expected maintenance costs, and 41% failing to deliver expected         business value; the anecdotal failure rate for years for IT projects         has been claimed as 80%, with business intelligence and data         warehousing particularly failure-prone areas; in 2001, a study by Mark         N. Frolick and Keith Lindsey, <a style="font-style: italic;" href="http://www.tdwi.org/research/display.aspx?ID=6592">Critical Factors         for Data Warehouse Failures</a>, for the Data Warehousing Institute         noted conventional wisdom says the failure rate of data warehousing         projects is 70 to 80 percent, with a then-recent study in the insurance         industry found a 90-percent failure rate. This report is useful for         combining many historical studies.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa2" name="owa2"></a> [2] According to this article, by Antone         Gonsalves, <a><span style="font-style: italic;">Poor Use Of Data         Integration Tools Can Waste $500,000 Annually: Gartner</span></a> (April 27, 2009), which reports on a recent Gartner Report, large         global 2000 companies, using several data integration tools with         overlapping features, can reduce costs by more than $500,000 annually         by eliminating redundant software and leveraging a shared services         model. In a further report by Roman Stanek, <a style="font-style: italic;" href="http://romanstanek.ulitzer.com/node/935202">Business Intelligence         Projects are Famous for Low Success Rates, High Costs and Time         Overruns</a> (April 25, 2009), Gartner is talking about a dirty little         secret in the world of data integration, the fact that the data         integration technology in place is based on generations of data         integration technology being layered in the enterprise over the years.         Thus, technology that was purchased to solve data integration problems,         and reduce costs, is actually making the data integration problem more         complex and no longer cost efficient.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa3" name="owa3"></a> [3] Here are some of my earlier postings         dealing in some degree with OWA: <a style="font-style: italic;" href="../847/ontology-driven-applications-using-adaptive-ontologies/"> Ontology-driven Applications Using Adaptive Ontologies</a>, November         23, 2009; <a style="font-style: italic;" href="../825/fresh-perspectives-on-the-semantic-enterprise/"> Fresh Perspectives on the Semantic Enterprise</a>, September 28, 2009;         <a style="font-style: italic;" href="../553/confronting-misconceptions-with-adaptive-ontologies/"> Confronting Misconceptions with Adaptive Ontologies</a>, August 17,         2009; <a style="font-style: italic;" href="../483/advantages-and-myths-of-rdf/">Advantages         and Myths of RDF</a>, April 8, 2009; <a style="font-style: italic;" href="../476/making-linked-data-reasonable-using-description-logics-part-2/"> Making Linked Data Reasonable using Description Logics, Part 2</a>,         February 15, 2009, which specifically relates OWA to the ABox and TBox         <a href="#owa4">[4]</a>; and, <a style="font-style: italic;" href="../441/the-role-of-umbel-stuck-in-the-middle-with-you/"> The Role of UMBEL: Stuck in the Middle with You . . .</a>, May 11,         2008.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa4" name="owa4"></a> [4] We use the reference to &#8220;<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>&#8221; and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” in accordance with         our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa5" name="owa5"></a> [5] <strong style="font-weight: normal;">A <span style="font-style: italic;">model         theory</span></strong> is a formal semantic theory which relates         expressions to interpretations. A &#8220;model&#8221; refers to a given logical         &#8220;interpretation&#8221; or &#8220;world&#8221;. (See, for example, the discussion of         interpretation in Patrick Hayes, ed., 2004. <a style="font-style: italic;" href="ttp://www.w3.org/TR/rdf-mt/">RDF Semantics         &#8211; W3C Recommendation</a>, 10 February 2004.) The logic or inference         system of classical model theory is <strong style="font-style: italic;">monotonic</strong>. That is, it has the behavior         that if S entails E then (S + T) entails E. In other words, adding         information to some prior conditions or assertions cannot invalidate a         valid entailment. The basic intuition of         model-theoretic semantics is that asserting a statement makes a claim         about the world: it is another way of saying that the world is, in         fact, so arranged as to be an interpretation which makes the statement         true. An assertion amounts to stating a constraint on the possible ways         the world might be. In comparison, a <strong style="font-style: italic;">non-monotonic</strong> logic system may include         <em>default reasoning</em>, where one assumes a &#8216;normal&#8217; general truth         unless it is contradicted by more particular information (birds         normally fly, but penguins don&#8217;t fly); <em>negation-by-failure</em>,         commonly assumed in logic programming systems, where one concludes,         from a failure to prove a proposition, that the proposition is false;         and <em>implicit closed-world assumptions</em>, often assumed in         database applications, where one concludes from a lack of information         about an entity in some corpus that the information is false         (<span style="font-style: italic;">e.g</span>., that if someone is not         listed in an employee database, that he or she is not an employee.) See         further, <a style="font-style: italic;" href="http://plato.stanford.edu/entries/logic-nonmonotonic/">Non-monotonic         Logic</a> from the <a href="http://plato.stanford.edu/">Stanford         Encyclopedia of Philosophy</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa6" name="owa6"></a> [6] Peter F. Patel-­Schneider and Ian         Horrocks, 2006. Position Paper: A Comparison of Two Modelling Paradigms         in the Semantic Web,&#8221; in <em>WWW2006</em>, May 22–-26, 2006, Edinburgh,         UK. See <a href="http://www.comlab.ox.ac.uk/people/ian.horrocks/Publications/download/2006/PaHo06a.pdf"> http://www.comlab.ox.ac.uk/people/ian.horrocks/Publications/download/2006/PaHo06a.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa7" name="owa7"></a> [7] Other resources include: Franz         Baader, Diego Calvanese, Deborah McGuiness, Daniele Nardi, and Peter         Patel-Schneider, eds., 2003. <span style="font-style: italic;">The         Description Logic Handbook: Theory, Implementation and         Applications</span>, Cambridge University Press, 2003. Online access to         much of the book is available at <a href="http://www.inf.unibz.it/%7Efranconi/dl/course/">http://www.inf.unibz.it/~franconi/dl/course/</a>;         see esp. Chapters 1, 2, 4 and 16 relate to this topic; Jos de Bruijn,         Axel Polleres, Ruben Lara and Dieter Fensel, 2005. <a style="font-style: italic;" href="http://www2005.org/cdrom/docs/p623.pdf">OWL         DL vs. OWL Flight: Conceptual Modeling and Reasoning for the Semantic         Web</a>, in <span style="font-style: italic;">Proceedings</span> <span style="font-style: italic;">of the Ninth World Wide Web         Conference</span>, Japan, May 2005. This paper argues against the use         of description logics for the semantic Web; Andrew Newman, 2007.         <a style="font-style: italic;" href="http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html"> A Relational View of the Semantic Web</a>, March 14, 2007; Hai Wang,         2006. <a style="font-style: italic;" href="http://protege.stanford.edu/conference/2006/submissions/slides/7.2wang_protege2006.pdf"> Frames and OWL Side by Side</a>, presented at the 9th International         Protégé Conference, July 23-26, 2006, Stanford, CA; Nick Drummond and         Rob Shearer, 2006. <a style="font-style: italic;" href="http://www.cs.manchester.ac.uk/%7Edrummond/presentations/OWA.pdf">The         Open World Assumption</a>, Powerpoint presentation at <span style="font-style: italic;">The Chris Date Seminar: The Closed World of         Databases Meets the Open World of the Semantic Web</span>, e-Science         Institute, Edinburgh, Scotland, 12 Ocotober 2006; Yulia Levin, 2008.         <a style="font-style: italic;" href="http://www.cs.tau.ac.il/%7Eannaz/teaching/TAU_winter08/Seminar/yulia.pdf"> Closed World Reasoning</a>, presentation at <span style="font-style: italic;">Non-classical Logics and Applications Seminar &#8211;         Winter 2008</span>, Tel Aviv University; and Pat Hayes, 2001. &#8220;Why must         the web be monotonic?&#8221;, email thread at <a href="http://lists.w3.org/Archives/Public/www-rdf-logic/2001Jul/0067.html">http://lists.w3.org/Archives/Public/www-rdf-logic/2001Jul/0067.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa8" name="owa8"></a> [8] Raymond Reiter, 1978. “On         Closed World Data Bases”, in <span style="font-style: italic;">Logic and Data Bases</span>, H. Gallaire and J.         Minker, eds., New York: Plenum Press, 55-76; see also, Raymond Reiter,         1980. &#8220;A Logic for Default Reasoning,&#8221; <em>Artificial Intelligence</em>,         13:81-132.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa9" name="owa9"></a> [9] See this Google search on <a href="http://www.google.com/custom?domains=mkbergman.com&amp;q=driven+analysis&amp;sitesearch=mkbergman.com&amp;hl=en"> ontology-driven applications</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa10" name="owa10"></a> [10] See this Google search on <a href="http://www.google.com/custom?domains=mkbergman.com&amp;q=abox+tbox&amp;sitesearch=mkbergman.com&amp;hl=en"> ABox-TBox</a> articles.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa11" name="owa11"></a> [11] See, as examples: J. Heflin and H.         Munoz-Avila, 2002. LCW-Based Agent Planning for the Semantic Web, in         <span style="font-style: italic;">AAAI &#8216;02 Workshop on Ontologies and         the Semantic Web</span>, AAAI Press, pp. 63–70. See <a href="http://www.cse.lehigh.edu/%7Eheflin/pubs/lcw-aaai02.pdf">http://www.cse.lehigh.edu/~heflin/pubs/lcw-aaai02.pdf</a> (one of the first local CWA suggestions in specific regard to the         semantic Web); K. Golden, O. Etzioni and D. Weld, D. 1994. Omnipresence         Without Omniscience: Efficient Sensor Managment for Planning, in         <span style="font-style: italic;">Proceedings of AAAI-94</span> (one of         the first to propose LCWA in general); Evren Sirin, Michael Smith and         Evan Wallace, 2008. <a style="font-style: italic;" href="http://www.webont.org/owled/2008/papers/owled2008eu_submission_30.pdf"> Integrity constraints: Opening, Closing Worlds — On Integrity         Constraints</a>, presented at <span style="font-style: italic;">OWL:         Experiences and Directions (OWLED 2008), Fifth International         Workshop</span>, Karlsruhe, Germany, October 26-27, 2008; Timothy L.         Hinrichs, Jui-Yi Kao and Michael R. Genesereth, 2009. <a style="font-style: italic;" href="http://people.cs.uchicago.edu/%7Ethinrich/papers/hinrichs2009inconsistencytr.pdf"> Inconsistency-tolerant Reasoning with Classical Logic and Large         Databases</a>, in <span style="font-style: italic;">Proceedings of the         Eighth Symposium on Abstraction, Reformulation, and Approximation         (SARA2009)</span>, July 2009; S. Gómez, C.         Chesñevar and G. Simari 2008. <a style="font-style: italic;" href="http://www.cse.unsw.edu.au/%7Ekr2008/krow-papers/gomez-ea.pdf">An         Argumentative Approach to Reasoning with Inconsistent Ontologies</a>,         in <span style="font-style: italic;">Proceedings of the KR Workshop on         Knowledge Representation and Ontologies</span> (KROW 2008), Conferences         in Research and Practice in Information Technology, Vol. 90, pp. 11-20.         Eds. T.Meyer, M. Orgun. Australian Computer Society, Sidney, Australia,         July 2008. Holger Knoblauch, <a style="font-style: italic;" href="http://composing-the-semantic-web.blogspot.com/2009/01/object-oriented-semantic-web-with-spin.html"> The Object-Oriented Semantic Web with SPIN</a>, Sunday, January 18,         2009, that discusses the SPIN (SPARQL Inferencing Notation) Modeling         Vocabulary, which is a light-weight collection of RDF properties and         classes to support the use of SPARQL to specify rules and logical         constraints.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="owa12" name="owa12"></a> [12] For example, the BigOWLIM can         perform reasoning against 12 billion explicit statements and loads         about 12,000 statements per second on a standard server; see         <a href="http://www.ontotext.com/owlim/benchmarking/lubm.html">http://www.ontotext.com/owlim/benchmarking/lubm.html</a>;         also, see Orri Erling&#8217;s blog regarding performance of the Virtuoso RDF         triple store (<a href="http://www.openlinksw.com/weblog/oerling/">http://www.openlinksw.com/weblog/oerling/</a>).         In any case, these performance benchmarks continue to rise steadily and         indicate the performance of RDF as an ontology integration layer.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>When Linked Data Rules Fail</title>
		<link>http://www.mkbergman.com/846/when-linked-data-rules-fail/</link>
		<comments>http://www.mkbergman.com/846/when-linked-data-rules-fail/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 17:04:01 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[data.gov]]></category>
		<category><![CDATA[new york times]]></category>
		<category><![CDATA[nyt]]></category>
		<category><![CDATA[vocabularies]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=846</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/846/when-linked-data-rules-fail/&amp;rft.language=English"></span>

High Visibility Problems with NYT, data.gov Show Need for Better         Practices
When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky?         A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/846/when-linked-data-rules-fail/&amp;rft.language=English"></span>
<p><a href="http://www.adhd-mindbydesign.com/"><img style="border: 0px solid; width: 220px; height: 223px; float: left; margin-right: 10px;" title="Image Source: www.adhd-mindbydesign.com" src="../wp-content/themes/ai3/images/2009Posts/091115_disconnected.jpg" alt="Image Source: www.adhd-mindbydesign.com" hspace="5" vspace="5" align="left" /></a></p>
<h2>High Visibility Problems with NYT, data.gov Show Need for Better         Practices</h2>
<p>When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky?         A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? Do you now         think of someone being shot in an armed robbery of a local bank or         similar?</p>
<p>And, now, what if I add a reference to say, <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>,         or Minnesota Fats, or &#8220;Fast Eddie&#8221; Felson? Do you now see the         connection to a pressure-packed banked pool shot in some smoky bar         room?</p>
<p>As humans we need context to make connections and remove ambiguity. For         machines, with their limited reasoning and inference engines, context         and accurate connections are even more important.</p>
<p>Over the past few weeks we have seen announcements of two large and         high-visibility <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> projects:  One, a first release of references for articles         concerning about 5,000 people from the New York Times at <a href="http://data.nytimes.com/">data.nytimes.com</a>; and Two, a         massive exposure of 5 billion triples from <a href="http://tw.rpi.edu/">data.gov</a> datasets provided by the <a href="http://tw.rpi.edu/">Tetherless World Constellation</a> (TWC) at         <a href="http://rpi.edu/">Rennselaer Polytechnic Institute</a> (RPI).</p>
<p>On various grounds from <a href="http://go-to-hellman.blogspot.com/2009/10/new-york-times-blunders-into-linked.html"> licensing</a> to <a href="http://dowhatimean.net/2009/10/linked-data-at-the-new-york-times-exciting-but-buggy"> data characterization</a> and to creating linked data for its <a href="http://www.betaversion.org/%7Estefano/linotype/news/351/">own         sake</a>, some prominent commentators have weighed in on what is good         and what is not so good with these datasets. One of us, Mike, <a href="../843/must-read-data-smoke-and-mirrors/">commented</a> about a week ago that &#8220;we have now moved beyond &#8216;proof of concept&#8217; to         the need for actual useful data of trustworthy provenance and proper         mapping and characterization. Recent efforts are a disappointment that         no enterprise would or could rely upon.&#8221;</p>
<p>Reactions to <a href="../843/must-read-data-smoke-and-mirrors/">that         posting</a> and continued discussion on various <a href="http://lists.w3.org/Archives/Public/public-esw-thes/2009Nov/0000.html"> mailing lists</a> warrant a more precise dissection of what is wrong         and still needs to be done with these datasets <a href="#ld1">[1]</a>.</p>
<h3>Berners-Lee&#8217;s Four Linked Data &#8220;Rules&#8221;</h3>
<p>It is useful, then, to return to first principles, namely the original         four &#8220;rules&#8221; posed by Tim Berners-Lee in his design note on linked data         <a href="#ld2">[2]</a>:</p>
<ol>
<li>Use URIs as names for things</li>
<li>Use HTTP URIs so that people can look up those names</li>
<li>When someone looks up a URI, provide useful information, using the         standards (RDF, SPARQL)</li>
<li>Include links to other URIs so that they can discover more things.</li>
</ol>
<p>The first two rules are definitional to the idea of linked data. They         cement the basis of linked data in the Web, and are not at issue with         either of the two linked data projects that are the subject of this         posting.</p>
<p>However, it is the lack of specifics and guidance in the last two rules         where the breakdowns occur. Both the NYT and the RPI datasets suffer         from a lack of &#8220;providing useful information&#8221; (Rule #3). And,         the <span class="double_u">nature</span> of the links in Rule #4         is a real problem for the NYT dataset.</p>
<h3>What Constitutes &#8220;Useful Information&#8221;?</h3>
<p>The Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> expands on         &#8220;useful information&#8221; by augmenting the original rule with the         parenthetical clause, &#8221; (<span style="font-style: italic;">i.e.</span>,         a structured description — metadata).&#8221; But even that expansion is         insufficient.</p>
<p>Fundamentally, what are we talking about with linked data? Well, we are         talking about instances that are characterized by one or more         attributes. Those instances exist within contexts of various natures.         And, those contexts may relate to other existing contexts.</p>
<p>We can break this problem description down into three parts:</p>
<ul>
<li>A <span style="font-weight: bold; font-style: italic;">vocabulary</span> that defines         the nature of the instances and their descriptive attributes</li>
<li>A <span style="font-weight: bold; font-style: italic;">schema</span> of some nature         that describes the structural relationships amongst instances and their         characteristics, and, optimally,</li>
<li>A <span style="font-weight: bold; font-style: italic;">mapping</span> to existing         external schema or constructs that help place the data into context.</li>
</ul>
<p>At minimum, <span class="double_u">ANY</span> dataset exposed as         linked data needs to be described by a <span style="font-weight: bold; font-style: italic;">vocabulary</span>. Both the         NYT and RPI datasets fail on this score, as we elaborate below. Better         practice is to also provide a <span style="font-weight: bold; font-style: italic;">schema</span> of relationships         in which to embed each instance record. And, best practice is to also         <span style="font-weight: bold; font-style: italic;">map</span> those         structures to external schema.</p>
<p>Lacking this &#8220;useful information&#8221;, especially a defining vocabulary, we         cannot begin to understand whether our instances deal with drinks, bank         robberies or pool shots. This lack, in essence, makes the information         worthless, even though available via URL.</p>
<h4>The data.gov (RPI) Case</h4>
<p>With the support of NSF and various grant funding, RPI has set up the         <a href="http://data-gov.tw.rpi.edu/wiki/The_Data-gov_Wiki">Data-Gov         Wiki</a> <a href="#ld3">[3]</a>, which is in the process of converting         the datasets on <a href="http://www.data.gov/">data.gov</a> to RDF,         placing them into a semantic wiki to enable comment and annotation, and         providing that data as RSS feeds. Other demos are also being placed on         the site.</p>
<p>As of the date of this posting, the site had a <a href="http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog">catalog</a> of 116         datasets from the 800 or so available on data.gov, leading to these         statistics:</p>
<ul>
<li>459,412,419 table entries</li>
<li>5,074,932,510 triples, and</li>
<li>7,564 properties (or attributes).</li>
</ul>
<p>We&#8217;ll take one of these datasets, <a href="http://www.data.gov/details/319">#319</a>, and look a bit closer at         it:</p>
<table border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<th style="background-color: #cccccc;"> Wiki</th>
<th style="background-color: #cccccc;"> Title</th>
<th style="background-color: #cccccc;"> Agency</th>
<th style="background-color: #cccccc;"> Name</th>
<th style="background-color: #cccccc;"> data.gov Link</th>
<th style="background-color: #cccccc;"> No Properties</th>
<th style="background-color: #cccccc;"> No Triples</th>
<th style="background-color: #cccccc;"> RDF File</th>
</tr>
<tr>
<td><a title="Dataset 319" href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">Dataset 319</a></td>
<td>Consumer Expenditure Survey</td>
<td><a title="Department of Labor" href="http://data-gov.tw.rpi.edu/wiki/Department_of_Labor">Department of Labor</a></td>
<td><a title="LABOR-STAT (page does not exist)" href="http://data-gov.tw.rpi.edu/w/index.php?title=LABOR-STAT&amp;action=edit&amp;redlink=1">LABOR-STAT</a></td>
<td><a title="http://www.data.gov/details/319" rel="nofollow" href="http://www.data.gov/details/319">http://www.data.gov/details/319</a></td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">1,583,236</td>
<td><a title="http://data-gov.tw.rpi.edu/raw/319/index.rdf" rel="nofollow" href="http://data-gov.tw.rpi.edu/raw/319/index.rdf">http://data-gov.tw.rpi.edu/raw/319/index.rdf</a></td>
</tr>
</tbody>
</table>
<p>This report was picked solely because it had a small number of         attributes (properties), and is thus easier to screen capture. The         summary report on the wiki is shown by this <a href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">page</a>:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_wiki_dataset_319.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 611px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_wiki_dataset_319.png" alt="Data-gov-Wiki Dataset #319" width="1093" height="1113" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>So, we see that this specific dataset contains about 22 of the nearly         8,000 attributes across all datasets.</p>
<p>When we click on one of these attribute names, we are then taken to a         specific wiki page that only reiterates its label. There is no         definition or explanation.</p>
<p>When we inspect this page further we see that, other than the broad         characterization of the dataset itself (the bulk of the page), we see         at the bottom 22 undefined attributes with labels such as <span style="font-style: italic;">item code</span>, <span style="font-style: italic;">periodicity code</span>, <span style="font-style: italic;">seasonal</span>, and the like. These attributes         are the real structural basis for the data in this dataset.</p>
<p>But, what does all of this mean???</p>
<p>To gain a clue, now let&#8217;s go to the source data.gov site for this       <a href="http://www.data.gov/details/319">dataset (#319)</a>. Here is how       that report looks:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_data_gov_319.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 1146px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_data_gov_319.png" alt="Data.gov Dataset #319" width="1036" height="1978" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>Contained within this report we see a listing for additional <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.txt">metadata</a>. This link         tells us about the various data fields contained in this dataset; we         see many of these attributes are &#8220;codes&#8221; to various data categories.</p>
<p>Probing further into the dataset&#8217;s <a href="http://www.bls.gov/cex/">technical documentation</a>, we see that         there is indeed a rich structure underneath this report, again provided         via various code lookups. There are codes for geography, seasonality         (adjusted or not), consumer demographic profiles and a variety of         consumption categories. (See, for example, the link to this <a href="http://www.bls.gov/cex/csxgloss.htm">glossary page</a>.) These are the         keys to understanding the actual values within this dataset.</p>
<p>For example, one major dimension of the data is captured by the         attribute <span style="font-style: italic;">item_code</span>. The         survey breaks down consumption expenditures within the broad categories         of  Food, Housing, Apparel and Services, Transportation, Health         Care, Entertainment, and Other. Within a category, there is also a rich         structural breakdown. For example, expenditures for Bakery Products         within Food is given a <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.item">code</a> of FHC2.</p>
<p>But, nowhere are these codes defined or unlocked in the RDF datasets.         This absence is true for virtually all of the datasets exposed on this         wiki.</p>
<p>So, for literally billions of triples, and 8,000 attributes, we have         <span style="font-weight: bold;">ABSOLUTELY NO INFORMATION ABOUT WHAT         THE DATA CONTAINS OTHER THAN A PROPERTY LABEL</span>. There is much,         much rich value here in data.gov, but all of it remains locked up and         hidden.</p>
<p>The sad truth about this data release is that it provides absolutely no         value in its current form. We lack the keys to unlock the value.</p>
<p>To be sure, early essential spade work has been done here to begin         putting in place the conversion infrastructure for moving text files,         spreadsheets and the like to an RDF form. This is yeoman work important         to ultimate access. But, until a <span style="font-weight: bold; font-style: italic;">vocabulary</span> is published         that defines the attributes and their codes so we can unlock this         value, it will remain hidden. And only when its further value (by         connecting attributes and relations across datasets) through a         <span style="font-weight: bold; font-style: italic;">schema</span> of         some nature is also published, the real value from connecting the dots         will also remain hidden.<img style="width: 160px; height: 218px; float: right; margin-left: 10px;" title="The Hustler" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_the_hustler.jpg" alt="The Hustler" align="right" /></p>
<p>These datasets may meet the partial conditions of providing clickable         URLs, but the crucial &#8220;useful information&#8221; as to what any of this data         means is absent.</p>
<p>Every single dataset on data.gov has supporting references to text         files, PDFs, Web pages or the like that describe the nature of the data         within each dataset. Until that information is exposed and made usable,         we have no linked data.</p>
<p>Until ontologies get created from these technical documents, the value         of these data instances remain locked up, and no value can be created         from having these datasets expressed in RDF.</p>
<p>The devil lies in the details. The essential hard work has not yet         begun.</p>
<h4>The NYT Case</h4>
<p>Though at a much smaller scale with many fewer attributes, the <a href="http://data.nytimes.com/">NYT dataset</a> suffers from the same         failing: it too lacks a <span style="font-weight: bold; font-style: italic;">vocabulary</span>.</p>
<p>So, let&#8217;s take the case of one of the lead actors in <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>,         Paul Newman, who played the role of &#8220;Fast Eddie&#8221; Felson. Here is the         <a href="http://data.nytimes.com/N31738445835662083893.html">NYT         record</a> for the &#8220;person&#8221; <span style="font-style: italic;">Paul         Newman</span> (which they also refer to as <a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>).         Note the header title of <span style="font-weight: bold;">Newman,         Paul</span>:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_nyt_paul_newman.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 593px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_nyt_paul_newman.png" alt="NYT 'Paul Newman Articles' Record" width="988" height="976" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>Click on any of the internal labels used by the NYT for its own         attributes (such as <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>), and         you will be given this message:</p>
<div style="margin-left: 40px;">
<p><span style="font-style: italic;">&#8220;An RDFS description and English           language documentation for the NYT namespace will be provided soon.           Thanks for your patience.&#8221;</span></p>
</div>
<p>We again have no idea what is meant by all of this data except for the         labels used for its attributes. In this case for <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a> we have         a value of &#8220;2001-03-18&#8243;.</p>
<p>Hello? What? What is a &#8220;first use&#8221; for a &#8220;Paul Newman&#8221; of         &#8220;2001-03-18&#8243;???</p>
<p>The NYT put the cart before the horse: even if minimal, they should         have released their ontology first — or at least at the same time         — as they released their data instances. (See further <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> this discussion</a> about how an ontology creation workflow can be         incremental by starting simple and then upgrading as needed.)</p>
<h3>Links to Other Things</h3>
<p>Since there really are no links to other things on the Data-Gov Wiki,         our focus in this section continues with the NYT dataset using our same         example.</p>
<p>We now are in the territory of the fourth &#8220;rule&#8221; of linked data:         <span style="font-style: italic;">4. Include links to other URIs so         that they can discover more things</span>.</p>
<p>This will seem a bit basic at first, but before we can talk about         linking to other things, we first need to understand and define the         starting &#8220;thing&#8221; to which we are linking.</p>
<h4>What is a &#8220;Newman, Paul&#8221; Thing?</h4>
<p>Of course, without its own vocabulary, we are left to deduce what this         thing &#8220;<span style="font-weight: bold;">Newman,         Paul</span>&#8220; <span class="double_u">is</span> that is shown in the         previous screen shot. Our first clue comes from the statement that it         is of <span style="font-style: italic;">rdf:type</span> <a href="http://www.w3.org/TR/skos-reference/">SKOS</a> <span style="font-style: italic;">concept</span>. By looking to the SKOS         vocabulary, we see that <a href="http://www.w3.org/TR/skos-reference/#concepts"><span style="font-style: italic;">concept</span></a> is a class and is defined as:</p>
<p style="margin-left: 40px; font-style: italic;">A SKOS concept can be viewed as an idea or notion; a unit of thought.         However, what constitutes a unit of thought is subjective, and this         definition is meant to be suggestive, rather than restrictive. The         notion of a SKOS concept is useful when describing the conceptual or         intellectual structure of a knowledge organization system, and when         referring to specific ideas or meanings established within a KOS.</p>
<p>We also see that this instance is given a <a href="http://xmlns.com/foaf/0.1/primaryTopic">foaf:primaryTopic</a> of         <span style="font-style: italic;">Paul Newman</span>.</p>
<p>So, we can deduce so far that this instance is about the concept or         idea of <span style="font-style: italic;">Paul Newman</span>. Now,         looking to the attributes of this instance — that is the defining         properties provided by the NYT — we see the properties of         <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>,         <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>,         <a href="http://data.nytimes.com/elements/last_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>.         Completing our deductions, and in the absence of its own vocabulary, we         can now define this concept instance somewhat as follows:</p>
<p style="margin-left: 40px;"><span style="font-style: italic;">New York Times articles in the period         2001 to 2009 having as their primary topic the actor Paul Newman</span></p>
<p>(BTW, across all records in this dataset, we could see what the         earliest first use was to better deduce the time period over which         these articles have been assembled, but that has not been done.)</p>
<p>We also would re-title this instance more akin to &#8220;2001-2009 NYT         Articles with a Primary Topic of Paul Newman&#8221; or some such and use URIs         more akin to this usage.</p>
<h4>sameAs Woes</h4>
<p>Thus, in order to make links or connections with other data, it is         essential to understand what the nature is of the subject &#8220;thing&#8221; at         hand. There is much confusion about actual &#8220;things&#8221; and the references         to &#8220;things&#8221; and what is the nature of a &#8220;thing&#8221; within the literature         and on mailing lists.</p>
<p>Our belief and usage in matters of the semantic Web is that all         &#8220;things&#8221; we deal with are a reference to whatever the &#8220;true&#8221;, actual         thing is. The question then becomes:  What is the nature (or         scope) of this referent?</p>
<p>There are actually quite easy ways to determine this nature. First,         look to one or more instance examples of the &#8220;thing&#8221; being referred to.         In our case above, we have the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; instance record. Then, look         to the properties (or attributes) the publisher of that record has used         to describe that thing. Again, in the case above, we have <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>,         <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>,         <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>.</p>
<p>Clearly, this instance record — that is, its nature — deals         with articles or groups of articles. The relation to <span style="font-style: italic;">Paul Newman</span> occurs as a basis of         the <span class="double_u">primary topic</span> of these articles,         and not a <span class="double_u">person</span> basis for which to         describe the instance. If the nature of the instance was indeed the         person <span style="font-style: italic;">Paul Newman</span>, then the         attributes of the record would more properly be related to &#8220;person&#8221;         properties such as age, sex, birthdate, death date, marital status,         etc.</p>
<p>This confusion by NYT as to the nature of the &#8220;things&#8221; they are         describing then leads to some very serious errors. By confusing the         topic (<span style="font-style: italic;">Paul Newman</span>) of a         record with the nature of that record (articles about topics), NYT next         misuses one of the most powerful semantic Web predicates available,         <span style="font-weight: bold;">owl:sameAs</span>.</p>
<p>By asserting in the &#8220;<span style="font-weight: bold;">Newman,         Paul</span>&#8221; record that the instance has a <span style="font-weight: bold;">sameAs</span> relationship with external records         in <a href="http://rdf.freebase.com/ns/en.paul_newman">Freebase</a> and         <a href="http://dbpedia.org/resource/Paul_Newman">DBpedia</a>, the NYT         both <a href="http://en.wikipedia.org/wiki/Entailment">entail</a>s that         properties from any of the associated records are shared and <a href="http://en.wikipedia.org/wiki/Inference">infers</a> a chain of other         types to describe the record. More precisely, the NYT is asserting that         the &#8220;thing&#8221; referred to by these instances are <strong>identical</strong> resources.</p>
<p>Thus, by the <span style="font-weight: bold;">sameA</span>s statements         in the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; record,         the NYT is also asserting that that record is an instance of all these things <a href="#id5">[5]</a>:</p>
<table border="0">
<tbody>
<tr>
<td></td>
<td>
<ul>
<li> <a rel="rdf:type" href="http://dbpedia.org/about/html/http://www.w3.org/2002/07/owl%23Thing"> owl:Thing</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Agent">foaf:Agent</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Person">foaf:Person</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Actor">dbpedia-owl:Actor</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/JewishActors">http://dbpedia.org/class/yago/JewishActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio">http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Artist">dbpedia-owl:Artist</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Person">dbpedia-owl:Person</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Person100007846">http://dbpedia.org/class/yago/Person100007846</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmDirectors">http://dbpedia.org/class/yago/AmericanFilmDirectors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/YaleUniversityAlumni">http://dbpedia.org/class/yago/YaleUniversityAlumni</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/OhioUniversityAlumni">http://dbpedia.org/class/yago/OhioUniversityAlumni</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rvVjWoZwpEbGdrcN5Y29ycA"> opencyc:en/MaleHuman</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmActors">http://dbpedia.org/class/yago/AmericanFilmActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Liberals">http://dbpedia.org/class/yago/Liberals</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/OhioActors">http://dbpedia.org/class/yago/OhioActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/UnitedStatesNavySailors">http://dbpedia.org/class/yago/UnitedStatesNavySailors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut"> http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwQB4UJwpEbGdrcN5Y29ycA"> opencyc:en/JewishPerson</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwMRyTJwpEbGdrcN5Y29ycA"> opencyc:en/ActorInMovies</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/LivingPeople">http://dbpedia.org/class/yago/LivingPeople</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Actor109765278">http://dbpedia.org/class/yago/Actor109765278</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanVegetarians">http://dbpedia.org/class/yago/AmericanVegetarians</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanPhilanthropists">http://dbpedia.org/class/yago/AmericanPhilanthropists</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/KenyonCollegeAlumni">http://dbpedia.org/class/yago/KenyonCollegeAlumni</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/WesternFilmActors">http://dbpedia.org/class/yago/WesternFilmActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/ActorsStudioAlumni">http://dbpedia.org/class/yago/ActorsStudioAlumni</a></li>
<li>and, a hundred other dbpedia_yago superClasses.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Furthermore, because of its strong, reciprocal entailments, the         <span style="font-weight: bold;">owl:sameAs</span> assertion would also         now entail that the person <span style="font-style: italic;">Paul         Newman</span> has the <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a> and         <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> attributes, clearly illogical for a &#8220;person&#8221; thing.</p>
<p>This connection is clearly wrong in both directions. <span style="font-style: italic;">Articles</span> are not <span style="font-style: italic;">persons</span> and don&#8217;t have <span style="font-style: italic;">marital status</span>; and <span style="font-style: italic;">persons</span> do not have <span style="font-style: italic;">first_uses</span>. By misapplying this         <span style="font-weight: bold;">sameAs</span> linkage relationship, we         have screwed things up in every which way. And the error began with         misunderstanding what kinds of &#8220;things&#8221; our data is about.</p>
<h4>Some Options</h4>
<p>However, there are solutions. First, the <span style="font-weight: bold;">sameAs</span> assertions, at least involving these         external resources, should be dropped.</p>
<p>Second, if linkages are still desired, a vocabulary such as <a href="http://umbel.org/">UMBEL</a> <a href="#ld4">[4]</a> could be used to         make an assertion between such a concept, and these other related         resources. So, even though these resources are not the same, they are         <strong>closely</strong> related. The UMBEL ontology helps us to define         this kind of relation between related, but non-identical, resources.</p>
<p>Instead of using the <span style="font-weight: bold;">owl:sameAs</span> property, we would suggest the usage of the <span style="font-weight: bold;">umbel:linksEntity</span>, which links a         <span style="font-weight: bold;">skos:Concept</span> to related named         entities resources. Additionally, Freebase, which also currently         asserts a <span style="font-weight: bold;">sameAs</span> relationship         to the NYT resource, could use the <span style="font-weight: bold;">umbel:isAbout</span> relationship to assert that         their resource &#8220;is about&#8221; a certain concept, which is the one defined         by the NYT.</p>
<p>Alternatively, still other external vocabularies that more precisely         capture the intent of the NYT publishers could be found, or the NYT         editors could define their own properties specifically addressing their         unique linkage interests.</p>
<h4>Other Minor Issues</h4>
<p>As a couple of additional, minor suggestions for the NYT dataset, we         would suggest:</p>
<ul>
<li>Create a <span style="font-weight: bold;">foaf:Organization</span> description of the NYT organization, then use it with <span style="font-weight: bold;">dc:creator</span> and <span style="font-weight: bold;">dcterms:rightsHolder</span> rather than using a         literal, and</li>
<li>The dual URIs such as &#8220;<a href="http://data.nytimes.com/N31738445835662083893">http://data.nytimes.com/N31738445835662083893</a>&#8221;         and &#8220;<a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>&#8221;         are not wrong in themselves, but the purpose is hard to understand. Why         does a single organization need to create multiple resources for the         <strong>identical resource,</strong> when it comes from the         same system and has the same purpose?</li>
</ul>
<h4>Re-visiting the Linkage &#8220;Rule&#8221;</h4>
<p>There are very valuable benefits from entailment, inference and logic         to be gained from linking resources. However, if the nature of the         &#8220;things&#8221; being linked — or the properties that define these         linkages — are incorrect, then very wrong logical implications         result. Great care and understanding should be applied to linkage         assertions.</p>
<h3>In the End, the Challenge is Not Linked Data, but <span style="font-style: italic; text-decoration: underline;">Connected</span> Data</h3>
<p>Our critical comments are not meant to be disrespectful and are not         being picky. The NYT and TWC are prominent institutions for which we         should expect leadership on these issues. Our criticisms (and we         believe those of others) are also not an expression of a &#8220;<a href="http://en.wikipedia.org/wiki/Hype_cycle">trough of         disillusionment</a>&#8221; as <a href="http://twitter.com/gregboutin/status/5558525462">some</a> have been         pointing out.</p>
<div class="boxYellowDotted" style="margin: 0pt 0pt 0pt 10px; float: right; width: 300px; text-align: center;">This posting has been jointly authored by <a href="http://mkbergman.com/"> Mike Bergman</a> and <a href="http://fgiasson.com/blog">Fred         Giasson</a> and simultaneously published on both of their blogs, hoping         to draw more attention to the need for better practices in publishing         linked data.</div>
<p>This posting is about poor practices, pure and simple. The time to         correct them is now. If asked, we would be pleased to help either         institution establish exemplar practices. This is not automatic, and it         is not always easy. The data.gov datasets, in particular, will require         much time and effort to get right. There is much documentation that         needs to be transitioned and expressed in semantic Web formats.</p>
<p>In a broader sense, we also seem to lack a definition of best practices         related to <span style="font-weight: bold;">vocabularies</span>,         <span style="font-weight: bold;">schema</span> and <span style="font-weight: bold;">mappings</span>. The Berners-Lee rules are         imprecise and insufficient as is. Prior best guidance documents tend to         be more how to publish and make URIs linkable, than to properly         characterize, describe and connect the data.</p>
<p>Perhaps, in part, this is a bit of a semantics issue. The challenge is         not the mechanics of <span style="font-style: italic;">linking         data</span>, but the meaning and basis for <span class="double_u">connecting</span> that data. Connections require logic and         rationality sufficient to reliably inform inference and rule-based         engines. It also needs to pass the sniff test as we &#8220;follow our nose&#8221;         by clicking the links exposed by the data.</p>
<p>It is exciting to see high-quality content such as from national         governments and major publishers like the New York Times begin to be         exposed as linked data. When this content finally gets embedded into         usable contexts, we should see manifest uses and benefits emerge. We         hope both institutions take our criticisms in that spirit.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld1" name="ld1"></a> [1] The NYT has been updated with         improvements and they fixed multiple issues from the first release. The         problems listed herein, however, still pertain after these         improvements.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld2" name="ld2"></a> [2] Tim Berners-Lee, 2006. Linked Data         (Design Issues), first posted on 2006-07-27; last updated on         2009-06-18. See <a href="http://www.w3.org/DesignIssues/LinkedData.html">http://www.w3.org/DesignIssues/LinkedData.html</a>.         Berners-Lee refers to the steps above as &#8220;rules,&#8221; but he elaborates         they are expectations of behavior. Most later citations refer to these         as &#8220;principles.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld3" name="ld3"></a> [3] Li Ding, Dominic DiFranzo, Sarah         Magidson, Deborah L. McGuinness and Jim Hendler, 2009. Data-GovWiki:         Towards Linked Government Data. See <a href="http://www.cs.vu.nl/%7Epmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf"> http://www.cs.vu.nl/~pmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld4" name="ld4"></a> [4] UMBEL <em>(Upper Mapping and Binding         Exchange Layer)</em> is a lightweight ontology structure in development         for relating Web content and data to a standard set of subject         concepts. It purpose has resulted in its creation of an associated         vocabulary geared to both class-instance and reciprocal relationships,         as well as partial or likelihood relationships. See <a href="http://umbel.org/technical_documentation.html#vocabulary">http://umbel.org/technical_documentation.html#vocabulary</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="id5"></a>[5] We&#8217;d like to thank Denny Vrandecic (see comments) for pointing out an imprecision in our original wording. This phrase was originally stated as, &#8220;Thus, by the sameAs statements in the &#8216;Newman, Paul&#8217; record, the NYT is also asserting that that record is the same as these other things.&#8221;<em> </em></div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/846/when-linked-data-rules-fail/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>A Most un-commON Way to Author Datasets</title>
		<link>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/</link>
		<comments>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 02:19:54 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[case study]]></category>
		<category><![CDATA[commON]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[structured data]]></category>
		<category><![CDATA[structWSF]]></category>
		<category><![CDATA[Sweet Tools]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=845</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A Most un-commON Way to Author Datasets&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/&amp;rft.language=English"></span>

A Case Study of Turning Spreadsheets into Structured Data Powerhouses
In a former life, I had the nickname of &#8216;Spreadsheet King&#8217; (perhaps         among others that I did not care to hear). I had gotten the nick         because of my aggressive [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A Most un-commON Way to Author Datasets&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/&amp;rft.language=English"></span>
<p><a href="http://openstructs.org/iron"><img style="border: 0px solid; width: 235px; height: 125px; float: left; margin-right: 10px;" title="irON - instance record and Object Notation" src="../wp-content/themes/ai3/images/iron_logo_235.png" alt="irON - instance record and Object Notation" hspace="5" vspace="5" align="left" /></a></p>
<h2>A Case Study of Turning Spreadsheets into Structured Data Powerhouses</h2>
<p>In a former life, I had the nickname of &#8216;Spreadsheet King&#8217; (perhaps         among others that I did not care to hear). I had gotten the nick         because of my aggressive use of spreadsheets for financial models,         competitors tracking, time series analyses, and the like. However, in         all honesty, I have encountered many others in my career much more         knowledgeable and capable with spreadsheets than I&#8217;ll ever be. So,         maybe I was really more like a minor duke or a court jester than true         nobility.</p>
<p>Yet, pro or amateur, there are perhaps 1 billion spreadsheet users         worldwide <a href="#commON1">[1]</a>, making spreadsheets undoubtedly the most prevalent data         authoring environment in existence. And, despite moans and wails about         how spreadsheets can lead to chaos, spaghetti code, or violations of         internal standards, they are here to stay.</p>
<p>Spreadsheets often begin as simple notetaking environments. With the         addition of new findings and more analysis, some of these worksheets         may evolve to become full-blown datasets. Alternatively, some         spreadsheets start from Day One as intended datasets or modeling         environments. Whatever the case, clearly there is much accumulated         information and data value &#8220;locked up&#8221; in existing spreadsheets.</p>
<p>How to &#8220;unlock&#8221; this value for sharing and collaboration was a major         stimulus to development of the <span style="font-weight: bold;">commON</span> serialization of <span style="font-weight: bold;">irON</span> (<span style="font-style: italic;">instance record</span> and <span style="font-style: italic;">Object Notation</span>) <a href="#commON2">[2]</a>. I recently published         a <a href="http://openstructs.org/iron/common-swt-annex">case study</a> <a href="#commON3">[3]</a> that describes the reasons and benefits of dataset authoring in a         spreadsheet, and provides working examples and code based on         <span style="font-style: italic;">Sweet Tools</span> <a href="#commON4">[4]</a> to aid users         in understanding and using the <span style="font-weight: bold;">commON</span> notation. I summarize portions of         that study herein.</p>
<div class="boxGreenDotted" style="margin: 5px 0pt 5px 10px; width: 240px; float: right; text-align: center;">This is the second article of a two-part series related to the recent       <span style="font-style: italic;">Sweet Tools</span> <a href="../844/sweet-tools-shatters-the-sound-barrier/">update</a>.</div>
<h3>Background on <span style="font-style: italic;">Sweet Tools</span> and         irON</h3>
<p>The dataset that is the focus of this <a href="http://openstructs.org/iron/common-swt-annex">use case</a>,         <a href="../844/sweet-tools-shatters-the-sound-barrier/"><span style="font-style: italic;">Sweet Tools</span></a>, began as an         informal tracking spreadsheet about four years ago. I began it as a way         to learn about available tools in the semantic Web and -related spaces.         I began publishing it and others found it of value so I continued to         develop it.</p>
<p>As it grew over time, however, it gained in structure and size.         Eventually, it became a reference dataset, with which many other people         desired to use and interact. The current version has well over 800         tools listed, characterized by many structured data attributes such as         type, programming language, description and so forth. As it has grown,         a formal controlled vocabulary has also evolved to bring consistency to         the characterization of many of these attributes.</p>
<p>It was natural for me to maintain this listing as a spreadsheet, which         was also reinforced when I was one of the first to adopt an <a href="../326/converting-sweet-tools-to-an-exhibit/">Exhibit         presentation</a> of the data based on a Google spreadsheet about three         years back. Here is a partial view of this spreadsheet as I maintain it         locally:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_main_screen.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 356px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_main_screen.png" alt="Sweet Tools Main Spreadsheet Screen" width="1279" height="615" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>When we began to develop <span style="font-weight: bold;">irON</span> in earnest as a simple (&#8221;naïve&#8221;) dataset authoring framework, it was         clear that a comma-separated value, or <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> <a href="#commON5">[5]</a>,         option should join the other two serializations under consideration,         XML and JSON. CSV, though less expressive and capable as a data format         than the other serializations, still has an <a href="http://en.wikipedia.org/wiki/Attribute-value_pair">attribute-value         pair</a> (also known as key-value pairs and many other variants <a href="#commON6">[6]</a>)         orientation. And, via spreadsheets, datasets can be easily authored and         inspected, while also providing a rich functional environment including         sorting, formatting, data validation, calculations, macros, etc.</p>
<p>As a dataset very familiar to us as <span style="font-weight: bold;">irON</span>&#8217;s editors, and directly relevant to         the semantic Web, <span style="font-style: italic;">Sweet Tools</span> provided a perfect prototype case study for helping to guide the         development of <span style="font-weight: bold;">irON</span>, and         specifically what came to be known as the <span style="font-weight: bold;">commON</span> serialization for <span style="font-weight: bold;">irON</span>. The <span style="font-style: italic;">Sweet Tools</span> dataset is relatively large         for a speciality source, has many different types and attributes, and         is characterized by text, images, URLs and similar.</p>
<p>The premise was that if <span style="font-style: italic;">Sweet         Tools</span> could be specified and represented in <span style="font-weight: bold;">commON</span> sufficiently to be parsed and         converted to interoperable RDF, then many similar instance-oriented         datasets could likely be so as well. Thus, as we tried and refined         notation and vocabulary, we tested applicability against the CSV         representation of <span style="font-style: italic;">Sweet Tools</span> in addition to other CSV, JSON and XML datasets.</p>
<h3>Dataset Authoring in a Spreadsheet</h3>
<p>A large portion of the <a href="http://openstructs.org/iron/common-swt-annex">case study</a> describes         the many advantages of authoring small datasets within spreadsheets.         The useful thing about the CSV format is that these full functional         capabilities of the spreadsheet are available during authoring or later         updates and modifications, but, when exported, the CSV provides a         relatively clean format for processing and parsing.</p>
<p>So, some of the reasons for small dataset authoring in a spreadsheet         include:</p>
<ul>
<li> <span style="font-style: italic;">Formatting and on-sheet           management</span> -  the first usefulness of a spreadsheet comes           from being able to format and organize the records. Records can be           given background colors to highlight distinctions (new entries, for           example); live URL links can be embedded; contents can be wrapped and           styled within cells; and the column and row heads can be &#8220;frozen&#8221;,           useful when scrolling large workspaces</li>
<li> <span style="font-style: italic;">Named blocks and sorting</span> &#8211;           named blocks are a powerful feature of modern spreadsheets, useful           for data manipulation, printing and internal referencing by formulas           and the like.  Sorting with named blocks is especially important           as an aid to check consistency of terminology, records completeness,           duplicates checks, missing value checks, and the like. Named blocks           can also be used as references in calculations. All of these features           are real time savers, especially when datasets grow large and           consistency of treatment and terminology is important</li>
<li> <span style="font-style: italic;">Multiple sheets and consolidated           access</span> &#8211; <span style="font-weight: bold;">commON</span> modules can be specified on a single worksheet or multiple worksheets           and saved as individual CSV files; because of its size and relative           complexity, the <span style="font-style: italic;">Sweet Tools</span> dataset is maintained on multiple sheets. Multi-worksheet           environments help keep related data and notes consolidated and more           easily managed on local hard drives</li>
<li> <span style="font-style: italic;">Completeness and counts</span> - the spreadsheet <span style="font-style: italic;">counta</span> function is useful to sum counts           for cell entries by both column and row, a useful aid to indicate if           an attribute or type value is missing or if a record is           incomplete.  Of course, similar helps and uses can be found for           many of the hundreds of embedded functions within a spreadsheet</li>
<li> <span style="font-style: italic;">Controlled vocabularies and data           entry validation</span> &#8211; quality datasets often hinge on consistency           and uniform values and terminology; the data validation utilities           within spreadsheets can be applied to Boolean, ranges and mins and           maxes, and to controlled vocabulary lists. Here is an example for           <span style="font-style: italic;">Sweet Tools</span>, enforcing           proper tool category assignments from a 50-item pick list:</li>
</ul>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 609px; height: 373px;" title="Controlled Vocabularies and Data Entry Validation" src="http://openstructs.org/sites/openstructs.org/files/images/swt_validation.png" alt="Controlled Vocabularies and Data Entry Validation" width="609" height="373" /></div>
<ul>
<li> <span style="font-style: italic;">Specialized functions and           macros</span> &#8211; <span>all</span> functionality of           spreadsheets may be employed in the development of <span style="font-weight: bold;">commON</span> datasets. Then, once employed,           only the values embedded within the sheets are then exported as CSV.</li>
</ul>
<h3>Staging <span style="font-style: italic;">Sweet Tools</span> for commON</h3>
<p>The next major section of the <a href="http://openstructs.org/iron/common-swt-annex">case study</a> deals         with the minor conventions that must be followed in order to stage         spreadsheets for <span style="font-weight: bold;">commON</span>. Not         much of the specific <span style="font-weight: bold;">commON</span> vocabulary or notation is discussed below; for details, see <a href="#commON7">[7]</a>.</p>
<p>Because you can create multiple worksheets within a spreadsheet, it is         not necessary to modifiy existing worksheets or tabs. Rather, if you         are reluctant or can not change existing information, merely create         parallel duplicate sheets of the source information. These duplicate         sheets have as their sole purpose export to <span style="font-weight: bold;">commON</span> CSV. You can maintain your         spreadsheet as is while staging for <span style="font-weight: bold;">commON</span>.</p>
<p>To do so, use the simple <span style="font-style: italic;">=</span> formula to create cross-references between the existing source         spreadsheet tab and the target <span style="font-weight: bold;">commON</span> CSV export tab. (You can also do         this for complete, highlighted blocks from source to target sheet.)         Then, by adding the few minor conventions of <span style="font-weight: bold;">commON</span>, you have now created a staged         export tab without modifying your source information in the slightest.</p>
<p>In standard form and for Excel and Open Office, single quotes, double         quotes and commas when entered into a spreadsheet cell are         automatically &#8216;<a href="http://en.wikipedia.org/wiki/Escape_character">escaped</a>&#8216; when         issued as CSV. <span style="font-weight: bold;">commON</span> allows         you to specify your own delimiter for lists (the standard is the pipe         &#8216;|&#8217; character) and what the parser recognizes as the &#8216;escape&#8217; character         (&#8217;\&#8217; is the standard). However, you probably should not change for most         conditions.</p>
<p>The standard <span style="font-weight: bold;">commON</span> parsers and         converters are UTF-8 compatible. If your source content has unusual         encodings, try to target UTF-8 as your canonical spreadsheet output.</p>
<p>In the <a href="http://openstructs.org/iron/iron-specification"><span style="font-weight: bold;">irON</span> specification</a> there are a         small number of defined modules or processing sections. In <span style="font-weight: bold;">commON</span>, these         modules are denoted by the double-ampersand character sequence         (&#8217;<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;</span>&#8216;),         and apply to lists of instance records (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span>),         dataset specifications and associated metadata describing the dataset         (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;dataset</span>),         and mappings of attributes and types to existing schema (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;linkage</span>).         Similarly, attributes and types are denoted by a single ampersand         prefix (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeName</span>).</p>
<p>In <span style="font-weight: bold;">commON</span>, any or all of the         modules can occur within a single CSV file or in multiple files. In any         case, the start of one of these processing modules is signaled by the         module keyword and <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;keyword</span> convention.</p>
<h4>The RecordList Module</h4>
<p>The first spreadsheet figure above shows a <span style="font-style: italic;">Sweet Tools</span> example for the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> module. The module begins with that keyword, indicating one of more         instance records will follow. Note that the first line after the         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> keyword is devoted to the listing of attributes and types for the         instance records (designated by the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeName</span> convention in the columns for the first row after the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> keyword is encountered).</p>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> format can also include the <span style="font-style: italic;">stacked</span> style (see similar Dataset example         below) in addition to the single <span style="font-style: italic;">row</span> style shown above.</p>
<p>At any rate, once a worksheet is ready with its instance records         following the straightforward <span style="font-weight: bold;">irON</span> and <span style="font-weight: bold;">commON</span> conventions, it can then be saved as         a CSV file and appropriately named. Here is an example of what this         &#8220;vanilla&#8221; CSV file now looks like when shown again in a spreadsheet:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_spreadsheet_view.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 342px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_spreadsheet_view.png" alt="Spreadsheet View of the CSV File" width="1271" height="587" /></a><span><br />
</span> <span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Alternatively, you could open this same file in a text editor. Here is         how this exact same instance record view looks in an editor:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_editor_view.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 389px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_editor_view.png" alt="Editor View of the CSV Record File" width="1251" height="657" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Note that the CSV format separates each column by the comma separator,         with escapes shown for the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;description</span> attribute when it includes a comma-separated clause. Without word wrap,         each record in this format occupies a single row (though, again, for         the <span style="font-style: italic;">stacked</span> style, multiple         entries are allowed on individual rows so long as a new instance record         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;id</span> is not encountered in the first column).</p>
<h4>The Dataset Module</h4>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;dataset</span> module defines the dataset parameters and provides very flexible         metadata attributes to describe the dataset <a href="#commON8">[8]</a>. Note the dataset         specification is exactly equivalent in form to the instance record         (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span>)         format, and also allows the single <span style="font-style: italic;">row</span> or <span style="font-style: italic;">stacked</span> styles (see these <a href="http://openstructs.org/iron/iron-specification#mozTocId223991">instance         record examples</a>), with this one being the <span style="font-style: italic;">stacked</span> style:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_dataset.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 105px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_dataset.png" alt="The Dataset Module" width="1579" height="223" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<h4>The Linkage Module</h4>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;linkage</span> module is used to map the structure of the instance records to some         structural schema, which can also include external ontologies. The         module has a simple, but specific structure.</p>
<p>Either attributes (presented as the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeList</span>)         or types (presented as the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;typeList</span>)         are listed sequentially by row until the listing is exhausted <a href="#commON8">[8]</a>. By         convention, the second column in the listing is the targeted         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;mapTo</span> value. Absent a prior <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;prefixList</span> value, the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;mapTo</span> value needs to be a full URL to the corresponding attribute or type in         some external schema:</p>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 537px; height: 595px;" title="The Linkage Module" src="http://openstructs.org/sites/openstructs.org/files/images/swt_linkage.png" alt="The Linkage Module" width="537" height="595" /></div>
<p>Notice in the case of <span style="font-style: italic;">Sweet         Tools</span> that most values are from the actual COSMO mini-ontology         underlying the listing. These need to be listed as well, since absent         the specifications in <span style="font-weight: bold;">commON</span> the system has NO knowledge of linkages and mappings.</p>
<h4>The Schema (structure) Module</h4>
<p>In its current state of development, <span style="font-weight: bold;">commON</span> does not support a spreadsheet-based         means for specifying the schema structure (lightweight ontology)         governing the datasets <a href="#commON2">[2]</a>. Another <span style="font-weight: bold;">irON</span> serialization, <span style="font-weight: bold;">irJSON</span>, does. Either via this <span style="font-weight: bold;">irJSON</span> specification or via an offline         ontology, a link reference is presently used by <span style="font-weight: bold;">commON</span> (and, therefore, <span style="font-style: italic;">Sweet Tools</span> for this case study) to         establish the governing structure of the input instance record         datasets.</p>
<p>A spreadsheet-based schema structure for <span style="font-weight: bold;">commON</span> has been designed and tested in         prototype form. <span style="font-weight: bold;">commON</span> should         be enhanced with this capability in the near future <a href="#commON8">[8]</a>.</p>
<h4>Saving and Importing</h4>
<p>If the modules are spread across more than one worksheet, then each         worksheet must be saved as its own CSV file. In the case of         <span style="font-style: italic;">Sweet Tools</span>, as exhibited by         its reference current spreadsheet, <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">sweet_tools_20091110.xls</span>,         three individual CSV files get saved. These files can be named whatever         you would like. However, it is essential that the names be remembered         for later referencing.</p>
<p>My own naming convention is to use a format of <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">appname_date_modulename.csv</span> because it sorts well in a file manager accommodating multiple versions         (dates) and keeps related files clustered. The <span style="font-style: italic;">appname</span> in the case of <span style="font-style: italic;">Sweet Tools</span> is generally <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">swt</span>.         The <span style="font-style: italic;">modulename</span> is generally         the <span style="font-style: italic;">dataset</span>, <span style="font-style: italic;">records</span>, or <span style="font-style: italic;">linkage</span> convention. I tend to use the         <span style="font-style: italic;">date</span> specification in the         YYYYMMDD format. Thus, in the case of the <span style="font-style: italic;">records</span> listings for <span style="font-style: italic;">Sweet Tools</span>, its filename could be         something like:  <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">swt_20091110_records.csv</span>.</p>
<p>Once saved, these files are now ready to be imported into a         <span style="font-weight: bold;">structWSF</span> <a href="#commON9">[9]</a> instance, which         is where the CSV parsing and conversion to interoperable RDF occurs<a href="#commON8"> [8]</a>. In this case study, we used the Drupal-based <span style="font-weight: bold;">conStruct SCS</span> system <a href="#commON10">[10]</a>. <span style="font-weight: bold;">conStruct</span> exposes the <span style="font-weight: bold;">structWSF</span> Web services via a user interface         and a user permission and access system. The actual case study write-up         offers more details about the import process.</p>
<h3>Using the Dataset</h3>
<p>We are now ready to interact with the <span style="font-style: italic;">Sweet Tools</span> structured dataset using         <span style="font-weight: bold;">conStruct</span> (assuming you have a         Drupal installation with the <span style="font-weight: bold;">conStruct</span> modules) <a href="#commON10">[10]</a>.</p>
<h4>Introduction to the App</h4>
<p>The screen capture below shows a couple of aspects of the system:</p>
<ul>
<li>First, the left hand panel (according to how this specific Drupal         install was themed) shows the various tools available to <span style="font-weight: bold;">conStruct</span>.  These include (with links         to their documentation) <a href="http://constructscs.com/documentation/instructions/search">Search</a>,         <a href="http://constructscs.com/documentation/instructions/browse">Browse</a>,         <a href="http://constructscs.com/documentation/instructions/view-record">View         Record</a>, <a href="http://constructscs.com/documentation/instructions/import">Import</a>,         <a href="http://constructscs.com/documentation/instructions/export">Export</a>,         <a href="http://constructscs.com/documentation/instructions/datasets"> Datasets</a>, <a href="http://constructscs.com/documentation/instructions/create-record">Create           Record</a>, <a href="http://constructscs.com/documentation/instructions/update-record">Update           Record</a>, <a href="http://constructscs.com/documentation/instructions/delete-record">Delete           Record</a> and <a href="http://constructscs.com/documentation/instructions/settings">Settings</a><a href="#commON11"> [11]</a>;</li>
<li>The Browse tree in the main part of the screen shows the full         mini-ontology that classifies <span style="font-style: italic;">Sweet         Tools</span>. Via simple inferencing, clicking on any parent link         displays all children projects for that category as well <span style="font-style: italic;">(click to expand)</span>:</li>
</ul>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_drupal_browse.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 1907px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_drupal_browse.png" alt="conStruct (Drupal) Browse Screen for Sweet Tools" width="1176" height="3031" /></a><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>One of the absolutely cool things about this framework is that all         tools, inferencing, user interfaces and data structure are a direct         result of the ontology(ies) underlying the system (plus the         <span style="font-weight: bold;">irON</span> instance ontology, as         well). This means that switching datasets or adding datasets causes the         entire system structure to now reflect those changes — without         lifting a finger!!</p>
<h4>Some Sample Uses</h4>
<p>Here are a few sample things you can do with these generic tools driven         by the <em>Sweet Tools</em> dataset:</p>
<ul>
<li> <a href="http://constructscs.com/conStruct/browse/">Browsing the           ontology tree</a> (then, Browse by Kind)</li>
<li>Viewing an <a href="http://constructscs.com/conStruct/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fswt%2Firon&amp;dataset=http%3A%2F%2Fconstructscs.com%2Fwsf%2Fdatasets%2F181%2F"> instance record</a></li>
<li>Viewing a <a href="http://constructscs.com/conStruct/ontology/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23KRBrowser"> Class Type Report</a></li>
<li>Viewing an <a href="http://constructscs.com/conStruct/ontology/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Firon%23description"> Attribute Report</a></li>
<li> <a href="http://constructscs.com/conStruct/search/?filter_types_3=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23KRBrowser&amp;filter_attributes_4=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23status&amp;query=new&amp;filter=on"> Searching by facet</a> (check the tabs)</li>
<li>Doing a <a href="http://constructscs.com/conStruct/search/">multi-value filtering</a> (make selections from the various tabs),</li>
<li> <a href="http://constructscs.com/conStruct/export/">Exporting           stuff</a> in a variety of formats.</li>
</ul>
<p>Note, if you access this <span style="font-weight: bold;">conStruct</span> instance you will do so as a         <span style="font-style: italic;">demo</span> user. Unfortunately, as such, you may not be able to see all of the write and update tools, which in this case are reserved for curators or admins. Recall that <span style="font-weight: bold;">structWSF</span> has a comprehensive <a href="../497/structwsf-a-framework-for-collaboration-networks/"> user access and permissions layer</a>.</p>
<h4>Exporting in Alternative Formats</h4>
<p>Of course, one of the real advantages of the <span style="font-weight: bold;">irON</span> and <span style="font-weight: bold;">structWSF</span> designs is to enable different         formats to be interchanged and to interoperate. Upon submission, the         <span style="font-weight: bold;">commON</span> format and its datasets         can then be exported in these alternate formats and serializations <a href="#commON8">[8]</a>:</p>
<ul>
<li>commON</li>
<li>irJSON</li>
<li>irXML</li>
<li>N-Triples/CSV</li>
<li>N-Triples/TSV</li>
<li>RDF+N3</li>
<li>RDF+XML</li>
</ul>
<p>As should be obvious, one of the real benefits of the <span style="font-weight: bold;">irON</span> notation &#8212; in addition to easy         dataset authoring &#8212; is the ability to more-or-less treat RDF, CSV, XML         and JSON as interoperable data formats.</p>
<h3>The Formal Case Study</h3>
<p>The formal <span style="font-style: italic;">Sweet Tools</span> case       study based on <span style="font-weight: bold;">commON</span>, with       sample download files and PDF, is available from <a style="font-style: italic;" href="http://openstructs.org/iron/common-swt-annex">Annex: A commON Case Study       using Sweet Tools, Supplementary Documentation</a> <a href="#commON3">[3]</a>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON1" name="commON1"></a> [1] In 2003, <a href="http://www.microsoft.com/presspass/press/2003/oct03/10-13vstoofficelaunchpr.mspx"> Microsoft estimated</a> its worldwide users of the Excel spreadsheet,         which then had about a 90% market share globally, at 400 million.         Others at that time estimated unauthorized use to perhaps double that         amount. There has been significant growth since then, and online         spreadsheets such as Google Docs and Zoho have also grown wildly. This         surely puts spreadsheet users globally into the 1 billion range.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON2" name="commON2"></a> [2] See Frédérick Giasson and         Michael Bergman, eds., <span style="font-style: italic;">Instance         Record and Object Notation (irON) Specification, Specification         Document</span>, version 0.82, 20 October 2009.  See <a href="http://openstructs.org/iron/iron-specification">http://openstructs.org/iron/iron-specification</a>.         Also see the <a href="http://openstructs.org/iron"><span style="font-weight: bold;">irON</span> Web site</a>, Google <a href="http://groups.google.com/group/iron-notation">discussion group</a>,         and <a href="http://code.google.com/p/iron-notation/">code distribution         site</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON3" name="commON3"></a> [3] Michael Bergman, 2009.         <span style="font-style: italic;">Annex: A commON Case Study using         Sweet Tools, Supplementary Documentation</span>, prepared by Structured         Dynamics LLC, November 10, 2009. See <a href="http://openstructs.org/iron/common-swt-annex">http://openstructs.org/iron/common-swt-annex</a>.         It may also be downloaded in PDF <a href="http://openstructs.org/sites/openstructs.org/files/downloads/common-case-study.pdf"> <img style="border: 0px solid; width: 13px; height: 16px;" src="http://openstructs.org/sites/openstructs.org/files/icons/pdfdoc.gif" alt="" /></a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON4" name="commON4"></a> [4] See Michael K. Bergman&#8217;s         <a href="http://mkbergman.com/">AI3:::Adaptive Information</a> blog,         <a href="../new-version-sweet-tools-sem-web/"><span style="font-style: italic;"> Sweet Tools (Sem Web)</span></a>. In addition, the <span style="font-weight: bold;">commON</span> version of <span style="font-style: italic;">Sweet Tools</span> is available at the <a href="http://constructscs.com/conStruct/browse/?browse=true&amp;attribute=all&amp;type=all&amp;dataset=http%3A%2F%2Fconstructscs.com%2Fwsf%2Fdatasets%2F122%2F&amp;page=0"> <span style="font-weight: bold;">conStruct</span> site</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON5" name="commON5"></a> [5] The CSV mime type is defined in         <span style="font-style: italic;">Common Format and MIME Type for         Comma-Separated Values (CSV) Files</span> [<a href="http://www.rfc-editor.org/rfc/rfc4180.txt">RFC 4180</a>]. A useful         overview of the CSV format is provided by <a title="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm" rel="nofollow" href="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm">The Comma Separated Value (CSV) File Format</a>. Also, see         that author&#8217;s related CTX reference for a discussion of how schema and         structure can be added to the basic CSV framework; see <a href="http://www.creativyst.com/Doc/Std/ctx/ctx.htm">http://www.creativyst.com/Doc/Std/ctx/ctx.htm</a>,         especially the section on the comma-delimited version (<a href="http://www.creativyst.com/Doc/Std/ctx/ctx.htm#CTC">http://www.creativyst.com/Doc/Std/ctx/ctx.htm#CTC</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON6" name="commON6"></a> [6] An <a href="http://en.wikipedia.org/wiki/Attribute-value_system">attribute-value         system</a> is a basic knowledge representation framework comprising a         table with columns designating &#8220;attributes&#8221; (also known as <span style="font-style: italic;">properties</span>, <span style="font-style: italic;">predicates</span>, <span style="font-style: italic;">features</span>, <span style="font-style: italic;">parameters</span>, <span style="font-style: italic;">dimensions</span>, <span style="font-style: italic;">characteristics</span> or <span style="font-style: italic;">independent variables</span>) and rows         designating &#8220;objects&#8221; (also known as <span style="font-style: italic;">entities</span>, <span style="font-style: italic;">instances</span>, <span style="font-style: italic;">exemplars</span>, <span style="font-style: italic;">elements</span> or <span style="font-style: italic;">dependent variables</span>). Each table cell         therefore designates the value (also known as <span style="font-style: italic;">state</span>) of a particular attribute of a         particular object. This is the basic table presentation of a         spreadsheet or relational data table.</p>
<p>Attribute-values can also be presented as pairs in a form of an         <a href="http://en.wikipedia.org/wiki/Associative_array">associative         array</a>, where the first item listed is the attribute, often followed         by a separator such as the colon, and then the value. JSON and many         simple data struct notations follow this format. This format may also         be called <span style="font-style: italic;">attribute-value         pairs</span>, <span style="font-style: italic;">key-value pairs</span>,         <span style="font-style: italic;">name-value pairs</span>, <span style="font-style: italic;">alists</span> or others. In these cases the         &#8220;object&#8221; is implied, or is introduced as the name of the array..</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON7" name="commON7"></a> [7] See especially <a style="font-style: italic;" href="http://openstructs.org/iron/iron-specification#mozTocId603499">SUB-PART         3: commON PROFILE</a> in, Frédérick Giasson and Michael Bergman, eds.,         <span style="font-style: italic;">Instance Record and Object Notation         (irON) Specification, Specification Document</span>, version 0.82, 20         October 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON8" name="commON8"></a> [8] As of the date of this case         study, some of the processing steps in the <span style="font-weight: bold;">commON</span> pipeline are manual. For example,         the parser creates an intermediate N3 file that is actually submitted         to the <span style="font-weight: bold;">structWSF</span>. Within a week         or two of publication, these capabilities should be available as a         direct import to a <span style="font-weight: bold;">structWSF</span> instance. However, there is one exception to this:  the         specification for the schema structure. That module has been         prototyped, but will not be released with the first <span style="font-weight: bold;">commON</span> upgrade. That enhancement is likely         a few weeks off from the date of this posting. Please check the         <a href="http://groups.google.com/group/iron-notation"><span style="font-weight: bold;">irON</span></a> or <a style="font-weight: bold;" href="http://groups.google.com/group/structwsf">structWSF</a> discussion groups for announcements.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON9" name="commON9"></a> [9] <a style="font-weight: bold;" href="http://openstructs.org/">structWSF</a> is a platform-independent         Web services framework for accessing and exposing structured RDF data,         with generic tools driven by underlying data structures. Its central         perspective is that of the dataset. Access and user rights are granted         around these datasets, making the framework enterprise-ready and         designed for collaboration. Since a <span style="font-weight: bold;">structWSF</span> layer may be placed over         virtually any existing datastore with Web access &#8212; including large         instance record stores in existing relational databases &#8212; it is also a         framework for Web-wide deployments and interoperability.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="commON10"></a>[10] <a style="font-weight: bold;" href="http://constructscs.com/">conStruct SCS</a> is a structured content         system built on the Drupal content management framework. <span style="font-weight: bold;">conStruct</span> enables structured data and its         controlling vocabularies (ontologies) to drive applications and user         interfaces. It is based on RDF and SD&#8217;s <span style="font-weight: bold;">structWSF</span> platform-independent Web services         framework [6]. In addition to user access control and management and a         general user interface, <span style="font-weight: bold;">conStruct</span> provides Drupal-level CRUD, data         display templating, faceted browsing, full-text search, and import and         export over structured data stores based on RDF.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="commON11"></a> [11] More Web services are being         added to <span style="font-weight: bold;">structWSF</span> on a fairly         constant basis, and the existng ones have been through a number of         upgrades.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Structured Dynamics&#8217; Product Stack</title>
		<link>http://www.mkbergman.com/842/structured-dynamics-product-stack/</link>
		<comments>http://www.mkbergman.com/842/structured-dynamics-product-stack/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 22:54:24 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Information Automation]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[scones]]></category>
		<category><![CDATA[semantic enterprise]]></category>
		<category><![CDATA[slideshow]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=842</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>

A New Slide Show Consolidates, Explains Recent Developments
Much has been happening on the Structured Dynamics front of late. Besides welcoming Steve Ardire as a senior advisor to the company, we also have been issuing a steady stream of new products from our semantic Web pipeline.
This new slide show attempts to capture these products and relate [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; width: 260px; height: 60px; float: left; margin-right: 10px;" title="Structured Dynamics LLC" src="../wp-content/themes/ai3/images/sd_logo_260.png" alt="Structured Dynamics LLC" hspace="5" vspace="5" align="left" /></a></p>
<h2>A New Slide Show Consolidates, Explains Recent Developments</h2>
<p>Much has been happening on the <a href="http://structureddynamics.com">Structured Dynamics</a> front of late. Besides welcoming <a href="http://www.linkedin.com/in/sardire">Steve Ardire</a> as a senior advisor to the company, we also have been issuing a steady stream of new <a href="http://structureddynamics.com/products.html">products</a> from our semantic Web pipeline.</p>
<p>This new slide show attempts to capture these products and relate them to the various layers in Structured Dynamics&#8217; enterprise product stack:</p>
<div class="center_ok center">
<div id="__ss_2406783" class="center_ok" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="Structured Dynamics's Semantic Technologies Product Stack" href="http://www.slideshare.net/mkbergman/structured-dynamicss-semantic-technologies-product-stack">Structured Dynamics&#8217;s Semantic Technologies Product Stack</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/mkbergman">mkbergman</a>.</div>
</div>
</div>
<p>The show indicates the role of <a href="http://structureddynamics.com/scones.html">scones</a>, <a href="http://openstructs.org/iron">irON</a>, <a href="http://openstructs.org/structwsf">structWSF</a>, <a href="http://umbel.org/">UMBEL</a>, <a href="http://constructscs.com/">conStruct</a> and others and how they leverage existing information assets to enable the semantic enterprise. And, oh, by the way, all of this is done via Web-accessible <a href="http://structureddynamics.com/linked_data.html">linked data</a> and our practical <a href="http://structureddynamics.com/technology.html">technologies</a>.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/842/structured-dynamics-product-stack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Law of Linked Data</title>
		<link>http://www.mkbergman.com/837/the-law-of-linked-data/</link>
		<comments>http://www.mkbergman.com/837/the-law-of-linked-data/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 01:16:17 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linked data law]]></category>
		<category><![CDATA[metcalfe's law]]></category>
		<category><![CDATA[network effects]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=837</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Law of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/837/the-law-of-linked-data/&amp;rft.language=English"></span>

A Marshal to Bring Order to the Town of Data Gulch
Though not the first, I have been touting the Linked Data Law for a         couple of years now [1]. But in a conversation last week, I found that         my [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Law of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/837/the-law-of-linked-data/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 150px; height: 158px; float: left; margin-right: 10px;" title="The Marshal Has Come to Town" src="../wp-content/themes/ai3/images/2009Posts/091011_deputy_marshal_badge.jpg" alt="The Marshal Has Come to Town" hspace="5" vspace="0" align="left" /></p>
<h2>A Marshal to Bring Order to the Town of Data Gulch</h2>
<p>Though not the first, I have been touting the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span> for a         couple of years now <a href="#ldl_1">[1]</a>. But in a conversation last week, I found that         my colleague did not find the premise very clear. I suspect that is due         both to cryptic language on my part and the fact no one has really         tackled the topic with focus. So, in this post, I try to redress that         and also comment on the related role of linked data in the semantic         enterprise.</p>
<p>Adding connections to existing information via linked data is a         powerful force multiplier, similar to <a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe&#8217;s law</a> for         how the value of a network increases with more users (nodes). I have         come to call this the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span>: the         value of a linked data network is proportional to the square of the         number of links between data objects.</p>
<div class="boxGreenDotted" style="margin: 5px 0pt 5px 0px; float: right; text-align: center; width: 360px;"><big style="font-style: italic; color: #006600; font-weight: bold;">&#8220;In the       network economy, the connections are as important as the nodes.&#8221;</big> <a href="#ldl_2">[2]</a></div>
<p>An early direct mention of the semantic Web and its possible ability to         generate <a href="http://en.wikipedia.org/wiki/Network_effect">network         effects</a> comes from a 2003 Mitre report for the government <a href="#ldl_3">[3]</a>. In         it, the authors state, &#8220;At present a very small proportion of the data         exposed on the web is marked up using Semantic Web vocabularies like         RDF and OWL. As more data gets mapped to ontologies, the potential         exists to achieve a &#8216;network effect&#8217;.&#8221; Prescient, for sure.</p>
<p>In July 2006, both Henry Story and Dion Hinchliffe discussed Metcalfe&#8217;s         law, with Henry specifically looking to relate it to the semantic Web <a href="#ldl_4"> [4]</a>. He noted that his initial intuition was that &#8220;the value of your         information grows exponentially with your ability to combine it with         new information.&#8221; He noted he was trying to find ways to adapt         Metcalfe&#8217;s law for applicability to the semantic Web.</p>
<p>I picked up on those observations and commented to Henry at that time         and in my own post, &#8220;<a style="font-style: italic;" title="Permanent Link to The Exponential Driver of Combining Information" rel="bookmark" href="../255/the-exponential-driver-of-combining-information/">The         Exponential Driver of Combining Information</a>.&#8221; I have been enamoured         of the idea ever since, and have begun to weave the idea into my         writings.</p>
<p>More recently, in late 2008, James Hendler and Jennifer Golbeck devoted         an entire paper to Metcalfe&#8217;s law and the semantic Web <a href="#ldl_5">[5]</a>. In it, they         note:</p>
<p style="margin-left: 40px;">&#8220;This linking between ontologies, and between instances in documents         that refer to terms in another ontology, is where much of the latent         value of the Semantic Web lies. The vocabularies, and particularly         linked vocabularies using URIs, of the Semantic Web create a graph         space with the ability to link any term to any other. As this link         space grows with the use of RDF and OWL, Metcalfe&#8217;s law will once again         be exploited – the more terms to link to, and the more links         created, the more value in creating more terms and linking them in.&#8221;</p>
<h3>A Refresher on Metcalfe&#8217;s Law</h3>
<p><a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe’s         law</a> states that the value of a telecommunications network is         proportional to the square of the number of users of the system         (<span style="font-style: italic;">n</span>²) (note: it is <span style="font-weight: bold; font-style: italic;">not</span> exponential, as         some of the points above imply). <a href="http://en.wikipedia.org/wiki/Robert_Metcalfe">Robert Metcalfe</a> formulated it about 1980 in relation to Ethernet and fax machines; the         &#8220;law&#8221; was then named for Metcalfe and popularized by <a href="http://en.wikipedia.org/wiki/George_Gilder">George Gilder</a> in 1993.</p>
<p>These attempts to estimate the value of physical networks were in         keeping with earlier efforts to estimate the value of a broadcast         network. That value is almost universally agreed to be proportional to         the number of users, as accepted as <a href="http://en.wikipedia.org/wiki/Sarnoff%27s_law">Sarnoff&#8217;s law</a> (see         further below).</p>
<p>The actual algorithm proposed by Metcalfe calculates the number of         unique connections in a network with <span style="font-style: italic;">n</span> nodes to be <em>n</em>(<em>n</em> −         1)/2, which is proportional to <em>n</em><sup>2</sup>. This makes         Metcalfe&#8217;s law a quadratic growth equation.</p>
<p>As nodes get added, then, we see the following increase in connections:</p>
<div style="margin: 5px 0pt;"><a href="../wp-content/themes/ai3/images/2009Posts/091011_telephone.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 180px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/091011_telephone.png" alt="Metcalfe Law Network Effect" hspace="5" /></a></p>
<h5 style="color: #820000;">&#8216;Network Effect&#8217; for Physical Networks</h5>
</div>
<p>This diagram, modified from <a href="http://en.wikipedia.org/wiki/File:Network_effect.png">Wikipedia</a> to         be a horizontal image, shows how two telephones can make only one         connection, five can make 10 connections, and twelve can make 66         connections, etc.</p>
<p>By definition, a physical network is a connected network. Thus, every         time a new node is added to the network, connections are added, too.         This general formula has also been embraced as a way to discuss social         connections on the Internet <a href="#ldl_6">[6]</a>.</p>
<h3>Analogies to Linked Data</h3>
<p>Like physical networks, the interconnectedness of the semantic Web or         semantic enterprise is a graph.</p>
<p>The idea behind <a href="http://structureddynamics.com/linked_data.html">linked data</a> is to         make connections between data. Unlike physical telecommunication         networks, however, the nodes in the form of datasets and data are         (largely) already there. What is missing are the connections. The         build-out and growth that produces the <a href="http://en.wikipedia.org/wiki/Network_effect">network effects</a> in a         linked data context do not result from adding more nodes, but from the         linking or connecting of <span style="font-weight: bold; font-style: italic; text-decoration: underline;">existing</span> nodes.</p>
<p>The fact that adding a node to a physical network carries with it an         associated connection has tended to conjoin these two complementary         requirements of node <span style="font-weight: bold; font-style: italic;">and</span> connection. But, to         grok the real dynamics and to gain network effects, we need to realize:         Both nodes and connections are necessary.</p>
<p>One circumstance of the enterprise is that data nodes are everywhere.         The fact that the overwhelming majority are unconnected is why we have         adopted the popular colloquialism of data &#8220;silos&#8221;. There are also         massive amounts of unconnected data on the Web in the form of dynamic         databases only accessible via search form, and isolated data tables and         listings virtually everywhere.</p>
<p>Thus, the essence of the <span style="font-style: italic;">semantic         enterprise</span> and the <span style="font-style: italic;">semantic         Web</span> is no more complicated than connecting — <span class="double_u">meaningfully</span> — data nodes that already exist.</p>
<p>As the following diagram shows, unconnected data nodes or silos look         like random particles caught in the chaos of <a href="http://en.wikipedia.org/wiki/Brownian_motion">Brownian motion</a>:</p>
<div style="margin: 5px 0pt;"><a href="../wp-content/themes/ai3/images/2009Posts/091011_network.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 196px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/091011_network.png" alt="Linked Data Law Network Effect" hspace="5" /></a></p>
<h5 style="color: #820000;">&#8216;Network Effect&#8217; for Coherent Linked Data</h5>
</div>
<p>As initial connections get made, bits of structure begin to emerge.         But, as connections are proliferated — <span style="font-weight: bold; font-style: italic;">exactly</span> equivalant to         the network effects of connected networks — coherence and value         emerge.</p>
<p>Look at the last part in the series diagram above. We not only see that         the same nodes are now all connected, with the inferences and         relationships that result from those connections, but we can also see         entirely new structures emerge by virtue of those connections. All of         this structure and meaning was totally absent prior to making the         linked data connections.</p>
<h3>Quantifying the Network Effect</h3>
<p>So, what is the benefit of this linked data? It depends on the product         of the <span style="font-style: italic;">value</span> of the         connections and the <span style="font-style: italic;">multiplier</span> of the network effect:</p>
<div style="margin: 10px 0pt; text-align: center;">linked data benefit <span style="font-weight: bold; font-family: Arial Black;">=</span> connections         <span style="font-style: italic;">value</span> <span style="font-weight: bold;">X</span> network effect <span style="font-style: italic;">multiplier</span></div>
<p>Just as it is hard to have a conversation via phone with yourself, or         to collaborate with yourself, the ability to gain perspective and         context from data comes from connections. But like some phone calls or         some collaborations, the <span style="font-style: italic;">value</span> depends on the participants. In the case of linked data, that depends         on the quality of the data and its <span style="font-style: italic; font-weight: bold;">coherence</span> <a href="#ldl_7">[7]</a>. The         value &#8220;constant&#8221; for connected linked data depends in some manner on         these factors, as well as the purposes and circumstances to which that         linked data might be applied.</p>
<p>Even in physical networks or social collaboration contexts, the &#8220;value&#8221;         of the network has been hard to quantify. And, while academics and         researchers will appropriately and naturally call for more research on         these questions, we do not need to be so timid. Whatever the         <span style="font-style: italic;">alpha</span> constant is for         quantifying the value of a linked data network, our intuition should be         clear that making connections, finding relationships, making         inferences, and making discoveries can not occur when data is in         isolation.</p>
<p>Because I am an advocate, I believe this <span style="font-style: italic;">alpha</span> constant of value to be quite large.         I believe this constant is also higher for circumstances of business         intelligence, knowledge management and discovery.</p>
<p>The second part of the benefit equation is the <span style="font-style: italic;">multiplier</span> for network effects. We&#8217;ve         mentioned before the linear growth advantage due to broadcast networks         (Sarnoff law) and the standard quadratic growth assumption of physical         and social networks (Metcalfe law). Naturally, there have been other         estimates and advocacies.</p>
<p>David Reed <a href="#ldl_8">[8]</a>, for example, also adds group effects and has asserted         an exponential multiplier to the network effect (like Henry Story&#8217;s         initial intuition noted above). As he states,</p>
<p style="margin-left: 40px;">&#8220;[E]ven Metcalfe&#8217;s Law understates the value created by a group-forming         network [GFN] as it grows. Let&#8217;s say you have a GFN with <span style="font-style: italic;">n</span> members. If you add up all the potential         two-person groups, three-person groups, and so on that those members         could form, the number of possible groups equals <span>2<sup><em>n</em></sup></span>. So the value of a GFN increases         exponentially, in proportion to <span>2<sup><em>n</em></sup></span>. I call that Reed&#8217;s Law. And its         implications are profound.&#8221;</p>
<p>Yet not all agree with the assertion of an exponential multiplier, let         alone the quadratic one of Metcalfe. Odlyzko and Tilly <a href="#ldl_9">[9]</a> note that         Metcalfe&#8217;s law would hold if the value that an individual gets         personally from a network is directly proportional to the number of         people in that network. But, then they argue that does not hold because         of local preferences or different qualities of interaction. In a linked         data context, such arguments have merit, though you may also want to         see Metcalfe&#8217;s own counter-arguments <a href="#ldl_6">[6]</a>.</p>
<p>Hinchliffe&#8217;s earlier commentary <a href="#ldl_4">[4]</a> provided a nice graphic that shows         the implications of these various multiplers on the network effect, as         a function of nodes in a network:</p>
<div style="margin: 5px 0pt; text-align: center;"><img class="center_ok" style="border: 0px solid; width: 528px; height: 329px;" src="../wp-content/themes/ai3/images/2009Posts/091011_network_effects.jpg" alt="Potency of the Network Effect from Dion Hinchliffe" hspace="5" width="528" height="329" /></p>
<h5 style="color: #820000;">Various Estimates for the &#8216;Network Effect&#8217;</h5>
</div>
<p>I believe we can dismiss the lower linear bound of this question and         likely the higher exponential one as well (that is, Reed&#8217;s law, because         quality and relevance questions make some linked data connections less         valuable than others). Per the above, that would suggest that the         <span style="font-style: italic;">multiplier</span> of the linked data         network is perhaps closer to the Metcalfe estimate or similar.</p>
<p>In any event, it is also essential to point out that connecting data         indiscriminantly for linked data&#8217;s sake will likely deliver few, if         any, benefits. Connections must still be coherent and logical for the         value benefits to be realized.</p>
<h3>The Role and Contribution of Linked Data</h3>
<p>I <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> elsewhere</a> discuss the role of linked data in the enterprise and         will continue to do so. But, there are some implications in the above         that warrant some further observations.</p>
<p>It should be clear that the graph and network basis of linked data, not         to mention some of the uncertainties as to quantifying benefits,         suggests the practice should be considered apart from mission-critical         or transactional uses in the enterprise. That may change with time and         experience.</p>
<p>There are also open questions about data quality in terms of inputs to         linked data and possible erroneous semantics and ontologies to guide         the linked connections. Operational uses should be kept off the table         for now. Like physical networks, not all links perform well and not all         have usefulness. Similarly to how poor connections may be encountered         in physical networks, they should be either taken off-ledger or         relegated to a back-up basis. Linked data should be understood and         treated no differently than networks of variable quality.</p>
<p>Such realism is important — for both internal and external linked         data advocates — to allow linked data to be applied in the right         venues at acceptable risk and with likely demonstrable benefits.         <a href="../553/confronting-misconceptions-with-adaptive-ontologies/"> Elsewhere</a> I have advocated an approach that builds on existing         assets; here I advocate a clear and smart understanding of where linked         data can best deliver network effects in the near term.</p>
<p>And, so, in the nearest term, enterprise applications that best fit         linked data promises and uncertainties include:</p>
<ul>
<li>Establishing frameworks for data federation</li>
<li>Business intelligence</li>
<li>Discovery</li>
<li>Knowledge management and knowledge resources</li>
<li>Reasoning and inference</li>
<li>Development of internal common language</li>
<li>Learning and adopting data-driven apps <a href="#ldl_10">[10]</a>, and</li>
<li>Staging and analysis for data cleaning.</li>
</ul>
<h3>A New Deputy Has Come to Town</h3>
<p>As in the Wild West, the new deputy marshal and his tin badge did not         guarantee prosperity. But a good marshal would deliver law and order.         And those are the preconditions for the town folk to take charge of         building their own prosperity.</p>
<p>Linked data is a practice for starting to bring order and connections         to your existing data. Once some order has been imposed, the framework         then becomes a basis for defining meanings and then gaining value from         those connections.</p>
<p>Once order has been gained, it is up to the good citizens of Data Gulch         to then deliver the prosperity. Broad participation and the network         effect are one way to promote that aim. But success and prosperity         still depends on intelligence and good policies and practice.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_1" name="ldl_1"></a> [1] I first put forward this linked         data aspect in <a style="font-style: italic;" href="../?p=447">What is Linked Data?</a>, dated June         23, 2008. I then formalized it in <a style="font-style: italic;" title="Permanent Link to Structure the World" rel="bookmark" href="../533/structure-the-world/">Structure the         World</a>, dated August 3, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_2" name="ldl_2"></a> [2] Paul Tearnen, 2006. &#8220;Integration in         the Network Economy,&#8221; <span style="font-style: italic;">Information         Management Special Reports</span>, October 2006. See <a href="http://www.information-management.com/specialreports/20061010/1064941-1.html"> http://www.information-management.com/specialreports/20061010/1064941-1.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_3" name="ldl_3"></a> [3] Salim K. Semy, Mark Linderman and         Mary K. Pulvermacher, 2003. &#8220;Information Management Meets the Semantic         Web,&#8221; <span style="font-style: italic;">DOD Report</span> by MITRE         Corporation, November 2003, 10 pp. See <a href="http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460265&amp;Location=U2&amp;doc=GetTRDoc.pdf"> http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460265&amp;Location=U2&amp;doc=GetTRDoc.pdf.</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_4" name="ldl_4"></a> [4] On July 15, 2006, Dion Hinchcliffe         wrote, <a style="font-style: italic;" href="http://web2.socialcomputingjournal.com/web_20s_real_secret_sauce_network_effects.htm"> Web 2.0&#8217;s Real Secret Sauce: Network Effects</a>. He produced a couple         of useful graphics and expanded upon some earlier comments to the         <span style="font-style: italic;">Wall Street Journal</span>. Shortly         thereafter, on July 29, Story wrote his own post, <a href="http://blogs.sun.com/bblfish/entry/rdf_and_metcalf_s_law">RDF and         Metcalfe&#8217;s law</a>, as noted. I commented on July 30.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_5" name="ldl_5"></a> [5] James Hendler and Jennifer Golbeck,         2008. &#8220;Metcalfe&#8217;s Law, Web 2.0, and the Semantic Web,&#8221; in <span style="font-style: italic;">Journal of Web Semantics</span> 6(1):14-20, 2008.         See <a href="http://www.cs.umd.edu/%7Egolbeck/downloads/Web20-SW-JWS-webVersion.pdf"> http://www.cs.umd.edu/~golbeck/downloads/Web20-SW-JWS-webVersion.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_6" name="ldl_6"></a> [6] Robert Metcalfe, 2006. <span style="font-style: italic;">Metcalfe’s Law Recurses Down the Long Tail         of Social Networking</span>, see <a href="http://vcmike.wordpress.com/2006/08/18/metcalfe-social-networks/">http://vcmike.wordpress.com/2006/08/18/metcalfe-social-networks/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_7" name="ldl_7"></a> [7] See my <a title="Permanent Link to When is Content &lt;em&gt;&lt;u&gt;Coherent&lt;/u&gt;&lt;/em&gt;?" rel="bookmark" href="../450/when-is-content-coherent/"> When is Content <em>Coherent</em>?</a> posting of July 25, 2008.         &#8216;Coherence&#8217; is a frequent theme of my blog posts; see my <a href="../chronological-listing/">chronological         listing</a> for additional candidates.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_8" name="ldl_8"></a> [8] From David P. Reed, 2001. &#8220;The Law         of the Pack,&#8221; Harvard Business Review, February 2001, pp 23-4. For more         on Reed&#8217;s position, see Wikipedia&#8217;s entry on <a href="http://en.wikipedia.org/wiki/Reed%27s_law">Reed&#8217;s law</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_9" name="ldl_9"></a> [9] Andrew Odlyzko and Benjamin Tilly,         2005. <span style="font-style: italic;">A Refutation of Metcalfe&#8217;s Law         and a Better Estimate for the Value of Networks and Network         Interconnections</span>, personal publication; see <a href="http://www.dtc.umn.edu/%7Eodlyzko/doc/metcalfe.pdf">http://www.dtc.umn.edu/~odlyzko/doc/metcalfe.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_10" name="ldl_10"></a> [10] <span style="font-weight: bold; font-style: italic;">Data-driven         applications</span> are the term we have adopted for modular, generic         tools that operate and present results to users based on the underlying         data structures that feed them. See further the discussion of         Structured Dynamics&#8217;s <a href="http://structureddynamics.com/products.html">products</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/837/the-law-of-linked-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
