<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; Linked Data</title>
	<atom:link href="http://www.mkbergman.com/category/linked-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Mon, 26 Jul 2010 05:31:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Bipolar Disorder of Linked Data</title>
		<link>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/</link>
		<comments>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 23:12:38 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[ABox]]></category>
		<category><![CDATA[structured data]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=880</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Bipolar Disorder of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-28&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/&amp;rft.language=English"></span>

An Acceptance of Its Natural Role is the Prozac Substitute
There has been a bit of a manic-depressive character on the Web waves         of late with respect to linked data. On the one         hand, we have seen huzzahs and celebrations [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Bipolar Disorder of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-28&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/&amp;rft.language=English"></span>
<p><a href="http://commons.wikimedia.org/wiki/File:VanGogh-starry_night_ballance1.jpg"><img style="border: 0px solid; width: 250px; height: 198px; float: left; margin-right: 10px;" title="The Starry Night, from Vincent Van Gogh" src="../wp-content/themes/ai3/images/2010Posts/250-VanGogh-starry_night_ballance1.jpg" alt="The Starry Night, from Vincent Van Gogh" hspace="5" vspace="5" align="left" /></a></p>
<h2>An Acceptance of Its Natural Role is the Prozac Substitute</h2>
<p>There has been a bit of a manic-depressive character on the Web waves         of late with respect to <a href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a>. On the one         hand, we have seen huzzahs and celebrations from the likes of <a href="http://www.readwriteweb.com/archives/the_state_of_linked_data_in_2010.php"> ReadWriteWeb</a> and <a href="http://www.semanticweb.com/">Semantic         Web.com</a> and, just concluded, the Linked Data on the Web (<a href="http://events.linkeddata.org/ldow2010/">LDOW</a>) workshop at <a href="http://www2010.org/www/">WWW2010</a>. This treatment has tended to         tout the coming of the linked data era and to seek ideas about         possible, cool <a href="http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php"> linked data apps</a> <a href="#BPD1">[1]</a>. This rise in visibility has been accomplished         by much manic and excited discussion on <a href="http://lists.w3.org/Archives/Public/public-lod/">various</a> <a href="http://lists.w3.org/Archives/Public/semantic-web/">mailing</a> <a href="http://sourceforge.net/mailarchive/forum.php?forum_name=dbpedia-discussion"> lists</a>.</p>
<p>On the other hand, we have seen much wringing of hands and gnashing of         teeth for why linked data is not being used more and why the broader         issue of the semantic Web is not seeing more uptake. This depressive         &#8220;<a href="http://lists.w3.org/Archives/Public/semantic-web/2010Mar/0160.html">call         to arms</a>&#8221; has sometimes felt like ravings with blame being given to         the poor state of apps and user interfaces to badly linked data to the         difficulty of publishing same. Actually using linked data for anything         productive (other than single sources like <a href="http://dbpedia.org/About">DBpedia</a>) still appears to be an issue.</p>
<p>Meanwhile, among others, <a href="http://www.openlinksw.com/blog/%7Ekidehen/">Kingsley Idehen</a>,         ubiquitous voice on the Twitter <a href="http://twitter.com/search?q=%23linkeddata">#linkeddata</a> channel,         has been promoting the separation of identity of linked data from the         notion of the semantic Web. He is also trying to <a href="http://www.openlinksw.com/blog/%7Ekidehen/?1624&amp;title=Data%203.0%20%28a%20Manifesto%20for%20Platform%20Agnostic%20Structured%20Data%29%20Update%203"> change the narrative</a> away from the association of linked data with         RDF, instead advocating &#8220;Data 3.0&#8243; and the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity-attribute-value</a> (EAV) model understanding of structured data.</p>
<p>As someone less engaged in these topics since my own statements about         linked data over the past couple of years <a href="#BPD2">[2]</a>, I have my own         distanced-yet-still-biased view of what all of this crisis of         confidence is about. I think I have a diagnosis for what may be causing         this <a href="http://en.wikipedia.org/wiki/Bipolar_disorder">bipolar         disorder</a> of linked data <a href="#BPD3">[3]</a>.</p>
<h3>The Semantic Web Boogie Man</h3>
<p>A fairly universal response from enterprise prospects when raising the         topic of the semantic Web is, &#8220;That was a big deal of about a decade         ago, wasn&#8217;t it? It didn&#8217;t seem to go anywhere.&#8221; And, actually, I think         both proponents and keen observers agree with this general sentiment.         We have seen the original advocate, Tim Berners-Lee, float the <a href="http://en.wikipedia.org/wiki/Giant_Global_Graph">Giant Global         Graph</a> balloon, and now <a href="http://blog.ted.com/2010/03/the_year_open_d.php">Linked Data</a>.         Others have touted <a href="../462/how-shall-we-call-web-30-instead-mike-please-indulge-us/"> Web 3.0</a> or <a href="http://webofdata.wordpress.com/">Web of         Data</a> or, frankly, <a href="http://bnode.org/blog/2008/03/04/semantic-web-aliases">dozens of         alternatives</a>. Linked data, which began as a set of techniques for         publishing RDF, has emerged as a potential marketing hook and saviour         for the tainted original semantic Web term.</p>
<p>And therein, I think, lies the rub and the answer to the bipolar         disorder.</p>
<p>If one looks at the <a href="http://www.w3.org/DesignIssues/LinkedData.html">original         principles</a> for putting linked data on the Web or <a href="../846/when-linked-data-rules-fail/">subsequent         interpretations</a>, it is clear that linked data (lower case) is merely a set of techniques. Useful techniques, for sure; but really a simple approach to exposing data using the Web with URLs as the naming convention for objects and their relationships. These techniques provide (1) methods to access data on the Web and (2) specifying the relationships to link the data (resources). The first part is mechanistic and not really of further concern here. And, while any predicate can be used to specify a data (resource) relationship, that relationship should also be discoverable with a URL (dereferencable) to qualify as linked data. Then, to actually be semantically useful, that relationship (predicate) should also have a precise definition and be part of a coherent schema. (Note, this last sentence is actually not part of the &#8220;standard&#8221; principles for linked data, which itself is a <a href="../846/when-linked-data-rules-fail/">problem</a>.)</p>
<p>When used right, these techniques can be powerful and useful. But, poor         choices or execution in how relationships are specified often leads to         saying little or nothing about semantics. Most linked data uses a         woefully small vocabulary of data relationships, with even a smaller         set ever used for setting linkages <span style="font-weight: bold; font-style: italic;">across</span> existing linked         data sets <a href="#BPD4">[4]</a>. Linked data techniques are a part of the foundation to         overall best practices, but not the total foundation. As I have argued         for some time, linked data alone does not speak to issues of <a href="../431/umbel-making-linked-data-classy/">context</a> nor <a href="../450/when-is-content-coherent/">coherence</a>.</p>
<p>To speak semantically, linked data is not a synonym for the semantic         Web nor is it the <a style="font-family: monospace;" href="http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf">sameAs</a> the semantic Web. But, many proponents have tried to characterize it as         such. The general tenor is to blow the horns hard anytime some large         data set is &#8220;exposed&#8221; as linked data. (No matter whether the data is         incoherent, lacks a schema, or is even <a href="../846/when-linked-data-rules-fail/">poorly         described and defined</a>.) Heralding such events, followed by no         apparent usefulness to the data, causes confusion to reign supreme and         disappointment to naturally occur.</p>
<p>The semantic Web (or semantic enterprise or semantic government or         similar expressions) is a vision and an ideal. It is also a fairly         complete one that potentially embraces machines and agents working in         the background to serve us and make us more productive. There is an         entire stack of languages and techniques and methods that enable schema         to be described and non-conforming data to be interoperated. Now, of         course this ideal is still a work in progress. Does that make it a         failure?</p>
<p>Well, maybe so, if one sees the semantic Web as marketing or branding.         But, who said we had to present it or understand it as such?</p>
<p>The issue is not one of marketing and branding, but the lack of         benefits. Now, maybe I have it all wrong, but it seems to me that the         argument needs to start with what &#8220;linked data&#8221; and the &#8220;semantic Web&#8221;         can do for me. What I actually call it is secondary. Rejecting the         branding of the semantic Web for linked data or Web 3.0 or any other         somesuch is still dressing the emperor in new clothes.</p>
<h3>A Nicely Progressing Continuum, Thank You!</h3>
<p>For a couple of years now I have tried in various posts to present         linked data in a broader framework of structured and semantic Web data.         I first tried to capture this continuum in a diagram from <a href="../?p=391">July 2007</a>:</p>
<div style="margin: 18px 0px;">
<table class="center_ok" style="text-align: left; width: 622px;" border="0">
<tbody>
<tr>
<td colspan="4"><img style="width: 599px; height: 205px; margin-left: 15px; vertical-align: top;" src="../wp-content/themes/ai3/images/2007Posts/070720_web_transition.jpg" alt="Transition in Web Structure" /></td>
</tr>
<tr>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;">Document Web</td>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;" colspan="2">Structured Web</td>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;">Semantic Web</td>
</tr>
<tr>
<td style="width: 150px;"></td>
<td style="width: 150px;"></td>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;">Linked Data</td>
<td style="width: 150px;"></td>
</tr>
<tr>
<td>
<ul>
<li> <small>Document-centric</small></li>
<li> <small>Document resources</small></li>
<li> <small>Unstructured data and semi-structured data</small></li>
<li> <small>HTML<br />
</small></li>
<li> <small>URL-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 1993</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Structured data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>XML, JSON, RDF, etc<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2003</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2006<br />
</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S, OWL<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> ???<br />
</small></li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>Now, three years later, I think the transitional phase of linked data         is reaching an end. OK, we have figured out one useful way to publish         large datasets staged for possible interoperability. Sure, we have         billions of triples and assertions floating out there. But what are we         to do with them? And, is any of it any good?</p>
<h3>The Reality of a Heterogeneous World</h3>
<p>I think Kingsley is right in one sense to point to EAV and structured         data. We, too, have not met a structured data format we did not         like. There are hundreds of attribute-value pair models of even         more generic nature that also belong to the conversation.</p>
<p>One of my most popular posts on this blog has been, <a style="font-style: italic;" href="../471/structs-naive-data-formats-and-the-abox/"> ‘Structs’: Naïve Data Formats and the ABox</a>, from         January 2009. Today, we have a multitude of popular structured data         formats from XML to JSON and even spreadsheets (CSV). Each form has its         advocates, place and reasons for existence and popularity (or not).         This inherent diversity is a fact and fixture of any discussion of         data. It is a major reason why we developed the <a href="http://openstructs.org/iron">irON</a> (<span style="font-style: italic;">instance record</span> and <span style="font-style: italic;">object notation</span>) non-RDF vocabulary to         provide a bridge from such forms to RDF, which is accessible on the Web         via URIs. irON clearly shows that entities can be usefully described         and consumed in either RDF or non-RDF serialized forms.</p>
<p>Though RDF and linked data is a great form for expressing this         structured information, other forms can convey the same meaning as         well. Of the billions of linked data triples exposed to date, surely         more than 99% are of this instance-level, &#8220;ABox&#8221; type of data <a href="#BPD5">[5]</a>. And,         more telling, of all of the structured data that is publicly obtainable         on the Web, my wild guess is that less than 0.0000000001% of that is         even linked RDF data <a href="#BPD6">[6]</a>.</p>
<p>Neither linked data nor RDF alone will &#8212; today or in the near future         &#8212; play a pivotal or essential role for instance data. The real         contribution from RDF and the semantic Web will come from connecting         things together, from interoperation and federation and conjoining.         This is the provenance of the TBox and is a role barely touched by         linked data. Publishing data as linked data helps tremendously in         simplifying ingest and guiding the eventual connections, but the making         of those connections, testing for their quality and reliability, are         steps beyond the linked data ken or purpose.</p>
<h3>Promoting Linked Data to its Level of Incompetence</h3>
<p>It seems, then, that we see two different forces and perspectives at         work, each contributing in its own way to today&#8217;s bipolar nature of         linked data.</p>
<p>On the manic side, we see the celebration for the release of each         large, linked data set. This perspective seems to care most about         volumes and numbers, with less interest in how and whether the data is         of quality or useful. This perspective seems to believe &#8220;post the data,         and the public will come.&#8221; This same perspective is also quite         parochial with respect to the unsuitability of non-linked data, be it         microdata, microformats or any of the older junk.</p>
<p>On the depressed side, linked data has been seen as a more palatable         packaging for the disappointments and perceived failures or slow         adoption of the earlier semantic Web phrasing. When this perspective         sees the lack of structure, defensible connections and other quality         problems with linked data as it presently exists, despair and         frustration ensue.</p>
<p>But both of these perspectives very much miss the mark. Linked data         will never become the universal technique for publishing structured         data, and should not be expected to be such. Numbers are never a         substitute for quality. And linked data lacks the standards, scope and         investment made in the semantic Web to date. Be patient; don&#8217;t despair;         structured data and the growth of semantics and useful metadata is         proceeding just fine.</p>
<p>Unrealistic expectations or wrong roles and metrics simply confuse the         public. We are fortunate that most potential buyers do not frequent the         community&#8217;s various mailing lists. Reduced expectations and an         understanding of linked data&#8217;s natural role is perhaps the best way to         bring back balance.</p>
<h3>Linked Data&#8217;s Natural Role</h3>
<p>We have consciously moved our communications focus from speaking         internally to the community to reaching out to the broader enterprise         public. There is much of education, clarification and dialog that is         now needed with the buying public. The time has moved past software         demos and toys to workable, pragmatic platforms, and the methodologies         and documentation necessary to support them. This particular missive         speaking to the founding community is (perhaps many will Hurray!)         likely to become even more rare as we continue to focus outward.</p>
<p>As Structured Dynamics has stated many times, we are committed to         linked data, presenting our information as such, and providing better         tools for producing and consuming it. We have made it one of the         <a href="../859/seven-pillars-of-the-open-semantic-enterprise/"> seven foundations</a> to our <a href="http://structureddynamics.com/products.html">technology stack</a> and         <a href="http://mike2.openmethodology.org/wiki/Open_SEAS_Framework">methodology</a>.</p>
<p>But, linked data on its own is inadequate as an interoperability         standard. Many practitioners don&#8217;t publish it right, characterize it         right, or link to it right. That does not negate its benefits, but it         does make it a poor candidate to install on the semantic Web throne.</p>
<p>Linked data based on RDF is perhaps the first citizen amongst all         structured data citizens. It is an expressive and readily consumed         means for publishing and relating structured instance data and one that         can be easily interoperated. It is a natural citizen of the Web.</p>
<p>If we can accept and communicate linked data for these strengths, for         what it naturally is &#8212; a useful set of techniques and best practices         for enabling data that can be easily consumed &#8212; we can rest easy at         night and not go crazy. Otherwise, bring on the Prozac.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD1"></a> [1] Actually, in my opinion, the suggested listing of apps from these         discussions is distinctly unimpressive and not compelling. As argued in         the main body of the post, I think this is because linked data is         really just a technique or best practice, and not a basis alone for         enabling compelling apps. As initial developers of such apps as the         <a href="http://umbel.structureddynamics.com/explorer.php?concept=http%3A%2F%2Fumbel.org%2Fumbel%2Fsc%2FMolecule"> UMBEL concept explorer</a> or <a href="http://dataviewer.zitgist.com/?uri=http%3A//fgiasson.com">Dataviewer</a>,         <a href="http://structureddynamics.com/">Structured Dynamics</a> understands the use of linked data and has a defensible basis to         comment on applications. Our own applications intimately integrate         linked data, but only as one of <a href="../859/seven-pillars-of-the-open-semantic-enterprise/"> seven foundations</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD2"></a> [2] Here are some of my relevant posts over the past year discussing         the role of linked data: <a style="font-style: italic;" href="../802/moving-beyond-linked-data/">Moving Beyond         Linked Data</a> (Sept. 20, 2009); <a style="font-style: italic;" href="../825/fresh-perspectives-on-the-semantic-enterprise/"> Fresh Perspectives on the Semantic Enterprise</a> (Sept. 28, 2009);         <a style="font-style: italic;" href="../837/the-law-of-linked-data/">The Law of         Linked Data</a> (Oct. 11, 2009); <a style="font-style: italic;" href="../846/when-linked-data-rules-fail/">When Linked         Data Rules Fail</a> (Nov. 16, 2009).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD3"></a> [3] The current bipolar discussion reminds me of the &#8220;<a href="http://en.wikipedia.org/wiki/Six_phases_of_a_big_project">Six Phases         of a Project</a>,&#8221; a copy of which has been a permanent fixture on my         office wall:</p>
<ol>
<li>Enthusiasm</li>
<li>Disillusionment</li>
<li>Panic</li>
<li>Search for the guilty</li>
<li>Punishment of the innocent</li>
<li>Honors &amp; praise for the non-participants.</li>
</ol>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD4"></a> [4] See, for example: Harry Halpin, 2009. &#8220;A Query-Driven         Characterization of Linked Data,&#8221; paper presented at the Linked Data on         the Web (LDOW) 2009 Workshop, April 20, 2009, Madrid, Spain, see         <a href="http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf">http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf</a>;         Prateek Jain, Pascal Hitzler, Peter Z. Yehy, Kunal Vermay and Amit P.         Shet, 2010. &#8220;Linked Data is Merely More Data,&#8221; in Dan Brickley, Vinay         K. Chaudhri, Harry Halpin, and Deborah McGuinness, <span style="font-style: italic;">Linked Data Meets Artificial Intelligence,         Technical Report SS-10-07</span>, AAAI Press, Menlo Park, California,         2010, pp. 82-86., see <a href="http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf"> http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf</a>;         among others.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD5"></a> [5] Structured Dynamics&#8217; best practices approach makes explicit splits         between the “<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>” (for instance data)         and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>”         (for ontology schema) in accordance with our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>, a fundamental underpinning for how we use RDF:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD6"></a> [6] This topic is deserving of some analysis in its own right, and my         guess is really just that. For example, RSS feeds to mobile devices         alone perhaps account for 2,000 petabytes today; see <a href="http://www.tgdaily.com/hardware-features/49167-8000-petabytes-of-mobile-data-traffic-expected-by-2014"> http://www.tgdaily.com/hardware-features/49167-8000-petabytes-of-mobile-data-traffic-expected-by-2014</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Seven Pillars of the Open Semantic Enterprise</title>
		<link>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 20:26:54 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[adaptive ontologies]]></category>
		<category><![CDATA[ontology-driven apps]]></category>
		<category><![CDATA[open semantic enterprise]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>
		<category><![CDATA[web oriented architecture]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=859</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Seven Pillars of the <i>Open Semantic Enterprise</i>&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/&amp;rft.language=English"></span>

Guideposts for How to Make the Transition
The beginning of a new year and a new decade is a perfect opportunity         to take stock of how the world is changing and how we can change with         it. Over the past [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Seven Pillars of the <i>Open Semantic Enterprise</i>&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Description Logics&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 250px; height: 211px; float: left; margin-right: 10px;" title="Seven Pillars of the Open Semantic Enterprise" src="../wp-content/themes/ai3/images/2010Posts/100110_7pillars.png" alt="Seven Pillars of the Open Semantic Enterprise" align="left" /></p>
<h2>Guideposts for How to Make the Transition</h2>
<p>The beginning of a new year and a new decade is a perfect opportunity         to take stock of how the world is changing and how we can change with         it. Over the past year I have been writing on many foundational topics         relevant to the use of semantic technologies in enterprises.</p>
<p>In this post I bring those threads together to present a unified view         of these foundations &#8212; some seven pillars &#8212; to the <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span>.</p>
<p>By <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span> we mean an organization that uses the languages and         standards of the <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a>, including         <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a>, <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a> to integrate existing information assets,         using the best practices of <a href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a> and the <a href="http://en.wikipedia.org/wiki/Open_world_assumption">open         world assumption</a>, and targeting knowledge management applications. It         does so using some or all of the seven foundational pieces (&#8221;pillars&#8221;)         noted herein.</p>
<p>The foundational approaches to the open semantic enterprise do not necessarily mean open data nor open source (though they are suitable for these purposes with many open source tools available <a href="#ose3">[3]</a>). The techniques can equivalently be applied to internal, closed, proprietary data and structures. The techniques can themselves be used as a basis for bringing external information into the enterprise. &#8216;Open&#8217; is in reference to the critical use of the open world assumption.</p>
<p>These practices do not require replacing current systems and assets;         they can be applied equally to public or proprietary information; and         they can be tested and deployed incrementally at low risk and cost. The         very foundations of the practice encourage a learn-as-you-go approach         and active and agile adaptation. While embracing the open semantic         enterprise can lead to quite disruptive benefits and changes, it can be         accomplished as such with minimal disruption in itself. This is its         most compelling aspect.</p>
<p>Like any change in practice or learning, embracing the open semantic         enterprise is fundamentally a people process. This is the pivotal piece         to the puzzle, but also the one that does not lend itself to ready         formula about pillars or best practices. Leadership and vision is         necessary to begin the process. People are the fuel for impelling it.         So, we&#8217;ll take this fuel as a given below, and concentrate instead on         the mechanics and techniques by which this vision can be achieved. In         this sense, then, there are really <span style="font-style: italic; text-decoration: underline;">eight</span> pillars         to the open semantic enterprise, with people residing at the apex.</p>
<p>This article is synthetic, with links to (largely) my preparatory blog         postings and topics that preceded it. Assuming you are interested in         becoming one of those leaders who wants to bring the benefits of an         open semantic enterprise to your organization, I encourage you to         follow the reference links for more background and detail.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar0.png" alt="Benefits" /> A Review of the Benefits</h3>
<p>OK, so what&#8217;s the big deal about an open semantic enterprise and why         should my organization care?</p>
<p>We should first be clear that the natural scope of the open semantic         enterprise is in knowledge management and representation <a href="#ose1">[1]</a>. Suitable         applications include data federation, data warehousing, search,         enterprise information integration, business intelligence, competitive         intelligence, knowledge representation, and so forth <a href="#ose2">[2]</a>. In the         knowledge domain, the benefits for embracing the open semantic         enterprise can be summarized as <span class="double_u">greater insight</span> with <span class="double_u">lower         risk</span>, <span class="double_u">lower cost</span>, <span class="double_u">faster deployment</span>, and more <span class="double_u">agile responsiveness</span>.</p>
<p>The intersection of knowledge domain, semantic technologies and the         approaches herein means it is possible to start small in testing the         transition to a semantic enterprise. These efforts can be done         incrementally and with a focus on early, high-value applications and         domains.</p>
<p>There is absolutely no need to abandon past practices. There         is much that can be done to leverage existing assets. Indeed, those         prior investments are often the requisite starting basis to inform         semantic initiatives.</p>
<p>Embracing the pillars of the open semantic enterprise brings these knowledge management benefits:</p>
<ul>
<li>Domains can be analyzed and inspected incrementally</li>
<li>Schema can be incomplete and developed and refined incrementally</li>
<li>The data and the structures within these frameworks can be used and         expressed in a piecemeal or incomplete manner</li>
<li>Data with partial characterizations can be combined with other data         having complete characterizations</li>
<li>Systems built with these frameworks are flexible and robust; as new         information or structure is gained, it can be incorporated without         negating the information already resident, and</li>
<li>Both open and closed world subsystems can be bridged.</li>
</ul>
<p>Moreover, by building on successful Web architectures, we can also put         in place loosely coupled, distributed systems that can grow and         interoperate in a decentralized manner. These also happen to be perfect         architectures for flexible collaboration systems and networks.</p>
<p>These benefits arise both from individual pillars in the open semantic         enterprise foundation, as well as in the interactions between them.         Let&#8217;s now re-introduce these seven pillars.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar1.png" alt="Pillar #1" />Pillar         #1: The RDF Data Model</h3>
<p>As I stated on the occasion of the 10th birthday of the <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">Resource         Description Framework</a> data model, I belief RDF is the single most         important foundation to the open semantic enterprise <a href="#ose4">[4]</a>. RDF can be         applied equally to all structured, semi-structured and unstructured         content. By defining new types and predicates, it is possible to create         more expressive vocabularies within RDF. This expressiveness enables         RDF to define controlled vocabularies with exact semantics. These         features make RDF a powerful data model and language for data         federation and interoperability across disparate datasets.</p>
<p>Via various processors or extractors, RDF can capture and convey the         metadata or information in unstructured (say, text), semi-structured         (say, HTML documents) or structured sources (say, standard databases).         This makes RDF almost a “universal solvent” for         representing data structure.</p>
<p>Because of this universality, there are now more than 150 off-the-shelf         ‘RDFizers’ for converting various non-RDF notations (data         formats and serializations) to RDF <a href="#ose5">[5]</a>. Because of its diversity of         serializations and simple data model, it is also easy to create new         converters. Once in a common RDF representation, it is easy to         incorporate new datasets or new attributes. It is also easy to         aggregate disparate data sources as if they came from a single source.         This enables meaningful compositions of data from different applications         regardless of format or serialization.</p>
<p>What this practically means is that the integration layer can be based         on RDF, but that all source data and schema can still reside in their         native forms <a href="#ose6">[6]</a>. If it is easier or more convenient to author,         transfer or represent data in non-RDF forms, great <a href="#ose7">[7]</a>. RDF is only         necessary at the point of federation, and not all knowledge workers         need be versed in the framework.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar2.png" alt="Pillar #2" /> Pillar #2: Linked Data Techniques</h3>
<p>Linked data is a set of best practices for publishing and deploying         instance and class data using the RDF data model. Two of the best         practices are to name the data objects using uniform resource         identifiers (URIs), and to expose the data for access via the HTTP         protocol. Both of these practices enable the Web to become a         distributed database, which also means that Web architectures can also         be readily employed (see Pillar #5 below).</p>
<p>Linked data is applicable to public or enterprise data, open or         proprietary. It is really straightforward to employ. Structured         Dynamics has published a <a href="http://structureddynamics.com/linked_data.html">useful FAQ</a> on         linked data.</p>
<p>Additional linked data best practices relate to how to characterize and         classify data, especially in the use of predicates with the proper         semantics for establishing the degree of relatedness for linked data         items from disparate sources.</p>
<p>Linked data has been a frequent topic of this blog, including how         adding linkages creates value for existing data, with a four-part         series about a year ago on linked data best practices <a href="#ose8">[8]</a>. As advocated         by Structured Dynamics, our linked data best practices are geared to         data interconnections, interrelationships and context that is equally         useful to both humans and machine agents.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar3.png" alt="Pillar #3" /> Pillar #3: Adaptive Ontologies</h3>
<p>Ontologies are the guiding structures for how information is         interrelated and made coherent using RDF and its related schema and         ontology vocabularies, <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a> and <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> <a href="#ose10">[10]</a>.         Thousands of off-the-shelf ontologies exist &#8212; a minority of which are         suitable for re-use &#8212; and new ones appropriate to any domain or scope         at hand can be readily constructed.</p>
<p>In standard form, semantic Web ontologies may range from the small and         simple to the large and complex, and may perform the roles of defining         relationships among concepts, integrating instance data, orienting to         other knowledge and domains, or mapping to other schema <a href="#ose11">[11]</a>. These are         explicit uses in the way that we construct ontologies; we also believe         it is important to keep concept definitions and relationships expressed         separately from instance data and their attributes <a href="#ose9">[9]</a>.</p>
<p>But, in addition to these standard roles, we also look to ontologies to         stand on their own as guiding structures for ontology-driven         applications (see next pillar). With a relatively few minor and new         best practices, ontologies can take on the double role of informing         user interfaces in addition to standard information integration.</p>
<p>In this vein we term our structures <span style="font-style: italic;">adaptive ontologies</span> [<a href="#ose11">11</a>,<a href="#ose12">12</a>,<a href="#ose13">13</a>]. Some of         the user interface considerations that can be driven by adaptive         ontologies include: attribute labels and tooltips; navigation and         browsing structures and trees; menu structures; auto-completion of         entered data; contextual dropdown list choices; spell checkers; online         help systems; etc. Put another way, what makes an ontology adaptive is         to supplement the standard machine-readable purpose of ontologies to         add human-readable labels, synonyms, definitions and the like.</p>
<p>A neat trick occurs with this slight expansion of roles. The knowledge         management effort can now shift to the actual description, nature and         relationships of the information environment. In other words,         ontologies themselves become the focus of effort and development. The         KM problem no longer needs to be abstracted to the IT department or         third-party software. The actual concepts, terminology and relations         that comprise coherent ontologies now become the explicit focus of KM         activities.</p>
<p>Any existing structure (or multiples thereof) can become a starting         basis for these ontologies and their vocabularies, from spreadsheets to         naïve data structures and lists and taxonomies. So, while producing an         operating ontology that meets the best practice thresholds noted herein         has certain requirements, kicking off or contributing to this process         poses few technical or technology demands.</p>
<p>The skills needed to create these adaptive ontologies are logic,         coherent thinking and domain knowledge. That is, any subject matter         expert or knowledge worker likely has the necessary skills to         contribute to useful ontology development and refinement. With adaptive         ontologies powering ontology-driven apps (see next), we thus see a shift         in roles and responsibilities away from IT to the knowledge workers         themselves. This shift acts to democratize the knowledge management         function and flatten the organization.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar4.png" alt="Pillar #4" /> Pillar #4: Ontology-driven Applications</h3>
<p>The complement to adaptive ontologies are <span style="font-style: italic;">ontology-driven applications</span>. By         definition, ontology-driven apps are modular, generic software         applications designed to operate in accordance with the specifications         contained in an adaptive ontology. The relationships and structure of         the information driving these applications are based on the standard         functions and roles of ontologies, as supplemented by the human and         user interface roles noted above [<a href="#ose11">11</a>,<a href="#ose12">12</a>,<a href="#ose13">13</a>].</p>
<p>Ontology-driven apps fulfill specific generic tasks. Examples of         current ontology-driven apps include imports and exports in various         formats, dataset creation and management, data record creation and         management, reporting, browsing, searching, data visualization, user         access rights and permissions, and similar. These applications provide         their specific functionality in response to the specifications in the         ontologies fed to them.</p>
<p>The applications are designed more similarly to widgets or API-based         frameworks than to the dedicated software of the past, though the         dedicated functionality (<span style="font-style: italic;">e.g.</span>,         graphing, reporting, etc.) is obviously quite similar. The major change         in these ontology-driven apps is to accommodate a relatively common         abstraction layer that responds to the structure and conventions of the         guiding ontologies. The major advantage is that single generic         applications can supply shared functionality based on any properly         constructed adaptive ontology.</p>
<p>This design thus limits software brittleness and maximizes software         re-use. Moreover, as noted above, it shifts the locus of effort from         software development and maintenance to the creation and modification         of knowledge structures. The KM emphasis can shift from programming and         software to logic and terminology <a href="#ose12">[12]</a>.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar5.png" alt="Pillar #5" /> Pillar #5: A Web-oriented Architecture</h3>
<p>A Web-oriented architecture (WOA) is a subset of the <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">service-oriented         architectural</a> (SOA) style, wherein discrete functions are packaged         into modular and shareable elements (”services”) that are         made available in a distributed and loosely coupled manner. WOA uses         the representational state transfer (REST) style. REST provides         principles for how resources are defined and used and addressed with         simple interfaces without additional messaging layers such as <a href="http://en.wikipedia.org/wiki/SOAP">SOAP</a> or <a href="http://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a>. The         principles are couched within the framework of a generalized         architectural style and are not limited to the Web, though they are a         foundation to it <a href="#ose14">[14]</a>.</p>
<p>REST and WOA stand in contrast to earlier Web service styles that are         often known by the WS-* acronym (such as <a href="http://en.wikipedia.org/wiki/Web_Services_Description_Language">WSDL</a>,         <a href="http://en.wikipedia.org/wiki/List_of_Web_service_specifications">etc</a>.).         WOA has proven itself to be highly scalable and robust for         decentralized users since all messages and interactions are         self-contained.</p>
<p>Enterprises have much to learn from the Web’s success. WOA has a         simple design with REST and idempotent operations, simple messaging,         distributed and modular services, and simple interfaces. It has a         natural synergy with linked data via the use of URI identifiers and the         HTTP transport protocol. As we see with the explosion of searchable         dynamic databases exposed via the Web, so too can we envision the same         architecture and design providing a distributed framework for data         federation. Our daily experience with browser access of the Web shows         how incredibly diverse and distributed systems can meaningfully         interoperate <a href="#ose15">[15]</a>.</p>
<p>This same architecture has worked beautifully in linking documents; it         is now pointing the way to linking data; and we are seeing but the         first phases of linking people and groups together via meaningful         collaboration. While generally based on only the most rudimentary basis         of connections, today&#8217;s social networking platforms are changing the         nature of contacts and interaction.</p>
<p>The foundations herein provide a basis for marrying data and documents         in a design geared from the ground up for collaboration. These         capabilities are proven and deployable today. The only unclear aspects         will be the scale and nature of the benefits <a href="#ose16">[16]</a>.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar6.png" alt="Pillar #6" /> Pillar #6: An Incremental, Layered Approach</h3>
<p>To this point, you&#8217;ll note that we have been speaking in what are         essentially &#8220;layers&#8221;. We began with existing assets, both internal and         external, in many diverse formats. These are then converted or         transformed into RDF-capable forms. These various sources are then         exposed via a WOA Web services layer for distributed and         loosely-coupled access. Then, we integrate and federate this         information via adaptive ontologies, which then can be searched,         inspected and managed via ontology-driven apps. We have presented this         layered architecture before <a href="#ose13">[13]</a>, and have also expressed this design         in relation to current Structured Dynamics&#8217; products <a href="#ose17">[17]</a>.</p>
<p>A slight update of this layered view is presented below, made even more         general for the purposes of this foundational discussion:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091213_open_enterprise.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 500px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091213_open_enterprise.png" alt="Open Enterprise Architecture" width="982" height="818" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Semantic technology does not change or alter the fact that most         activities of the enterprise are transactional, communicative or         documentary in nature. Structured, relational data systems for         transactions or records are proven, performant and understood. On its         very face, it should be clear that the <span style="font-style: italic;">meaning</span> of these activities — their         <span style="font-style: italic;">semantics</span>, if you will —         is by nature an augmentation or added layer to how to conduct the         activities themselves.</p>
<p>This simple truth affirms that semantic technologies are not a starting         basis, then, for these activities, but a way of expressing and         interoperating their outcomes. Sure, some semantic understanding and         common vocabularies at the front end can help bring consistency and a         common language to an enterprise’s activities. This is good         practice, and the more that can be done within reason while not         stifling innovation, all the better. But we all know that the budget         department and function has its own way of doing things separate from         sales or R&amp;D. And that is perfectly OK and natural.</p>
<p>Clearly, then, an obvious benefit to the semantic enterprise is to         federate across existing data silos. This should be an objective of the         first semantic &#8220;layer&#8221;, and to do so in a way that leverages existing         information already in hand. This approach is inherently incremental;         if done right, it is also low cost and low risk.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_pillar7.png" alt="Pillar #7" /> Pillar #7: The Open World Mindset</h3>
<p>As these pillars took shape in our thinking and arguments over the past         year, an illusive piece seemed always to be missing. It was like having         one of those meaningful dreams, and then waking up in the morning         wracking your memory trying to recall that essential, missing insight.</p>
<p>As I most recently wrote <a href="#ose1">[1]</a>, that missing piece for <span style="font-weight: bold; font-style: italic; text-decoration: underline;">this</span> story is the open world assumption (OWA). I argue that this somewhat         obscure concept holds within it the key as to why there have been         decades of too-frequent failures in the enterprise in <a href="http://en.wikipedia.org/wiki/Business_intelligence">business         intelligence</a>, <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a>,         <a href="http://en.wikipedia.org/wiki/Data_integration">data         integration</a> and <a href="http://en.wikipedia.org/wiki/Federated_database_system">federation</a>,         and <a href="http://en.wikipedia.org/wiki/Knowledge_management">knowledge         management</a>.</p>
<p>Enterprises have been captive to the mindset of traditional relational         data management and its (most often unstated) <a href="http://en.wikipedia.org/wiki/Closed_World_Assumption">closed world         assumption</a> (CWA). Given the success of relational systems for         transaction and operational systems &#8212; applications for which they are         still clearly superior &#8212; it is understandable and not surprising         that this same mindset has seemed logical for knowledge management         problems as well.  But knowledge and KM are by their nature         incomplete, changing and uncertain. A closed-world mindset carries with         it certainty and logic implications not supportable by real         circumstances.</p>
<p>This is not an esoteric point, but a fundamental one. How one thinks         about the world and evaluates it is pivotal to what can be learned and         how and with what information. Transactions require completeness and         performance; insight requires drawing connections in the face of         incompleteness or unknowns.</p>
<p>The absolute applicability of the semantic Web stack to an open-world         circumstance is the elephant in the room <a href="#ose1">[1]</a>. By itself, the open world mindset         provides no assurance of gaining insight or wisdom. But, absent it, we         place thresholds on information and understanding that may neither be         affordable nor achievable with traditional, closed-world approaches.</p>
<p>And, by either serendipity or some cosmic beauty, the open world         mindset also enables incremental development, testing and refinement.         Even if my basic argument of the open world advantage for knowledge         management purposes is wrong, we can test that premise at low cost and         risk. So, within available budget, pick a doable proof-of-concept, and         decide for yourself.</p>
<h3><img style="vertical-align: middle;" src="../wp-content/themes/ai3/images/2010Posts/100110_7pillars_small.png" alt="Seven Pillars" /> The Foundations for the <span style="font-style: italic;">Open Semantic         Enterprise</span></h3>
<p>The seven pillars above are not magic bullets and each is likely not         absolutely essential. But, based on today&#8217;s understandings and with         still-emerging use cases being developed, we can see our <span style="font-weight: bold; font-style: italic;">open semantic         enterprise</span> as resulting from the interplay of these seven         factors:</p>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 414px; height: 404px;" title="Seven Pillars of the Open Semantic Enterprise" src="http://mkbergman.com/wp-content/themes/ai3/images/2010Posts/100110_ose.png" alt="Open Semantic Enterprise" width="414" height="404" /></div>
<p>Thirty years of disappointing knowledge management projects and much         wasted money and effort compel that better ways must be found. On         the other hand, until recently, too much of the semantic Web discussion         has been either revolutionary (<span style="font-style: italic;">&#8220;change everything!!&#8221;</span>) or argued from         pie-in-the-sky bases. Something needs to give.</p>
<p>Our work over the past few years &#8212; but especially as focused in the         last 12 months &#8212; tells us that meaningful semantic Web initiatives can         be mounted in the enterprise with potentially huge benefits, all at         manageable risks and costs. These seven pillars point to way to how         this might happen. What is now required is that eighth pillar &#8212; you.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose1"></a> [1] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> &#8220;The Open World Assumption: Elephant in the Room</a>&#8220;, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose2"></a> [2] In most instances, semantic technologies are poorly suited to         transactional or operational applications. Also, there are instances in         modeling specific closed-world domains where ontologies can be quite         useful, such as in aerospace, petrochemicals, engineering, etc., where         the scope of the domain can be precisely bounded and defined. Such         efforts tend to be high cost with lengthy lead times. There are vendors         who support efforts in these areas, though my company, <a href="http://structureddynamics.com/">Structured Dynamics</a>, does not. Our         focus and the more generally suitable case for semantic technologies we         believe is in knowledge representation and management.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose3"></a> [3] The standard <a style="font-weight: bold; font-style: italic; color: #990000;" href="../new-version-sweet-tools-sem-web/">Sweet         Tools</a> listing on my <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog contains more than 800 semantic Web and         -related tools, most of which are open source, which can be inspected         via filtered and faceted search.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose4"></a> [4] See, M.K. Bergman, 2009. <a href="../483/advantages-and-myths-of-rdf/">&#8220;Advantages         and Myths of RDF&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog, April 8, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose5"></a> [5] For example, see this listing of more than 150 specific <a href="http://openstructs.org/resources/rdfizers">format options</a> available as open source. These converters can also work directly with         major application APIs.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose6"></a> [6] For an expansion on RDF as a canonical data model, see further M.K.         Bergman, 2009. <a href="../533/structure-the-world/">&#8220;Structure the         World&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog, August 3, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose7"></a> [7] For example, for dataset authoring, Structured Dynamics has         developed <a href="http://openstructs.org/iron"><span style="font-style: italic; font-weight: bold;">irON</span></a>, an instance         record and object notation that can be serialized as JSON (called         <span style="font-style: italic;">irJSON</span>), XML (called         <span style="font-style: italic;">irXML</span>) or comma-separated         values (or CSV comma-delimited files, called <span style="font-style: italic;">commON</span>). The purpose of these notations is         to provide easier authoring environments and scripting support to         RDF-ready datasets. The advantage is to shield users from the nuances         of RDF. The design of <span style="font-style: italic;">commON</span> is especially geared to using spreadsheets as authoring environments         for instance record tables or simple outline structures.  See         further the <a href="http://openstructs.org/iron/iron-specification"><span style="font-style: italic; font-weight: bold;">irON</span> specification</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose8"></a> [8] For a general listing of linked data articles, please see <a href="../category/linked-data/">that category</a> on         this <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive         Information</span></a> blog. Specific articles of interest include the         four-part series on &#8220;Making Linked Data Reasonable Using Description         Logics&#8221; [9] (<a href="../474/making-linked-data-reasonable-using-description-logics-part-1/">February         11</a>, <a href="../476/making-linked-data-reasonable-using-description-logics-part-2/"> February 15</a>, <a href="../477/making-linked-data-reasonable-using-description-logics-part-3/"> February 18</a> and <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> February 23</a>, 2009) and the <a href="../837/the-law-of-linked-data/">&#8220;The Law of         Linked Data&#8221;</a> (October 11, 2009).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose9"></a> [9] Our best practices approach makes explicit splits between the         &#8220;<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>&#8221; (for instance         data) and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” (for ontology         schema) in accordance with our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>, a fundamental underpinning for how we use RDF:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose10"></a> [10] Those unfamiliar with the term <span style="font-style: italic;">ontology</span> might be interested in my first         introduction to the subject: M.K. Bergman, 2007. <a href="../374/an-intrepid-guide-to-ontologies/"><span style="font-style: italic;"> &#8220;</span>An Intrepid Guide to Ontologies<span style="font-style: italic;">&#8220;</span></a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, May         16, 2007.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose11"></a> [11] See M.K. Bergman, 2009. <a href="../492/ontology-best-practices-for-data-driven-applications-part-3/"> <span style="font-style: italic;">&#8220;</span>Ontologies as the         ‘Engine’ for Data-Driven Applications<span style="font-style: italic;">&#8220;</span></a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, June         10, 2009. This is the most detailed explanation, but the specific term         <span style="font-style: italic;">adaptive ontology</span> was not yet         used. The first dedicated focus on adaptive ontologies was in <a href="../553/confronting-misconceptions-with-adaptive-ontologies/"> &#8220;Confronting Misconceptions with Adaptive Ontologies&#8221;</a> (August 17,         2009). See also [12] and [13].</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose12"></a> [12] See, M.K. Bergman, 2009. <a href="../847/ontology-driven-applications-using-adaptive-ontologies/"> &#8220;Ontology-driven Applications Using Adaptive Ontologies&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         November 23, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose13"></a> [13] See, M.K. Bergman, 2009. <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> &#8220;Fresh Perspectives on the Semantic Enterprise&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         September 28, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose14"></a> [14] See, M.K. Bergman, 2009. <a href="../486/a-general-web-oriented-architecture-woa-for-structured-data/"> &#8220;A General Web-oriented Architecture (WOA) for Structured Data&#8221;</a>,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, May         3, 2009. Also, see the related <a href="../category/web-oriented-architecture-woa/">WOA         category</a> for other articles in this area.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose15"></a> [15] See, M.K. Bergman, 2008. <a href="../459/woa-a-new-enterprise-partner-for-linked-data/"> &#8220;WOA: A New Enterprise Partner for Linked Data&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         October 12, 2008.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose16"></a> [16] See, M.K. Bergman, 2009. <a href="../497/structwsf-a-framework-for-collaboration-networks/"> &#8220;structWSF: A Framework for Collaboration Networks&#8221;</a>, <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, July         2, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="ose17"></a> [17] See <a href="http://structureddynamics.com/products.html">http://structureddynamics.com/products.html</a> for a general descriptive illustration of Structured Dynamics&#8217; product         stack. There is also a longer <a href="http://www.slideshare.net/mkbergman/structured-dynamicss-semantic-technologies-product-stack"> slideshow</a>, with particular reference to slide #37.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>When Linked Data Rules Fail</title>
		<link>http://www.mkbergman.com/846/when-linked-data-rules-fail/</link>
		<comments>http://www.mkbergman.com/846/when-linked-data-rules-fail/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 17:04:01 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[data.gov]]></category>
		<category><![CDATA[new york times]]></category>
		<category><![CDATA[nyt]]></category>
		<category><![CDATA[vocabularies]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=846</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/846/when-linked-data-rules-fail/&amp;rft.language=English"></span>

High Visibility Problems with NYT, data.gov Show Need for Better         Practices
When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky?         A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/846/when-linked-data-rules-fail/&amp;rft.language=English"></span>
<p><a href="http://www.adhd-mindbydesign.com/"><img style="border: 0px solid; width: 220px; height: 223px; float: left; margin-right: 10px;" title="Image Source: www.adhd-mindbydesign.com" src="../wp-content/themes/ai3/images/2009Posts/091115_disconnected.jpg" alt="Image Source: www.adhd-mindbydesign.com" hspace="5" vspace="5" align="left" /></a></p>
<h2>High Visibility Problems with NYT, data.gov Show Need for Better         Practices</h2>
<p>When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky?         A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? Do you now         think of someone being shot in an armed robbery of a local bank or         similar?</p>
<p>And, now, what if I add a reference to say, <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>,         or Minnesota Fats, or &#8220;Fast Eddie&#8221; Felson? Do you now see the         connection to a pressure-packed banked pool shot in some smoky bar         room?</p>
<p>As humans we need context to make connections and remove ambiguity. For         machines, with their limited reasoning and inference engines, context         and accurate connections are even more important.</p>
<p>Over the past few weeks we have seen announcements of two large and         high-visibility <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> projects:  One, a first release of references for articles         concerning about 5,000 people from the New York Times at <a href="http://data.nytimes.com/">data.nytimes.com</a>; and Two, a         massive exposure of 5 billion triples from <a href="http://tw.rpi.edu/">data.gov</a> datasets provided by the <a href="http://tw.rpi.edu/">Tetherless World Constellation</a> (TWC) at         <a href="http://rpi.edu/">Rennselaer Polytechnic Institute</a> (RPI).</p>
<p>On various grounds from <a href="http://go-to-hellman.blogspot.com/2009/10/new-york-times-blunders-into-linked.html"> licensing</a> to <a href="http://dowhatimean.net/2009/10/linked-data-at-the-new-york-times-exciting-but-buggy"> data characterization</a> and to creating linked data for its <a href="http://www.betaversion.org/%7Estefano/linotype/news/351/">own         sake</a>, some prominent commentators have weighed in on what is good         and what is not so good with these datasets. One of us, Mike, <a href="../843/must-read-data-smoke-and-mirrors/">commented</a> about a week ago that &#8220;we have now moved beyond &#8216;proof of concept&#8217; to         the need for actual useful data of trustworthy provenance and proper         mapping and characterization. Recent efforts are a disappointment that         no enterprise would or could rely upon.&#8221;</p>
<p>Reactions to <a href="../843/must-read-data-smoke-and-mirrors/">that         posting</a> and continued discussion on various <a href="http://lists.w3.org/Archives/Public/public-esw-thes/2009Nov/0000.html"> mailing lists</a> warrant a more precise dissection of what is wrong         and still needs to be done with these datasets <a href="#ld1">[1]</a>.</p>
<h3>Berners-Lee&#8217;s Four Linked Data &#8220;Rules&#8221;</h3>
<p>It is useful, then, to return to first principles, namely the original         four &#8220;rules&#8221; posed by Tim Berners-Lee in his design note on linked data         <a href="#ld2">[2]</a>:</p>
<ol>
<li>Use URIs as names for things</li>
<li>Use HTTP URIs so that people can look up those names</li>
<li>When someone looks up a URI, provide useful information, using the         standards (RDF, SPARQL)</li>
<li>Include links to other URIs so that they can discover more things.</li>
</ol>
<p>The first two rules are definitional to the idea of linked data. They         cement the basis of linked data in the Web, and are not at issue with         either of the two linked data projects that are the subject of this         posting.</p>
<p>However, it is the lack of specifics and guidance in the last two rules         where the breakdowns occur. Both the NYT and the RPI datasets suffer         from a lack of &#8220;providing useful information&#8221; (Rule #3). And,         the <span class="double_u">nature</span> of the links in Rule #4         is a real problem for the NYT dataset.</p>
<h3>What Constitutes &#8220;Useful Information&#8221;?</h3>
<p>The Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> expands on         &#8220;useful information&#8221; by augmenting the original rule with the         parenthetical clause, &#8221; (<span style="font-style: italic;">i.e.</span>,         a structured description — metadata).&#8221; But even that expansion is         insufficient.</p>
<p>Fundamentally, what are we talking about with linked data? Well, we are         talking about instances that are characterized by one or more         attributes. Those instances exist within contexts of various natures.         And, those contexts may relate to other existing contexts.</p>
<p>We can break this problem description down into three parts:</p>
<ul>
<li>A <span style="font-weight: bold; font-style: italic;">vocabulary</span> that defines         the nature of the instances and their descriptive attributes</li>
<li>A <span style="font-weight: bold; font-style: italic;">schema</span> of some nature         that describes the structural relationships amongst instances and their         characteristics, and, optimally,</li>
<li>A <span style="font-weight: bold; font-style: italic;">mapping</span> to existing         external schema or constructs that help place the data into context.</li>
</ul>
<p>At minimum, <span class="double_u">ANY</span> dataset exposed as         linked data needs to be described by a <span style="font-weight: bold; font-style: italic;">vocabulary</span>. Both the         NYT and RPI datasets fail on this score, as we elaborate below. Better         practice is to also provide a <span style="font-weight: bold; font-style: italic;">schema</span> of relationships         in which to embed each instance record. And, best practice is to also         <span style="font-weight: bold; font-style: italic;">map</span> those         structures to external schema.</p>
<p>Lacking this &#8220;useful information&#8221;, especially a defining vocabulary, we         cannot begin to understand whether our instances deal with drinks, bank         robberies or pool shots. This lack, in essence, makes the information         worthless, even though available via URL.</p>
<h4>The data.gov (RPI) Case</h4>
<p>With the support of NSF and various grant funding, RPI has set up the         <a href="http://data-gov.tw.rpi.edu/wiki/The_Data-gov_Wiki">Data-Gov         Wiki</a> <a href="#ld3">[3]</a>, which is in the process of converting         the datasets on <a href="http://www.data.gov/">data.gov</a> to RDF,         placing them into a semantic wiki to enable comment and annotation, and         providing that data as RSS feeds. Other demos are also being placed on         the site.</p>
<p>As of the date of this posting, the site had a <a href="http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog">catalog</a> of 116         datasets from the 800 or so available on data.gov, leading to these         statistics:</p>
<ul>
<li>459,412,419 table entries</li>
<li>5,074,932,510 triples, and</li>
<li>7,564 properties (or attributes).</li>
</ul>
<p>We&#8217;ll take one of these datasets, <a href="http://www.data.gov/details/319">#319</a>, and look a bit closer at         it:</p>
<table border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<th style="background-color: #cccccc;"> Wiki</th>
<th style="background-color: #cccccc;"> Title</th>
<th style="background-color: #cccccc;"> Agency</th>
<th style="background-color: #cccccc;"> Name</th>
<th style="background-color: #cccccc;"> data.gov Link</th>
<th style="background-color: #cccccc;"> No Properties</th>
<th style="background-color: #cccccc;"> No Triples</th>
<th style="background-color: #cccccc;"> RDF File</th>
</tr>
<tr>
<td><a title="Dataset 319" href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">Dataset 319</a></td>
<td>Consumer Expenditure Survey</td>
<td><a title="Department of Labor" href="http://data-gov.tw.rpi.edu/wiki/Department_of_Labor">Department of Labor</a></td>
<td><a title="LABOR-STAT (page does not exist)" href="http://data-gov.tw.rpi.edu/w/index.php?title=LABOR-STAT&amp;action=edit&amp;redlink=1">LABOR-STAT</a></td>
<td><a title="http://www.data.gov/details/319" rel="nofollow" href="http://www.data.gov/details/319">http://www.data.gov/details/319</a></td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">1,583,236</td>
<td><a title="http://data-gov.tw.rpi.edu/raw/319/index.rdf" rel="nofollow" href="http://data-gov.tw.rpi.edu/raw/319/index.rdf">http://data-gov.tw.rpi.edu/raw/319/index.rdf</a></td>
</tr>
</tbody>
</table>
<p>This report was picked solely because it had a small number of         attributes (properties), and is thus easier to screen capture. The         summary report on the wiki is shown by this <a href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">page</a>:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_wiki_dataset_319.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 611px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_wiki_dataset_319.png" alt="Data-gov-Wiki Dataset #319" width="1093" height="1113" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>So, we see that this specific dataset contains about 22 of the nearly         8,000 attributes across all datasets.</p>
<p>When we click on one of these attribute names, we are then taken to a         specific wiki page that only reiterates its label. There is no         definition or explanation.</p>
<p>When we inspect this page further we see that, other than the broad         characterization of the dataset itself (the bulk of the page), we see         at the bottom 22 undefined attributes with labels such as <span style="font-style: italic;">item code</span>, <span style="font-style: italic;">periodicity code</span>, <span style="font-style: italic;">seasonal</span>, and the like. These attributes         are the real structural basis for the data in this dataset.</p>
<p>But, what does all of this mean???</p>
<p>To gain a clue, now let&#8217;s go to the source data.gov site for this       <a href="http://www.data.gov/details/319">dataset (#319)</a>. Here is how       that report looks:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_data_gov_319.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 1146px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_data_gov_319.png" alt="Data.gov Dataset #319" width="1036" height="1978" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>Contained within this report we see a listing for additional <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.txt">metadata</a>. This link         tells us about the various data fields contained in this dataset; we         see many of these attributes are &#8220;codes&#8221; to various data categories.</p>
<p>Probing further into the dataset&#8217;s <a href="http://www.bls.gov/cex/">technical documentation</a>, we see that         there is indeed a rich structure underneath this report, again provided         via various code lookups. There are codes for geography, seasonality         (adjusted or not), consumer demographic profiles and a variety of         consumption categories. (See, for example, the link to this <a href="http://www.bls.gov/cex/csxgloss.htm">glossary page</a>.) These are the         keys to understanding the actual values within this dataset.</p>
<p>For example, one major dimension of the data is captured by the         attribute <span style="font-style: italic;">item_code</span>. The         survey breaks down consumption expenditures within the broad categories         of  Food, Housing, Apparel and Services, Transportation, Health         Care, Entertainment, and Other. Within a category, there is also a rich         structural breakdown. For example, expenditures for Bakery Products         within Food is given a <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.item">code</a> of FHC2.</p>
<p>But, nowhere are these codes defined or unlocked in the RDF datasets.         This absence is true for virtually all of the datasets exposed on this         wiki.</p>
<p>So, for literally billions of triples, and 8,000 attributes, we have         <span style="font-weight: bold;">ABSOLUTELY NO INFORMATION ABOUT WHAT         THE DATA CONTAINS OTHER THAN A PROPERTY LABEL</span>. There is much,         much rich value here in data.gov, but all of it remains locked up and         hidden.</p>
<p>The sad truth about this data release is that it provides absolutely no         value in its current form. We lack the keys to unlock the value.</p>
<p>To be sure, early essential spade work has been done here to begin         putting in place the conversion infrastructure for moving text files,         spreadsheets and the like to an RDF form. This is yeoman work important         to ultimate access. But, until a <span style="font-weight: bold; font-style: italic;">vocabulary</span> is published         that defines the attributes and their codes so we can unlock this         value, it will remain hidden. And only when its further value (by         connecting attributes and relations across datasets) through a         <span style="font-weight: bold; font-style: italic;">schema</span> of         some nature is also published, the real value from connecting the dots         will also remain hidden.<img style="width: 160px; height: 218px; float: right; margin-left: 10px;" title="The Hustler" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_the_hustler.jpg" alt="The Hustler" align="right" /></p>
<p>These datasets may meet the partial conditions of providing clickable         URLs, but the crucial &#8220;useful information&#8221; as to what any of this data         means is absent.</p>
<p>Every single dataset on data.gov has supporting references to text         files, PDFs, Web pages or the like that describe the nature of the data         within each dataset. Until that information is exposed and made usable,         we have no linked data.</p>
<p>Until ontologies get created from these technical documents, the value         of these data instances remain locked up, and no value can be created         from having these datasets expressed in RDF.</p>
<p>The devil lies in the details. The essential hard work has not yet         begun.</p>
<h4>The NYT Case</h4>
<p>Though at a much smaller scale with many fewer attributes, the <a href="http://data.nytimes.com/">NYT dataset</a> suffers from the same         failing: it too lacks a <span style="font-weight: bold; font-style: italic;">vocabulary</span>.</p>
<p>So, let&#8217;s take the case of one of the lead actors in <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>,         Paul Newman, who played the role of &#8220;Fast Eddie&#8221; Felson. Here is the         <a href="http://data.nytimes.com/N31738445835662083893.html">NYT         record</a> for the &#8220;person&#8221; <span style="font-style: italic;">Paul         Newman</span> (which they also refer to as <a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>).         Note the header title of <span style="font-weight: bold;">Newman,         Paul</span>:</p>
<div style="margin: 10px; text-align: center;"><a href="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_nyt_paul_newman.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 593px;" title="Click to expand" src="http://mkbergman.com/wp-content/themes/ai3/images/2009Posts/091115_nyt_paul_newman.png" alt="NYT 'Paul Newman Articles' Record" width="988" height="976" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></p>
</div>
<p>Click on any of the internal labels used by the NYT for its own         attributes (such as <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>), and         you will be given this message:</p>
<div style="margin-left: 40px;">
<p><span style="font-style: italic;">&#8220;An RDFS description and English           language documentation for the NYT namespace will be provided soon.           Thanks for your patience.&#8221;</span></p>
</div>
<p>We again have no idea what is meant by all of this data except for the         labels used for its attributes. In this case for <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a> we have         a value of &#8220;2001-03-18&#8243;.</p>
<p>Hello? What? What is a &#8220;first use&#8221; for a &#8220;Paul Newman&#8221; of         &#8220;2001-03-18&#8243;???</p>
<p>The NYT put the cart before the horse: even if minimal, they should         have released their ontology first — or at least at the same time         — as they released their data instances. (See further <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> this discussion</a> about how an ontology creation workflow can be         incremental by starting simple and then upgrading as needed.)</p>
<h3>Links to Other Things</h3>
<p>Since there really are no links to other things on the Data-Gov Wiki,         our focus in this section continues with the NYT dataset using our same         example.</p>
<p>We now are in the territory of the fourth &#8220;rule&#8221; of linked data:         <span style="font-style: italic;">4. Include links to other URIs so         that they can discover more things</span>.</p>
<p>This will seem a bit basic at first, but before we can talk about         linking to other things, we first need to understand and define the         starting &#8220;thing&#8221; to which we are linking.</p>
<h4>What is a &#8220;Newman, Paul&#8221; Thing?</h4>
<p>Of course, without its own vocabulary, we are left to deduce what this         thing &#8220;<span style="font-weight: bold;">Newman,         Paul</span>&#8220; <span class="double_u">is</span> that is shown in the         previous screen shot. Our first clue comes from the statement that it         is of <span style="font-style: italic;">rdf:type</span> <a href="http://www.w3.org/TR/skos-reference/">SKOS</a> <span style="font-style: italic;">concept</span>. By looking to the SKOS         vocabulary, we see that <a href="http://www.w3.org/TR/skos-reference/#concepts"><span style="font-style: italic;">concept</span></a> is a class and is defined as:</p>
<p style="margin-left: 40px; font-style: italic;">A SKOS concept can be viewed as an idea or notion; a unit of thought.         However, what constitutes a unit of thought is subjective, and this         definition is meant to be suggestive, rather than restrictive. The         notion of a SKOS concept is useful when describing the conceptual or         intellectual structure of a knowledge organization system, and when         referring to specific ideas or meanings established within a KOS.</p>
<p>We also see that this instance is given a <a href="http://xmlns.com/foaf/0.1/primaryTopic">foaf:primaryTopic</a> of         <span style="font-style: italic;">Paul Newman</span>.</p>
<p>So, we can deduce so far that this instance is about the concept or         idea of <span style="font-style: italic;">Paul Newman</span>. Now,         looking to the attributes of this instance — that is the defining         properties provided by the NYT — we see the properties of         <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>,         <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>,         <a href="http://data.nytimes.com/elements/last_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>.         Completing our deductions, and in the absence of its own vocabulary, we         can now define this concept instance somewhat as follows:</p>
<p style="margin-left: 40px;"><span style="font-style: italic;">New York Times articles in the period         2001 to 2009 having as their primary topic the actor Paul Newman</span></p>
<p>(BTW, across all records in this dataset, we could see what the         earliest first use was to better deduce the time period over which         these articles have been assembled, but that has not been done.)</p>
<p>We also would re-title this instance more akin to &#8220;2001-2009 NYT         Articles with a Primary Topic of Paul Newman&#8221; or some such and use URIs         more akin to this usage.</p>
<h4>sameAs Woes</h4>
<p>Thus, in order to make links or connections with other data, it is         essential to understand what the nature is of the subject &#8220;thing&#8221; at         hand. There is much confusion about actual &#8220;things&#8221; and the references         to &#8220;things&#8221; and what is the nature of a &#8220;thing&#8221; within the literature         and on mailing lists.</p>
<p>Our belief and usage in matters of the semantic Web is that all         &#8220;things&#8221; we deal with are a reference to whatever the &#8220;true&#8221;, actual         thing is. The question then becomes:  What is the nature (or         scope) of this referent?</p>
<p>There are actually quite easy ways to determine this nature. First,         look to one or more instance examples of the &#8220;thing&#8221; being referred to.         In our case above, we have the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; instance record. Then, look         to the properties (or attributes) the publisher of that record has used         to describe that thing. Again, in the case above, we have <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>,         <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>,         <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>.</p>
<p>Clearly, this instance record — that is, its nature — deals         with articles or groups of articles. The relation to <span style="font-style: italic;">Paul Newman</span> occurs as a basis of         the <span class="double_u">primary topic</span> of these articles,         and not a <span class="double_u">person</span> basis for which to         describe the instance. If the nature of the instance was indeed the         person <span style="font-style: italic;">Paul Newman</span>, then the         attributes of the record would more properly be related to &#8220;person&#8221;         properties such as age, sex, birthdate, death date, marital status,         etc.</p>
<p>This confusion by NYT as to the nature of the &#8220;things&#8221; they are         describing then leads to some very serious errors. By confusing the         topic (<span style="font-style: italic;">Paul Newman</span>) of a         record with the nature of that record (articles about topics), NYT next         misuses one of the most powerful semantic Web predicates available,         <span style="font-weight: bold;">owl:sameAs</span>.</p>
<p>By asserting in the &#8220;<span style="font-weight: bold;">Newman,         Paul</span>&#8221; record that the instance has a <span style="font-weight: bold;">sameAs</span> relationship with external records         in <a href="http://rdf.freebase.com/ns/en.paul_newman">Freebase</a> and         <a href="http://dbpedia.org/resource/Paul_Newman">DBpedia</a>, the NYT         both <a href="http://en.wikipedia.org/wiki/Entailment">entail</a>s that         properties from any of the associated records are shared and <a href="http://en.wikipedia.org/wiki/Inference">infers</a> a chain of other         types to describe the record. More precisely, the NYT is asserting that         the &#8220;thing&#8221; referred to by these instances are <strong>identical</strong> resources.</p>
<p>Thus, by the <span style="font-weight: bold;">sameA</span>s statements         in the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; record,         the NYT is also asserting that that record is an instance of all these things <a href="#id5">[5]</a>:</p>
<table border="0">
<tbody>
<tr>
<td></td>
<td>
<ul>
<li> <a rel="rdf:type" href="http://dbpedia.org/about/html/http://www.w3.org/2002/07/owl%23Thing"> owl:Thing</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Agent">foaf:Agent</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Person">foaf:Person</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Actor">dbpedia-owl:Actor</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/JewishActors">http://dbpedia.org/class/yago/JewishActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio">http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Artist">dbpedia-owl:Artist</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/ontology/Person">dbpedia-owl:Person</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Person100007846">http://dbpedia.org/class/yago/Person100007846</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmDirectors">http://dbpedia.org/class/yago/AmericanFilmDirectors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/YaleUniversityAlumni">http://dbpedia.org/class/yago/YaleUniversityAlumni</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/OhioUniversityAlumni">http://dbpedia.org/class/yago/OhioUniversityAlumni</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rvVjWoZwpEbGdrcN5Y29ycA"> opencyc:en/MaleHuman</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmActors">http://dbpedia.org/class/yago/AmericanFilmActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Liberals">http://dbpedia.org/class/yago/Liberals</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/OhioActors">http://dbpedia.org/class/yago/OhioActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/UnitedStatesNavySailors">http://dbpedia.org/class/yago/UnitedStatesNavySailors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut"> http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwQB4UJwpEbGdrcN5Y29ycA"> opencyc:en/JewishPerson</a></li>
<li> <a rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwMRyTJwpEbGdrcN5Y29ycA"> opencyc:en/ActorInMovies</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/LivingPeople">http://dbpedia.org/class/yago/LivingPeople</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/Actor109765278">http://dbpedia.org/class/yago/Actor109765278</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanVegetarians">http://dbpedia.org/class/yago/AmericanVegetarians</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanPhilanthropists">http://dbpedia.org/class/yago/AmericanPhilanthropists</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/KenyonCollegeAlumni">http://dbpedia.org/class/yago/KenyonCollegeAlumni</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/WesternFilmActors">http://dbpedia.org/class/yago/WesternFilmActors</a></li>
<li> <a rel="rdf:type" href="http://dbpedia.org/class/yago/ActorsStudioAlumni">http://dbpedia.org/class/yago/ActorsStudioAlumni</a></li>
<li>and, a hundred other dbpedia_yago superClasses.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Furthermore, because of its strong, reciprocal entailments, the         <span style="font-weight: bold;">owl:sameAs</span> assertion would also         now entail that the person <span style="font-style: italic;">Paul         Newman</span> has the <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a> and         <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> attributes, clearly illogical for a &#8220;person&#8221; thing.</p>
<p>This connection is clearly wrong in both directions. <span style="font-style: italic;">Articles</span> are not <span style="font-style: italic;">persons</span> and don&#8217;t have <span style="font-style: italic;">marital status</span>; and <span style="font-style: italic;">persons</span> do not have <span style="font-style: italic;">first_uses</span>. By misapplying this         <span style="font-weight: bold;">sameAs</span> linkage relationship, we         have screwed things up in every which way. And the error began with         misunderstanding what kinds of &#8220;things&#8221; our data is about.</p>
<h4>Some Options</h4>
<p>However, there are solutions. First, the <span style="font-weight: bold;">sameAs</span> assertions, at least involving these         external resources, should be dropped.</p>
<p>Second, if linkages are still desired, a vocabulary such as <a href="http://umbel.org/">UMBEL</a> <a href="#ld4">[4]</a> could be used to         make an assertion between such a concept, and these other related         resources. So, even though these resources are not the same, they are         <strong>closely</strong> related. The UMBEL ontology helps us to define         this kind of relation between related, but non-identical, resources.</p>
<p>Instead of using the <span style="font-weight: bold;">owl:sameAs</span> property, we would suggest the usage of the <span style="font-weight: bold;">umbel:linksEntity</span>, which links a         <span style="font-weight: bold;">skos:Concept</span> to related named         entities resources. Additionally, Freebase, which also currently         asserts a <span style="font-weight: bold;">sameAs</span> relationship         to the NYT resource, could use the <span style="font-weight: bold;">umbel:isAbout</span> relationship to assert that         their resource &#8220;is about&#8221; a certain concept, which is the one defined         by the NYT.</p>
<p>Alternatively, still other external vocabularies that more precisely         capture the intent of the NYT publishers could be found, or the NYT         editors could define their own properties specifically addressing their         unique linkage interests.</p>
<h4>Other Minor Issues</h4>
<p>As a couple of additional, minor suggestions for the NYT dataset, we         would suggest:</p>
<ul>
<li>Create a <span style="font-weight: bold;">foaf:Organization</span> description of the NYT organization, then use it with <span style="font-weight: bold;">dc:creator</span> and <span style="font-weight: bold;">dcterms:rightsHolder</span> rather than using a         literal, and</li>
<li>The dual URIs such as &#8220;<a href="http://data.nytimes.com/N31738445835662083893">http://data.nytimes.com/N31738445835662083893</a>&#8221;         and &#8220;<a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>&#8221;         are not wrong in themselves, but the purpose is hard to understand. Why         does a single organization need to create multiple resources for the         <strong>identical resource,</strong> when it comes from the         same system and has the same purpose?</li>
</ul>
<h4>Re-visiting the Linkage &#8220;Rule&#8221;</h4>
<p>There are very valuable benefits from entailment, inference and logic         to be gained from linking resources. However, if the nature of the         &#8220;things&#8221; being linked — or the properties that define these         linkages — are incorrect, then very wrong logical implications         result. Great care and understanding should be applied to linkage         assertions.</p>
<h3>In the End, the Challenge is Not Linked Data, but <span style="font-style: italic; text-decoration: underline;">Connected</span> Data</h3>
<p>Our critical comments are not meant to be disrespectful and are not         being picky. The NYT and TWC are prominent institutions for which we         should expect leadership on these issues. Our criticisms (and we         believe those of others) are also not an expression of a &#8220;<a href="http://en.wikipedia.org/wiki/Hype_cycle">trough of         disillusionment</a>&#8221; as <a href="http://twitter.com/gregboutin/status/5558525462">some</a> have been         pointing out.</p>
<div class="boxYellowDotted" style="margin: 0pt 0pt 0pt 10px; float: right; width: 300px; text-align: center;">This posting has been jointly authored by <a href="http://mkbergman.com/"> Mike Bergman</a> and <a href="http://fgiasson.com/blog">Fred         Giasson</a> and simultaneously published on both of their blogs, hoping         to draw more attention to the need for better practices in publishing         linked data.</div>
<p>This posting is about poor practices, pure and simple. The time to         correct them is now. If asked, we would be pleased to help either         institution establish exemplar practices. This is not automatic, and it         is not always easy. The data.gov datasets, in particular, will require         much time and effort to get right. There is much documentation that         needs to be transitioned and expressed in semantic Web formats.</p>
<p>In a broader sense, we also seem to lack a definition of best practices         related to <span style="font-weight: bold;">vocabularies</span>,         <span style="font-weight: bold;">schema</span> and <span style="font-weight: bold;">mappings</span>. The Berners-Lee rules are         imprecise and insufficient as is. Prior best guidance documents tend to         be more how to publish and make URIs linkable, than to properly         characterize, describe and connect the data.</p>
<p>Perhaps, in part, this is a bit of a semantics issue. The challenge is         not the mechanics of <span style="font-style: italic;">linking         data</span>, but the meaning and basis for <span class="double_u">connecting</span> that data. Connections require logic and         rationality sufficient to reliably inform inference and rule-based         engines. It also needs to pass the sniff test as we &#8220;follow our nose&#8221;         by clicking the links exposed by the data.</p>
<p>It is exciting to see high-quality content such as from national         governments and major publishers like the New York Times begin to be         exposed as linked data. When this content finally gets embedded into         usable contexts, we should see manifest uses and benefits emerge. We         hope both institutions take our criticisms in that spirit.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld1" name="ld1"></a> [1] The NYT has been updated with         improvements and they fixed multiple issues from the first release. The         problems listed herein, however, still pertain after these         improvements.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld2" name="ld2"></a> [2] Tim Berners-Lee, 2006. Linked Data         (Design Issues), first posted on 2006-07-27; last updated on         2009-06-18. See <a href="http://www.w3.org/DesignIssues/LinkedData.html">http://www.w3.org/DesignIssues/LinkedData.html</a>.         Berners-Lee refers to the steps above as &#8220;rules,&#8221; but he elaborates         they are expectations of behavior. Most later citations refer to these         as &#8220;principles.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld3" name="ld3"></a> [3] Li Ding, Dominic DiFranzo, Sarah         Magidson, Deborah L. McGuinness and Jim Hendler, 2009. Data-GovWiki:         Towards Linked Government Data. See <a href="http://www.cs.vu.nl/%7Epmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf"> http://www.cs.vu.nl/~pmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld4" name="ld4"></a> [4] UMBEL <em>(Upper Mapping and Binding         Exchange Layer)</em> is a lightweight ontology structure in development         for relating Web content and data to a standard set of subject         concepts. It purpose has resulted in its creation of an associated         vocabulary geared to both class-instance and reciprocal relationships,         as well as partial or likelihood relationships. See <a href="http://umbel.org/technical_documentation.html#vocabulary">http://umbel.org/technical_documentation.html#vocabulary</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="id5"></a>[5] We&#8217;d like to thank Denny Vrandecic (see comments) for pointing out an imprecision in our original wording. This phrase was originally stated as, &#8220;Thus, by the sameAs statements in the &#8216;Newman, Paul&#8217; record, the NYT is also asserting that that record is the same as these other things.&#8221;<em> </em></div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/846/when-linked-data-rules-fail/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Must Read: &#8216;Data Smoke and Mirrors&#8217;</title>
		<link>http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/</link>
		<comments>http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 02:32:29 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[data.gov]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=843</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Must Read: &#8216;Data Smoke and Mirrors&#8217;&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/&amp;rft.language=English"></span>
Mazzocchi Sounds a Warning to Linked Data Advocates
Stefano Mazzocchi has been a clear thinker for years and an innovative contributor to the community since his early leadership of the Apache Cocoon project. One of his best qualities is he speaks his mind. Now at Freebase, but previously with MIT&#8217;s Simile program, he is one of [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Must Read: &#8216;Data Smoke and Mirrors&#8217;&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/&amp;rft.language=English"></span>
<h2>Mazzocchi Sounds a Warning to Linked Data Advocates</h2>
<p>Stefano Mazzocchi has been a clear thinker for years and an innovative contributor to the community since his early leadership of the Apache <a href="http://cocoon.apache.org/">Cocoon</a> project. One of his best qualities is he speaks his mind. Now at <a href="http://www.freebase.com/">Freebase</a>, but previously with MIT&#8217;s <a href="http://simile.mit.edu/">Simile</a> program, he is one of my dedicated reads via his <a href="http://www.betaversion.org/~stefano/linotype/">Stefano’s Linotype</a> blog.</p>
<p>His aforementioned post, <em><a href="http://www.betaversion.org/~stefano/linotype/news/351/">Data Smoke and Mirrors</a></em>, stands on its own, and I highly recommend it. He particularly focuses on the conversion of <a href="http://data.gov/">data.gov</a> datasets to &#8220;<a href="http://structureddynamics.com/linked_data.html">linked data</a>&#8221; (my quotes are purposeful). Combined with the recent poor conversion of <a href="http://open.blogs.nytimes.com/2009/10/29/first-5000-tags-released-to-the-linked-data-cloud/">New York Times datasets</a> to linked data, I think he is the canary sending out a warning about a disturbing trend.</p>
<p>Posting linked data for its own sake &#8212; whatever the reasons &#8212; risks undercutting the premise.</p>
<p>We have now moved beyond &#8220;proof of concept&#8221; to the need for actual useful data of trustworthy provenance and proper mapping and characterization. Recent efforts are a disappointment that no enterprise would or could rely upon.</p>
<p>Listen up, folks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Structured Dynamics&#8217; Product Stack</title>
		<link>http://www.mkbergman.com/842/structured-dynamics-product-stack/</link>
		<comments>http://www.mkbergman.com/842/structured-dynamics-product-stack/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 22:54:24 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Information Automation]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[scones]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>
		<category><![CDATA[slideshow]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=842</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>

A New Slide Show Consolidates, Explains Recent Developments
Much has been happening on the Structured Dynamics front of late. Besides welcoming Steve Ardire as a senior advisor to the company, we also have been issuing a steady stream of new products from our semantic Web pipeline.
This new slide show attempts to capture these products and relate [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structured Dynamics&#8217; Product Stack&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Information Automation&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/842/structured-dynamics-product-stack/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; width: 260px; height: 60px; float: left; margin-right: 10px;" title="Structured Dynamics LLC" src="../wp-content/themes/ai3/images/sd_logo_260.png" alt="Structured Dynamics LLC" hspace="5" vspace="5" align="left" /></a></p>
<h2>A New Slide Show Consolidates, Explains Recent Developments</h2>
<p>Much has been happening on the <a href="http://structureddynamics.com">Structured Dynamics</a> front of late. Besides welcoming <a href="http://www.linkedin.com/in/sardire">Steve Ardire</a> as a senior advisor to the company, we also have been issuing a steady stream of new <a href="http://structureddynamics.com/products.html">products</a> from our semantic Web pipeline.</p>
<p>This new slide show attempts to capture these products and relate them to the various layers in Structured Dynamics&#8217; enterprise product stack:</p>
<div class="center_ok center">
<div id="__ss_2406783" class="center_ok" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="Structured Dynamics's Semantic Technologies Product Stack" href="http://www.slideshare.net/mkbergman/structured-dynamicss-semantic-technologies-product-stack">Structured Dynamics&#8217;s Semantic Technologies Product Stack</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sdproductstack20091102-091102163620-phpapp01&amp;stripped_title=structured-dynamicss-semantic-technologies-product-stack" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/mkbergman">mkbergman</a>.</div>
</div>
</div>
<p>The show indicates the role of <a href="http://structureddynamics.com/scones.html">scones</a>, <a href="http://openstructs.org/iron">irON</a>, <a href="http://openstructs.org/structwsf">structWSF</a>, <a href="http://umbel.org/">UMBEL</a>, <a href="http://constructscs.com/">conStruct</a> and others and how they leverage existing information assets to enable the semantic enterprise. And, oh, by the way, all of this is done via Web-accessible <a href="http://structureddynamics.com/linked_data.html">linked data</a> and our practical <a href="http://structureddynamics.com/technology.html">technologies</a>.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/842/structured-dynamics-product-stack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Law of Linked Data</title>
		<link>http://www.mkbergman.com/837/the-law-of-linked-data/</link>
		<comments>http://www.mkbergman.com/837/the-law-of-linked-data/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 01:16:17 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linked data law]]></category>
		<category><![CDATA[metcalfe's law]]></category>
		<category><![CDATA[network effects]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=837</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Law of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/837/the-law-of-linked-data/&amp;rft.language=English"></span>

A Marshal to Bring Order to the Town of Data Gulch
Though not the first, I have been touting the Linked Data Law for a         couple of years now [1]. But in a conversation last week, I found that         my [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Law of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/837/the-law-of-linked-data/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 150px; height: 158px; float: left; margin-right: 10px;" title="The Marshal Has Come to Town" src="../wp-content/themes/ai3/images/2009Posts/091011_deputy_marshal_badge.jpg" alt="The Marshal Has Come to Town" hspace="5" vspace="0" align="left" /></p>
<h2>A Marshal to Bring Order to the Town of Data Gulch</h2>
<p>Though not the first, I have been touting the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span> for a         couple of years now <a href="#ldl_1">[1]</a>. But in a conversation last week, I found that         my colleague did not find the premise very clear. I suspect that is due         both to cryptic language on my part and the fact no one has really         tackled the topic with focus. So, in this post, I try to redress that         and also comment on the related role of linked data in the semantic         enterprise.</p>
<p>Adding connections to existing information via linked data is a         powerful force multiplier, similar to <a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe&#8217;s law</a> for         how the value of a network increases with more users (nodes). I have         come to call this the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span>: the         value of a linked data network is proportional to the square of the         number of links between data objects.</p>
<div class="boxGreenDotted" style="margin: 5px 0pt 5px 0px; float: right; text-align: center; width: 360px;"><big style="font-style: italic; color: #006600; font-weight: bold;">&#8220;In the       network economy, the connections are as important as the nodes.&#8221;</big> <a href="#ldl_2">[2]</a></div>
<p>An early direct mention of the semantic Web and its possible ability to         generate <a href="http://en.wikipedia.org/wiki/Network_effect">network         effects</a> comes from a 2003 Mitre report for the government <a href="#ldl_3">[3]</a>. In         it, the authors state, &#8220;At present a very small proportion of the data         exposed on the web is marked up using Semantic Web vocabularies like         RDF and OWL. As more data gets mapped to ontologies, the potential         exists to achieve a &#8216;network effect&#8217;.&#8221; Prescient, for sure.</p>
<p>In July 2006, both Henry Story and Dion Hinchliffe discussed Metcalfe&#8217;s         law, with Henry specifically looking to relate it to the semantic Web <a href="#ldl_4"> [4]</a>. He noted that his initial intuition was that &#8220;the value of your         information grows exponentially with your ability to combine it with         new information.&#8221; He noted he was trying to find ways to adapt         Metcalfe&#8217;s law for applicability to the semantic Web.</p>
<p>I picked up on those observations and commented to Henry at that time         and in my own post, &#8220;<a style="font-style: italic;" title="Permanent Link to The Exponential Driver of Combining Information" rel="bookmark" href="../255/the-exponential-driver-of-combining-information/">The         Exponential Driver of Combining Information</a>.&#8221; I have been enamoured         of the idea ever since, and have begun to weave the idea into my         writings.</p>
<p>More recently, in late 2008, James Hendler and Jennifer Golbeck devoted         an entire paper to Metcalfe&#8217;s law and the semantic Web <a href="#ldl_5">[5]</a>. In it, they         note:</p>
<p style="margin-left: 40px;">&#8220;This linking between ontologies, and between instances in documents         that refer to terms in another ontology, is where much of the latent         value of the Semantic Web lies. The vocabularies, and particularly         linked vocabularies using URIs, of the Semantic Web create a graph         space with the ability to link any term to any other. As this link         space grows with the use of RDF and OWL, Metcalfe&#8217;s law will once again         be exploited – the more terms to link to, and the more links         created, the more value in creating more terms and linking them in.&#8221;</p>
<h3>A Refresher on Metcalfe&#8217;s Law</h3>
<p><a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe’s         law</a> states that the value of a telecommunications network is         proportional to the square of the number of users of the system         (<span style="font-style: italic;">n</span>²) (note: it is <span style="font-weight: bold; font-style: italic;">not</span> exponential, as         some of the points above imply). <a href="http://en.wikipedia.org/wiki/Robert_Metcalfe">Robert Metcalfe</a> formulated it about 1980 in relation to Ethernet and fax machines; the         &#8220;law&#8221; was then named for Metcalfe and popularized by <a href="http://en.wikipedia.org/wiki/George_Gilder">George Gilder</a> in 1993.</p>
<p>These attempts to estimate the value of physical networks were in         keeping with earlier efforts to estimate the value of a broadcast         network. That value is almost universally agreed to be proportional to         the number of users, as accepted as <a href="http://en.wikipedia.org/wiki/Sarnoff%27s_law">Sarnoff&#8217;s law</a> (see         further below).</p>
<p>The actual algorithm proposed by Metcalfe calculates the number of         unique connections in a network with <span style="font-style: italic;">n</span> nodes to be <em>n</em>(<em>n</em> −         1)/2, which is proportional to <em>n</em><sup>2</sup>. This makes         Metcalfe&#8217;s law a quadratic growth equation.</p>
<p>As nodes get added, then, we see the following increase in connections:</p>
<div style="margin: 5px 0pt;"><a href="../wp-content/themes/ai3/images/2009Posts/091011_telephone.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 180px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/091011_telephone.png" alt="Metcalfe Law Network Effect" hspace="5" /></a></p>
<h5 style="color: #820000;">&#8216;Network Effect&#8217; for Physical Networks</h5>
</div>
<p>This diagram, modified from <a href="http://en.wikipedia.org/wiki/File:Network_effect.png">Wikipedia</a> to         be a horizontal image, shows how two telephones can make only one         connection, five can make 10 connections, and twelve can make 66         connections, etc.</p>
<p>By definition, a physical network is a connected network. Thus, every         time a new node is added to the network, connections are added, too.         This general formula has also been embraced as a way to discuss social         connections on the Internet <a href="#ldl_6">[6]</a>.</p>
<h3>Analogies to Linked Data</h3>
<p>Like physical networks, the interconnectedness of the semantic Web or         semantic enterprise is a graph.</p>
<p>The idea behind <a href="http://structureddynamics.com/linked_data.html">linked data</a> is to         make connections between data. Unlike physical telecommunication         networks, however, the nodes in the form of datasets and data are         (largely) already there. What is missing are the connections. The         build-out and growth that produces the <a href="http://en.wikipedia.org/wiki/Network_effect">network effects</a> in a         linked data context do not result from adding more nodes, but from the         linking or connecting of <span style="font-weight: bold; font-style: italic; text-decoration: underline;">existing</span> nodes.</p>
<p>The fact that adding a node to a physical network carries with it an         associated connection has tended to conjoin these two complementary         requirements of node <span style="font-weight: bold; font-style: italic;">and</span> connection. But, to         grok the real dynamics and to gain network effects, we need to realize:         Both nodes and connections are necessary.</p>
<p>One circumstance of the enterprise is that data nodes are everywhere.         The fact that the overwhelming majority are unconnected is why we have         adopted the popular colloquialism of data &#8220;silos&#8221;. There are also         massive amounts of unconnected data on the Web in the form of dynamic         databases only accessible via search form, and isolated data tables and         listings virtually everywhere.</p>
<p>Thus, the essence of the <span style="font-style: italic;">semantic         enterprise</span> and the <span style="font-style: italic;">semantic         Web</span> is no more complicated than connecting — <span class="double_u">meaningfully</span> — data nodes that already exist.</p>
<p>As the following diagram shows, unconnected data nodes or silos look         like random particles caught in the chaos of <a href="http://en.wikipedia.org/wiki/Brownian_motion">Brownian motion</a>:</p>
<div style="margin: 5px 0pt;"><a href="../wp-content/themes/ai3/images/2009Posts/091011_network.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 196px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/091011_network.png" alt="Linked Data Law Network Effect" hspace="5" /></a></p>
<h5 style="color: #820000;">&#8216;Network Effect&#8217; for Coherent Linked Data</h5>
</div>
<p>As initial connections get made, bits of structure begin to emerge.         But, as connections are proliferated — <span style="font-weight: bold; font-style: italic;">exactly</span> equivalant to         the network effects of connected networks — coherence and value         emerge.</p>
<p>Look at the last part in the series diagram above. We not only see that         the same nodes are now all connected, with the inferences and         relationships that result from those connections, but we can also see         entirely new structures emerge by virtue of those connections. All of         this structure and meaning was totally absent prior to making the         linked data connections.</p>
<h3>Quantifying the Network Effect</h3>
<p>So, what is the benefit of this linked data? It depends on the product         of the <span style="font-style: italic;">value</span> of the         connections and the <span style="font-style: italic;">multiplier</span> of the network effect:</p>
<div style="margin: 10px 0pt; text-align: center;">linked data benefit <span style="font-weight: bold; font-family: Arial Black;">=</span> connections         <span style="font-style: italic;">value</span> <span style="font-weight: bold;">X</span> network effect <span style="font-style: italic;">multiplier</span></div>
<p>Just as it is hard to have a conversation via phone with yourself, or         to collaborate with yourself, the ability to gain perspective and         context from data comes from connections. But like some phone calls or         some collaborations, the <span style="font-style: italic;">value</span> depends on the participants. In the case of linked data, that depends         on the quality of the data and its <span style="font-style: italic; font-weight: bold;">coherence</span> <a href="#ldl_7">[7]</a>. The         value &#8220;constant&#8221; for connected linked data depends in some manner on         these factors, as well as the purposes and circumstances to which that         linked data might be applied.</p>
<p>Even in physical networks or social collaboration contexts, the &#8220;value&#8221;         of the network has been hard to quantify. And, while academics and         researchers will appropriately and naturally call for more research on         these questions, we do not need to be so timid. Whatever the         <span style="font-style: italic;">alpha</span> constant is for         quantifying the value of a linked data network, our intuition should be         clear that making connections, finding relationships, making         inferences, and making discoveries can not occur when data is in         isolation.</p>
<p>Because I am an advocate, I believe this <span style="font-style: italic;">alpha</span> constant of value to be quite large.         I believe this constant is also higher for circumstances of business         intelligence, knowledge management and discovery.</p>
<p>The second part of the benefit equation is the <span style="font-style: italic;">multiplier</span> for network effects. We&#8217;ve         mentioned before the linear growth advantage due to broadcast networks         (Sarnoff law) and the standard quadratic growth assumption of physical         and social networks (Metcalfe law). Naturally, there have been other         estimates and advocacies.</p>
<p>David Reed <a href="#ldl_8">[8]</a>, for example, also adds group effects and has asserted         an exponential multiplier to the network effect (like Henry Story&#8217;s         initial intuition noted above). As he states,</p>
<p style="margin-left: 40px;">&#8220;[E]ven Metcalfe&#8217;s Law understates the value created by a group-forming         network [GFN] as it grows. Let&#8217;s say you have a GFN with <span style="font-style: italic;">n</span> members. If you add up all the potential         two-person groups, three-person groups, and so on that those members         could form, the number of possible groups equals <span>2<sup><em>n</em></sup></span>. So the value of a GFN increases         exponentially, in proportion to <span>2<sup><em>n</em></sup></span>. I call that Reed&#8217;s Law. And its         implications are profound.&#8221;</p>
<p>Yet not all agree with the assertion of an exponential multiplier, let         alone the quadratic one of Metcalfe. Odlyzko and Tilly <a href="#ldl_9">[9]</a> note that         Metcalfe&#8217;s law would hold if the value that an individual gets         personally from a network is directly proportional to the number of         people in that network. But, then they argue that does not hold because         of local preferences or different qualities of interaction. In a linked         data context, such arguments have merit, though you may also want to         see Metcalfe&#8217;s own counter-arguments <a href="#ldl_6">[6]</a>.</p>
<p>Hinchliffe&#8217;s earlier commentary <a href="#ldl_4">[4]</a> provided a nice graphic that shows         the implications of these various multiplers on the network effect, as         a function of nodes in a network:</p>
<div style="margin: 5px 0pt; text-align: center;"><img class="center_ok" style="border: 0px solid; width: 528px; height: 329px;" src="../wp-content/themes/ai3/images/2009Posts/091011_network_effects.jpg" alt="Potency of the Network Effect from Dion Hinchliffe" hspace="5" width="528" height="329" /></p>
<h5 style="color: #820000;">Various Estimates for the &#8216;Network Effect&#8217;</h5>
</div>
<p>I believe we can dismiss the lower linear bound of this question and         likely the higher exponential one as well (that is, Reed&#8217;s law, because         quality and relevance questions make some linked data connections less         valuable than others). Per the above, that would suggest that the         <span style="font-style: italic;">multiplier</span> of the linked data         network is perhaps closer to the Metcalfe estimate or similar.</p>
<p>In any event, it is also essential to point out that connecting data         indiscriminantly for linked data&#8217;s sake will likely deliver few, if         any, benefits. Connections must still be coherent and logical for the         value benefits to be realized.</p>
<h3>The Role and Contribution of Linked Data</h3>
<p>I <a href="../825/fresh-perspectives-on-the-semantic-enterprise/"> elsewhere</a> discuss the role of linked data in the enterprise and         will continue to do so. But, there are some implications in the above         that warrant some further observations.</p>
<p>It should be clear that the graph and network basis of linked data, not         to mention some of the uncertainties as to quantifying benefits,         suggests the practice should be considered apart from mission-critical         or transactional uses in the enterprise. That may change with time and         experience.</p>
<p>There are also open questions about data quality in terms of inputs to         linked data and possible erroneous semantics and ontologies to guide         the linked connections. Operational uses should be kept off the table         for now. Like physical networks, not all links perform well and not all         have usefulness. Similarly to how poor connections may be encountered         in physical networks, they should be either taken off-ledger or         relegated to a back-up basis. Linked data should be understood and         treated no differently than networks of variable quality.</p>
<p>Such realism is important — for both internal and external linked         data advocates — to allow linked data to be applied in the right         venues at acceptable risk and with likely demonstrable benefits.         <a href="../553/confronting-misconceptions-with-adaptive-ontologies/"> Elsewhere</a> I have advocated an approach that builds on existing         assets; here I advocate a clear and smart understanding of where linked         data can best deliver network effects in the near term.</p>
<p>And, so, in the nearest term, enterprise applications that best fit         linked data promises and uncertainties include:</p>
<ul>
<li>Establishing frameworks for data federation</li>
<li>Business intelligence</li>
<li>Discovery</li>
<li>Knowledge management and knowledge resources</li>
<li>Reasoning and inference</li>
<li>Development of internal common language</li>
<li>Learning and adopting data-driven apps <a href="#ldl_10">[10]</a>, and</li>
<li>Staging and analysis for data cleaning.</li>
</ul>
<h3>A New Deputy Has Come to Town</h3>
<p>As in the Wild West, the new deputy marshal and his tin badge did not         guarantee prosperity. But a good marshal would deliver law and order.         And those are the preconditions for the town folk to take charge of         building their own prosperity.</p>
<p>Linked data is a practice for starting to bring order and connections         to your existing data. Once some order has been imposed, the framework         then becomes a basis for defining meanings and then gaining value from         those connections.</p>
<p>Once order has been gained, it is up to the good citizens of Data Gulch         to then deliver the prosperity. Broad participation and the network         effect are one way to promote that aim. But success and prosperity         still depends on intelligence and good policies and practice.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_1" name="ldl_1"></a> [1] I first put forward this linked         data aspect in <a style="font-style: italic;" href="../?p=447">What is Linked Data?</a>, dated June         23, 2008. I then formalized it in <a style="font-style: italic;" title="Permanent Link to Structure the World" rel="bookmark" href="../533/structure-the-world/">Structure the         World</a>, dated August 3, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_2" name="ldl_2"></a> [2] Paul Tearnen, 2006. &#8220;Integration in         the Network Economy,&#8221; <span style="font-style: italic;">Information         Management Special Reports</span>, October 2006. See <a href="http://www.information-management.com/specialreports/20061010/1064941-1.html"> http://www.information-management.com/specialreports/20061010/1064941-1.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_3" name="ldl_3"></a> [3] Salim K. Semy, Mark Linderman and         Mary K. Pulvermacher, 2003. &#8220;Information Management Meets the Semantic         Web,&#8221; <span style="font-style: italic;">DOD Report</span> by MITRE         Corporation, November 2003, 10 pp. See <a href="http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460265&amp;Location=U2&amp;doc=GetTRDoc.pdf"> http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460265&amp;Location=U2&amp;doc=GetTRDoc.pdf.</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_4" name="ldl_4"></a> [4] On July 15, 2006, Dion Hinchcliffe         wrote, <a style="font-style: italic;" href="http://web2.socialcomputingjournal.com/web_20s_real_secret_sauce_network_effects.htm"> Web 2.0&#8217;s Real Secret Sauce: Network Effects</a>. He produced a couple         of useful graphics and expanded upon some earlier comments to the         <span style="font-style: italic;">Wall Street Journal</span>. Shortly         thereafter, on July 29, Story wrote his own post, <a href="http://blogs.sun.com/bblfish/entry/rdf_and_metcalf_s_law">RDF and         Metcalfe&#8217;s law</a>, as noted. I commented on July 30.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_5" name="ldl_5"></a> [5] James Hendler and Jennifer Golbeck,         2008. &#8220;Metcalfe&#8217;s Law, Web 2.0, and the Semantic Web,&#8221; in <span style="font-style: italic;">Journal of Web Semantics</span> 6(1):14-20, 2008.         See <a href="http://www.cs.umd.edu/%7Egolbeck/downloads/Web20-SW-JWS-webVersion.pdf"> http://www.cs.umd.edu/~golbeck/downloads/Web20-SW-JWS-webVersion.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_6" name="ldl_6"></a> [6] Robert Metcalfe, 2006. <span style="font-style: italic;">Metcalfe’s Law Recurses Down the Long Tail         of Social Networking</span>, see <a href="http://vcmike.wordpress.com/2006/08/18/metcalfe-social-networks/">http://vcmike.wordpress.com/2006/08/18/metcalfe-social-networks/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_7" name="ldl_7"></a> [7] See my <a title="Permanent Link to When is Content &lt;em&gt;&lt;u&gt;Coherent&lt;/u&gt;&lt;/em&gt;?" rel="bookmark" href="../450/when-is-content-coherent/"> When is Content <em>Coherent</em>?</a> posting of July 25, 2008.         &#8216;Coherence&#8217; is a frequent theme of my blog posts; see my <a href="../chronological-listing/">chronological         listing</a> for additional candidates.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_8" name="ldl_8"></a> [8] From David P. Reed, 2001. &#8220;The Law         of the Pack,&#8221; Harvard Business Review, February 2001, pp 23-4. For more         on Reed&#8217;s position, see Wikipedia&#8217;s entry on <a href="http://en.wikipedia.org/wiki/Reed%27s_law">Reed&#8217;s law</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_9" name="ldl_9"></a> [9] Andrew Odlyzko and Benjamin Tilly,         2005. <span style="font-style: italic;">A Refutation of Metcalfe&#8217;s Law         and a Better Estimate for the Value of Networks and Network         Interconnections</span>, personal publication; see <a href="http://www.dtc.umn.edu/%7Eodlyzko/doc/metcalfe.pdf">http://www.dtc.umn.edu/~odlyzko/doc/metcalfe.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ldl_10" name="ldl_10"></a> [10] <span style="font-weight: bold; font-style: italic;">Data-driven         applications</span> are the term we have adopted for modular, generic         tools that operate and present results to users based on the underlying         data structures that feed them. See further the discussion of         Structured Dynamics&#8217;s <a href="http://structureddynamics.com/products.html">products</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/837/the-law-of-linked-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Moving Beyond Linked Data</title>
		<link>http://www.mkbergman.com/802/moving-beyond-linked-data/</link>
		<comments>http://www.mkbergman.com/802/moving-beyond-linked-data/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 01:09:01 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[structured data]]></category>
		<category><![CDATA[UMBEL]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=802</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Moving Beyond Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/802/moving-beyond-linked-data/&amp;rft.language=English"></span>

A Technique is Neither a &#8216;Meme&#8217; nor a Philosophy
I have been a participant in an interesting series of         discussions recently: Whither goes &#8216;linked data&#8217;?
As I described to someone, I was clearly not a father to the idea of         &#8216;linked [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Moving Beyond Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/802/moving-beyond-linked-data/&amp;rft.language=English"></span>
<p><a href="http://en.wikipedia.org/wiki/The_Unbearable_Lightness_of_Being"><img style="border: 0px solid; width: 180px; height: 143px; float: left; margin-right: 10px;" title="The Unbearable Lightness of Being, by Milan Kundera" src="../wp-content/themes/ai3/images/2009Posts/090807_lightness_being.jpg" alt="The Unbearable Lightness of Being, by Milan Kundera" hspace="5" vspace="5" align="left" /></a></p>
<h2>A Technique is Neither a &#8216;Meme&#8217; nor a Philosophy</h2>
<p>I have been a participant in an interesting series of         discussions recently: Whither goes &#8216;linked data&#8217;?</p>
<p>As I described to someone, I was clearly not a father to the idea of         &#8216;<a href="http://structureddynamics.com/linked_data.html">linked data</a>&#8216;, but I was handing out cigars pretty close on to the         birth. Chris Bizer and Richard Cyganiak were the innovators that first         proposed the original project to the <a href="http://en.wikipedia.org/wiki/W3C">W3C</a> <a href="#beyond1">[1]</a>. (Thanks guys!)</p>
<p>From         that point forward, now a bit over 2-1/2 years ago, we have seen a         massive increase in attention and visibility to the idea of &#8216;linked         data.&#8217; I take a small amount of reflected pride that I helped promote         the idea in some way with my early writings.</p>
<p>That visibility was well-deserved. After all, here was the concept:</p>
<ul>
<li>Expose your data in an accessible way on the Web</li>
<li>Use Web identifiers (URIs) as the means to uniquely identify that         data</li>
<li>Use <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF         &#8220;triples&#8221;</a> to describe the relationships between the data.</li>
</ul>
<p>Much other puffery got layered on to those ideas, but I think those         premises are the key basis.</p>
<h3>Early Cracks in the Vision</h3>
<p>My first personal concern with where linked data was going dealt with         an absence of context or conceptual structure for how these new         datasets related to one another. I will not repeat those arguments         here; simply see many of my <a href="../chronological-listing/">blog postings</a> from the past two years or so.         Exposing millions of &#8220;things&#8221; was wonderful, but what did all of that         mean? How does one &#8220;thing&#8221; relate to another &#8220;thing&#8221;? Are some &#8220;things&#8221;         the same as or similar to other things? If nothing else, these concerns         stimulated the genesis of the <a href="http://umbel.org/">UMBEL</a> subject concept ontology, an         outcome for which I need to thank the community.</p>
<p>It would be petty of me to question the basis that attracted millions         of data items to get exposed from linked data techniques. In fact, the         richness we have today in exposed Web data objects comes solely from         this linked data initiative. But, nonetheless, my guess is that even         the most ardent linked data advocate would have a hard time finding         a logical way to present the current linked data reality in context. We         see the <a href="http://en.wikipedia.org/wiki/File:Lod-datasets_2009-07-14_colored.png">big bubble diagram</a> of available datasets, but, frankly, the         position and relationships amongst datasets appears somewhat arbitrary. We have lots of bubbles, but little meaning.</p>
<h3>The Constant is Transition</h3>
<p>The semantic Web was in serious crisis prior to linked data. It had bad         perception, little delivery, and unmet hype. Linked data at least began to         show how exposed and properly characterized data can begin to become         interconnected.</p>
<p>For a couple of years now I have tried in various posts to         present linked data in a broader framework of structured and semantic         Web data.  I first tried to capture this         continuum in a diagram from <a href="../?p=391">July 2007</a>:</p>
<div class="center_ok" style="margin: 18px 0px;">
<table class="center_ok" style="text-align: left;" border="0">
<tbody>
<tr>
<td colspan="4"><img class="center_ok" style="width: 599px; height: 205px; margin-left: 15px; vertical-align: top;" src="../wp-content/themes/ai3/images/2007Posts/070720_web_transition.jpg" alt="Transition in Web Structure" /></td>
</tr>
<tr>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;">Document Web</td>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;" colspan="2">Structured Web</td>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;">Semantic Web</td>
</tr>
<tr>
<td style="width: 150px;"></td>
<td style="width: 150px;"></td>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;">Linked Data</td>
<td style="width: 150px;"></td>
</tr>
<tr>
<td>
<ul>
<li> <small>Document-centric</small></li>
<li> <small>Document resources</small></li>
<li> <small>Unstructured data and semi-structured data</small></li>
<li> <small>HTML<br />
</small></li>
<li> <small>URL-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 1993</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Structured data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>XML, JSON, RDF, etc<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2003</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2006<br />
</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S, OWL<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> ???<br />
</small></li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>The point is not whether those earlier characterizations were         &#8220;correct&#8221;, but that linked data be properly seen as merely a natural         step in an ongoing transition. IMO, we are progressing nicely along         this spectrum.</p>
<h3>A Caricature of Itself</h3>
<p>Linked data is a set of techniques &#8212; nothing more &#8212; and         certainly not a philosophy or <a href="http://en.wikipedia.org/wiki/Meme">meme</a> (whatever the hell that         means). We have way too many breathy pontifications about &#8220;linked         data <span style="font-style: italic;">this</span>&#8221; and &#8220;linked data         <span style="font-style: italic;">that</span>&#8221; that frankly are         undercutting the usefulness of the practice and making it a caricature         of itself.</p>
<p>In the enterprise world we see similar attempts at marketing that         need to give everything a three-letter acronym. In this case, we have a         bunch of academics and researchers trying to act like market and         business gurus. All it is doing is confusing the marketplace and         hurting the practice.</p>
<p>The elevation of techniques or best practices into roles clearly beyond         their pay grade produces completely the opposite effect:  the idea         comes under question and ridicule. The logic and rationale for why we         should be following these best practices gets lost in the hyperbole. I         spend most of my time hitting the delete button on the mailing lists. I         fear what others new to these practices &#8212; that is, my company&#8217;s         customers and prospects &#8212; perceive when they look into this topic.</p>
<p>Linked data is useful and needed. But come on, folks, these are not         tribal or religious matters.</p>
<h3>Declaring Victory, and Moving On</h3>
<p>Through the initial project vehicle of <a href="http://en.wikipedia.org/wiki/Dbpedia">DBpedia</a> and then how it         nucleated other &#8220;linked&#8221; data sets, the linked data practice certainly became viral.         Today, we have many millions of data items available in linked data         form. This is unalloyed goodness.</p>
<p>I will continue to use the phrase &#8216;linked data&#8217; to refer to those         useful techniques noted in the opening. Actually, I think it is best to         think of linked data as a set of best practices, but by no means an end         unto itself.</p>
<p>Beyond linked data we need context, we need our data to be embedded and         related to interoperable ontologies, we need much better user         interfaces and attainability, and we need quality in our assertions and         use. These are issues that extend well beyond the techniques of linked         data and form the next set of challenges in gaining broader acceptance         for the semantic Web and the semantic enterprise.</p>
<p>Like most everything else in this world, there are real problems and         real needs out there. Thankfully, we have heard mostly the end of the         silliness about Web 3.0.  Perhaps we can now also broaden our         horizons beyond the useful techniques of linked data to tackle the next         set of semantic challenges.</p>
<p>So, let me be the first to congratulate the community on a victory well         achieved! As for myself and my company, we will now focus our         attentions on the next tier of challenges. It is time to deprecate the rhetoric. Huzzah!</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="beyond1"></a>[1] For the record, in addition to Bizer and Cyganiak, the first         publication on the project, <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkingOpenData.pdf">&#8220;Interlinking         Open Data on the Web&#8221;</a>, in the <span style="font-style: italic;">Proceedings Poster Track, ESWC2007</span>,         Innsbruck, Austria, June 2007, by Bizer, Tom Heath, Danny Ayers and         Yves Raimond, also noted the early contributions of Sören Auer, Orri         Erling, Frederick Giasson, Kingsley Idehen, Georgi Kobilarov, Stefano         Mazzocchi, Josh Tauberer, Bernard Vatant and Marc Wick.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/802/moving-beyond-linked-data/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Structure the World</title>
		<link>http://www.mkbergman.com/533/structure-the-world/</link>
		<comments>http://www.mkbergman.com/533/structure-the-world/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 03:23:03 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Bibliographic Knowledge Network]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[BKN]]></category>
		<category><![CDATA[data federation]]></category>
		<category><![CDATA[data-driven applications]]></category>
		<category><![CDATA[Description Logics]]></category>
		<category><![CDATA[Ontology]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[structWSF]]></category>
		<category><![CDATA[web oriented architecture]]></category>
		<category><![CDATA[web service]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=533</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structure the World&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-03&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/533/structure-the-world/&amp;rft.language=English"></span>

Multiple Techniques and Data Structs can Make the Vision a Reality
Linked  data and subject and domain ontologies provide the organizing  framework. Techniques for converting, tagging and authoring structure  provide the content. In combination, we now have in hand the necessary  pieces to enable all of us to &#8220;structure the World.&#8221;
In this [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Structure the World&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Linked Data&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.subject=Web-oriented Architecture&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-03&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/533/structure-the-world/&amp;rft.language=English"></span>
<p><a href="http://upload.wikimedia.org/wikipedia/commons/9/97/The_Earth_seen_from_Apollo_17.jpg"><img style="border: 0px solid; width: 250px; height: 250px; float: left; margin-right: 10px;" title="The &quot;Blue Marble&quot;: The Earth seen from Apollo 17.jpg from Wikipedia.org" src="../wp-content/themes/ai3/images/2009Posts/The_Earth_seen_from_Apollo_17_240px.jpg" alt="The &quot;Blue Marble&quot;: The Earth seen from Apollo 17.jpg from Wikipedia.org" hspace="5" vspace="5" align="left" /></a></p>
<h2>Multiple Techniques and Data Structs can Make the Vision a Reality</h2>
<p><a href="http://structureddynamics.com/linked_data.html">Linked  data</a> and subject and domain ontologies provide the organizing  framework. Techniques for converting, tagging and authoring structure  provide the content. In combination, we now have in hand the necessary  pieces to enable all of us to &#8220;structure the World.&#8221;</p>
<p>In this vision, the nature of the links or connections between data  need not be complicated to gain tremendous benefit. Similar to <a href="http://en.wikipedia.org/wiki/Metcalf%27s_law">Metcalfe&#8217;s Law</a> for  the increasing value of networks as more nodes (users) get added,  adding connections to existing data is a powerful force multiplier.</p>
<p>We can call this the <span style="font-weight: bold; font-style: italic;">Linked Data Law</span>: the  value of a linked data network is proportional to the square of the  number of links between data objects <a href="#structure1">[1]</a>. Further, if we are purposeful  to include connective links where appropriate as we add more data (that  is, nodes), this multiplier effect becomes even stronger.</p>
<p><a href="http://structureddynamics.com/">Structured Dynamics</a> is  dedicated to help make this prospect real. Meaningful progress in doing  so requires only a relatively few moving parts or techniques. Yet,  because we sometimes bounce from talking or focusing on one part versus  the others, we can lose context or sight of the overarching vision. The  purpose of this article is to re-set and calibrate that overall vision.</p>
<h3><span style="font-style: italic;">The Vision</span>: Data Federation of  Any Desired Content</h3>
<p>The vision is to get all data and information to interoperate,  regardless of legacy or form. Much of this data is already structured,  either from databases or simpler forms of data structs. Some of this  information is unstructured or semi-structured, requiring extraction  and tagging techniques. And new information is being constantly  generated, which warrants better means to author and stage for  interchange and interoperability.</p>
<p>No matter the provenance, all information has context and scope. As a  chunk from here, and a piece from there, gets added to our linked data  mix, having means to characterize what that data is about and how it  can be meaningfully inter-related becomes crucial. Sometimes these  contexts are informed by existing schema; sometimes they are not. But,  in any case, it is the role of ontologies to both position these  datasets into an &#8220;aboutness&#8221; framework and to help guide how the data  can be described and related to other data. This part of the vision  invokes semantics and coherent structures (schema or ontologies) for  positioning and mapping datasets to one another.</p>
<p>As both the means for representing any extant data format and as the  means for describing these conceptual relationships or schema, RDF  provides the canonical data model. A single target representation and  common data model also means we can develop and design a smaller  universe of tools to operate and provide functionality over all of this  data. Indeed, because our RDF data model and its ontologies are so  richly structured, we can design our tools with generic functionality,  the specific operation and expression of which is based on the inherent  structure within the data and its relationships. This vision of  <span style="font-weight: bold; font-style: italic;">data-driven  apps</span> leads to extreme leverage, incredible flexibility, and  inherent &#8220;meshup&#8221; capabilities for tools.</p>
<p>Further, because we use Web identifiers (<a href="http://en.wikipedia.org/wiki/URI">URIs</a>) for our data and concepts  and because we expose and access this linked data via the Web, we use  the proven and scalable architectures of the Web itself for how we  design our systems. This <a href="../category/web-oriented-architecture-woa/"><span style="font-style: italic;"> Web-oriented architecture</span></a> (WOA) provides a completely  decentralized and loosely coupled deployment model that can work  ranging from public and open to private and proprietary, applicable to  data and participants alike.</p>
<p>From the outset, it is essential to recognize that thousands of  contributors are enabling this vision. So, while Structured Dynamics  naturally uses its own tools and techniques to flesh out the various  parts of this vision below, realize there are many players and many  tools from which to choose <a href="#structure2">[2]</a>. For that is another aspect of this  vision that is quite powerful: providing choice and avoiding lock-in.</p>
<h3><span style="font-style: italic;">RDF</span>: The Canonical Data Model</h3>
<p>The core construct &#8212; or fulcrum, if you will &#8212; of the vision is the  <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> (Resource Description Framework) data model <a href="#structure3">[3]</a>. I have written  elsewhere on the <a style="font-style: italic;" title="Permanent Link to Advantages and Myths of RDF" rel="bookmark" href="../483/advantages-and-myths-of-rdf/">Advantages and Myths of  RDF</a>, which explains more precisely the advantages of that model.  RDF provides a common data model to which any external format or schema  can be converted and represented. It also provides a logic model and  basis for building vocabularies that can inform and drive generic  tools.</p>
<p>In the context of data interoperability, a critical premise is that a  single, canonical data model is highly desirable. Why?</p>
<p>Simply because of 2N v N<sup>2</sup>. That is, a single reference  (&#8221;canon&#8221;) structure means that fewer tool variants and  converters need be developed to talk to the myriad of data formats in  the wild. With a canonical data model, talking to external sources and  formats (N) only requires converters to and from the canonical form  (2N). Without a canonical model, the <a href="http://en.wikipedia.org/wiki/Combinatorial_explosion">combinatorial  explosion</a> of required format converters becomes N<sup>2</sup> <a href="#structure4">[4]</a>.</p>
<p>Note, in general, such a canonical data model merely represents the  agreed-upon internal representation. It need not affect data transfer  formats. Indeed, in many cases, data systems employ quite different  internal data models from what is used for data exchange. Many, in  fact, have two or three favored flavors of data exchange such as XML,  JSON or the like. More on this is discussed in a section below.</p>
<p>As this diagram shows, then, we have a single internal representation  that is the target for all data and format converters and upon which  all tools operate. These tools are themselves expressed as Web services  so that they may be distributed and conform to general WOA guidelines.  In addition, there may be multiple external &#8220;hubs&#8221; that represent  alternative data models or formats or schema conversions (say, for  relational databases). So long as we have converters between these  alternate &#8220;hubs&#8221; and our canonical RDF form we can allow a thousand  flowers to bloom:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png"> <img class="center_ok" style="border: 0pt none;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png" border="0" alt="structWSF Data Model Relationships" width="600" height="364" /></a></div>
<p>Other canonical forms could be advocated. Yet RDF has the logical basis  to represent any data form and any schema or conceptual structure. It  is based on a robust set of open standards and languages and tools. It  may be serialized in many formats. It can be grounded in description  logics and, in appropriate forms, reasoned over and expressed in  vocabularies and schema suitable for the most complex of conceptual  structures and semantics. RDF is the data model explicitly designed for  the Web, the clear global information basis for the foreseeable future.</p>
<p>For more than 30 years &#8212; since the widespread adoption of electronic  information systems by enterprises &#8212; the Holy Grail has been complete,  integrated access to all data. With the canonical RDF data model, that  promise is now at hand.</p>
<h3><span style="font-style: italic;">Conversion</span>: So Many Structs,  So Little Time</h3>
<p>Diversity is a truism of human communications as captured by the  biblical <a href="http://en.wikipedia.org/wiki/Tower_of_Babel">Tower of  Babel</a> and the many thousands of current <a href="http://en.wikipedia.org/wiki/Language">human languages</a>. Diversity  in data formats, serializations, notations and languages is a similar  truism. We term the expression of each of these varied forms of data a  <span style="font-style: italic;">struct</span>.</p>
<p>While an internal canonical representation of data makes sense for the  reasons noted above, pragmatic information systems must recognize the  inherent diversity and chaos of data in the real world. The history of  trying to find single representations or to impose standards via fiat  have singularly failed. That will continue to be so due in part to  inertia and legacy, sunk investments, existing infrastructure, and the  purposes for the data.</p>
<p>In pursuing a vision of data interoperability, then, conversion is an  essential glue for cementing understanding with what exists and will  exist.</p>
<h4>RDB-to-RDF</h4>
<p>Arguably the largest source of structured data are enterprise and  government information systems, with the predominant data  representation being the relational data model managed by relational  schema. Much of this data is also cleaner and mission critical compared  to other sources in the wild. Fortunately, there are many logical and  conceptual affinities between the relational model and the one for RDF  <a href="#structure5">[5]</a>.</p>
<p>Just as there are many RDFizers for simpler forms of data structs (see  next), there are also nice ways to convert relational schema to RDF  automatically. Given these overall conceptual and logical affinities  the W3C is also in the process of graduating an incubator group to an  official work group, <a href="http://www.w3.org/2005/Incubator/rdb2rdf/WG-draft-charter/">RDB2RDF</a>,  focused on methods and specifications for mapping relational schema to  RDF.</p>
<p>Amongst all techniques covered in this paper, Structured Dynamics views  the layering of RDF ontologies over existing relational data stores as  one of the most promising and important. Given the advantages of RDF  for interoperability, this area should be a major emphasis of current  and new vendors and service providers.</p>
<h4>RDFizers</h4>
<p>Much data, however, resides in much smaller datasets and often for less  formal purposes than what is found in enterprise databases. Some of  this data is geared for exchange or standardization; much is emerging  from Web and Internet applications and uses; and much might be local or  personal in nature, such as simple lists or spreadsheets.</p>
<p>RDF is well suited to convert (&#8221;RDFize&#8221;) these simpler and more naïve data formats. In my original census about 18 months ago, as reported in  <a style="font-style: italic;" title="Permanent Link to 'Structs': naÃ¯ve Data Formats and the ABox" rel="bookmark" href="../?p=471"> &#8216;Structs&#8217;: Naïve Data Formats and the ABox</a>, I listed  about 90 converters. My most recent <a href="http://openstructs.org/resources/rdfizers">update</a> now lists nearly  double that number, with about 150 converters <a href="#structure6">[6]</a>:</p>
<div style="margin: 15px; font-size: 10px;">
<table class="center_ok" style="text-align: left; margin-left: 0px; width: 90%;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top; width: 25%;">
<p style="font-weight: bold;">URN handlers (in addition to IRI and URI):</p>
<ul>
<li>DOI</li>
<li>LSID</li>
<li>OAI</li>
</ul>
<p style="font-weight: bold;">RDF</p>
<ul>
<li>Serialization formats:
<ul>
<li>N3</li>
<li>RDF/XML</li>
<li>Turtle</li>
</ul>
</li>
<li>Languages and ontologies:
<ul>
<li>AB Meta</li>
<li>Annotea</li>
<li>APML</li>
<li>AtomOWL</li>
<li>Bibliographic Ontology</li>
<li>Creative Commons</li>
<li>EXIF</li>
<li>FOAF</li>
<li> <a title="Java RDFizer" href="http://simile.mit.edu/wiki/Java_RDFizer">Java</a></li>
<li> <a title="Javadoc RDFizer" href="http://simile.mit.edu/wiki/Javadoc_RDFizer">Javadoc</a></li>
<li> <a title="MARC/MODS RDFizer" href="http://simile.mit.edu/wiki/MARC/MODS_RDFizer">MARC/MODS</a></li>
<li>Meta Standards</li>
<li>Music Ontology</li>
<li> <a title="http://cypher.monrai.com" rel="nofollow" href="http://cypher.monrai.com/">Natural Language</a></li>
<li>Open Archives Initiative Protocol for Metadata  Harvesting (OAI-PMH)</li>
<li>Open Geospatial</li>
<li>OWL</li>
<li>SIOC</li>
<li>SIOCT</li>
<li>SKOS</li>
<li>UMBEL</li>
<li>vCard</li>
<li> <a title="http://rhizomik.net/content/" href="http://rhizomik.net/content/">XML</a></li>
<li>Others</li>
</ul>
</li>
<li>(X)HTML pages</li>
<li>Embedded Microformats and GRDDL <a href="#structure7">[7]</a>:
<ul>
<li>DC</li>
<li>eRDF</li>
<li>geoURL</li>
<li>Google Base</li>
<li>hAudio</li>
<li>hCalendar</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Embedded Microformats and GRDDL (con&#8217;t):
<ul>
<li>hCard</li>
<li>hListing</li>
<li>hResume</li>
<li>hReview</li>
<li>HR-XML</li>
<li>Ning</li>
<li>RDFa</li>
<li>relLicense</li>
<li>SVG</li>
<li>XBRL</li>
<li>XFN</li>
<li>xFolk</li>
<li>XR-XML</li>
<li>XSLT</li>
</ul>
</li>
<li>Syndication Formats:
<ul>
<li>Atom</li>
<li>OPML</li>
<li>OCS</li>
<li>RSS 1.1</li>
<li>RSS 2.0</li>
<li>XBEL (for bookmarks)</li>
</ul>
</li>
<li>REST-style Web service APIs:
<ul>
<li>Amazon</li>
<li>Apple</li>
<li>Calais</li>
<li>CrunchBase</li>
<li>Del.icio.us</li>
<li>Digg</li>
<li>Discogs</li>
<li>Disqus</li>
<li>eBay</li>
<li>Facebook</li>
<li>Flickr</li>
<li>Freebase (MQL)</li>
<li>FriendFeed</li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/fromGarmin.py" href="http://www.w3.org/2000/10/swap/pim/fromGarmin.py"> Garmin</a></li>
<li>Get Satisfaction</li>
<li>Google</li>
<li>Hoover&#8217;s</li>
<li>HTTP (raw)</li>
<li>ISBN DB</li>
<li>Last.fm</li>
<li>Library Thing</li>
<li>Magnolia</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>REST-style Web service APIs (con&#8217;t):
<ul>
<li>Meetup</li>
<li>MusicBrainz</li>
<li>New York Times</li>
<li>New York Times Campaign Finance (NYTCF)</li>
<li>New York Times tags</li>
<li>Open Library</li>
<li>Open Social</li>
<li>Open Street</li>
<li>OpenLink (facets)</li>
<li>O&#8217;Reilly</li>
<li>Picasa</li>
<li>Radio Pop (BBC)</li>
<li>Rhapsody</li>
<li>Salesforce</li>
<li>Slideshare</li>
<li>Slidy</li>
<li>Technorati</li>
<li>They Work For You</li>
<li>Twine</li>
<li>Twitter</li>
<li> <a title="Weather RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=Weather_RDFizer&amp;action=edit"> Weather</a></li>
<li>Wikipedia</li>
<li>World Bank</li>
<li>Yahoo! Finance</li>
<li>Yahoo! Maps</li>
<li>Yahoo! Weather</li>
<li>YouTube</li>
<li>Zemanta</li>
</ul>
</li>
<li>Files (multitude of file formats and MIME types,  including):
<ul>
<li>audio (general)</li>
<li>BibJSON</li>
<li> <a title="BibTeX RDFizer" href="http://simile.mit.edu/wiki/BibTeX_RDFizer">BibTEX</a> and <a title="http://www.l3s.de/~siberski/bibtex2rdf/" href="http://www.l3s.de/%7Esiberski/bibtex2rdf/">others</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/rdfizers/" href="http://www.inf.unideb.hu/%7Ejeszy/rdfizers/">BitTorrent</a></li>
<li> <a title="http://www.mindswap.org/%7Erreck/excel2rdf.shtml" href="http://www.mindswap.org/%7Erreck/excel2rdf.shtml"> CSV</a></li>
<li> <a title="http://www.w3.org/2000/10/swap/util/fink2n3.py" href="http://www.w3.org/2000/10/swap/util/fink2n3.py">Fink</a></li>
<li> <a title="Flat RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=Flat_RDFizer&amp;action=edit"> Flat files</a></li>
<li> <a title="JPEG RDFizer" href="http://simile.mit.edu/wiki/JPEG_RDFizer">JPEG</a></li>
<li>JSON</li>
<li>images</li>
<li>MS Office</li>
<li>OpenOffice</li>
<li>Open Document Format</li>
<li> <a title="http://dev.w3.org/cvsweb/2001/palmagent" href="http://dev.w3.org/cvsweb/2001/palmagent">Palm</a></li>
<li> <a title="http://rdf123.umbc.edu/" href="http://rdf123.umbc.edu/">RDF123</a></li>
<li>video</li>
<li> <a title="http://www.mindswap.org/%7Erreck/excel2rdf.shtml" href="http://www.mindswap.org/%7Erreck/excel2rdf.shtml"> XLS</a></li>
<li>etc.</li>
</ul>
</li>
</ul>
</td>
<td style="width: 25%; vertical-align: top;">
<ul>
<li>Metadata extractors:
<ul>
<li> <a title="CRW RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=CRW_RDFizer"> CRW</a></li>
<li> <a title="DEB RDFizer" href="http://simile.mit.edu/mediawiki/index.php?title=DEB_RDFizer"> DEB</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/xmp/" href="http://www.inf.unideb.hu/%7Ejeszy/xmp/">EXIF</a></li>
<li> <a title="OCW RDFizer" href="http://simile.mit.edu/wiki/OCW_RDFizer">OCW</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/rdfizers/" href="http://www.inf.unideb.hu/%7Ejeszy/rdfizers/">RPM</a></li>
<li> <a title="http://www.inf.unideb.hu/~jeszy/xmp/" href="http://www.inf.unideb.hu/%7Ejeszy/xmp/">XMP</a></li>
</ul>
</li>
<li>Email formats:
<ul>
<li> <a title="Email RDFizer" href="http://simile.mit.edu/wiki/Email_RDFizer">EMail</a></li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/lookout.py" href="http://www.w3.org/2000/10/swap/pim/lookout.py">Outlook</a></li>
<li> <a title="http://www.w3.org/2000/04/maillog2rdf/aboutMsg.py" href="http://www.w3.org/2000/04/maillog2rdf/aboutMsg.py">RFC822</a></li>
</ul>
</li>
<li>Version control and related systems:
<ul>
<li>Bugzilla</li>
<li> <a title="Jira RDFizer" href="http://simile.mit.edu/wiki/Jira_RDFizer">Jira</a></li>
<li> <a title="Maven POM RDFizer" href="http://simile.mit.edu/wiki/Maven_POM_RDFizer">POM</a></li>
<li> <a title="Subversion RDFizer" href="http://simile.mit.edu/wiki/Subversion_RDFizer">Subversion</a></li>
</ul>
</li>
<li>Other Web service frameworks:
<ul>
<li>BPEL</li>
<li>WSDL</li>
<li>XBRL</li>
<li>XBEL</li>
</ul>
</li>
<li>Data exchange formats:
<ul>
<li>iCalendar</li>
<li> <a title="http://www.w3.org/2000/10/swap/pim/ldif2n3.py" href="http://www.w3.org/2000/10/swap/pim/ldif2n3.py">LDIF</a></li>
<li>vCalendar</li>
<li>vCard</li>
</ul>
</li>
<li>Relational databases and related:
<ul>
<li> <a title="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm" href="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm"> D2RQ</a></li>
<li> <a title="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm" href="http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm"> D2RMAP</a></li>
<li> <a href="http://www.openlinksw.com/virtuoso/Whitepapers/html/rdf_views/virtuoso_rdf_views_example.html"> RDF Views</a></li>
</ul>
</li>
<li>Virtuoso VADs</li>
<li>OpenLink license files</li>
<li>Third party metadata extraction frameworks:
<ul>
<li> <a href="http://aperture.sourceforge.net/">Aperture</a></li>
<li>Spotlight</li>
</ul>
</li>
<li>Miscellaneous and other related converters:
<ul>
<li> <a title="http://rhizomik.net/redefer/" href="http://rhizomik.net/redefer/">MPEG-7/CS</a> → OWL</li>
<li>Random</li>
<li> <a title="http://rhizomik.net/redefer/" href="http://rhizomik.net/redefer/">XSD</a> → OWL</li>
</ul>
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>Many of the sources above come from new and emerging Web-based APIs,  which are also huge sources of content growth. Also note that  alternative formats to RDF (<span style="font-style: italic;">e.g.</span>, microformats) or leading  serializations and encodings (<span style="font-style: italic;">e.g,</span> XML, JSON) also have many converter  options.</p>
<p>For many typical naïve data structs, the data is represented as  attribute-value pairs, which easily lend themselves to conversion to  RDF as instance records <a href="#structure8">[8]</a>. See further the <span style="font-weight: bold; font-style: italic;">Authoring</span> section  below.</p>
<h3><span style="font-style: italic;">Tagging</span>: The 80% Solution</h3>
<p>An apocryphal statistic is that 80% to 85% of all information resides  in unstructured text <a href="#structure9">[9]</a>. Besides lacking recent validation, this  claim from a decade ago often attributed to Merrill Lynch also precedes  much of the Internet and the emergence of metadata and tagging.  Nevertheless, what is true is that written text  content is ubiquitous and the majority of it remains untagged or  uncharacterized by any form of metadata.</p>
<p>While such information can be searched, it only matches when exact  terms match. This means that related information, particularly in the  form of conceptual relationships and inferencing, can not be applied to  untagged text content.</p>
<p>While information extraction &#8212; the basis by which tags for entities  and concepts can be obtained &#8212; has been an active topic of  research for two decades, it is only recently that we have begun  to see Web-scale extractors appear. Examples include Yahoo&#8217;s <a href="http://developer.yahoo.com/search/content/V1/termExtraction.html">term  extractor</a>, Thomson Reuter&#8217;s <a href="http://www.opencalais.com/">Calais</a>, or Google&#8217;s <a href="http://www.google.com/squared">Squared</a>, to name but a few.</p>
<p><a href="http://www.umbel.org/"><img style="border: 0px solid; margin-right: 10px; width: 104px; height: 24px; float: left;" src="../wp-content/themes/ai3/images/scones_100.png" alt="scones - Subject Concepts or Named Entities" align="left" /></a> In  Structured Dynamics&#8217; case we have been working on the <span style="font-weight: bold; font-style: italic;">scones</span> (Subject  Concepts Or Named EntitieS) extractor for quite a while. <span style="font-weight: bold; font-style: italic;">scones</span> uses rather  simple natural language processing (NLP) methods as informed by concept  ontologies and named entity (instance record) dictionaries to help  guide the extraction process. The co-occurrence of matches between  concepts and entities also aids the disambiguation task (though  additional modules may be invoked with alternative disambiguation  methods). In prototype forms, the resulting tags can be managed  separately or fed to user interfaces or re-injected back into the  original content as RDFa.</p>
<p>There are literally dozens of such extractors and services presently  available on the Web and many that are available as open source or  commercial products. Some are mostly algorithm based using  machine-learning techiques or statistics, while others are gazeteer- or  dictionary-driven.</p>
<p>These systems will lead to rapid tagging of existing content and the  removal of some of the early &#8220;chicken-and-egg&#8221; challenges associated  with the semantic Web. These systems will also be combined with the  many existing bookmarking and tagging services.</p>
<p>So, just as we will see federation and interoperability of conventional  data, we will also see linkages to relevant and supporting text content  accompanying it. This combination, in turn, will also lead to richer  browsing and discovery experiences.</p>
<h3><span style="font-style: italic;">Authoring</span>: The Neglected Third  Leg of the Stool</h3>
<p>In addition to <span style="font-style: italic;">conversion</span> and  <span style="font-style: italic;">tagging</span>, <span style="font-style: italic;">authoring</span> is the third leg of the stool to  expose structured data. It is a neglected leg to the structured content  stool, and one important to make it easier for datasets to be easily  exposed as RDF linked data.</p>
<p>One of the reasons for the proliferation of data structs has been the  interest in finding notations and conventions for easier reading and  authoring of small datasets. There have literally been hundreds of  <a href="http://en.wikipedia.org/wiki/Lightweight_markup_language">various</a> formats proposed over decades for conveying lightweight data  structures. Most have been proprietary or limited to specific domains  or users. Some, such as <a href="http://en.wikipedia.org/wiki/Fielded_text">fielded text</a>, <a href="http://www.zope.org/Documentation/Articles/STX">structured text</a>,  <a href="http://en.wikipedia.org/wiki/Simple_Declarative_Language">simple  declarative language</a> (SDL), or more recently <a href="http://en.wikipedia.org/wiki/YAML">YAML</a> or its simpler cousin  <a href="http://en.wikipedia.org/wiki/JSON">JSON</a>, have become more  widely adopted and supported by formal specifications, tools or APIs.  JSON, especially, is a preferred form for Web 2.0 applications.</p>
<p>What has been less clear or intuitive in these forms, again mostly  based on an attribute-value pair orientation, is how to adequately  relate them to a more capable data model, such as RDF. In JSON or YAML,  for example, the notations include the concepts of objects, arrays and  datatypes (among other conventions). Other structures lack even these  constructs.</p>
<p>To take the case of JSON as might be related to RDF, there are a couple  of efforts to define representation conventions from <a href="http://n2.talis.com/wiki/RDF_JSON_Specification">Talis</a> and  <a href="http://www.gbv.de/wikis/cls/RDF_in_JSON">GBV</a> for  serializing RDF. There was a floated idea for an RDF version of JSON  called <a href="http://lists.w3.org/Archives/Public/semantic-web/2007Jul/0323.html">RDFON</a> that has now evolved into the <a href="http://www.urf.name/">TURF</a> approach. <a href="http://jdil.org/">JDIL</a> (JSON data integration  layer) instructs how to add namespaces to JSON to enable encoding RDF.  <a href="http://jibbering.com/rdf-parser/">Jim Ley</a>, <a href="http://www.kanzaki.com/works/2006/misc/0308turtle.html">Kanzaki  Masahide</a> and <a href="http://librdf.org/rasqal/roqet.html">Dave  Beckett</a> (likely among others) have written simple and  straightforward RDF and <a href="http://www.dajobe.org/2004/01/turtle/">Turtle</a> parsers and  converters for JSON. And, still further examples are Beckett&#8217;s  <a href="http://triplr.org/">Triplr</a> and <a href="http://www.uni-leipzig.de/">Sören Auer</a>&#8217;s <a href="http://aksw.org/">ASKW</a> <a href="http://triplify.org/Overview">Triplify</a> lightweight conversion  services involving many different formats.</p>
<p>Because JSON is easily readable, can drive many Web 2.0 applications  and widgets, and lends itself to fast conversions and tools in various  scripting languages, Structured Dynamics was commissioned by the  <a href="http://bibkn.org/">Bibliographic Knowledge Network</a> (BKN) to  formalize a BibJSON specification suitable for <a href="http://en.wikipedia.org/wiki/BibTeX">BibTeX</a>-like data records and  citations with an extensible schema to be converted to RDF.</p>
<p>The emerging result of that BibJSON effort will be published shortly.  The specification includes conventions and vocabularies for creating  bibliographic and citation instance records, for specifying structural  schema, and for creating linkage files between the attributes in the  record files with existing and new schema. BibJSON is itself grounded  in <span style="font-style: italic;">IRON</span>, which is an instance  record and object notation developed by Structured Dyamics that can be  serialized as JSON (called <span style="font-style: italic;">irJSON</span>), XML (called <span style="font-style: italic;">irXML</span>) or comma-separated values (or CSV  comma-delimited files, called <span style="font-style: italic;">commON</span>).</p>
<p>The purpose of these notations and serializations is to provide easier  authoring environments and scripting support to RDF-ready datasets.  This approach has the advantage of shielding most users from the  nuances or lengthiness of RDF (though the N3 serialization also works  well).</p>
<p>The design and development of commON was especially geared to using  spreadsheets as authoring environments that would enable easy creation  of instance record tables or simple hierarchical or outline structures.  For example, here is a sample portion of <a href="../new-version-sweet-tools-sem-web/"><span style="color: #990000; font-weight: bold;"> Sweet Tools</span></a> specified in a  spreadsheet using the commON notation:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090801_swt_spreadsheet.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 379px;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/090801_swt_spreadsheet.png" alt="Sweet Tools Sample Spreadsheet" width="2406" height="1521" /></a></div>
<p>Once the philosophy and role of naïve data structs is embraced &#8212; with an appreciation of the many converters now available or easily  written for translating to RDF &#8212; it becomes easier to determine data forms  appropriate to the tools and natural work flow of the users and tasks at  hand. Under this mindset, the role of RDF is to be the eventual  conversion target, but not necessarily what is used for intermediate work  tasks, and in particular not for authoring.</p>
<h3>Getting it All Organized</h3>
<p>OK, so now all of this stuff is converted, tagged or authored. How does  it relate? What is the relation of one dataset to another dataset? Is  there a context or framework for laying out these conceptual roadmaps?</p>
<p><a href="http://www.umbel.org/"><img style="border: 0px solid; margin-right: 10px; width: 100px; height: 50px; float: left;" src="../wp-content/themes/ai3/images/umbel_logo_100.png" alt="UMBEL (Upper Mapping and Binding Exchange Layer)" align="left" /></a> Two years ago as we looked at the state of RDF and the  incipient semantic Web as promised via linked data, we saw that such a  specific framework was lacking. (Though there were existing  higher-level ontologies, either their complexity or design were not  well-suited to these purposes.) It was at that time that <a href="http://fgiasson.com/blog">Frédérick Giasson</a> and I began to  formulate the <a href="http://umbel.org/intro.html">UMBEL</a> (<em>Upper Mapping and Binding Exchange Layer</em>) ontology, which  eventually led to our more formal business partnership and Structured  Dynamics.</p>
<p>What we sought to achieve with UMBEL was a coherent reference framework  of about 20,000 subject concepts, connected and acting like  constellations in the information sky for orienting content and new  datasets. At the same time, we wanted to create a general vocabulary  and approach that would lend themselves to creation of domain-specific  ontologies, which would also naturally tie in and inter-relate to the  more general UMBEL structure.</p>
<p>This objective was achieved, though UMBEL deserves an upgrade to OWL 2  and some other pending improvements. A number of domain ontologies have  been created and now relate to UMBEL. So, rather than being an end to  itself, UMBEL was one of the necessary infrastructure pieces to help  make the vision herein a reality.</p>
<p>Similar approaches may be taken by others with new domain ontologies  based on the UMBEL vocabulary with tie-in as appropriate to existing  subject concepts, or by mapping to the existing UMBEL structure.</p>
<p>Of course, UMBEL is not an absolute condition to the vision herein.  However, insofar as users desire to see multiple datasets  inter-related, including the use of existing public Web data, something  akin to UMBEL and related domain ontologies will be necessary to  provide a similar roadmap.</p>
<h3>Making it All Available</h3>
<p>The parts and techniques discussed so far pertain almost exclusively to  data and content. But, these structures so created now can inform  data-driven applications which also now must be deployed. To do so,  Structured Dynamics is committed to what is known as a <a href="../category/web-oriented-architecture-woa/"><em>Web-oriented  architecture</em></a> (WOA):</p>
<div style="margin-left: 40px; margin-bottom: 15px;"><a href="http://en.wikipedia.org/wiki/Web_Oriented_Architecture">WOA</a> =  <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">SOA</a> +  <a href="http://en.wikipedia.org/wiki/World_Wide_Web">WWW</a> +  <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST</a></div>
<p>WOA is a subset of the <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">service-oriented  architectural</a> style, wherein discrete functions are packaged into  modular and shareable elements (&#8221;services&#8221;) that are made  available in a distributed and loosely coupled manner. WOA generally  uses the representational state transfer (REST) architectural style  defined by <a href="http://en.wikipedia.org/wiki/Roy_Fielding">Roy  Fielding</a> in his 2000 <a href="http://www.ics.uci.edu/%7Efielding/pubs/dissertation/top.htm">doctoral  thesis</a>; Fielding is also one of the principal authors of the  <a href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">Hypertext  Transfer Protocol</a> (HTTP) specification.</p>
<p>REST provides principles for how resources are defined and used and  addressed with simple interfaces without additional messaging layers  such as <a href="http://en.wikipedia.org/wiki/SOAP">SOAP</a> or  <a href="http://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a>.  The principles are couched within the framework of a generalized  architectural style and are not limited to the Web, though they are a  foundation to it.</p>
<p><a href="http://openstructs.org/"><img style="border: 0px solid; margin-right: 5px; width: 150px; height: 36px; float: left;" src="../wp-content/themes/ai3/images/structWSF_150.png" alt="structWSF Web Services Framework" align="left" /></a>Within this design we need a suite of generic  functions and tools that are driven by the structure of the available  datasets. The deployment vehicle and design we have implemented to  provide this WOA design is <a href="http://openstructs.org/">structWSF</a> <a href="#structure10">[10]</a>.</p>
<p>structWSF is a platform-independent Web services framework for  accessing and exposing structured RDF data. Its central organizing  perspective is that of the dataset. These datasets contain instance  records, with the structural relationships amongst the data and their  attributes and concepts defined via ontologies (schema with  accompanying vocabularies). The master or controlling Web service in  the framework is the module for granting access and use rights to  datasets based on permissions.</p>
<p>The structWSF middleware framework is generally RESTful in design and  is based on HTTP and Web protocols and open standards. The initial  structWSF framework comes packaged with a baseline set of about a dozen  Web services in CRUD, browse, search and export and import. More  services can readily be added to the system.</p>
<p>All Web services are exposed via APIs and SPARQL endpoints. Each  request to an individual Web service returns an HTTP status and a  document of resultsets (if the query result is not null). Each results  document can be serialized in many ways, and may be expressed as either  RDF or pure XML.</p>
<p>In initial release, structWSF has direct interfaces to the <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/">Virtuoso</a> RDF triple store (via ODBC, and later HTTP) and the <a href="http://lucene.apache.org/solr/">Solr</a> faceted, full-text search  engine (via HTTP). However, structWSF has been designed to be fully  platform-independent. The framework is open source (Apache 2 license)  and designed for extensibility.</p>
<h3>No End in Sight</h3>
<p>Like all visions, there are many aspects and many improvements  possible. This vision is definitely a work-in-progress with no end in  sight.</p>
<p>But, meaningful movement embracing the full scope of this vision is  doable today. Structured Dynamics welcomes <a href="mailto:mailto:mike%20at%20structureddynamics%20dot%20com">inquiries</a> regarding any of these aspects, improvements to them, or application to  your specific needs and problems.</p>
<p>We also welcome you to come back and visit our blogs (Fred&#8217;s is found  <a href="http://fgiasson.com/blog">here</a>). We try to speak on  various aspects of this vision in all of our posts and are pleased to  share our experience and insights as gained.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure1" name="structure1"></a> [1] Metcalfe&#8217;s law states  that the value of a telecommunications network is proportional to the  square of the number of users of the system (n<sup>2</sup>), where the  linkages between users (nodes) exist by definition. For information  bases, the data objects are the nodes. Linked data works to add the  connections between the nodes. We can thus modify the original sense to  become the Linked Data Law: the value of a linked data network is  proportional to the square of the number of links between the data  objects. I first presented this formulation about a year ago in  <a style="font-style: italic;" href="../?p=447">What is Linked Data?</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure2" name="structure2"></a> [2] This piece introduces for  the first time a couple of efforts-in-progress by Structured Dynamics.  For a general tools listing, see my own <a href="../new-version-sweet-tools-sem-web/"><span style="color: #990000; font-weight: bold;"> Sweet Tools</span></a> listing of about 800 semantic Web and -related  tools.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure3" name="structure3"></a> [3] As quoted in <a style="font-style: italic;" href="http://www.math.nyu.edu/%7Ecrorres/Archimedes/Lever/LeverQuotes.html">The  Lever</a>, &#8220;&#8221;Archimedes, however, in writing to King Hiero, whose  friend and near relation he was, had stated that given the force, any  given weight might be moved, and even boasted, we are told, relying on  the strength of demonstration, that if there were another earth, by  going into it he could remove this.&#8221; from <a href="http://www.utexas.edu/depts/classics/chaironeia/">Plutarch</a> (<span style="font-style: italic;">c.</span> 45-120 <span>AD</span>) in the <a href="http://classics.mit.edu/Plutarch/marcellu.html"><em>Life of  Marcellus</em></a>, as translated by <a href="http://www.newadvent.org/cathen/05167b.htm">John Dryden</a> (1631-1700).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure4" name="structure4"></a> [4] The canonical data model  is especially prevalent in <a title="Enterprise application integration" href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise application  integration</a>. An interesting animated visualization of the canonical  data model may be found at: <a href="http://soa-eda.blogspot.com/2008/03/canonical-data-model-visualized.html"> http://soa-eda.blogspot.com/2008/03/canonical-data-model-visualized.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure5" name="structure5"></a> [5] An excellent piece on  those relations was written by Andrew Newman a bit over a year ago; see  Andrew Newman, 2007. &#8220;A Relational View of the Semantic  Web,&#8221; published on <a href="http://xml.com/">XML.com</a>, March  14, 2007; <a href="http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html"> http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html</a>. RDF can be modeled relationally as a single table with three columns  corresponding to the <span style="font-style: italic;">subject</span>-<span style="font-style: italic;">predicate</span>-<span style="font-style: italic;">object</span> triple. Conversely, a relational  table can be modeled in RDF with the <em>subject</em> <a href="http://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a> derived from the primary key or a blank node; the <em>predicate</em> from the column identifier; and the <em>object</em> from the cell  value. Because of these affinities, it is also possible to store RDF  data models in existing relational databases. (In fact, most RDF  &#8220;triple stores&#8221; are RDBM systems with a tweak, sometimes as  &#8220;quad stores&#8221; where the fourth tuple is the  <em>graph</em>.) Moreover, these affinities also mean that RDF stored  in this manner can also take advantage of the historical learnings  around RDBMS and SQL query optimizations.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure6" name="structure6"></a> [6] The largest source for  RDFizers, which it calls Sponger cartridges, is from <a href="http://www.openlinksw.com/">OpenLink Software</a> in relation to its  <a href="http://www.openlinksw.com/virtuoso/">Virtuoso</a> universal  server. Most of its converters use XSLT stylesheets to translate to  RDF, but the system has other conversion capabilities as well. Two  additional OpenLink resources are a <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/ClickableVirtSpongerCloud"> clickable diagram</a> of converters and relationships with links and an  online storehouse of <a href="http://github.com/openlink/Virtuoso-RDFIzer-Mapper-Scripts/tree/master"> available XSLT converters</a>. In addition, two other sources &#8212; the  W3C&#8217;s Semantic Web wiki with <a href="http://esw.w3.org/topic/ConverterToRdf?highlight=%28converter%29">converter  listings</a> and MIT&#8217;s Simile program and <a href="http://simile.mit.edu/wiki/RDFizers">listing of RDFizers</a> &#8212; have a  rich set of listings. Note that many of the categories shown on the table also have multiple  sources of converters, so that the absolute number of converters has  also grown faster than the unique formats supported.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure7" name="structure7"></a> [7] <a href="http://www.w3.org/TR/grddl/">GRDDL</a> (Gleaning Resource Descriptions  from Dialects of Languages) is a W3C markup format for getting RDF data  out of XML and XHTML documents using explicitly associated  transformation algorithms, typically represented in XSLT GRDDL  accomodates a wide variety of dialects (see <a href="http://esw.w3.org/topic/CustomRdfDialects">one listing</a>) and can be  combined with arbitrary transformation mechanisms (though currently  mostly based on XSLTs).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure8" name="structure8"></a> [8] We characterize <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> instance records</a> as representing the &#8220;ABox&#8221;, in accordance with our  <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/">working  definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description  logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split  <span style="font-style: italic;">concepts</span> and their  relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and  roles, expressed as fact assertions. The concept split is known as  the TBox (for <em>terminological</em> knowledge, the basis for  <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or  taxonomy of the domain at hand. The TBox is the structural and  intensional component of conceptual relationships. The second split  of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of  instances (and individuals), the roles between instances, and other  assertions about instances regarding their class membership with the  TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure9" name="structure9"></a> [9] One of the more recent  discussions of this percentage is by Seth Grimes, <a style="font-style: italic;" href="http://clarabridge.com/default.aspx?tabid=137&amp;ModuleID=635&amp;ArticleID=551"> Unstructured Data and the 80 Percent Rule</a>, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="structure10" name="structure10"></a> [10] structWSF is also  designed to integrate with third-party apps and content management  systems (CMSs) to provide the user interfaces to these functions. The  first implementation of this design is <a href="http://constructscs.com/">conStruct SCS</a>, a structured content  system that extends the basic Drupal content management framework.  conStruct enables structured data and its controlling vocabularies  (ontologies) to drive applications and user interfaces.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/533/structure-the-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
