<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; Semantic Web</title>
	<atom:link href="http://www.mkbergman.com/category/semantic-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Wed, 01 Sep 2010 05:10:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>I Have Yet to Metadata I Didn&#8217;t Like</title>
		<link>http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/</link>
		<comments>http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 05:58:53 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Innovation]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[interoperability]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=902</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=I Have Yet to Metadata I Didn&#8217;t Like&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Innovation&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-08-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/&amp;rft.language=English"></span>

Contrasted with Some Observations on Linked Data
At the SemTech conference earlier this summer there was a kind of vuvuzela-like buzzing in         the background. And, like the World Cup games         on television, in play at the same time as the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=I Have Yet to Metadata I Didn&#8217;t Like&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Innovation&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-08-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/&amp;rft.language=English"></span>
<p><a href="http://en.wikipedia.org/wiki/Interfaith"><img style="border: 0px solid; width: 276px; height: 277px; float: left; margin-right: 10px;" title="Ecumenical" src="../wp-content/themes/ai3/images/2010Posts/100816_ecumenical2.jpg" alt="Ecumenical" hspace="5" vspace="5" align="left" /></a></p>
<h2>Contrasted with Some Observations on Linked Data</h2>
<p>At the <a href="http://semtech2010.semanticuniverse.com/">SemTech</a> conference earlier this summer there was a kind of <a href="http://en.wikipedia.org/wiki/Vuvuzela">vuvuzela</a>-like buzzing in         the background. And, like the <a href="http://en.wikipedia.org/wiki/2010_FIFA_World_Cup">World Cup</a> games         on television, in play at the same time as the conference, I found the         droning to be just as irritating.</p>
<p>That droning was a combination of the sense of righteousness in the         superiority of <a href="http://linkeddata.org/">linked data</a> matched         with a reprise of the &#8220;<a href="http://en.wikipedia.org/wiki/Chicken_and_egg">chicken-and-egg</a>&#8221;         argument that plagued the early years of semantic Web advocacy <a href="#meta1">[1]</a>. I         think both of these premises are misplaced. So, while I have been a fan         and explicator of <a href="http://structureddynamics.com/linked_data.html">linked data</a> for         some time, I do not worship at its altar <a href="#meta2">[2]</a>. And, for those that do,         this post argues for a greater sense of <a href="http://en.wikipedia.org/wiki/Interfaith">ecumenism</a>.</p>
<p>My main points are not against linked data. I think it a very useful         technique and good (if not best) practice in many circumstances. But my         main points get at whether linked data is an objective in itself. By         making it such, I argue our eye misses the ball. And, in so doing, we         miss making the connection with <span style="font-weight: bold; font-style: italic;">meaningful, interoperable         information</span>, which should be our true objective. We need to look         elsewhere than linked data for root causes.</p>
<h3>Observation #1: What Problem Are We Solving?</h3>
<p>When I began this blog more than five years ago &#8212; and when I left my         career in population genetics nearly three decades before that &#8212; I did         so because of my belief in the value of information to <a href="../the-blogasbrd/">confer adaptive         advantage</a>. My perspective then, and my perspective now, was that         adaptive information through genetics and evolution was being uniquely         supplanted within the human species. This change has occurred because humanity is able to record and         carry forward all information gained in its experiences.</p>
<p>Adaptive         innovations from writing to bulk printing to now electronic form         uniquely position the human species to both record its past and         anticipate its future. We no longer are limited to evolution and genetic information encoded in surviving offspring to         determine what information is retained and moves forward. Now,         <span style="font-weight: bold; font-style: italic; text-decoration: underline;">all</span> information can be retained. Further, we can combine and connect         that information in ways that break to         smithereens the biological limits of other species.</p>
<p>Yet, despite the electronic volumes and the potentials, chaos and         isolated content silos have characterized humanity&#8217;s first half century         of experience with digital information. I have spoken before about how we have been steadily <a href="../?p=229">climbing the data federation         pyramid,</a> with Internet technologies and the Web being prime factors         for doing so. Now, with a <a href="../483/advantages-and-myths-of-rdf/">compelling         data model in RDF</a> and standards for how we can relate any type of         information meaningfully, we also have the means for making sense of         it. And connecting it. And learning and adapting from it.</p>
<p>And, so, there is the answer to the rhetorical question: The problem we         are solving is to <span style="font-weight: bold; font-style: italic;">meaningfully connect         information</span>. For, without those meaningful connections and         recombinations, none of that information confers adaptive advantage.</p>
<h3>Observation #2: The Problem is Not A Lack of Consumable Data</h3>
<p>One of the &#8220;chicken-and-egg&#8221; premises in the linked data community is         there needs to be more linked data exposed before some threshold to         trigger the <a href="http://en.wikipedia.org/wiki/Network_effect">network effect</a> occurs. This attitude, I suspect, is one of the reasons why hosannas         are always forthcoming each time some outfit announces they have posted         another chunk of triples to the Web.</p>
<p><a href="http://fgiasson.com/blog/">Fred Giasson</a> and I earlier tackled that issue with <a style="font-style: italic;" href="../846/when-linked-data-rules-fail/">When Linked         Data Rules Fail</a> regarding some information published for <a href="http://data-gov.tw.rpi.edu/wiki">data.gov</a> and the <a href="http://data.nytimes.com/">New York Times</a>. Our observations on the lack of standards for linked data quality proved to be quite controversial. Rehashing that piece is         not my objective here.</p>
<p>What <span style="font-weight: bold; font-style: italic; text-decoration: underline;">is</span> my objective is to hammer home that we do not need linked data in order         to have data available to consume. Far from it. Though linked data         volumes have been growing, I actually suspect that its growth has been         slower than data availability <span style="font-style: italic;">in         toto</span>. On the Web alone we have searchable deep Web databases,         JSON, XML, microformats, RSS feeds, Google snippets, yada, yada, all in         a veritable deluge of formats, contents and contexts. We are having a         hard time inventing the next 1000-fold description beyond zettabyte and         yottabyte to even describe this deluge <a href="#meta3">[3]</a>.</p>
<p>There is absolutely no voice or observer anywhere that is saying, &#8220;We         need linked data in order to have data to consume.&#8221; Quite the opposite.         The reality is we are drowning in the stuff.</p>
<p>Furthermore, when one dissects what most of all of this data is about,         it is about ways to describe things. Or, put another way, most all data         is not schema nor descriptions of conceptual relationships, but making         records available, with attributes and their values used to describe         those records. Where is a business located? What political party does a         politician belong to? How tall are you? What is the population of Hungary?</p>
<p>These are simple constructs with simple <a href="http://en.wikipedia.org/wiki/Associative_array">key-value pair</a> ways to describe and convey them. This very simplicity is one reason         why naïve data structs or simple data models like JSON or XML have         proven so popular <a href="#meta4">[4]</a>. It is one of the reasons why the so-called         <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL databases</a> have         also been growing in popularity. What we have are lots of atomic facts,         located everywhere, and representable with very simple key-value         structures.</p>
<p>While having such information available in linked data form makes it         easier for agents to consume it, that extra publishing burden is by no         means necessary. There are plenty of ways to consume that data &#8212;         without loss of information &#8212; in non-linked data form. In fact, that         is how the overwhelming percentage of such data is expressed today. This non-linked data is also often easy to understand.</p>
<p>What <span style="font-weight: bold; font-style: italic; text-decoration: underline;">is</span> important is that the data be available electronically with a         description of what the records contain. But that hurdle is met in         many, many different ways and from many, many sources without any         reference whatsoever to linked data. I submit that any form of         desirable data available on the Web can be readily consumed without recourse to linked data principles.</p>
<h3>Observation #3: An Interoperable Data Model Does Not Require a Single         Transmittal Format</h3>
<p>The real advantage of RDF is the simplicity of its data model, which         can be extended and augmented to express vocabularies and relationships         of any nature. As I have stated before, that makes RDF like a <a href="../483/advantages-and-myths-of-rdf/"><span style="font-style: italic;"> universal solvent</span></a> for any extant data structure, form or         schema.</p>
<p>What I find perplexing, however, is how this strength somehow gets         translated into a parallel belief that such a flexible data model is         also the best means for <span style="font-weight: bold; font-style: italic; text-decoration: underline;">transmitting</span> data. As noted, most transmitted data can be         represented through simple key-value pairs. Sure, at some point one         needs to model the structural assumptions of the data model from the         supplying publisher, but that complexity need not burden the         actual transmitted form. So long as schema can be captured and modeled         at the receiving end, data record transmittal can be made quite a bit         simpler.</p>
<p>Under this mindset RDF provides         the internal (canonical) data model. Prior to that, format and other         converters can be used to consume the source data in its native form. A         generalized representation for how this can work is shown in this         diagram using <a href="http://structureddynamics.com/">Structured         Dynamics</a>&#8216; <a href="http://openstructs.org/structwsf">structWSF</a> Web services framework middleware as the <a href="http://www.mkbergman.com/496/structwsf-a-framework-for-data-mixing/">mediating layer</a>:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png"> <img class="center_ok" style="border: 0pt none;" title="Click to enlarge" src="../wp-content/themes/ai3/images/2009Posts/090628_data_model_relationships.png" border="0" alt="structWSF Data Model Relationships" width="600" height="364" /></a></div>
<p>Of course, if the source data is already in linked data form with         understood concepts, relationships and semantics, much of this         conversion overhead can be bypassed. If available, that is a good         thing.</p>
<p>But it is not a required or necessary thing. Insistence on publishing         data in certain forms suffers from the same narrowness as cultural or         religious zealotry. Why certain publishers or authors prefer different         data formats has a diversity of answers. Reasons can range from what is         tried and familiar to available toolsets or even what is trendy, as one         might argue linked data is in some circles today.There are literally         scores of off-the-shelf &#8220;<a href="http://openstructs.org/osf/resources/rdfizers">RDFizers</a>&#8221; for         converting native and simple data structs into RDF form. New converters         are readily written.</p>
<p>Adaptive systems, by definition, do not require wholesale changes to         existing practices and do not require effort where none is warranted. By         posing the challenge as a &#8220;chicken-and-egg&#8221; one where publishers         themselves must undertake a change in their existing practices to         conform, or else they fail the &#8220;linked data threshold&#8221;, advocates are         ensuring failure. There is plenty of useful structured data to consume         already.</p>
<p>Accessible structured data, properly         characterized (see below), should be our root interest; not whether that         data has been published as linked data <span style="font-style: italic;">per se</span>.</p>
<h3>Observation #4: A Technique Can Not Carry the Burden of Usefulness or         Interoperability</h3>
<p>Linked data is nothing more than some techniques for         publishing Web-accessible data using the RDF data model. Some have         tried to use the concept of linked data as a replacement for the idea         of the semantic Web, and some have recently tried to re-define linked         data as not requiring RDF <a href="#meta5">[5]</a>. Yet the real issue with all of these         attempts &#8212; correct or not, and a fact of linked data since first         formulated by Tim Berners-Lee &#8212; is that a technique alone can not         carry the burden of usefulness or interoperability.</p>
<p>Despite billions of triples now available, we in fact see little actual         use or consumption of linked data, except in the life science domain.         Indeed, a new workshop by the research community called COLD (Consuming         Linked Data) has been set up for the upcoming ISWC conference to look         into the very reasons why this lack of usage may be occurring <a href="#meta6">[6]</a>.</p>
<p>It will be interesting to monitor what comes out of that workshop, but         I have my own views as to what might be going on here. A number of         factors, applicable frankly to any data, must be layered on top of         linked data techniques in order for it to be useful:</p>
<ul>
<li>Context and coherence (see below)</li>
<li>Curation and quality control (where provenance is used as the         proxy), and</li>
<li>Up-to-date and timely.</li>
</ul>
<p>These requirements apply to any data ranging from Census CSV files to Google         search results. But because relationships can also be more readily         asserted with linked data, these requirements are even         greater for it.</p>
<p>It is not surprising that the life sciences have seen more uptake of linked         data. That community has keen experience with curation, and the quality         and linkages asserted there are much superior to other areas of linked         data <a href="#meta7">[7]</a>.</p>
<p>In other linked data areas, it is really in limited pockets such as         <a href="http://factforge.net/">FactForge</a> from <a href="http://www.ontotext.com/">Ontotext</a> or curated forms of <a href="http://en.wikipedia.org/">Wikipedia</a> by the likes of <a href="http://wiki.freebase.com/wiki/WEX">Freebase</a> that we see the most         use and uptake. There is no substitute for consistency and quality         control.</p>
<p>It is really in this area of &#8220;publish it and they will come&#8221; that we         see one of the threads of parochialism in the linked data community.         You can publish it and they still will <span style="font-weight: bold; font-style: italic;">not</span> come. And, like any         data, they will not come because the quality is poor or the linkages         are wrong.</p>
<p>As a technique for making data available, linked data is thus nothing         more than a foot soldier in the campaign to make information         meaningful. Elevating it above its pay grade sets the wrong target and         causes us to lose focus for what is really important.</p>
<h3>Observation #5: 50% of Linked Data is Missing (that is, the Linking         part)</h3>
<p>There is another strange phenomenon in the linked data movement: the         almost total disregard for the linking part. Sure data is getting         published as triples with dereferencable URIs, but where are the links?</p>
<p>At most, what we are seeing is <span style="font-family: monospace;">owl:sameAs</span> assertions and a few others         <a href="#meta8">[8]</a>. Not only does this miss the whole point of linked data, but one         can question whether equivalence assertions are correct in many         instances <a href="#meta9">[9]</a>.</p>
<p>For a couple of years now I have been arguing that the central gap in         linked data has been the absence of <a style="font-weight: bold; font-style: italic;" href="../440/the-semantics-of-context/">context</a> and <a style="font-weight: bold; font-style: italic;" href="../450/when-is-content-coherent/">coherence</a>.         By <span style="font-weight: bold; font-style: italic;">context</span> I mean the use of reference structures to help place and frame what         content is about. By <span style="font-weight: bold; font-style: italic;">coherence</span> I mean that         those contextual references make internal and logical sense, that they         represent a consistent world view. Both require a richer use of links         to concepts and subjects describing the semantics of the content.</p>
<p>It is precisely through these kinds of links that data from disparate         sources and with different frames of reference can be meaningfully         related to other data. This is the essence of the semantic Web and the         purported purpose of linked data. And it is exactly these areas in         which linked data is presently found most lacking.</p>
<p>Of course, these questions are not the sole challenge of linked data.         They are the essential challenge in any attempt to connect or         interoperate structured data within information systems. So, while         linked data is ostensibly designed from the get-go to fulfill these         aims, any data that can find meaning outside of its native silo must         also be placed into <span style="font-weight: bold; font-style: italic;">context</span> in a         <span style="font-weight: bold; font-style: italic;">coherent</span> manner. The unique disappointment for much linked data is its failure to provide these contexts despite its design.</p>
<h3>Observation #6: Pluralism is a Reality; Embrace It</h3>
<p>Yet, having said all of this, Structured Dynamics is still committed to linked data. We present our         information as such, and provide great tools for producing and         consuming it. We have made it one of the <a href="../859/seven-pillars-of-the-open-semantic-enterprise/"> seven foundations</a> to our <a href="http://structureddynamics.com/products.html">technology stack</a> and         <a href="http://mike2.openmethodology.org/wiki/Open_SEAS_Framework">methodology</a>.</p>
<p>But we live in a pluralistic data world. There are reasons and roles         for the multitude of popular structured data formats that presently         exist. This inherent diversity is a fact in any real-world data         context. Thus, we have not met a form of structured data that we didn&#8217;t         like, especially if it is accompanied with metadata that puts the data         into coherent context. It is a major reason why we developed the         <a href="http://openstructs.org/iron">irON</a> (<span style="font-style: italic;">instance record</span> and <span style="font-style: italic;">object notation</span>) non-RDF vocabulary to         provide a bridge from such forms to RDF. irON clearly shows that         entities can be usefully described and consumed in either RDF or         non-RDF serialized forms.</p>
<p>Attitudes that dismiss non-linked data forms or arrogantly insist that         publishers adhere to linked data practices are anything but         pluralistic. They are parochial and short-sighted and are contributing,         in part, to keeping the semantic Web from going mainstream.</p>
<p>Adoption requires simplicity. The simplest way to encourage the greater         interoperability of data is to leverage existing assets in their native         form, with encouragement for minor enhancements to add descriptive         metadata for what the content is about. Embracing such an ecumenical         attitude makes all publishers potentially valuable contributors to a         better information future. It will also nearly instantaneously widen         the tools base available for the common objective of interoperability.</p>
<h3>Parochialism and Root Cause Analysis</h3>
<p>Linked data is a good thing, but not an ultimate thing. By making         linked data an objective in itself we unduly raise publishing         thresholds; we set our sights below the real problem to be solved; and         we risk diluting the understanding of RDF from its natural         role as a flexible and adaptive data model. Paradoxically, too much         parochial insistence on linked data may undercut its adoption and the         realization of the overall semantic objective.</p>
<p><a href="http://en.wikipedia.org/wiki/Root_cause_analysis">Root cause         analysis</a> for what it takes to achieve <span style="font-weight: bold; font-style: italic;">meaningful, interoperable         information</span> suggests that describing source content in terms of         what it <span style="font-style: italic;">is about</span> is the pivotal factor.         Moreover, those contexts should be shared to aid interoperability.         Whichever organizations do an excellent job of providing context and         coherent linkages will be the go-to ones for data consumers. As we have         seen to date, merely publishing linked data triples does not meet this         test.</p>
<p>I have heard <a href="http://chatlogs.planetrdf.com/swig/2010-08-10.html#T15-16-04">some         state</a> that first you celebrate linked data and its growing         quantity, and then hope that the quality improves. This sentiment holds         if indeed the community moves on to the questions of quality and         relevance. The time for that transition is now. And, oh, by the         way, as long as we are broadening our horizons, let&#8217;s also celebrate         properly characterized structured data no matter what its form. <a href="http://en.wikipedia.org/wiki/Religious_pluralism">Pluralism</a> is part of the <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Tao">tao</a> to the meaning of information.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta1"></a> [1] See, for example, J.A. Hendler, 2008. &#8220;Web 3.0: Chicken Farms on the Semantic         Web,&#8221; <span style="font-style: italic;">Computer</span>, January         2008, pp. 106-108. See <a href="http://www.comp.leeds.ac.uk/webscience/talks/hendler_web_3.pdf">http://www.comp.leeds.ac.uk/webscience/talks/hendler_web_3.pdf</a>.         While I can buy Hendler&#8217;s arguments about commercial tool vendors holding off         major investments until the market is sizable, I think we can also see via listings like <a style="font-weight: bold;" href="../new-version-sweet-tools-sem-web/">Sweet Tools</a> that a lack of tools is not in itself limiting.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a href="file:///F:/5-WebSites/All%20In%20Progress/meta2"> </a><a name="meta2"></a>[2] An earlier treatment of this subject from a different perspective         is M.K. Bergman, 2010. &#8220;<a href="../880/the-bipolar-disorder-of-linked-data/">The         Bipolar Disorder of Linked Data</a>,&#8221; <span style="font-style: italic;">AI3:::Adaptive Information</span> blog, April 28,         2010.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta3"></a> [3] So far only prefixes for units up to 10^24 (&#8221;yotta&#8221;) have names;         for 10^27, a student campaign on Facebook is proposing &#8220;hellabyte&#8221;         (North California slang for &#8220;a whole lot of&#8221;) to get adopted by science         bodies. See <a href="http://scitech.blogs.cnn.com/2010/03/04/hella-proposal-facebook/">http://scitech.blogs.cnn.com/2010/03/04/hella-proposal-facebook/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta4"></a> [4] One of more popular posts on this blog has been, M.K. Bergman,         2009. &#8220;<a href="../471/structs-naive-data-formats-and-the-abox/">‘Structs’:         Naïve Data Formats and the ABox</a>,&#8221; <span style="font-style: italic;">AI3:::Adaptive Information</span> blog, January         22, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta5"></a> [5] See, for example, the recent history on the <a href="http://en.wikipedia.org/w/index.php?title=Linked_Data&amp;action=history"> linked data</a> entry on Wikipedia or the assertions by Kingsley Idehen         regarding entity attribute values (EAV) (see, for example, <a href="http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1611"> this blog post</a>.)</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta6"></a> [6] See further the <a href="http://consuminglinkeddata.org/COLD2010">1st International Workshop on         Consuming Linked Data</a> (COLD 2010), at the <a rel="foaf:homepage" href="http://iswc2010.semanticweb.org/">9th International Semantic Web         Conference</a> (ISWC 2010), November 8, 2010, Shanghai, China.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta7"></a> [7] For example, in the early years of <a href="http://www.ncbi.nlm.nih.gov/genbank/">GenBank</a>, some claimed that         annotations of gene sequences due to things like BLAST analyses may         have had as high as 30% to 70% error rates due to propagation of         initially mislabeled sequences. In part, the whole field of         bioinformatics was formed to deal with issues of data quality and         curation (in addition to analytics).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta8"></a> [8] See, for example: Harry Halpin, 2009. “A Query-Driven         Characterization of Linked Data,” paper presented at the <span style="font-style: italic;">Linked Data on         the Web (LDOW) 2009 Workshop</span>, April 20, 2009, Madrid, Spain, see         <a href="http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf">http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf</a>;         Prateek Jain, Pascal Hitzler, Peter Z. Yehy, Kunal Vermay and Amit P.         Shet, 2010. “Linked Data is Merely More Data,” in Dan Brickley, Vinay         K. Chaudhri, Harry Halpin, and Deborah McGuinness, <span style="font-style: italic;">Linked Data Meets Artificial Intelligence,         Technical Report SS-10-07</span>, AAAI Press, Menlo Park, California,         2010, pp. 82-86., see <a href="http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf"> http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf</a>;         among others.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="meta9"></a> [9] Harry Halpin and Patrick J. Hayes, 2010. &#8220;When owl:sameAs isn’t the         Same: An Analysis of Identity Links on the Semantic Web,&#8221; presented at         <span style="font-style: italic;">LDOW 2010</span>, April 27th, 2010,         Raleigh, North Carolina. See <a href="http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf">http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>An Executive Intro to Ontologies</title>
		<link>http://www.mkbergman.com/900/an-executive-intro-to-ontologies/</link>
		<comments>http://www.mkbergman.com/900/an-executive-intro-to-ontologies/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 05:53:14 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Ontology Best Practices]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Ontology]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[vocabularies]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=900</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=An Executive Intro to Ontologies&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-08-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/900/an-executive-intro-to-ontologies/&amp;rft.language=English"></span>
Ontologies are the structural frameworks for organizing information  on the semantic Web and within semantic enterprises. They provide unique  benefits in discovery, flexible access, and information integration due to their inherent connectedness;  that is, their ability to represent conceptual relationships.  Ontologies can be layered on top of existing information assets, which [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=An Executive Intro to Ontologies&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Ontology Best Practices&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-08-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/900/an-executive-intro-to-ontologies/&amp;rft.language=English"></span>
<p>Ontologies are the structural frameworks for organizing information  on the semantic Web and within semantic enterprises. They provide unique  benefits in <strong>discovery</strong>, <strong>flexible access</strong>, and <strong>information integration</strong> due to their inherent <span class="double_u">connectedness</span>;  that is, their ability to represent conceptual relationships.  Ontologies can be layered on top of existing information assets, which  means they are an <em><strong>enhancement</strong></em> and not a displacement for  prior investments. And ontologies may be developed and matured  incrementally, which means their adoption may be <em><strong>cost-effective</strong></em> as benefits become evident <a href="#exec1">[1]</a>.</p>
<h3>What Is an Ontology?</h3>
<p><em><strong>Ontology</strong></em> may be one of the more daunting terms for  those exposed for the first time to semantic technologies. Not only is  the word long and without common antecedents, but it is also a term that  has widely divergent use and understanding within the community. It can  be argued that this not-so-little word is one of the barriers to  mainstream understanding of the semantic Web.</p>
<p>The root of the term is the Greek <em>ontos</em>, or <em>being</em> or <em>the nature of things</em>. Literally — and in classical philosophy — ontology was used in relation to the study of the nature of being or the world, <a title="http://en.wikipedia.org/wiki/Ontology" rel="nofollow" href="http://en.wikipedia.org/wiki/Ontology">the nature of existence</a>. <a title="http://tomgruber.org/" rel="nofollow" href="http://tomgruber.org/">Tom Gruber</a>, among others, made the term popular in relation to <a title="http://en.wikipedia.org/wiki/Ontology_%28computer_science%29" rel="nofollow" href="http://en.wikipedia.org/wiki/Ontology_%28computer_science%29">computer science</a> and artificial intelligence <a title="http://tomgruber.org/writing/ontolingua-kaj-1993.htm" rel="nofollow" href="http://tomgruber.org/writing/ontolingua-kaj-1993.htm">about 15 years ago</a> when he defined ontology as a “formal specification of a conceptualization.”</p>
<p>Much like taxonomies or relational database schema, ontologies  work to organize information. No matter what the domain or scope, an  ontology is a description of a world view. That view might be limited  and miniscule, or it might be global and expansive. However, unlike  those alternative hierarchical views of concepts such as taxonomies,  ontologies often have a linked or networked &#8220;graph&#8221; structure. Multiple  things can be related to other things, all in a potentially multi-way  series of relationships.</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top;"><a href="../wp-content/themes/ai3/images/2010Posts/100809_taxonomy_view.png"><img style="border: 0px solid; width: 270px; height: 270px;" title="Example Taxonomy Structure" src="../wp-content/themes/ai3/images/2010Posts/100809_taxonomy_view.png" alt="Example Taxonomy Structure" /></a></td>
<td style="vertical-align: middle;"><img style="width: 52px; height: 64px;" src="../wp-content/themes/ai3/images/2010Posts/100809_arrow.png" alt="" /></td>
<td style="vertical-align: top;"><a href="../wp-content/themes/ai3/images/2010Posts/100809_ontology_view.png"><img style="border: 0px solid; width: 270px; height: 270px;" title="Example Ontology Structure" src="../wp-content/themes/ai3/images/2010Posts/100809_ontology_view.png" alt="Example Ontology Structure" /></a></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center;" colspan="3"><small><span style="color: #6666cc; font-weight: bold;">A distinguishing characteristic of ontologies compared to conventional hierarchical structures is their degree<br />
of connectedness,  their ability to model coherent, linked relationships</span></small></td>
</tr>
</tbody>
</table>
<p>Ontologies supply the structure for relating information to other  information in the semantic Web or the <a href="http://www.structureddynamics.com/linked_data.html">linked data</a> realm.  Ontologies  thus provide a similar role for the organization of data that is  provided by relational data schema.  Because of this structural role,  ontologies are pivotal to the coherence and interoperability of  interconnected data.</p>
<p>When one uses the idea of &#8220;world view&#8221; as synonomous with an  ontology, it is not meant to be cosmic, but simply a way to convey how a  given domain or problem area can be described. One group might choose  to describe and organize, say, automobiles, by color; another might  choose body styles such as pick-ups or sedans; or still another might  use brands such as Honda and Ford. None of these views is inherently  &#8220;right&#8221; (indeed multiples might be combined in a given ontology), but  each represents a particular way &#8212; a &#8220;world view&#8221; &#8212; of looking at the domain.</p>
<p>Though there is much latitude in how a given domain might be  described, there are both good ontology practices and bad ones. We offer  some views as to what constitutes good ontology design and practice in  the concluding section.</p>
<h3>What Are Its Benefits?</h3>
<p>A good ontology offers a composite suite of benefits not available to  taxonomies, relational database schema, or other standard ways to  structure information. Among these benefits are:</p>
<ul>
<li> <strong>Coherent navigation</strong> by enabling the movement from concept to concept in the ontology structure</li>
<li> <strong>Flexible entry points</strong> because any specific perspective  in the ontology can be traced and related to all of its associated  concepts; there is no set structure or manner for interacting with the  ontology</li>
<li> <strong>Connections</strong> that highlight related information and aid  and prompt discovery without requiring prior knowledge of the domain or  its terminology</li>
<li> Ability to represent <strong>any form of information</strong>, including  <span style="font-style: italic;">unstructured </span>(say, documents or text), <span style="font-style: italic;">semi-structured</span> (say, XML or Web  pages) and <span style="font-style: italic;">structured</span> (say, conventional databases) data</li>
<li> <strong>Inferencing</strong>, whereby by specifying one concept (say,  mammals) one knows that we are also referring to a related concept (say,  that mammals are a kind of animal)</li>
<li> <strong>Concept matching</strong>, which means that even though we may describe things somewhat differently, we can still match to the same idea (such as <em>glad</em> or <em>happy</em> both referring to the concept of a pleasant state of mind)</li>
<li> Thus, this means that we can also <strong>integrate external content</strong> by proper matching and mapping of these concepts</li>
<li> A framework for <strong>disambiguation</strong> by nature of the matching and analysis of concepts and instances in the ontology graph, and</li>
<li> <strong>Reasoning</strong>, which is the ability to use the coherence  and structure itself to inform questions of relatedness or to answer  questions.</li>
</ul>
<h3>How Are Ontologies Used?</h3>
<p>The relationship structure underlying an ontology provides an excellent vehicle for <strong>discovery</strong> and <strong>linkages</strong>.   &#8220;Swimming through&#8221; this relationship graph is the basis of the <a href="http://openstructs.org/sites/openstructs.org/sc/demos/portablecontrolapplication/demo/sRelationBrowser/index.html">Concept  Explorer</a> (also known as the Relation Browser) and similar widgets.</p>
<p>The most prevalent use of ontologies at present is in <strong>semantic search</strong>.  Semantic search has benefits over conventional search in terms of being  able to make inferences and matches not available to standard keyword  retrieval.</p>
<p>The relationship structure also is a powerful and more general and more nuanced way to <strong>organize information</strong>.  Concepts can relate to other concepts through a richness of vocabulary.  Such predicates might capture subsumption, precedence, parts of  relationships (mereology), preferences, or importances along virtually  any metric. This richness of expression and relationships can also be  built incrementally over time, allowing ontologies to grow and develop  in sophistication and use as desired.</p>
<p>The pinnacle application for ontologies, therefore, is as  coherent reference structures whose purpose is to help map and integrate  other structures and information. Given the huge heterogeneity of  information both within and without organizations, the use of ontologies  as <strong>integration frameworks</strong> will likely emerge as their most valuable use.</p>
<h3>What Makes for a Good Ontology?</h3>
<p>Good ontology practice has aspects both in terms of <span class="double_u">scope</span> and in terms of <span class="double_u">construction</span>.</p>
<h4>Scope Considerations</h4>
<p>Here are some scoping and design questions that we believe should be  answered in the positive in order for an ontology to meet good practice  standards:</p>
<ul>
<li> Does the ontology provide <strong>balanced coverage</strong> of the  subject domain? This question gets at the issue of properly scoping and  bounding the subject coverage of the ontology. It also means that the  breadth and depth of the coverage is roughly equivalent across its scope</li>
<li> Does the ontology embed its domain coverage into a proper <strong>context</strong>?  A major strength of ontologies is their potential ability to  interoperate with other ontologies. Re-using existing and well-accepted  vocabularies and including concepts in the subject ontology that aid  such connections is good practice. The ontology should also have  sufficient reference structure for guiding the assignment of what  content “is about”</li>
<li> Are the relationships in the ontology <strong>coherent</strong>? The  essence of coherence is that it is a state of logical, consistent  connections, a logical framework for integrating diverse elements in an  intelligent way. So while context supplies a reference structure,  coherence means that the structure makes sense. Is the hip bone  connected to the thigh bone, or is the skeleton incorrect?</li>
<li> Has the ontology been <strong>well constructed</strong> according to good practice? See next.</li>
</ul>
<p>If these questions can be answered affirmatively, then we would deem the ontology ready for production-grade use.</p>
<p>Fundamental to the whole concept of coherence is the fact that  experts and practitioners within domains have been looking at the  questions of relationships, structure, language and meaning for decades.   Though perhaps today we now finally have a broad useful data and logic  model in RDF, the fact remains that massive time and effort has already  been expended to codify some of these understandings in various ways  and at various levels of completeness and scope. Good practice also  means, therefore, that maximum leverage is made to springboard  ontologies from existing structural and vocabulary assets.</p>
<p>And, because good ontologies also embrace the <a title="http://en.wikipedia.org/wiki/Open_world_assumption" rel="nofollow" href="http://en.wikipedia.org/wiki/Open_world_assumption">open world approach</a>,  working toward these desired end states can also be incremental. Thus,  in the face of common budget or deadline constraints, it is possible initially to scope domains as smaller or to provide less  coverage in depth or to use a small set of predicates, all the while  still achieving productive use of the ontology. Then, over time, the  scope can be expanded incrementally.</p>
<h4>Construction Considerations</h4>
<p>To achieve their purposes, ontologies must be both human-readable and  machine-processable. Also, because they represent conceptual  structures, they must be built with a certain composition.</p>
<p>Good ontologies therefore are constructed such that they have:</p>
<ul>
<li> Concept <strong>definitions</strong> &#8211; the matching and alignment of  things is done on the basis of concepts (not simply labels) which means  each concept must be defined</li>
<li> A <strong>preferred label</strong> that is used for human readable purposes and in user interfaces</li>
<li> A &#8220;<strong>semset</strong>&#8221; &#8211; which means a series of alternate labels  and terms to describe the concept. These alternatives include true  synonyms, but may also be more expansive and include jargon, slang,  acronyms or alternative terms that usage suggests refers to the same  concept</li>
<li> Clearly defined <strong>relationships</strong> (also known as properties, attributes, or predicates) for relating two things to one another</li>
<li> All of which is written in a machine-processable language such as <a title="http://en.wikipedia.org/wiki/Web_Ontology_Language" rel="nofollow" href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> or <a title="http://en.wikipedia.org/wiki/RDF_Schema" rel="nofollow" href="http://en.wikipedia.org/wiki/RDF_Schema">RDF Schema</a> (among others).</li>
</ul>
<p>In the case of <a title="http://www.mkbergman.com/847/ontology-driven-applications-using-adaptive-ontologies/" rel="nofollow" href="../847/ontology-driven-applications-using-adaptive-ontologies/">ontology-driven applications using adaptive ontologies</a>,  there are also additional instructions contained in the system (often  via administrative ontologies) that tell the system which types of  widgets need to be invoked for different data types and attributes. This  is different than the standard conceptual schema, but is nonetheless  essential to how such applications are designed.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="exec1"></a>[1] This posting was at the request of a couple of <a href="http://structureddynamics.com/">Structured Dynamics</a>&#8216; customers that desired a way to describe ontologies to non-technical management. For a more in depth treatment, see M.K. Bergman, 2007. &#8220;<a style="font-style: italic;" href="../374/an-intrepid-guide-to-ontologies/">An Intrepid Guide to Ontologies</a>,&#8221; <span style="font-weight: bold;">AI3:::Adaptive Information</span> blog, May 16, 2007.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/900/an-executive-intro-to-ontologies/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Structure Paves the Way to the Semantic Web</title>
		<link>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/</link>
		<comments>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 05:55:31 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[Eisenhower]]></category>
		<category><![CDATA[IEEE]]></category>
		<category><![CDATA[Interstate highways]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=889</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Structure Paves the Way to the Semantic Web&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-06-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/&amp;rft.language=English"></span>
How Shall We Measure Progress Over the Past Three Years?

For a dozen years, my career has been centered on Internet search,  dynamic content and the deep Web. For the  past few years, I have been somewhat obsessed by two topics.
The first  topic, a conviction really, is that implicit structure needs to be [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Structure Paves the Way to the Semantic Web&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-06-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/&amp;rft.language=English"></span>
<h2>How Shall We Measure Progress Over the Past Three Years?</h2>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday     Brown Bag Lunch" width="158" height="179" /><br />
<a href="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405a_colorado-hwy.jpg"><img style="border: 0px solid; float: right; margin-left: 10px;" title="Colorado  Interstate construction - 1970; courtesy National Archives" src="../wp-content/themes/ai3/images/2007Posts/070405a_colorado-hwy.jpg" alt="Colorado  Interstate construction - 1970; courtesy National Archives" width="272" /></a>For a dozen years, my career has been centered on Internet search,  dynamic content and the <a href="http://en.wikipedia.org/wiki/Deep_web">deep Web</a>. For the  past few years, I have been somewhat obsessed by two topics.</p>
<p>The first  topic, a conviction really, is that implicit structure needs to be  extracted from Web content to enable it to be disambiguated, organized,  shared and re-purposed. The second topic, more an open question as a  former academic married to a professor, is what might replace editorial  selections and peer review to establish the authoritativeness of  content. These topics naturally steer one to the <a href="http://en.wikipedia.org/wiki/Semantic_web">semantic Web</a>.</p>
<h3><span style="font-weight: bold">A  Millennial Perspective</span></h3>
<p>The semantic Web, by whatever name it comes to be called, is an  inevitability.  History tells us that as information content grows, so  do the mechanisms for organizing and managing it. Over human history,  innovations such as writing systems, alphabetization, pagination, tables  of contents, indexes, concordances, reference look-ups, classification  systems, tables, figures, and statistics have emerged in parallel with  content growth [<a href="#SWref19">19</a>].</p>
<p>When the Lycos search engine, one of the first profitable Internet  ventures, was publicly released in 1994, it indexed a mere 54,000 pages [<a href="#SWref1">1</a>].  When  Google wowed us with its page-ranking algorithm in 1998, it soon  replaced my then favorite search engine, AltaVista.  Now, tens of  billions of indexed documents later, I often find Google&#8217;s results to be  overwhelming dross &#8212; unfortunately true again for all of the major  search engines.  Faceted browsing, vertical search, and Web 2.0&#8217;s  tagging and folksonomies demonstrate humanity&#8217;s natural penchant to  fight this entropy, efforts that will next continue with the semantic  Web and then mechanisms unforeseen to manage the chaos of burgeoning  content.</p>
<p>An awful lot of hot air has been expelled over the false dichotomy of  whether the semantic Web will fail or is on the verge of nirvana.  Arguments extend from the epistemological versus ontological  (classically defined) to Web 3.0 versus SemWeb or Web services (WS*)  versus REST (Representational State Transfer). My RSS feed reader points  to at least one such dust up every week.</p>
<p>Some set the difficulties of resolving semantic heterogeneities as  absolutes, leading to an illogical and false rejection of semantic Web  objectives. In contrast, some advocates set equally divisive arguments  for semantic Web purity by insisting on formal ontologies and  descriptive logics. Meanwhile, studied leaks about &#8220;stealth&#8221; semantic  Web ventures mean you should grab your wallet while simultaneously  shaking your head.</p>
<h3><span style="font-weight: bold">A  Decades-Long Perspective</span></h3>
<p>My mental image of the semantic Web is a road from here to some  achievable destination &#8212; say, Detroit. Parts of the road are well paved;  indeed, portions are already superhighways with controlled on-ramps and  off-ramps. Other portions are two lanes, some with way too many traffic  lights and some with dangerous intersections. A few small portions  remain unpaved gravel and rough going.</p>
<div style="float: right;  margin-left: 10px"><a href="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405b_1919wreck_400.jpg"><img style="border: 0px solid; width: 400px;" title="1919 Wreck in Nebraska" src="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405b_1919wreck_400.jpg" alt="1919 Wreck in Nebraska" align="middle" /></a></p>
<p align="center"><small>Wreck in Nebraska during the 1919  Transcontinental Motor Convoy</small></p>
</div>
<p>A lack of perspective makes things appear either too close or too far  away. The automobile isn&#8217;t yet a century old as a mass-produced item.  It wasn&#8217;t until 1919 that the US Army Transcontinental Motor Convoy made  the first automobile trip across the United States.</p>
<p>The 3,200 mile  route roughly followed today&#8217;s Lincoln Highway, US 30, from Washington,  D.C. to San Francisco. The convoy took 62 days and 250 recorded  accidents to complete the trip (see figure), half on dirt roads at an  average speed of 6 miles per hour. A tank officer on that trip later  observed Germany&#8217;s autobahns during World War II. When he subsequently  became President Dwight D. Eisenhower, he proposed and then signed the  Interstate Highway Act.</p>
<p>That was 50 years ago. Today, the US is  crisscrossed with 50,000 miles of interstates, which have completely  remade the nation&#8217;s economy and culture [<a href="#SWref2">2</a>].</p>
<h3><span style="font-weight: bold">Today&#8217;s  Perspective</span></h3>
<p>Like the interstate system in its early years, today&#8217;s semantic Web  lets you link together a complete trip, but the going isn&#8217;t as smooth or  as fast as it could be. Nevertheless, making the trip is doable and  keeps improving day by day, month by month.</p>
<p>My view of what&#8217;s required to smooth the road begins with extracting  structure and meaningful information according to understandable schema  from mostly uncharacterized content. Then we store the now-structured  content as RDF triples that can be further managed and manipulated at  scale. By necessity, the journey embraces tools and requirements that,  individually, might not constitute semantic Web technology as some  strictly define it. These tools and requirements are nonetheless  integral to reaching the destination. We are well into that journey&#8217;s  first leg, what I and others are calling the <span style="font-style: italic">structured Web</span>.</p>
<p>For the past six months or so I have been researching and assembling  as many semantic Web and related tools as I can find [<a href="#SWref3">3</a>].  That  <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/"><span style="font-style:  italic; font-weight: bold">Sweet Tools</span></a> listing now exceeds  500 tools [<a href="#SWref4">4</a>] (with  its presentation using the nifty lightweight Exhibit publication system  from MIT&#8217;s Simile program [<a href="#SWref5">5</a>]).   I&#8217;ve come to understand the importance of many ancillary tool sets to  the entire semantic Web highway, such as natural language processing and  information extraction. I&#8217;ve also found new categories of pragmatic  tools that embody semantic Web and data mediation processes but don&#8217;t  label themselves as such.</p>
<p>In its entirety, the <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/"><span style="font-style:  italic; font-weight: bold">Sweet Tools</span></a> listing provides a  pretty good picture of the semantic Web&#8217;s state. It&#8217;s a surprisingly  robust picture &#8212; though with some notable potholes &#8212; and includes  impressive open source options in all categories. Content publishing,  indexing, and retrieval at massive scales are largely solved problems.  We also have the infrastructure, languages, and (yes!) standards for  tying this content together meaningfully at the data and object levels.</p>
<p>I also think a degree of consensus has emerged on RDF as the  canonical data model for semantic information. RDF triple stores are  rapidly improving toward industrial strength, and RESTful designs enable  massive scalability, as terabyte- and petabyte-scale full-text indexes  prove.</p>
<p>Powerful and flexible middleware options, such as those from OpenLink  [<a href="#SWref6">6</a>], can  transform and integrate diverse file formats with a variety of back  ends. The World Wide Web Consortium&#8217;s GRDDL standard [<a href="#SWref7">7]</a> and  related tools, plus various &#8220;RDF-izers&#8221; from Massachusetts Institute of  Technology and elsewhere [<a href="#SWref8">8</a>],  largely provide the conversion infrastructure for getting Web data into  that canonical RDF form. Sure, some of these converters are still  research-grade, but getting them to operational capabilities at scale  now appears trivial.</p>
<p>Things start getting shakier when trying to structure information  into a semantic formalism. Controlled vocabularies and ontologies range  broadly and remain a contentious area. Publishers and authors perhaps  have too many choices: from straight Atom or RSS feeds and feeds with  tags to informal folksonomies and then Outline Processor Markup Language  [<a href="#SWref9">9</a>] or  microformats [<a href="#SWref10">10</a>].  From there, the formalism increases further to include the standard RDF  ontologies such as SIOC (Semantically-Interlinked Online Communities),  SKOS (Simple Knowledge Organizing System), DOAP (Description of a  Project), and FOAF (Friend of a Friend) [<a href="#SWref11">11</a>] and  the still greater formalism of OWL&#8217;s various dialects [<a href="#SWref12">12</a>].</p>
<div style="border: 1px solid #820000; background-color: #ffffe5; width: 460px; float: left; margin: 10px 10px 10px 0px; font-style: italic; font-size: 120%;">
<table border="0" cellspacing="4" cellpadding="4">
<tbody>
<tr>
<td style="vertical-align: middle; text-align: center"><em>If we compare the  semantic Web to the US interstate highway system, we&#8217;re still in the  early stages of a journey that will remake our economy and culture.</em></td>
</tr>
<tr>
<td style="text-align: center"><em>Many  potholes on the road to the semantic Web exist.</em></td>
</tr>
<tr>
<td style="text-align: center"><em>One ready  task is to transform existing structure to RDF. Another priority is to  refine tools to extract structure and meaningful information from  uncharacterized content.</em></td>
</tr>
</tbody>
</table>
</div>
<p>Arguing which of these is the theoretical best method is doomed to  failure, except possibly in a bounded enterprise environment. We live in  the real world, where multiple options will always have their advocates  and their applications.</p>
<p>All of us should welcome whatever structure we  can add to our information base, no matter where it comes from or how  it&#8217;s done. The sooner we can embrace content in any of these formats and  convert it into canonical RDF form, we can then move on to needed  developments in semantic mediation, some of the roughest road on the  journey.</p>
<h3 style="font-weight: bold">Potholes on  the Semantic Highway</h3>
<p>Semantic mediation requires appropriate structured content. Many  potholes on the road to the semantic Web exist because the content lacks  structured markup; others arise because existing structure requires  transformation. We need improved ways to address both problems. We also  need more intuitive means for applying schema to structure. Some have  referred to these issues as &#8220;who pays the tax.&#8221;</p>
<p>Recent experience with social software and collaboration proves that a  portion of the Internet user community is willing to tag and  characterize content. Furthermore, we can readily leverage that  resulting structure, and free riders are welcomed. The real pothole is  the lack of easy &#8212; even fun &#8212; data extractors and &#8220;structurizers.&#8221; But  we&#8217;re tantalizingly close.</p>
<p>Tools such as Solvent and Sifter from MIT&#8217;s Simile program [<a href="#SWref13">13</a>] and  Marmite from Carnegie Mellon University [<a href="#SWref14">14</a>] are  showing the way to match DOM (document object model) inspectors with  automated structure extractors. DBpedia,  the alpha version of Freebase,  and System One now provide large-scale, open Web data sets in RDF [<a href="#SWref15">15</a>],  including all of Wikipedia. Browser extensions such as Zotero [<a href="#SWref16">16</a>] are  showing how to integrate structure management into acceptable user  interfaces, as are services such as Zoominfo [<a href="#SWref17">17</a>]. Yet  we still lack easy means to design the differing structures suitable  for a plenitude of destinations.</p>
<p>Amazingly, a compelling road map for how all these pieces could truly  fit together is also incomplete. How do we actually get from here to  Detroit? Within specific components, architectural understandings are  sometimes OK (although documentation is usually awful for open source  projects, as most of the current tools are). Until our community better  documents that vision, attracting new contributors will be needlessly  slower, thus delaying the benefits of network effects.</p>
<p>So, let&#8217;s create a road map and get on with paving the gaps and  filling the potholes. It&#8217;s not a matter of standards or technology &#8212; we  have those in abundance. Let&#8217;s stop the silly squabbles and commit to  the journey in earnest. The <span style="font-style: italic">structured Web</span>&#8217;s ability to reach <span style="font-style: italic">Hyperland</span> [<a href="#SWref18">18</a>],  Douglas Adam&#8217;s prescient 1990 forecast of the semantic Web, now looks to  be no further away than Detroit.</p>
<div class="boxBrownDotted center_ok" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag    Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday      Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday      brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about three years     ago on <a href="http://www.mkbergman.com/357/structure-paves-the-way-to-the-semantic-web/">May 3, 2007</a>.  The piece was my answer to a request by <a href="http://www.mindswap.org/blog/">Jim  Hendler</a> to pen   some thoughts on the semantic Web, based on I believe what he thought might be a pragmatic perspective combining  Internet business with Web science. The formal piece appeared as a guest  editorial in  the May/June 2007 issue of <a href="http://www.computer.org/intelligent/">IEEE Intelligent Systems</a>. What appears above is unaltered from my original posting (aside from some minor formatting clean-up and &#8212; sorry to say &#8212; some of the projects are now defunct).</div>
<hr style="height: 1px; width: 33%; margin-left: 0px; margin-right: auto;" />
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref1" name="SWref1">[1]</a> Chris  Sherman, &#8220;Happy Birthday, Lycos!,&#8221; <span style="font-style: italic">Search Engine Watch</span>, August 14,  2002.  See <a href="http://searchenginewatch.com/showPage.html?page=2160551">http://searchenginewatch.com/showPage.html?page=2160551</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref2" name="SWref2">[2]</a> David  A. Pfeiffer, &#8220;Ike&#8217;s Interstates at 50: Anniversary of the Highway System  Recalls Eisenhower&#8217;s Role as Catalyst,&#8221; <span style="font-style: italic">Prologue Magazine</span>,  National Archives, Summer 2006, Vol. 38, No. 2. See: <a href="http://www.archives.gov/publications/prologue/2006/summer/interstates.html">http://www.archives.gov/publications/prologue/2006/summer/interstates.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref3" name="SWref3">[3]</a> The  mention of specific tool names is meant to be illustrative and not  necessarily a recommendation.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref4" name="SWref4">[4]</a> <span style="font-weight: bold">Sweet Tools</span> (SemWeb) listing; see <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/">http://www.mkbergman.com/new-version-sweet-tools-sem-web/</a> .</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref5" name="SWref5">[5]</a> See <a href="http://simile.mit.edu/exhibit/">http://simile.mit.edu/exhibit/.</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref6" name="SWref6">[6]</a> OpenLink Software&#8217;s Virtuoso and Data Spaces products; see <a href="http://www.openlinksw.com/">http://www.openlinksw.com/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref7" name="SWref7">[7]</a> W3C&#8217;s  Gleaning Resource Descriptions from Dialects of Languages (GRDDL,  pronounced &#8220;griddle&#8221;).  See <a href="http://www.w3.org/2004/01/rdxh/spec">http://www.w3.org/2004/01/rdxh/spec</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref8" name="SWref8">[8]</a> See <a href="http://simile.mit.edu/wiki/RDFizers">http://simile.mit.edu/wiki/RDFizers</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref9" name="SWref9">[9]</a> Outline  Processor Markup Language (OPML); see <a href="http://www.opml.org/">http://www.opml.org/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref10" name="SWref10">[10]</a> Microformats; see <a href="http://microformats.org/">http://microformats.org</a>/.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref11" name="SWref11">[11]</a> <a href="http://en.wikipedia.org/wiki/DOAP">DOAP</a> (<a href="http://en.wikipedia.org/wiki/DOAP">Description of a Project</a>),  <a href="http://en.wikipedia.org/wiki/FOAF">FOAF</a> (<a href="http://en.wikipedia.org/wiki/FOAF">Friend of a Friend</a>), <a href="http://en.wikipedia.org/wiki/SIOC">SIOC</a> (<a href="http://en.wikipedia.org/wiki/SIOC">Semantically-Interlinked  Online Communities</a>) and <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> (<a href="http://en.wikipedia.org/wiki/SKOS">Simple Knowledge Organizing  System</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref12" name="SWref12">[12]</a> W3C&#8217;s Web Ontology Language (OWL).  See <a href="http://www.w3.org/TR/owl-features/">http://www.w3.org/TR/owl-features/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref13" name="SWref13">[13]</a> Solvent (<a href="http://simile.mit.edu/wiki/Solvent">http://simile.mit.edu/wiki/Solvent</a>)  and Sifter (<a href="http://simile.mit.edu/wiki/Sifter">http://simile.mit.edu/wiki/Sifter</a>)  are from MIT&#8217;s Simile program.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref14" name="SWref14">[14]</a> Marmite (<a href="http://www.cs.cmu.edu/%7Ejasonh/projects/marmite/">http://www.cs.cmu.edu/~jasonh/projects/marmite/</a>)  is from Carnegie Mellon University.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref15" name="SWref15">[15]</a> DBpedia (<a href="http://dbpedia.org/docs/">http://dbpedia.org/docs/</a>) and  Freebase (in alpha, by invitation only at <a href="http://www.freebase.com/">http://www.freebase.com/</a>)  are two of the first large-scale open datasets on the Web; Wikipedia  has also been converted to RDF by System One (<a href="http://labs.systemone.at/wikipedia3">http://labs.systemone.at/wikipedia3</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref16" name="SWref16">[16]</a> Zotero is produced by George Mason University&#8217;s Center for History and  New Media; see <a href="http://www.zotero.org/">http://www.zotero.org</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref17" name="SWref17">[17]</a> ZoomInfo (<a href="http://www.zoominfo.com/">http://www.zoominfo.com/</a>)  provides online structured search of companies and people, plus broader  services to enterprises.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref18" name="SWref18">[18]</a> The  late <a title="Douglas Adams" href="http://www.douglasadams.com/">Douglas  Adams</a>, of <em>Doctor Who </em>and <em>A Hitchhiker&#8217;s Guide to the  Galaxy</em> fame, produced a TV program for BBC2 presaging the Internet  called <a style="font-style: italic" href="http://en.wikipedia.org/wiki/Hyperland">Hyperland</a>.  This 50-min  video can be seen in five parts via YouTube at Part <a href="http://www.youtube.com/watch?v=rOsPKjbMvxY">1 of 5</a>, <a href="http://www.youtube.com/watch?v=ELSZ7pAmvKE">2 of 5</a>, <a href="http://www.youtube.com/watch?v=VF8dm9sK8as">3 of 5</a>, <a href="http://www.youtube.com/watch?v=6dB3_GcFV_0">4 of 5</a> and <a href="http://www.youtube.com/watch?v=b8pvOdMnflI">5 of 5</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref19" name="SWref19">[19]</a> Since I first wrote this piece, I have systematized these developments in my <a href="http://www.mkbergman.com/temp-exhibit/">Timeline of Information History</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Bipolar Disorder of Linked Data</title>
		<link>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/</link>
		<comments>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 23:12:38 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[ABox]]></category>
		<category><![CDATA[structured data]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=880</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Bipolar Disorder of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-28&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/&amp;rft.language=English"></span>

An Acceptance of Its Natural Role is the Prozac Substitute
There has been a bit of a manic-depressive character on the Web waves         of late with respect to linked data. On the one         hand, we have seen huzzahs and celebrations [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Bipolar Disorder of Linked Data&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Linked Data&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-28&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/&amp;rft.language=English"></span>
<p><a href="http://commons.wikimedia.org/wiki/File:VanGogh-starry_night_ballance1.jpg"><img style="border: 0px solid; width: 250px; height: 198px; float: left; margin-right: 10px;" title="The Starry Night, from Vincent Van Gogh" src="../wp-content/themes/ai3/images/2010Posts/250-VanGogh-starry_night_ballance1.jpg" alt="The Starry Night, from Vincent Van Gogh" hspace="5" vspace="5" align="left" /></a></p>
<h2>An Acceptance of Its Natural Role is the Prozac Substitute</h2>
<p>There has been a bit of a manic-depressive character on the Web waves         of late with respect to <a href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a>. On the one         hand, we have seen huzzahs and celebrations from the likes of <a href="http://www.readwriteweb.com/archives/the_state_of_linked_data_in_2010.php"> ReadWriteWeb</a> and <a href="http://www.semanticweb.com/">Semantic         Web.com</a> and, just concluded, the Linked Data on the Web (<a href="http://events.linkeddata.org/ldow2010/">LDOW</a>) workshop at <a href="http://www2010.org/www/">WWW2010</a>. This treatment has tended to         tout the coming of the linked data era and to seek ideas about         possible, cool <a href="http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php"> linked data apps</a> <a href="#BPD1">[1]</a>. This rise in visibility has been accomplished         by much manic and excited discussion on <a href="http://lists.w3.org/Archives/Public/public-lod/">various</a> <a href="http://lists.w3.org/Archives/Public/semantic-web/">mailing</a> <a href="http://sourceforge.net/mailarchive/forum.php?forum_name=dbpedia-discussion"> lists</a>.</p>
<p>On the other hand, we have seen much wringing of hands and gnashing of         teeth for why linked data is not being used more and why the broader         issue of the semantic Web is not seeing more uptake. This depressive         &#8220;<a href="http://lists.w3.org/Archives/Public/semantic-web/2010Mar/0160.html">call         to arms</a>&#8221; has sometimes felt like ravings with blame being given to         the poor state of apps and user interfaces to badly linked data to the         difficulty of publishing same. Actually using linked data for anything         productive (other than single sources like <a href="http://dbpedia.org/About">DBpedia</a>) still appears to be an issue.</p>
<p>Meanwhile, among others, <a href="http://www.openlinksw.com/blog/%7Ekidehen/">Kingsley Idehen</a>,         ubiquitous voice on the Twitter <a href="http://twitter.com/search?q=%23linkeddata">#linkeddata</a> channel,         has been promoting the separation of identity of linked data from the         notion of the semantic Web. He is also trying to <a href="http://www.openlinksw.com/blog/%7Ekidehen/?1624&amp;title=Data%203.0%20%28a%20Manifesto%20for%20Platform%20Agnostic%20Structured%20Data%29%20Update%203"> change the narrative</a> away from the association of linked data with         RDF, instead advocating &#8220;Data 3.0&#8243; and the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity-attribute-value</a> (EAV) model understanding of structured data.</p>
<p>As someone less engaged in these topics since my own statements about         linked data over the past couple of years <a href="#BPD2">[2]</a>, I have my own         distanced-yet-still-biased view of what all of this crisis of         confidence is about. I think I have a diagnosis for what may be causing         this <a href="http://en.wikipedia.org/wiki/Bipolar_disorder">bipolar         disorder</a> of linked data <a href="#BPD3">[3]</a>.</p>
<h3>The Semantic Web Boogie Man</h3>
<p>A fairly universal response from enterprise prospects when raising the         topic of the semantic Web is, &#8220;That was a big deal of about a decade         ago, wasn&#8217;t it? It didn&#8217;t seem to go anywhere.&#8221; And, actually, I think         both proponents and keen observers agree with this general sentiment.         We have seen the original advocate, Tim Berners-Lee, float the <a href="http://en.wikipedia.org/wiki/Giant_Global_Graph">Giant Global         Graph</a> balloon, and now <a href="http://blog.ted.com/2010/03/the_year_open_d.php">Linked Data</a>.         Others have touted <a href="../462/how-shall-we-call-web-30-instead-mike-please-indulge-us/"> Web 3.0</a> or <a href="http://webofdata.wordpress.com/">Web of         Data</a> or, frankly, <a href="http://bnode.org/blog/2008/03/04/semantic-web-aliases">dozens of         alternatives</a>. Linked data, which began as a set of techniques for         publishing RDF, has emerged as a potential marketing hook and saviour         for the tainted original semantic Web term.</p>
<p>And therein, I think, lies the rub and the answer to the bipolar         disorder.</p>
<p>If one looks at the <a href="http://www.w3.org/DesignIssues/LinkedData.html">original         principles</a> for putting linked data on the Web or <a href="../846/when-linked-data-rules-fail/">subsequent         interpretations</a>, it is clear that linked data (lower case) is merely a set of techniques. Useful techniques, for sure; but really a simple approach to exposing data using the Web with URLs as the naming convention for objects and their relationships. These techniques provide (1) methods to access data on the Web and (2) specifying the relationships to link the data (resources). The first part is mechanistic and not really of further concern here. And, while any predicate can be used to specify a data (resource) relationship, that relationship should also be discoverable with a URL (dereferencable) to qualify as linked data. Then, to actually be semantically useful, that relationship (predicate) should also have a precise definition and be part of a coherent schema. (Note, this last sentence is actually not part of the &#8220;standard&#8221; principles for linked data, which itself is a <a href="../846/when-linked-data-rules-fail/">problem</a>.)</p>
<p>When used right, these techniques can be powerful and useful. But, poor         choices or execution in how relationships are specified often leads to         saying little or nothing about semantics. Most linked data uses a         woefully small vocabulary of data relationships, with even a smaller         set ever used for setting linkages <span style="font-weight: bold; font-style: italic;">across</span> existing linked         data sets <a href="#BPD4">[4]</a>. Linked data techniques are a part of the foundation to         overall best practices, but not the total foundation. As I have argued         for some time, linked data alone does not speak to issues of <a href="../431/umbel-making-linked-data-classy/">context</a> nor <a href="../450/when-is-content-coherent/">coherence</a>.</p>
<p>To speak semantically, linked data is not a synonym for the semantic         Web nor is it the <a style="font-family: monospace;" href="http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf">sameAs</a> the semantic Web. But, many proponents have tried to characterize it as         such. The general tenor is to blow the horns hard anytime some large         data set is &#8220;exposed&#8221; as linked data. (No matter whether the data is         incoherent, lacks a schema, or is even <a href="../846/when-linked-data-rules-fail/">poorly         described and defined</a>.) Heralding such events, followed by no         apparent usefulness to the data, causes confusion to reign supreme and         disappointment to naturally occur.</p>
<p>The semantic Web (or semantic enterprise or semantic government or         similar expressions) is a vision and an ideal. It is also a fairly         complete one that potentially embraces machines and agents working in         the background to serve us and make us more productive. There is an         entire stack of languages and techniques and methods that enable schema         to be described and non-conforming data to be interoperated. Now, of         course this ideal is still a work in progress. Does that make it a         failure?</p>
<p>Well, maybe so, if one sees the semantic Web as marketing or branding.         But, who said we had to present it or understand it as such?</p>
<p>The issue is not one of marketing and branding, but the lack of         benefits. Now, maybe I have it all wrong, but it seems to me that the         argument needs to start with what &#8220;linked data&#8221; and the &#8220;semantic Web&#8221;         can do for me. What I actually call it is secondary. Rejecting the         branding of the semantic Web for linked data or Web 3.0 or any other         somesuch is still dressing the emperor in new clothes.</p>
<h3>A Nicely Progressing Continuum, Thank You!</h3>
<p>For a couple of years now I have tried in various posts to present         linked data in a broader framework of structured and semantic Web data.         I first tried to capture this continuum in a diagram from <a href="../?p=391">July 2007</a>:</p>
<div style="margin: 18px 0px;">
<table class="center_ok" style="text-align: left; width: 622px;" border="0">
<tbody>
<tr>
<td colspan="4"><img style="width: 599px; height: 205px; margin-left: 15px; vertical-align: top;" src="../wp-content/themes/ai3/images/2007Posts/070720_web_transition.jpg" alt="Transition in Web Structure" /></td>
</tr>
<tr>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;">Document Web</td>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;" colspan="2">Structured Web</td>
<td style="border-bottom: 1px solid; font-weight: bold; text-align: center;">Semantic Web</td>
</tr>
<tr>
<td style="width: 150px;"></td>
<td style="width: 150px;"></td>
<td style="border-bottom: 1px solid; width: 150px; font-weight: bold; text-align: center;">Linked Data</td>
<td style="width: 150px;"></td>
</tr>
<tr>
<td>
<ul>
<li> <small>Document-centric</small></li>
<li> <small>Document resources</small></li>
<li> <small>Unstructured data and semi-structured data</small></li>
<li> <small>HTML<br />
</small></li>
<li> <small>URL-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 1993</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Structured data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>XML, JSON, RDF, etc<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2003</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> 2006<br />
</small></li>
</ul>
</td>
<td>
<ul>
<li> <small>Data-centric</small></li>
<li> <small>Linked data<br />
</small></li>
<li> <small>Semi-structured data and structured data</small></li>
<li> <small>RDF, RDF-S, OWL<br />
</small></li>
<li> <small>URI-centric</small></li>
<li> <small><span style="font-style: italic;">circa</span> ???<br />
</small></li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p>Now, three years later, I think the transitional phase of linked data         is reaching an end. OK, we have figured out one useful way to publish         large datasets staged for possible interoperability. Sure, we have         billions of triples and assertions floating out there. But what are we         to do with them? And, is any of it any good?</p>
<h3>The Reality of a Heterogeneous World</h3>
<p>I think Kingsley is right in one sense to point to EAV and structured         data. We, too, have not met a structured data format we did not         like. There are hundreds of attribute-value pair models of even         more generic nature that also belong to the conversation.</p>
<p>One of my most popular posts on this blog has been, <a style="font-style: italic;" href="../471/structs-naive-data-formats-and-the-abox/"> ‘Structs’: Naïve Data Formats and the ABox</a>, from         January 2009. Today, we have a multitude of popular structured data         formats from XML to JSON and even spreadsheets (CSV). Each form has its         advocates, place and reasons for existence and popularity (or not).         This inherent diversity is a fact and fixture of any discussion of         data. It is a major reason why we developed the <a href="http://openstructs.org/iron">irON</a> (<span style="font-style: italic;">instance record</span> and <span style="font-style: italic;">object notation</span>) non-RDF vocabulary to         provide a bridge from such forms to RDF, which is accessible on the Web         via URIs. irON clearly shows that entities can be usefully described         and consumed in either RDF or non-RDF serialized forms.</p>
<p>Though RDF and linked data is a great form for expressing this         structured information, other forms can convey the same meaning as         well. Of the billions of linked data triples exposed to date, surely         more than 99% are of this instance-level, &#8220;ABox&#8221; type of data <a href="#BPD5">[5]</a>. And,         more telling, of all of the structured data that is publicly obtainable         on the Web, my wild guess is that less than 0.0000000001% of that is         even linked RDF data <a href="#BPD6">[6]</a>.</p>
<p>Neither linked data nor RDF alone will &#8212; today or in the near future         &#8212; play a pivotal or essential role for instance data. The real         contribution from RDF and the semantic Web will come from connecting         things together, from interoperation and federation and conjoining.         This is the provenance of the TBox and is a role barely touched by         linked data. Publishing data as linked data helps tremendously in         simplifying ingest and guiding the eventual connections, but the making         of those connections, testing for their quality and reliability, are         steps beyond the linked data ken or purpose.</p>
<h3>Promoting Linked Data to its Level of Incompetence</h3>
<p>It seems, then, that we see two different forces and perspectives at         work, each contributing in its own way to today&#8217;s bipolar nature of         linked data.</p>
<p>On the manic side, we see the celebration for the release of each         large, linked data set. This perspective seems to care most about         volumes and numbers, with less interest in how and whether the data is         of quality or useful. This perspective seems to believe &#8220;post the data,         and the public will come.&#8221; This same perspective is also quite         parochial with respect to the unsuitability of non-linked data, be it         microdata, microformats or any of the older junk.</p>
<p>On the depressed side, linked data has been seen as a more palatable         packaging for the disappointments and perceived failures or slow         adoption of the earlier semantic Web phrasing. When this perspective         sees the lack of structure, defensible connections and other quality         problems with linked data as it presently exists, despair and         frustration ensue.</p>
<p>But both of these perspectives very much miss the mark. Linked data         will never become the universal technique for publishing structured         data, and should not be expected to be such. Numbers are never a         substitute for quality. And linked data lacks the standards, scope and         investment made in the semantic Web to date. Be patient; don&#8217;t despair;         structured data and the growth of semantics and useful metadata is         proceeding just fine.</p>
<p>Unrealistic expectations or wrong roles and metrics simply confuse the         public. We are fortunate that most potential buyers do not frequent the         community&#8217;s various mailing lists. Reduced expectations and an         understanding of linked data&#8217;s natural role is perhaps the best way to         bring back balance.</p>
<h3>Linked Data&#8217;s Natural Role</h3>
<p>We have consciously moved our communications focus from speaking         internally to the community to reaching out to the broader enterprise         public. There is much of education, clarification and dialog that is         now needed with the buying public. The time has moved past software         demos and toys to workable, pragmatic platforms, and the methodologies         and documentation necessary to support them. This particular missive         speaking to the founding community is (perhaps many will Hurray!)         likely to become even more rare as we continue to focus outward.</p>
<p>As Structured Dynamics has stated many times, we are committed to         linked data, presenting our information as such, and providing better         tools for producing and consuming it. We have made it one of the         <a href="../859/seven-pillars-of-the-open-semantic-enterprise/"> seven foundations</a> to our <a href="http://structureddynamics.com/products.html">technology stack</a> and         <a href="http://mike2.openmethodology.org/wiki/Open_SEAS_Framework">methodology</a>.</p>
<p>But, linked data on its own is inadequate as an interoperability         standard. Many practitioners don&#8217;t publish it right, characterize it         right, or link to it right. That does not negate its benefits, but it         does make it a poor candidate to install on the semantic Web throne.</p>
<p>Linked data based on RDF is perhaps the first citizen amongst all         structured data citizens. It is an expressive and readily consumed         means for publishing and relating structured instance data and one that         can be easily interoperated. It is a natural citizen of the Web.</p>
<p>If we can accept and communicate linked data for these strengths, for         what it naturally is &#8212; a useful set of techniques and best practices         for enabling data that can be easily consumed &#8212; we can rest easy at         night and not go crazy. Otherwise, bring on the Prozac.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD1"></a> [1] Actually, in my opinion, the suggested listing of apps from these         discussions is distinctly unimpressive and not compelling. As argued in         the main body of the post, I think this is because linked data is         really just a technique or best practice, and not a basis alone for         enabling compelling apps. As initial developers of such apps as the         <a href="http://umbel.structureddynamics.com/explorer.php?concept=http%3A%2F%2Fumbel.org%2Fumbel%2Fsc%2FMolecule"> UMBEL concept explorer</a> or <a href="http://dataviewer.zitgist.com/?uri=http%3A//fgiasson.com">Dataviewer</a>,         <a href="http://structureddynamics.com/">Structured Dynamics</a> understands the use of linked data and has a defensible basis to         comment on applications. Our own applications intimately integrate         linked data, but only as one of <a href="../859/seven-pillars-of-the-open-semantic-enterprise/"> seven foundations</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD2"></a> [2] Here are some of my relevant posts over the past year discussing         the role of linked data: <a style="font-style: italic;" href="../802/moving-beyond-linked-data/">Moving Beyond         Linked Data</a> (Sept. 20, 2009); <a style="font-style: italic;" href="../825/fresh-perspectives-on-the-semantic-enterprise/"> Fresh Perspectives on the Semantic Enterprise</a> (Sept. 28, 2009);         <a style="font-style: italic;" href="../837/the-law-of-linked-data/">The Law of         Linked Data</a> (Oct. 11, 2009); <a style="font-style: italic;" href="../846/when-linked-data-rules-fail/">When Linked         Data Rules Fail</a> (Nov. 16, 2009).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD3"></a> [3] The current bipolar discussion reminds me of the &#8220;<a href="http://en.wikipedia.org/wiki/Six_phases_of_a_big_project">Six Phases         of a Project</a>,&#8221; a copy of which has been a permanent fixture on my         office wall:</p>
<ol>
<li>Enthusiasm</li>
<li>Disillusionment</li>
<li>Panic</li>
<li>Search for the guilty</li>
<li>Punishment of the innocent</li>
<li>Honors &amp; praise for the non-participants.</li>
</ol>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD4"></a> [4] See, for example: Harry Halpin, 2009. &#8220;A Query-Driven         Characterization of Linked Data,&#8221; paper presented at the Linked Data on         the Web (LDOW) 2009 Workshop, April 20, 2009, Madrid, Spain, see         <a href="http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf">http://events.linkeddata.org/ldow2009/papers/ldow2009_paper16.pdf</a>;         Prateek Jain, Pascal Hitzler, Peter Z. Yehy, Kunal Vermay and Amit P.         Shet, 2010. &#8220;Linked Data is Merely More Data,&#8221; in Dan Brickley, Vinay         K. Chaudhri, Harry Halpin, and Deborah McGuinness, <span style="font-style: italic;">Linked Data Meets Artificial Intelligence,         Technical Report SS-10-07</span>, AAAI Press, Menlo Park, California,         2010, pp. 82-86., see <a href="http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf"> http://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf</a>;         among others.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD5"></a> [5] Structured Dynamics&#8217; best practices approach makes explicit splits         between the “<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>” (for instance data)         and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>”         (for ontology schema) in accordance with our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>, a fundamental underpinning for how we use RDF:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="BPD6"></a> [6] This topic is deserving of some analysis in its own right, and my         guess is really just that. For example, RSS feeds to mobile devices         alone perhaps account for 2,000 petabytes today; see <a href="http://www.tgdaily.com/hardware-features/49167-8000-petabytes-of-mobile-data-traffic-expected-by-2014"> http://www.tgdaily.com/hardware-features/49167-8000-petabytes-of-mobile-data-traffic-expected-by-2014</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/880/the-bipolar-disorder-of-linked-data/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation</title>
		<link>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/</link>
		<comments>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 14:34:31 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[semantic annotation]]></category>
		<category><![CDATA[semantic discovery]]></category>
		<category><![CDATA[semantic heterogeneity]]></category>
		<category><![CDATA[semantic mediation]]></category>
		<category><![CDATA[Sweet Tools]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=875</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/&amp;rft.language=English"></span>

Mediating semantic  heterogeneities requires tools and automation (or semi-automation) at  scale.  But existing tools are still crude and lack across-the-board  integration.  This is one of the next challenges in getting more  widespread acceptance of the semantic Web.
In earlier posts, I described the significant progress in climbing the data federation [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday   Brown Bag Lunch" width="158" height="179" /></p>
<div class="boxGraySolid" style="margin-left: 190px;"><em>Mediating semantic  heterogeneities requires tools and automation (or semi-automation) at  scale.  But existing tools are still crude and lack across-the-board  integration.  This is one of the next challenges in getting more  widespread acceptance of the semantic Web.</em></div>
<p>In earlier posts, I described the significant progress in <a href="http://www.mkbergman.com/?p=229">climbing the data federation  pyramid</a>, today&#8217;s <a href="http://www.mkbergman.com/?p=231">evolution in emphasis to the  semantic Web</a>, and the <a href="http://www.mkbergman.com/?p=232">40 or so sources of semantic  heterogeneity</a>. We now transition to an overview of how one goes  about providing these semantics and resolving these heterogeneities.</p>
<h3>Why the Need for Tools and Automation?</h3>
<p>In an excellent recent overview of semantic Web progress, Paul Warren  points out:<a href="#_xedn1">[1]</a></p>
<blockquote><p><em>Although knowledge workers no doubt believe in the  value of annotating their documents, the pressure to create metadata  isn&#8217;t present. In fact, the pressure of time will work in a counter  direction. Annotation&#8217;s benefits accrue to other workers; the knowledge  creator only benefits if a community of knowledge workers abides by the  same rules. . . . Developing semiautomatic tools for learning ontologies  and extracting metadata is a key research area . . . .Having to move  out of a user&#8217;s typical working environment to &#8216;do knowledge management&#8217;  will act as a disincentive, whether the user is creating or retrieving  knowledge.</em></p></blockquote>
<p>Of course, even assuming that ontologies are created and semantics  and metadata are added to content, there still remains the nasty  problems of resolving heterogeneities (semantic mediation) and  efficiently storing and retrieving the metadata and semantic  relationships.</p>
<p>Putting all of this process in place requires the infrastructure in  the form of tools and automation and proper incentives and rewards for  users and suppliers to conform to it.</p>
<h3>Areas Requiring Tools and Automation</h3>
<p>In his paper, Warren repeatedly points to the need for  &#8220;semi-automatic&#8221; methods to make the semantic Web a reality. He makes  fully a dozen such references, in addition to multiple references to the  need for &#8220;reasoning algorithms.&#8221; In any case, here are some of the  areas noted by Warren needing &#8220;semi-automatic&#8221; methods:</p>
<ul>
<li>Assign authoritativeness</li>
<li>Learn ontologies</li>
<li>Infer better search requests</li>
<li>Mediate ontologies (semantic resolution)</li>
<li>Support visualization</li>
<li>Assign collaborations</li>
<li>Infer relationships</li>
<li>Extract entities</li>
<li>Create ontologies</li>
<li>Maintain and evolve ontologies</li>
<li>Create taxonomies</li>
<li>Infer trust</li>
<li>Analyze links</li>
<li>etc.</li>
</ul>
<p>In a different vein, SemWebCentral lists these clusters of semantic  Web-related tasks, each of which also requires tools:<a href="#_xedn2">[2]</a></p>
<ul>
<li><em>Create an ontology</em> &#8212; use a text or graphical ontology editor  to create the ontology, which is then validated. The resulting ontology  can then be viewed with a browser before being published</li>
<li><em>Disambiguate data </em>&#8211; generate a mapping between multiple  ontologies to identify where classes and properties are the same</li>
<li><em>Expose a relational database as OWL</em> &#8212; an editor is first  used to create the ontologies that represent the database schema, then  the ontologies are validated, translated to OWL and then the generated  OWL is validated</li>
<li><em>Intelligently query distributed data </em>&#8211; repository and again  able to be queried</li>
<li><em>Manually create data from an ontology</em> &#8212; a user would use an  editor to create new OWL data based on existing ontologies, which is  then validated and browsable</li>
<li><em>Programmatically interact with OWL content</em> &#8212; custom programs  can view, create, and modify OWL content with an API</li>
<li><em>Query non-OWL data</em> &#8212; via an annotation tool, create OWL  metadata from non-OWL content</li>
<li><em>Visualize semantic data</em> &#8212; view semantic data in a custom  visualizer.</li>
</ul>
<p>With some ontologies approaching tens to hundreds of thousands to  millions of triples, viewing, annotating and reconciling at scale can be  daunting tasks, the efforts behind which would never be taken without  useful tools and automation.</p>
<h3>A Workflow Perspective Helps Frame the Challenge</h3>
<p>A 2005 paper by Izza, Vincent and Burlat (among many other excellent  ones) at the first International Conference on Interoperability of  Enterprise Software and Applications (INTEROP-ESA) provides a very  readable overview on the role of semantics and ontologies in enterprise  integration.<a href="#_xedn3">[3]</a> Besides proposing a fairly  compelling unified framework, the authors also present a useful  workflow perspective emphasizing <a href="http://en.wikipedia.org/wiki/Web_service">Web services</a> (WS), also applicable to semantics in general, that helps frame this  challenge:</p>
<p align="center"><img src="http://www.mkbergman.com/wp-content/themes/ai3/images/2006Posts/060608a_SW_Workflow.gif" alt="" width="599" height="541" /></p>
<p align="center"><strong>Generic Semantic Integration Workflow </strong>(adapted from  <a href="#_xedn3">[3]</a>)</p>
<p>For existing data and documents, the workflow begins with information  extraction or annotation of semantics and metadata (#1) in accordance  with a reference ontology. Newly found information via harvesting must  also be integrated; however, external information or services may come  bearing their own ontologies, in which case some form of semantic  mediation is required.</p>
<p>Of course, this is a generic workflow, and depending on the  interoperation task, different flows and steps may be required. Indeed,  the overall workflow can vary by perspective and researcher, with  semantic resolution workflow modeling a prime area of current  investigations. (As one alternative among scores, see for example  Cardoso and Sheth.<a href="#_xedn4">[4]</a>)</p>
<h3>Matching and Mapping Semantic Heterogeneities</h3>
<p>Semantic mediation is a process of <em>matching</em> schemas and <em>mapping</em> attributes and values, often with intermediate transformations (such as  unit or language conversions) also required. The general problem of  schema integration is not new, with one prior reference going back as  early as 1986. <a href="#_xedn5">[5]</a> According to Alon Halevy:<a href="#_xedn6">[6]</a></p>
<blockquote><p><em>As would be expected, people have tried building  semi-automated schema-matching systems by employing a variety of  heuristics. The process of reconciling semantic heterogeneity typically  involves two steps. In the first, called schema matching, we find  correspondences between pairs (or larger sets) of elements of the two  schemas that refer to the same concepts or objects in the real world. In  the second step, we build on these correspondences to create the actual  schema mapping expressions. </em></p></blockquote>
<p>The issues of <em>matching</em> and <em>mapping</em> have been addressed  in many tools, notably commercial ones from MetaMatrix,<a href="#_xedn7">[7]</a> and open source and  academic projects such as Piazza, <a href="#_xedn8">[8]</a> SIMILE, <a href="#_xedn9">[9]</a> and the <a title="Web Service Execution Environment (WSMX)" href="http://www.wsmx.org/">WSMX</a> (Web service modeling execution environment) protocol from <a title="Digital Enterprise Research Institute" href="http://www.deri.org/">DERI</a>. <a href="#_xedn10">[10]</a> <a href="#_xedn11">[11]</a> A superb description of  the challenges in reconciling the vocabularies of different data sources  is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM&#8217;s  Prestigious Doctoral Dissertation Award.<a href="#_xedn12">[12]</a></p>
<p>What all of these efforts has found is the inability to completely  automate the mediation process. The current state-of-the-art is to  reconcile what is largely unambiguous automatically, and then prompt  analysts or subject matter experts to decide the questionable matches.  These are known as &#8220;semi-automated&#8221; systems and the user interface and  data presentation and workflow become as important as the underlying  matching and mapping algorithms. According to the WSMX project, there is  always a trade-off between how accurate these mappings are and the  degree of automation that can be offered<em>.</em></p>
<h3>Also a Need for Efficient Semantic Data Stores</h3>
<p>Once all of these reconciliations take place there is the (often  undiscussed) need to index, store and retrieve these semantics and their  relationships at scale, particularly for enterprise deployments. This  is a topic I have addressed many times from the standpoint of <a href="http://www.mkbergman.com/?p=227">scalability</a>, <a href="http://www.mkbergman.com/?p=233">more scalability</a>, and  comparisons of <a href="http://www.mkbergman.com/?p=185">database</a> and relational  technologies, but it is also not a new topic in the general community.</p>
<p>As Stonebraker and Hellerstein note in their retrospective covering  35 years of development in databases,<a href="#_xedn13">[13]</a> some of the first post-relational data models  were typically called semantic data models, including those of Smith and  Smith in 1977<a href="#_xedn14">[14]</a> and  Hammer and McLeod in 1981.<a href="#_xedn15">[15]</a> Perhaps what is different now is our ability to address some of the  fundamental issues.</p>
<p>At any rate, this subsection is included here because of the hidden  importance of database foundations. It is therefore a topic often  addressed in this series.</p>
<h3>A Partial Listing of Semantic Web Tools</h3>
<p>In all of these areas, there is a growing, but still spotty, set of  tools for conducting these semantic tasks. SemWebCentral, the open  source tools resource center, for example, lists many tools and whether  they interact or not with one another (the general answer is often No).<a href="#_xedn16">[16]</a> Protégé also has a  fairly long list of plugins, but not unfortunately well organized. <a href="#_xedn17">[17]</a></p>
<p>In the table below, I begin to compile a partial listing of semantic  Web tools, with more than 50 listed. Though a few are commercial, most  are open source. Also, for the open source tools, only the most  prominent ones are listed (<a href="http://www.sourceforge.net/">Sourceforge</a>, for example, has  about 200 projects listed with some relation to the semantic Web though  most of minor or not yet in alpha release).</p>
<table class="center_ok" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr style="border-style: none; width: 40%; background-image: none;">
<td style="background-color: #cccccc; width: 12%;">
<p align="center"><strong>NAME</strong></p>
</td>
<td style="background-color: #cccccc; width: 37%;">
<p align="center"><strong>URL</strong></p>
</td>
<td style="background-color: #cccccc; width: 50%;">
<p align="center"><strong>DESCRIPTION</strong></p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Almo</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/almo</td>
<td style="width: 50%;" valign="bottom">An ontology-based workflow  engine in Java</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Altova SemanticWorks</td>
<td style="width: 37%;" valign="top">http://www.altova.com/products_semanticworks.html</td>
<td style="width: 50%;" valign="top">Visual RDF and OWL editor that  auto-generates RDF/XML or nTriples based on visual ontology design</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Bibster</td>
<td style="width: 37%;" valign="top"><a href="http://bibster.semanticweb.org/">http://bibster.semanticweb.org/</a></td>
<td style="width: 50%;" valign="top">A semantics-based bibliographic  peer-to-peer system</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">cwm</td>
<td style="width: 37%;" valign="top">http://www.w3.org/2000/10/swap/doc/cwm.html</td>
<td style="width: 50%;" valign="top">A general purpose data processor  for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Deep Query Manager</td>
<td style="width: 37%;" valign="top">http://www.brightplanet.com/products/dqm_overview.asp</td>
<td style="width: 50%;" valign="top">Search federator from deep Web  sources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">DOSE</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/dose</td>
<td style="width: 50%;" valign="top">A distributed platform for semantic  annotation</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">ekoss.org</td>
<td style="width: 37%;" valign="top">http://www.ekoss.org/</td>
<td style="width: 50%;" valign="top">A collaborative knowledge sharing  environment where model developers can submit advertisements</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Endeca</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;"><a href="http://www.endeca.com/">http://www.endeca.com</a></span></td>
<td style="width: 50%;" valign="top">Facet-based content organizer and  search platform</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">FOAM</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;">http://ontoware.org/projects/map</span></td>
<td style="width: 50%;" valign="bottom">Framework for ontology alignment  and mapping</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Gnowsis</td>
<td style="width: 37%;" valign="top">http://www.gnowsis.org/</td>
<td style="width: 50%;" valign="top">A semantic desktop environment</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">GrOWL</td>
<td style="width: 37%;" valign="top">http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html</td>
<td style="width: 50%;" valign="top">Open source graphical ontology  browser and editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">HAWK</td>
<td style="width: 37%;" valign="top">http://swat.cse.lehigh.edu/projects/index.html#hawk</td>
<td style="width: 50%;" valign="top">OWL repository framework and  toolkit</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">HELENOS</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/artemis</td>
<td style="width: 50%;" valign="bottom">A Knowledge discovery workbench  for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jambalaya</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;"><a href="http://www.thechiselgroup.org/jambalaya">http://www.thechiselgroup.org/jambalaya</a></span></td>
<td style="width: 50%;" valign="top">Protégé plug-in for visualizing  ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jastor</td>
<td style="width: 37%;" valign="top"><a href="http://jastor.sourceforge.net/">http://jastor.sourceforge.net/</a></td>
<td style="width: 50%;" valign="bottom">Open source Java code generator  that emits Java Beans from ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jena</td>
<td style="width: 37%;" valign="top">http://jena.sourceforge.net/</td>
<td style="width: 50%;" valign="top">Opensource ontology API written in  Java</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">KAON</td>
<td style="width: 37%;" valign="top">http://kaon.semanticweb.org/</td>
<td style="width: 50%;" valign="top">Open source ontology management  infrastructure</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Kazuki</td>
<td style="width: 37%;" valign="top"><a href="http://projects.semwebcentral.org/projects/kazuki/">http://projects.semwebcentral.org/projects/kazuki/</a></td>
<td style="width: 50%;" valign="bottom">Generates a java API for working  with OWL instance data directly from a set of OWL ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Kowari</td>
<td style="width: 37%;" valign="top">http://www.kowari.org/</td>
<td style="width: 50%;" valign="bottom">Open source database for RDF and  OWL</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">LuMriX</td>
<td style="width: 37%;" valign="top">http://www.lumrix.net/xmlsearch.php</td>
<td style="width: 50%;" valign="top">A commercial search engine using  semantic Web technologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">MetaMatrix</td>
<td style="width: 37%;" valign="top">http://www.metamatrix.com/</td>
<td style="width: 50%;" valign="top">Semantic vocabulary mediation and  other tools</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Metatomix</td>
<td style="width: 37%;" valign="top">http://www.metatomix.com/</td>
<td style="width: 50%;" valign="top">Commercial semantic toolkits and  editors</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">MindRaider</td>
<td style="width: 37%;" valign="top">http://mindraider.sourceforge.net/index.html</td>
<td style="width: 50%;" valign="top">Open source semantic Web outline  editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Model Futures OWL Editor</td>
<td style="width: 37%;" valign="top">http://www.modelfutures.com/OwlEditor.html</td>
<td style="width: 50%;" valign="top">Simple OWL tools, featuring UML  (XMI), ErWin, thesaurus and imports</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Net OWL</td>
<td style="width: 37%;" valign="top">http://www.netowl.com/</td>
<td style="width: 50%;" valign="top">Entity extraction engine from SRA  International</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Nokia Semantic Web Server</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/sws-uriqa</td>
<td style="width: 50%;" valign="top">An RDF based knowledge portal for  publishing both authoritative and third party descriptions of URI  denoted resources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">OntoEdit/OntoStudio</td>
<td style="width: 37%;" valign="top"><a href="http://ontoedit.com/">http://ontoedit.com/</a></td>
<td style="width: 50%;" valign="top">Engineering environment for  ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">OntoMat Annotizer</td>
<td style="width: 37%;" valign="top"><a href="http://annotation.semanticweb.org/ontomat">http://annotation.semanticweb.org/ontomat</a></td>
<td style="width: 50%;" valign="top">Interactive Web page OWL and  semantic annotator tool</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Oyster</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/oyster</td>
<td style="width: 50%;" valign="bottom">Peer-to-peer system for storing  and sharing ontology metadata</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Piggy Bank</td>
<td style="width: 37%;" valign="top">http://simile.mit.edu/piggy-bank/</td>
<td style="width: 50%;" valign="top">A Firefox-based semantic Web  browser</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Pike</td>
<td style="width: 37%;" valign="top">http://pike.ida.liu.se/</td>
<td style="width: 50%;" valign="top">A dynamic programming (scripting)  language similar to Java and C for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">pOWL</td>
<td style="width: 37%;" valign="top">http://powl.sourceforge.net/index.php</td>
<td style="width: 50%;" valign="top">Semantic Web development platform</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Protégé</td>
<td style="width: 37%;" valign="top">http://protege.stanford.edu/</td>
<td style="width: 50%;" valign="top">Open source visual ontology editor  written in Java with many plug-in tools</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RACER Project</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/racerproject</td>
<td style="width: 50%;" valign="bottom">A collection of Projects and  Tools to be used with the semantic reasoning engine RacerPro</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RDFReactor</td>
<td style="width: 37%;" valign="top">http://rdfreactor.ontoware.org/</td>
<td style="width: 50%;" valign="bottom">Access RDF from Java using  inferencing</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Redland</td>
<td style="width: 37%;" valign="top">http://librdf.org/</td>
<td style="width: 50%;" valign="top">Open source software libraries  supporting RDF</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RelationalOWL</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/relational-owl</td>
<td style="width: 50%;" valign="top">Automatically extracts the  semantics of virtually any relational database and transforms this  information automatically into RDF/OW</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantical</td>
<td style="width: 37%;" valign="top">http://semantical.org/</td>
<td style="width: 50%;" valign="top">Open source semantic Web search  engine</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SemanticWorks</td>
<td style="width: 37%;" valign="top"><a href="http://www.altova.com/products_semanticworks.html">http://www.altova.com/products_semanticworks.html</a></td>
<td style="width: 50%;" valign="top">SemanticWorks RDF/OWL Editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantic Mediawiki</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/semediawiki</td>
<td style="width: 50%;" valign="top">Semantic extension to the  MediaWiiki wiki</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantic Net Generator</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/semantag</td>
<td style="width: 50%;" valign="top">Utility for generating topic maps  automatically</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Sesame</td>
<td style="width: 37%;" valign="top">http://www.openrdf.org/</td>
<td style="width: 50%;" valign="top">An open source RDF database with  support for RDF Schema inferencing and querying</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SMART</td>
<td style="width: 37%;" valign="top">http://web.ict.nsc.ru/smart/index.phtml?lang=en</td>
<td style="width: 50%;" valign="top">System for Managing Applications  based on RDF Technology</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SMORE</td>
<td style="width: 37%;" valign="top"><a href="http://www.mindswap.org/2005/SMORE/">http://www.mindswap.org/2005/SMORE/</a></td>
<td style="width: 50%;" valign="top">OWL markup for HTML pages</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SPARQL</td>
<td style="width: 37%;" valign="top">http://www.w3.org/TR/rdf-sparql-query/</td>
<td style="width: 50%;" valign="top">Query language for RDF</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SWCLOS</td>
<td style="width: 37%;" valign="top"><a href="http://iswc2004.semanticweb.org/demos/32/">http://iswc2004.semanticweb.org/demos/32/</a></td>
<td style="width: 50%;" valign="top">A semantic Web processor using Lisp</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Swoogle</td>
<td style="width: 37%;" valign="top"><a href="http://swoogle.umbc.edu/">http://swoogle.umbc.edu/</a></td>
<td style="width: 50%;" valign="top">A semantic Web search engine with  1.5 M resources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SWOOP</td>
<td style="width: 37%;" valign="top"><a href="http://www.mindswap.org/2004/SWOOP/">http://www.mindswap.org/2004/SWOOP/</a></td>
<td style="width: 50%;" valign="top">A lightweight ontology editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Turtle</td>
<td style="width: 37%;" valign="top">http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/</td>
<td style="width: 50%;" valign="top">Terse RDF &#8220;Triple&#8221; language</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMO Studio</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmostudio</td>
<td style="width: 50%;" valign="top">A semantic Web service editor  compliant with WSMO as a set of Eclipse plug-ins</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMT Toolkit</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmt</td>
<td style="width: 50%;" valign="top">The Web Service Modeling Toolkit  (WSMT) is a collection of tools for use with the Web Service Modeling  Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web  Service Execution Environment (WSMX)</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMX</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmx/</td>
<td style="width: 50%;" valign="top">Execution environment for dynamic  use of semantic Web services</td>
</tr>
</tbody>
</table>
<h3>Tools Still Crude, Integration Not Compelling</h3>
<p>Individually, there are some impressive and capable tools on this  list. Generally, however, the interfaces are not intuitive, integration  between tools is lacking, and why and how standard analysts should  embrace them is lacking. In the semantic Web, we have yet to see an  application of the magnitude of the first Mosaic browser that made HTML  and the World Wide Web compelling.</p>
<p>It is perhaps likely that a similar &#8220;killer app&#8221; may not be  forthcoming for the semantic Web. But it is important to remember just  how entwined tools are to accelerating acceptance and growth of new  standards and protocols.</p>
<div class="boxBrownDotted" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag   Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday   Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday   brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about four years  ago on <a href="http://www.mkbergman.com/241/methods-for-semantic-discovery-annotation-and-mediation/">June  12, 2006</a>. It was the follow-on to <a href="http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/">last week&#8217;s Brown Bag Lunch posting</a>. It is also the first attempt I made at assembling semantic Web- and -related tools, which has now grown into the 800+ <span style="color: #993300;"><strong><a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/">Sweet Tools</a></strong></span> listing. No changes have been made to the original posting.</div>
<hr size="1" />
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn1">[1]</a> Paul  Warren, &#8220;<a href="http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;">Knowledge  Management and the Semantic Web: From Scenario to Technology</a>,&#8221; <em>IEEE  Intelligent Systems</em>, vol. 21, no. 1, 2006, pp. 53-59. See <a href="http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;">http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;</a></div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn2">[2]</a> See <a href="http://www.semwebcentral.org/index.jsp?page=workflows">http://www.semwebcentral.org/index.jsp?page=workflows</a>. [Link now missing.]</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn3">[3]</a> Said Izza, Lucien  Vincent and Patrick Burlat, &#8220;A Unified Framework for Enterprise  Integration: An Ontology-Driven Service-Oriented Approach,&#8221; pp. 78-89,  in <em>Pre-proceedings of the First International Conference on  Interoperability of Enterprise Software and Applications  (INTEROP-ESA&#8217;2005)</em>, Geneva, Switzerland, February 23 &#8211; 25, 2005, 618  pp. See <a href="http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf">http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn4">[4]</a> Jorge Cardoso and Amit  Sheth, &#8220;Semantic Web Processes: Semantics Enabled Annotation, Discovery,  Composition and Orchestration of Web Scale Processes,&#8221; in the<em> 4th  International Conference on Web Information Systems Engineering (WISE  2003)</em>, December 10-12, 2003, Rome, Italy. See <a href="http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf">http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn5">[5]</a> C. Batini, M.  Lenzerini, and S.B. Navathe, &#8220;A Comparative Analysis of Methodologies  for Database Schema Integration,&#8221; in <em>ACM Computing Survey</em>,  18(4):323-364, 1986.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn6">[6]</a> Alon Halevy, &#8220;Why Your  Data Won&#8217;t Mix,&#8221; <em>ACM Queue</em> vol. 3, no. 8, October 2005. See <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336">http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn7">[7]</a> Chuck Moser, Semantic  Interoperability: Automatically Resolving Vocabularies, presented at the  <em>4th Semantic Interoperability Conference</em>, February 10, 2006. See  <a href="http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt">http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn8">[8]</a> Alon Y. Halevy, Zachary  G. Ives, Peter Mork and Igor Tatarinov, &#8220;Piazza: Data Management  Infrastructure for Semantic Web Applications,&#8221; <em>Journal of Web  Semantics,</em> Vol. 1 No. 2, February 2004, pp. 155-175. See <a href="http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf">http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn9">[9]</a> Stefano Mazzocchi,  Stephen Garland, Ryan Lee, &#8220;SIMILE: Practical Metadata for the Semantic  Web,&#8221; January 26, 2005. See <a href="http://www.xml.com/pub/a/2005/01/26/simile.html">http://www.xml.com/pub/a/2005/01/26/simile.html</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn10">[10]</a> Adrian Mocan, Ed.,  &#8220;WSMX Data Mediation,&#8221; in <em>WSMX Working Draft, W3C Organization</em>,  11 October 2005. See <a href="http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011">http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn11">[11]</a> J.Madhavan , P. A.  Bernstein , P. Domingos and A. Y. Halevy, &#8220;Representing and Reasoning  About Mappings Between Domain Models,&#8221; in the <em>Eighteenth National  Conference on Artificial Intelligence</em>, pp.80-86, Edmonton, Alberta,  Canada, July 28-August 01, 2002.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn12">[12]</a> AnHai Doan, Learning  to Map between Structured Representations of Data, Ph.D. Thesis to the  Computer Science &amp; Engineering Department, University of Washington,  2002, 133 pp. See <a href="http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf">http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn13">[13]</a> Michael Stonebraker  and Joey Hellerstein, &#8220;What Goes Around Comes Around,&#8221; in Joseph M.  Hellerstein and Michael Stonebraker, editors, <em>Readings in Database  Systems, Fourth Edition</em>, pp. 2-41, The MIT Press, Cambridge, MA,  2005. See <a href="http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf">http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn14">[14]</a> John Miles Smith and  Diane C. P. Smith, &#8220;Database Abstractions: Aggregation and  Generalization,&#8221; <em>ACM Transactions on Database Systems</em> 2(2):  105-133, 1977.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn15">[15]</a> Michael Hammer and  Dennis McLeod, &#8220;Database Description with SDM: A Semantic Database  Model,&#8221; <em>ACM Transactions on Database Systems</em> 6(3): 351-386, 1981.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn16">[16]</a> See <a href="http://www.semwebcentral.org/index.jsp?page=home">http://www.semwebcentral.org/index.jsp?page=home</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn17">[17]</a> See <a href="http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType">http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities</title>
		<link>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/</link>
		<comments>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 15:22:36 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[semantic heterogeneities]]></category>
		<category><![CDATA[semantic mediation]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=874</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/&amp;rft.language=English"></span>

Semantic mediation  &#8212; that is, resolving semantic heterogeneities &#8212; must address more than  40 discrete categories of potential mismatches from units of measure,  terminology, language, and many others.  These sources may derive from  structure, domain, data or language.
Earlier postings in this recent series traced the progress in climbing the data [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday  Brown Bag Lunch" width="158" height="179" /></p>
<div class="boxGraySolid" style="margin-left: 190px;"><em>Semantic mediation  &#8212; that is, resolving semantic heterogeneities &#8212; must address more than  40 discrete categories of potential mismatches from units of measure,  terminology, language, and many others.  These sources may derive from  structure, domain, data or language.</em></div>
<p>Earlier postings in this recent series traced the progress in <a href="http://www.mkbergman.com/?p=229">climbing the data federation  pyramid</a> to today&#8217;s <a href="http://www.mkbergman.com/?p=231">current emphasis on the  semantic Web</a>. Partially this series is aimed at disabusing the  notion that data extensibility can arise simply by using the <a href="http://en.wikipedia.org/wiki/Xml">XML</a> (eXtensible Markup  Language) <em>data representation</em> protocol. As Stonebraker and  Hellerstein correctly observe:</p>
<blockquote><p><em>XML is sometimes marketed as the solution to the  semantic heterogeneity problem . . . .  Nothing could be further from  the truth. Just because two people tag a data element as a salary does  not mean that the two data elements are comparable. One could be salary  after taxes in French francs including a lunch allowance, while the  other could be salary before taxes in US dollars. Furthermore, if you  call them &#8220;rubber gloves&#8221;  and I call them &#8220;latex hand protectors&#8221;, then  XML will be useless in deciding that they are the same concept. Hence,  the role of XML will be limited to providing the vocabulary in which  common schemas can be constructed.</em><a href="#_edn1">[1]</a></p></blockquote>
<p>This series also covers the ontologies and the OWL language (written  in XML) that now give us the means to understand and process these  different domains and &#8220;world views&#8221; by machine. According to Natalya  Noy, one of the principal researchers behind the <a href="http://protege.stanford.edu/plugins/owl/">Protégé</a> development environment for ontologies and knowledge-based systems:</p>
<blockquote><p><em>How are ontologies and the Semantic Web different from  other forms of structured and semi-structured data, from database  schemas to XML? Perhaps one of the main differences lies in their  explicit formalization. If we make more of our assumptions explicit and  able to be processed by machines, automatically or semi-automatically  integrating the data will be easier. Here is another way to look at  this: ontology languages have formal semantics, which makes building  software agents that process them much easier, in the sense that their  behavior is much more predictable (assuming they follow the specified  explicit semantics&#8211;but at least there is something to follow).</em> <a href="#_edn2">[2]</a></p></blockquote>
<p>Again, however, simply because OWL (or similar) languages now give us  the means to represent an ontology, we still have the vexing challenge  of how to resolve the differences between different &#8220;world views,&#8221; even  within the same domain. According to Alon Halevy:</p>
<blockquote><p><em>When independent parties develop database schemas for  the same domain, they will almost always be quite different from each  other. These differences are referred to as semantic heterogeneity,  which also appears in the presence of multiple XML documents, Web  services, and ontologies&#8211;or more broadly, whenever there is more than  one way to structure a body of data. The presence of semi-structured  data exacerbates semantic heterogeneity, because semi-structured schemas  are much more flexible to start with. For multiple data systems to  cooperate with each other, they must understand each other&#8217;s schemas.  Without such understanding, the multitude of data sources amounts to a  digital version of the Tower of Babel.</em> <a href="#_edn3">[3]</a></p></blockquote>
<p>In the sections below, we describe the sources for how this  heterogeneity arises and classify the many different types of  heterogeneity. I then describe some broad approaches to overcoming these  heterogeneities, though a <a href="http://www.mkbergman.com/?p=241">subsequent post looks at that  topic in more detail</a>.</p>
<h3>Causes and Sources of Semantic Heterogeneity</h3>
<p>There are many potential circumstances where semantic heterogeneity  may arise (partially from Halevy <a href="#_edn3">[3]</a>):</p>
<ul>
<li>Enterprise information integration</li>
<li>Querying and indexing the deep Web (which is a classic data  federation problem in that there are literally tens to hundreds of  thousands of separate Web databases) <a href="#_edn4">[4]</a></li>
<li>Merchant catalog mapping</li>
<li>Schema <em>v.</em> data heterogeneity</li>
<li>Schema heterogeneity and semi-structured data.</li>
</ul>
<p>Naturally, there will always be differences in how differing authors  or sponsors create their own particular &#8220;world view,&#8221; which, if  transmitted in XML or expressed through an ontology language such as OWL  may also result in differences based on expression or syntax. Indeed,  the ease of conveying these schema as semi-structured XML, RDF or OWL  is in and of itself a source of potential expression heterogeneities.  There are also other sources in simple schema use and versioning that  can create mismatches <a href="#_edn3">[3]</a>. Thus, possible drivers in semantic mismatches  can occur from world view, perspective, syntax, structure and versioning  and timing:</p>
<ul>
<li>One schema may express a similar &#8220;world view&#8221; with different syntax,  grammar or structure</li>
<li>One schema may be a new version of the other</li>
<li>Two or more schemas may be evolutions of the same original schema</li>
<li>There may be many sources modeling the same aspects of the  underlying domain (&#8221;horizontal resolution&#8221; such as for competing trade  associations or standards bodies), or</li>
<li>There may be many sources that cover different domains but overlap  at the seams (&#8221;vertical resolution&#8221; such as between pharmaceuticals and  basic medicine).</li>
</ul>
<p>Regardless, the needs for semantic mediation are manifest, as are the  ways in which semantic heterogeneities may arise.</p>
<h3>Classification of Semantic Heterogeneities</h3>
<p>The first known classification scheme applied to data semantics that I  am aware of is from William Kent nearly 20 years ago.<a href="#_edn5">[5]</a> (If you  know of earlier ones, please send me a note.) Kent&#8217;s approach dealt more  with structural mapping issues (see below) than differences in meaning,  which he pointed to data dictionaries as potentially solving.</p>
<p>The most comprehensive schema I have yet encountered is from  Pluempitiwiriyawej and Hammer, &#8220;A Classification Scheme for Semantic and  Schematic Heterogeneities in XML Data Sources.&#8221; <a href="#_edn6">[6]</a> They  classify heterogeneities into three broad classes:</p>
<blockquote>
<ul>
<li><em>Structural </em>conflicts arise when the schema of the sources  representing related or overlapping data exhibit discrepancies.  Structural conflicts can be detected when comparing the underlying DTDs.  The class of structural conflicts includes generalization conflicts,  aggregation conflicts, internal path discrepancy, missing items, element  ordering, constraint and type mismatch, and naming conflicts between  the element types and attribute names.</li>
<li><em>Domain </em>conflicts arise when the semantic of the data sources  that will be integrated exhibit discrepancies. Domain conflicts can be  detected by looking at the information contained in the DTDs and using  knowledge about the underlying data domains. The class of domain  conflicts includes schematic discrepancy, scale or unit, precision, and  data representation conflicts.</li>
<li><em>Data </em>conflicts refer to discrepancies among similar or  related data values across multiple sources. Data conflicts can only be  detected by comparing the underlying DOCs. The class of data conflicts  includes ID-value, missing data, incorrect spelling, and naming  conflicts between the element contents and the attribute values.</li>
</ul>
</blockquote>
<p>Moreover, mismatches or conflicts can occur between set elements (a  &#8220;population&#8221; mismatch) or attributes (a &#8220;description&#8221; mismatch).</p>
<p>The table below builds on Pluempitiwiriyawej and Hammer&#8217;s schema by  adding the fourth major explicit category of language, leading to about  40 distinct potential sources of semantic heterogeneities:</p>
<div>
<table class="center_ok" style="width: 620px;" border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 127px;" valign="top">
<p align="center"><strong>Class</strong></p>
</td>
<td style="background-color: #cccccc; width: 142px;" valign="top">
<p align="center"><strong>Category</strong></p>
</td>
<td style="background-color: #cccccc; width: 350px;" valign="top">
<p align="center"><strong>Subcategory</strong></p>
</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="15"><strong>STRUCTURAL</strong></td>
<td style="width: 142px;" rowspan="4" align="left">Naming</td>
<td style="width: 350px;">Case Sensitivity</td>
</tr>
<tr>
<td style="width: 350px;">Synonyms</td>
</tr>
<tr>
<td style="width: 350px;">Acronyms</td>
</tr>
<tr>
<td style="width: 350px;">Homonyms</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Generalization / Specialization</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="2">Aggregation</td>
<td style="width: 350px;">Intra-aggregation</td>
</tr>
<tr>
<td style="width: 350px;">Inter-aggregation</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Internal Path Discrepancy</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="4">Missing Item</td>
<td style="width: 350px;">Content Discrepancy</td>
</tr>
<tr>
<td style="width: 350px;">Attribute List Discrepancy</td>
</tr>
<tr>
<td style="width: 350px;">Missing Attribute</td>
</tr>
<tr>
<td style="width: 350px;">Missing Content</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Element Ordering</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Constraint Mismatch</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Type Mismatch</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="8"><strong>DOMAIN</strong></td>
<td style="width: 142px;" rowspan="4">Schematic Discrepancy</td>
<td style="width: 350px;">Element-value to Element-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Attribute-value to Element-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Element-value to Attribute-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Attribute-value to Attribute-label Mapping</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Scale or Units</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Precision</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="2">Data Representation</td>
<td style="width: 350px;">Primitive Data Type</td>
</tr>
<tr>
<td style="width: 350px;">Data Format</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="7"><strong>DATA</strong></td>
<td style="width: 142px;" rowspan="4">Naming</td>
<td style="width: 350px;">Case Sensitivity</td>
</tr>
<tr>
<td style="width: 350px;">Synonyms</td>
</tr>
<tr>
<td style="width: 350px;">Acronyms</td>
</tr>
<tr>
<td style="width: 350px;">Homonyms</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">ID Mismatch or Missing ID</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Missing Data</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Incorrect Spelling</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="8"><strong>LANGUAGE</strong></td>
<td style="width: 142px;" rowspan="4">Encoding</td>
<td style="width: 350px;">Ingest Encoding Mismatch</td>
</tr>
<tr>
<td style="width: 350px;">Ingest Encoding Lacking</td>
</tr>
<tr>
<td style="width: 350px;">Query Encoding Mismatch</td>
</tr>
<tr>
<td style="width: 350px;">Query Encoding Lacking</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="4">Languages</td>
<td style="width: 350px;">Script Mismatches</td>
</tr>
<tr>
<td style="width: 350px;">Parsing / Morphological Analysis Errors (many)</td>
</tr>
<tr>
<td style="width: 350px;">Syntactical Errors (many)</td>
</tr>
<tr>
<td style="width: 350px;">Semantic Errors (many)</td>
</tr>
</tbody>
</table>
</div>
<p>Most of these line items are self-explanatory, but a few may not be:</p>
<ul>
<li><em>Homonyms</em> refer to the same name referring to more than one  concept, such as Name referring to a person v. Name referring to a book</li>
<li>A <em>generalization/specialization</em> mismatch can occur when  single items in one schema are related to multiple items in another  schema, or vice versa. For example, one schema may refer to &#8220;phone&#8221; but  the other schema has multiple elements such as &#8220;home phone,&#8221; &#8220;work  phone&#8221; and &#8220;cell phone&#8221;</li>
<li><em>Intra-aggregation</em> mismatches come when the same population is  divided differently (Census <em>v</em>. Federal regions for states, or  full person names <em>v.</em> first-middle-last, for examples) by schema,  whereas <em>inter-aggregation</em> mismatches can come from sums or counts  as added values</li>
<li>Internal path discrepancies can arise from different source-target  retrieval paths in two different schema (for example, hierarchical  structures where the elements are different levels of remove)</li>
<li>The four sub-types of <em>schematic discrepancy</em> refer to where  attribute and element names may be interchanged between schema</li>
<li>Under languages, <em>encoding</em> mismatches can occur when either  the import or export of data to XML assumes the wrong encoding type.  While XML is based on Unicode, it is important that source retrievals  and issued queries be in the proper encoding of the source. For Web  retrievals this is very important, because only about 4% of all  documents are in Unicode, and <a title="Tutorial:  Internet Languages  and Encodings" href="http://www.mkbergman.com/?p=195">earlier BrightPlanet provided  estimates there may be on the order of 25,000 language-encoding pairs  presently on the Internet</a></li>
<li>Even should the correct encoding be detected, there are significant  differences in different language sources in <em>parsing</em> (white  space, for example), <em>syntax</em> and <em>semantics </em>that can also  lead to many error types.</li>
</ul>
<p>It should be noted that a different take on classifying semantics and  integration approaches is taken by Sheth et al. <a href="#_edn7">[7]</a> Under their concept, they  split semantics into three forms: implicit, formal and powerful.  Implicit semantics are what is either largely present or can easily be  extracted; formal languages, though relatively scarce, occur in the form  of ontologies or other descriptive logics; and powerful (soft)  semantics are fuzzy and not limited to rigid set-based assignments.  Sheth et al.&#8217;s main point is that first-order logic (FOL) or descriptive  logic is inadequate alone to properly capture the needed semantics.</p>
<p>From my viewpoint, Pluempitiwiriyawej and Hammer&#8217;s <a href="#_edn6">[6]</a> classification better lends itself to pragmatic tools and approaches,  though the Sheth et al. approach also helps indicate what can be  processed <em>in situ</em> from input data <em>v.</em> inferred or  probabalistic matches.</p>
<h3>Importance of Reference Standards</h3>
<p>An attractive and compelling vision  &#8212; perhaps even a likely one  &#8212;  is that standard reference ontologies become increasingly prevalent as  time moves on and semantic mediation is seen as more of a mainstream  problem. Certainly, a start on this has been seen with the use of the <a href="http://dublincore.org/">Dublin  Core</a> metadata initiative, and increasingly other associations,  organizations, and major buyers are busy developing &#8220;standardized&#8221; or  reference ontologies.<a href="#_edn8">[8]</a> Indeed, there are now more than 10,000  ontologies available on the Web.<a href="#_edn9">[9]</a> Insofar as these gain  acceptance, semantic mediation can become an effort mostly at the  periphery and not the core.</p>
<p>But, such is not the case today. Standards only have limited success  and in targeted domains where incentives are strong. That acceptance and  benefit threshold has yet to be reached on the Web. Until such time, a  multiplicity of automated methods, semi-automated methods and gazetteers  will all be required to help resolve these potential heterogeneities.</p>
<div class="boxBrownDotted" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag  Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday  Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday  brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about four years ago on <a href="http://www.mkbergman.com/232/sources-and-classification-of-semantic-heterogeneities/">June 6, 2006</a>. No changes have been made to the original posting. Current approaches to dealing with these heterogeneities would be to use &#8220;bridging&#8221; ontologies that map the mismatches.</div>
<hr size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn1" name="_edn1">[1]</a> Michael Stonebraker and Joey Hellerstein, &#8220;What Goes Around Comes  Around,&#8221; in Joseph M. Hellerstein and Michael Stonebraker, editors, <em>Readings  in Database Systems, Fourth Edition</em>, pp. 2-41, The MIT Press,  Cambridge, MA, 2005. See <a href="http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf">http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn2" name="_edn2">[2]</a> Natalya Noy,  &#8220;Order from Chaos,&#8221; ACM Queue vol. 3, no. 8, October 2005 See <a href="http://www.acmqueue.com/modules.php?name=Content&amp;pa=showpage&amp;pid=341&amp;page=1">http://www.acmqueue.com/modules.php?name=Content&amp;pa=showpage&amp;pid=341&amp;page=1</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn3" name="_edn3">[3]</a> Alon  Halevy, &#8220;Why Your Data Won&#8217;t Mix,&#8221; <em>ACM Queue</em> vol. 3, no. 8,  October 2005. See <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336">http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn4" name="_edn4">[4]</a> Michael  K. Bergman, &#8220;The Deep Web: Surfacing Hidden Value,&#8221; <em>BrightPlanet  Corporation White Paper</em>, June 2000. The most recent version of the  study was published by the University of Michigan&#8217;s <em>Journal of  Electronic Publishing</em> in July 2001. See <a href="http://www.press.umich.edu/jep/07-01/bergman.html">http://www.press.umich.edu/jep/07-01/bergman.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn5" name="_edn5">[5]</a> William  Kent, &#8220;The Many Forms of a Single Fact&#8221;, <em>Proceedings of the IEEE  COMPCON</em>, Feb. 27-Mar. 3, 1989, San Francisco. Also HPL-SAL-88-8,  Hewlett-Packard Laboratories, Oct. 21, 1988. [13 pp]. See <a href="http://www.bkent.net/Doc/manyform.htm">http://www.bkent.net/Doc/manyform.htm</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn6" name="_edn6">[6]</a> Charnyote  Pluempitiwiriyawej and Joachim Hammer, &#8220;A Classification Scheme for  Semantic and Schematic Heterogeneities in XML Data Sources,&#8221; <em>Technical  Report TR00-004</em>, University of Florida, Gainesville, FL, 36 pp.,  September 2000. See <a href="ftp://ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf">ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn7" name="_edn7">[7]</a> Amit  Sheth, Cartic Ramakrishnan and Christopher Thomas, &#8220;Semantics for the  Semantic Web: The Implicit, the Formal and the Powerful,&#8221; in <em>Int&#8217;l  Journal on Semantic Web &amp; Information Systems</em>, 1(1), 1-18,  Jan-March 2005. See <a href="http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html">http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn8" name="_edn8">[8]</a> See,  among scores of possible examples, the NIEM (National Information  Exchange Model) agreed to between the US Departments of Justice and  Homeland Security; see <a href="http://www.niem.gov/">http://www.niem.gov/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn9" name="_edn9">[9]</a> <a href="http://www.mkbergman.com/?p=194">OWL Ontologies: When Machine  Readable is Not Good Enough</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Two Contrasting Styles for the Semantic Enterprise</title>
		<link>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 15:36:49 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=866</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
Our Own Approach is Adaptive and Incremental
It is gratifying to see the emergence of the term semantic enterprise, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
<h2><img style="border: 0px solid; width: 225px; height: 225px; float: left; margin-right: 10px;" title="Two Faces in Circle, from http://energeticrelations.com/" src="../wp-content/themes/ai3/images/2010Posts/100214_two_faces_in_circle.jpg" alt="Two Faces in Circle, from http://energeticrelations.com/" />Our Own Approach is Adaptive and Incremental</h2>
<p>It is gratifying to see the emergence of the term <span style="font-style: italic;">semantic enterprise</span>, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single (nor best, depending on         circumstance) way to approach becoming a semantic enterprise.</p>
<p>In this piece I contrast two styles. The more traditional and familiar         one is comprehensive, complete and &#8220;engineered&#8221; in its approach. The         second, and emerging style, is more adaptive and incremental. While         <a href="http://structureddynamics.com/">Structured Dynamics</a> is a         proponent and thought leader for the adaptive style, the use and         applicability of either approach is really a function of objectives and         circumstances. The choice of approach depends on use case, and should not be a dogmatic one.</p>
<p>Any time a contrast is posed, one should be on guard about         setting up a rhetorical strawman. There may perhaps be a bit of this         flavor in this article; if so, it is unintended. It is probably best to         realize that there is a gradient &#8212; or spectrum &#8212; of possible         approaches between these contrasting styles. The real message is to         understand these differences such that you can comfortably place your         own organization at the right points along this spectrum.</p>
<h3>A Spectrum of Advantages and Differences</h3>
<p>The general idea of semantics in the enterprise preceeds the use of the         term, having been somewhat captured before by the ideas of <a href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise         application integration</a>, <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration">enterprise         information integration</a> and other concepts even related to <a href="http://en.wikipedia.org/wiki/Federated_database_system">data         federation</a> and <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a> stretching back to the 1980s. However, as a specific label, we can look         back to the first mentions in the late 1990s and more concerted         attention beginning from about 2002 or so onward <a href="#styles1">[1]</a>. As another         indicator, since 2005 the Semantic Technology Conference has given         specific prominence to the enterprise <a href="#styles2">[2]</a>.</p>
<p>Throughout this period, the sense from academic papers, many vendors,         and most pundits <a href="#styles3">[3]</a> has been on things like automated reasoning,         machine-aided decision making, aspects of artificial intelligence, and         so forth. The general tone is often framed as &#8220;revolution&#8221; or &#8220;massive         changes&#8221; or something &#8220;entirely new.&#8221; If you are a consultant or         software/implementation vendor &#8212; especially where VC money is backing         the venture with hopes for big returns and home runs &#8212; it may make         cynical sense to sell such large and costly change.</p>
<p>I believe there are circumstances where the <span style="font-style: italic;">Semantic Enterprise</span> writ this large may         make sense and be financially justified. But, this kind of &#8220;big change&#8221;         view has also seen relatively few visible (or successful) deployments.         It has colored what it means to be a semantic enterprise. And, I         believe, it has weakened market credibility by perhaps overpromising         and underdelivering. The conventional view of what it is         be a semantic enterprise deserves to be balanced.</p>
<p>So, as we balance this understanding of the semantic enterprise to one         that is more nuanced, we can contrast the characteristics of the two         apposite styles as follows:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style</td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style</td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>A focus on a more complete, comprehensive coverage of the                 semantics in the domain</li>
<li>More enterprise-wide, less partial or departmental</li>
<li>Greater emphasis on &#8220;<a href="http://en.wikipedia.org/wiki/Closed_world_assumption">closed                 world</a>&#8221; approaches <a href="#styles4">[4]</a>; more akin to relational database                 architecting and schema</li>
<li>Expansion is possible, but effort may be somewhat complex</li>
<li>A general implication is to replace or supplant existing                 information structures with semantic ones</li>
<li>Not necessarily based on semantic Web standards and                 languages <a href="#styles5">[5]</a> (<span style="font-style: italic;">e.g.</span>,                 may include <a href="http://en.wikipedia.org/wiki/Common_logic">Common Logic</a>,                 <a href="http://en.wikipedia.org/wiki/Frame_%28artificial_intelligence%29"> frame logics</a>, etc.)</li>
<li>Richer set of predicates (relations)</li>
<li>Though a distinction is maintained between                 schema and instances, their separation may not be consistently                 (physically) enforced</li>
<li>Often more complicated inferencing and logic tests</li>
<li>More complete enumeration and characterization of items</li>
<li>Much process around semantics agreement across groups</li>
<li>Fairly well-developed implementation tools, including for                 ontology engineering</li>
<li>Implementation times in months to years</li>
<li>Implementation costs akin to traditional large-scale IT                 projects</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>An emphasis on a simpler, incremental, &#8220;learn as you go&#8221;                 approach</li>
<li>Start with single departments or limited vertical apps</li>
<li>Embedded in the &#8220;<a href="http://en.wikipedia.org/wiki/Open_world_assumption">open                 world</a>&#8221; approach <a href="#styles4">[4]</a>, with incorporation of external                 information</li>
<li>Design and approach inherently allows incremental expansion                 and adaptation</li>
<li>A key premise is to build from and leverage existing                 information structures, vocabularies and assets</li>
<li>Fully based on semantic Web standards and languages <a href="#styles5">[5]</a>,                 often including linked data <a href="#styles6">[6]</a></li>
<li>Tends to start simply with hierarchical or related concepts                 (<span style="font-style: italic;">e.g.</span>, SKOS)</li>
<li>Conscious distinction in the structure for                 handling schema separate from instances <a href="#styles7">[7]</a></li>
<li>Inferencing logic based more on concept matching, or                 parent-child or part-of relationships</li>
<li>Degree of item characterization based on current scope</li>
<li>Initial semantic matching can be driven from existing                 assets</li>
<li>Fairly well-developed implementation tools, <span style="font-style: italic; text-decoration: underline;">except</span> for how to engage publics in the development process</li>
<li>Implementation times in weeks to months</li>
<li>Implementation costs driven by available budgets (and thus                 scope)</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Note we have labeled the conventional approach as the &#8220;comprehensive,         engineering&#8221; style; its contrast, and the one we position more closely to, is the         &#8220;adaptive, incremental&#8221; style.</p>
<p style="margin-left: 30px; margin-right: 30px;">[Others have posited contrasting styles, most often as "top down"         <span style="font-style: italic;">v.</span> "bottom up." However, in         one interpretation of that distinction, "top down" means a layer on top         of the existing Web <a href="#styles8">[8]</a>. On the other hand, &#8220;top down&#8221; is more often         understood in the sense of a &#8220;comprehensive, engineered&#8221; view,         consistent with my own understanding <a href="#styles9">[9]</a>. Yet no matter which  		characterization, neither captures what I feel to be the more         important considerations of mindset, logic and premise.]</p>
<p>Though the table above contrasts many points, I think there are two         main distinctions to the adaptive approach. First, it firmly embraces         the open world assumption. OWA is key to an incremental, &#8220;learn as you         go&#8221; deployment that is also well suited to incorporation of external         information. The second main distinction is to leverage and build from         existing assets.</p>
<h3>A Spectrum of Applications</h3>
<p>Yet as noted in the opening, which of these approaches makes better         sense depends on circumstance. One aspect of circumstance is available         budget and deployment times for pilots or proofs-of-concept. Another         aspect, of course, is the planned use or application         for the deployment.</p>
<p>These are by no means hard distinctions, but in general we can see         these contrasting approaches applying to the following uses:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more CWA driven)</span></td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more OWA driven)</span></td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>Bounded, &#8220;inward&#8221; applications (high degree of control and                 completeness)</li>
<li>Engineering enterprises</li>
<li>Technical domains and organizations</li>
<li>Aeronautics</li>
<li>Pharmaceuticals</li>
<li>Chemicals</li>
<li>Petroleum</li>
<li>Energy</li>
<li>A/E firms (construction)</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>External facing applications, organizations (customers,                 incorporation of external data)</li>
<li>Faceted Search</li>
<li>Taxonomy updates</li>
<li>Multi-domain master data management (MDM)</li>
<li>Simple (initially) inferencing</li>
<li>Consumer products</li>
<li>Finance</li>
<li>Health care</li>
<li>Knowledge enterprises</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>A critical distinction is the nature of the enterprise itself.         &#8220;External-facing&#8221; enterprises or functions that want or need to         incorporate much external information (say, marketing or competitive intelligence) are advised to look closely at         the adaptive approach. Organizations that have more complete control         over their circumstances should perhaps focus on the conventional         approach.</p>
<h3>Adoption Thresholds and Risks</h3>
<p>In previous writings I have pointed to the manifest benefits that can         accrue to the semantic enterprise [see, esp. <a href="#styles10">10</a>]. But we also have         witnessed nearly a decade of promotion for semantics in the enterprise,         with perhaps a lack of progress in some areas or unmet promises in         others. These raise questions and skepticism of the real eventual costs         and benefits.</p>
<p>I believe some of this skepticism is inherent with anything new &#8212; the         general IT fatigue from what the current &#8220;next great thing&#8221; might be.         But I also believe that some of this skepticism results from an         approach to semantics in the enterprise that is both lengthy to deploy         and high cost.</p>
<p>The key advantage of the adaptive, incremental approach is that the         whole IT game in the enterprise can change. An open world approach         enables adoption as it proves itself and as budgets allow. Commitments         made under this approach have, in essence, permanent value. Past fears         and concerns about making &#8220;wrong&#8221; bets no longer apply. With learning,         targets can be re-adjusted, structure re-defined and applications         re-focused, all as new discoveries and broadening scope dictate.</p>
<p>This does not make the adaptive approach better than the conventional         one. But, it does make it less risky and, well, more <span style="font-style: italic;">adaptive</span>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles1"></a>[1] For example, the earliest Google mentions on &#8220;semantic enterprise&#8221;         date to about 1998 or 1999. In 2002, the University of Georgia and Amit         Sheth offered the first known academic course on the Semantic         Enterprise; see <a href="http://lsdis.cs.uga.edu/SemanticEnterprise/">http://lsdis.cs.uga.edu/SemanticEnterprise/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles2"></a>[2] See the conference guide for the <a href="http://www.wilshireconferences.com/webfiles/STC05/Stc05Final.pdf">Semantic         Technology Conference 2005</a>. The sixth one, the <a href="http://www.semantic-conference.com/">2010 Semantic Technology         Conference</a>, is upcoming on June 21-25 in San Francisco.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles3"></a>[3] See, for example, Mitchell Ummell, ed., 2009. “The Rise of         the Semantic Enterprise,” special dedicated edition of the         <span style="font-style: italic;">Cutter IT Journal</span>, Vol. 22(9),         40 pp., September 2009. See <a href="http://www.cutter.com/offers/semanticenterprise.html">http://www.cutter.com/offers/semanticenterprise.html</a> (after filling out contact form). Partially in response to this         conventional view, I wrote <a href="#styles10">[10]</a>. In that article I offered as a working         definition that &#8220;<span style="font-style: italic;">a</span> <span style="font-weight: bold; font-style: italic;">semantic         enterprise</span> <span style="font-style: italic;">is one that adopts         the languages and standards of the</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a> <span style="font-style: italic;">. . .</span> <span style="font-style: italic;">and applies them to the issues of information         interoperability, preferably using the best practices of</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a><span style="font-style: italic;">.</span>&#8221; That happens to be Structured Dynamics&#8217;         preferred definition, though as this posting indicates, there is a         spectrum of definitions of the term.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles4"></a>[4] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> “The Open World Assumption: Elephant in the Room</a>“,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles5"></a>[5] See for example <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> , <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> and <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles6"></a>[6] <a href="http://en.wikipedia.org/wiki/Linked_data">Linked data</a> is a set of best practices for publishing and deploying instance and         class data using the RDF data model. Two of the best practices are to         name the data objects using uniform resource identifiers (URIs), and to         expose the data for access via the HTTP protocol. Both of these         practices enable the Web to become a distributed database, which also         means that Web architectures can also be readily employed.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles7"></a>[7] We use a basis in <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a> for defining the roles and splits in schema and instances.         As we define it:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles8"></a>[8] One article that got quite a bit of play a few years back was A.         Iskold, 2007. &#8220;<a href="http://www.readwriteweb.com/archives/the_top-down_semantic_web.php">Top         Down: A New Approach to the Semantic Web</a>,&#8221; in <em>ReadWrite Web</em>, Sept.         20, 2007. The problem with this terminology is that it offers a         completely different sense of &#8220;top down&#8221; to traditional uses. In         Iskold&#8217;s argument, his &#8220;top down&#8221; is a layering on top of the existing         Web.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles9"></a>[9] The more traditional view of &#8220;top down&#8221; with respect to the         semantic Web is in relation to how the system is constructed. This is         reflected well in a presentation from the <a href="http://lsdis.cs.uga.edu/SemNSF/SemWebWorkshopAgenda.htm">NSF Workshop         on DB &amp; IS Research for Semantic Web and Enterprises</a>, April 3,         2002, entitled &#8220;<a href="http://lsdis.cs.uga.edu/%7Ekashyap/talks/SWWS%20Panel.ppt">The         &#8216;Emergent, Semantic Web: Top Down Design or Bottom Up         Consensus?</a>&#8220;. Under this view, top down is design and         committee-driven; bottom up is more decentralized and based on social         processes, which is more akin to Iskold&#8217;s &#8220;top down.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles10"></a>[10] M.K. Bergman, 2009. &#8220;<a href="../825/fresh-perspectives-on-the-semantic-enterprise/">Fresh         Perspectives on the Semantic Enterprise</a>,&#8221; <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, Sept.         28, 2009.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Updates Posted to Sweet Tools, SWEETpedia</title>
		<link>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/</link>
		<comments>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 19:15:30 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[information extraction]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[owl]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[Sweet Tools]]></category>
		<category><![CDATA[sweetpedia]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=861</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Updates Posted to Sweet Tools, SWEETpedia&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-25&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/&amp;rft.language=English"></span>

Minor Updates Provided to these Standard AI3 Datasets
If you are like me, you like to clear the decks before the start of major new projects. In Structured Dynamics&#8216; case, we actually have multiple new initiatives getting underway, so the deck clearing has been especially focused this time.
As a result, we have updated Sweet   [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Updates Posted to Sweet Tools, SWEETpedia&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-25&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/&amp;rft.language=English"></span>
<p><img title="Sweet Tools Listing" src="../wp-content/themes/ai3/images/sweetsearchlogo80.png" alt="Sweet Tools Listing" hspace="5" vspace="0" width="89" height="80" align="left" /></p>
<h2>Minor Updates Provided to these Standard AI3 Datasets</h2>
<p>If you are like me, you like to clear the decks before the start of major new projects. In <a href="http://structureddynamics.com">Structured Dynamics</a>&#8216; case, we actually have multiple new initiatives getting underway, so the deck clearing has been especially focused this time.</p>
<p>As a result, we have updated <span style="color: #993300;"><strong><a href="../?page_id=325">Sweet         Tools</a></strong></span>, <span style="color: maroon;"><strong>AI3</strong></span>&#8217;s listing of semantic Web and         -related tools, with the addition of some 30 new tools, updates to others, and deletions of five expired entries. The dataset now lists 835 tools. And, as before, there is also now a new <a href="http://constructscs.com/conStruct/browse/">structured data view via conStruct</a> (pick the <span style="color: #990000; font-weight: bold;">Sweet Tools</span> dataset).</p>
<p>We have also updated <strong><a href="http://www.mkbergman.com/sweetpedia/">SWEETpedia</a></strong>, a listing of 246 research articles that use Wikipedia in one way or         another to do semantic-Web related research. Some 20 new papers were added to this update.</p>
<p>Please use the comments section on this post to suggest new tools or new research articles for inclusion in future updates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/861/updates-posted-to-sweet-tools-sweetpedia/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
