<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; Adaptive Information</title>
	<atom:link href="http://www.mkbergman.com/category/adaptive-information/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Wed, 01 Sep 2010 05:10:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Consolidating a Coherent Message with OSF</title>
		<link>http://www.mkbergman.com/894/consolidating-a-coherent-message-with-osf/</link>
		<comments>http://www.mkbergman.com/894/consolidating-a-coherent-message-with-osf/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 06:52:55 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Web-oriented Architecture]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[open semantic framework]]></category>
		<category><![CDATA[semantic components]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=894</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Consolidating a Coherent Message with OSF&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-07-06&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/894/consolidating-a-coherent-message-with-osf/&amp;rft.language=English"></span>

Release of Semantic Components Adds Final Layer, Leads to Streamlined Sites
Yesterday Fred Giasson announced the release of code associated with Structured Dynamics&#8216; open source semantics components (also called sComponents).  A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Consolidating a Coherent Message with OSF&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Web-oriented Architecture&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-07-06&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/894/consolidating-a-coherent-message-with-osf/&amp;rft.language=English"></span>
<p><a href="http://openstructs.org/open-semantic-framework"><img style="border: 0px solid; width: 216px; height: 216px; float: left; margin-right: 10px;" title="Consolidating Under the Open Semantic Framework" src="../wp-content/themes/ai3/images/2010Posts/100706_osf_consolidation.png" alt="Consolidating Under the Open Semantic Framework" /></a></p>
<h2>Release of Semantic Components Adds Final Layer, Leads to Streamlined Sites</h2>
<p>Yesterday <a href="http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/">Fred Giasson announced</a> the release of code associated with <a href="http://structureddynamics.com/">Structured Dynamics</a>&#8216; open source <a style="font-weight: bold; font-style: italic;" href="http://openstructs.org/semantic-components">semantics components</a> (also called <span style="font-weight: bold; font-style: italic;">sComponents</span>).  A <span style="font-weight: bold; font-style: italic;">semantic component</span> is an ontology-driven component, or widget, based on <a href="http://en.wikipedia.org/wiki/Adobe_Flex">Flex</a>. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.</p>
<p>Though not all layers are by any means complete, from an architectural standpoint the release of these <span style="font-weight: bold; font-style: italic;">semantic components</span> provides the last and missing layer to complete our <a style="font-weight: bold; font-style: italic;" href="http://openstructs.org/open-semantic-framework">open semantic framework</a>. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.</p>
<h3>The OSF &#8220;Semantic Muffin&#8221;</h3>
<p>We <a href="../891/domain-specific-instantiations-based-on-the-open-semantic-framework/">first announced</a> the <a href="http://openstructs.org/open-semantic-framework"><span style="font-weight: bold; font-style: italic;">open semantic framework</span></a> &#8212; or <span style="font-weight: bold; font-style: italic;">OSF</span> &#8212; a couple of weeks back. Refer to <a href="../891/domain-specific-instantiations-based-on-the-open-semantic-framework/">that original post</a> for more description of the general design <a href="#consol1">[1]</a>. However, we can show this framework with the <span style="font-weight: bold; font-style: italic;">semantic components</span> layer as illustrated by what some have called the &#8220;semantic muffin&#8221;:</p>
<div style="clear: both;"><a href="../wp-content/themes/ai3/images/2010Posts/100706_osf_sc_layer.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 382px;" title="Semantic Componetn Layer of the Open Semantic Framework" src="../wp-content/themes/ai3/images/2010Posts/100706_osf_sc_layer.png" alt="Incremental Layers of the Open Semantic Framework" width="758" height="482" /></a></p>
<p style="font-style: italic;" align="center"><small>(click for <a href="../wp-content/themes/ai3/images/2010Posts/100706_osf_sc_layer.png"> full size</a>)</small></p>
</div>
<p>The <span style="font-weight: bold; font-style: italic;">OSF</span> stack consists of these layers, moving from existing assets upward through increasing semantics and usability:</p>
<ul>
<li>Existing assets &#8212; any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise</li>
<li>scones / irON &#8212; this layer is for general conversion of non-RDF data and data schema to <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> (via <a href="http://openstructs.org/iron">irON</a> or <a href="http://openstructs.org/resources/rdfizers">RDFizers</a>) or for information extraction of subject concepts or named entities (<a href="http://structureddynamics.com/scones.html">scones</a>)</li>
<li><a href="http://openstructs.org/structwsf">structWSF</a> &#8212; is the pivotal Web services framework layer, and provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the <span style="font-weight: bold; font-style: italic;">OSF</span> stack</li>
<li><a href="http://openstructs.org/semantic-components">Semantic components</a> &#8212; the highlighted layer in the &#8220;semantic muffin&#8221;; in essence, this is the visualization and data interaction layer in the <span style="font-weight: bold; font-style: italic;">OSF</span> stack; see more below</li>
<li>Ontologies &#8212; are the layer containing the structured assets &#8220;driving&#8221; the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave, and</li>
<li><a href="http://openstructs.org/conStruct">conStruct</a> &#8212; is the content management system (CMS) layer based on Drupal and the thinnest layer with respect to <span style="font-weight: bold; font-style: italic;">OSF</span>; this optional layer provides the theming, user rights and permissions, or other functionality drawn from Drupal&#8217;s 6500 third-party modules.</li>
</ul>
<p>Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.</p>
<h3>The Semantics Components Layer</h3>
<p>Current <span style="font-weight: bold; font-style: italic;">semantic components</span>, or widgets, include: filter; tabular templates          (similar to <a href="http://en.wikipedia.org/wiki/Help:Infobox">infoboxes</a>);  maps; bar,         pie or linear charts; relationship (concept)  browser; story and text         annotator and viewer; workbench for  creating structured views; and         dashboard for presenting  pre-defined views and component arrangements.         These are generic  tools that respond to the structures and data fed to them,          adaptable to any domain without modification.</p>
<p>Though <a href="http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/">Fred&#8217;s post</a> goes into more detail &#8212; with subsequent posts to get into the technical nuances of the <span style="font-weight: bold; font-style: italic;">semantic components</span> &#8212; the main idea of these components is shown by the diagram below.</p>
<p>These various <span style="font-weight: bold; font-style: italic;">semantic components</span> get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as <a href="http://en.wikipedia.org/wiki/Sparql">SPARQL</a> queries) to the various <a href="http://openstructs.org/structwsf">structWSF</a> Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.</p>
<p>An internal ontology that embodies the desired behavior and display options (SCO, the <a href="http://openstructs.org/semantic-components/manual/semantic-component-ontology">Semantic Component Ontology</a>) is matched with these types and attributes to generate the formal instructions to the <span style="font-weight: bold; font-style: italic;">semantic components</span>. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:</p>
<div><a href="../wp-content/themes/ai3/images/2010Posts/100706_semantic_component.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 597px;" title="Semantic Components Workflow" src="../wp-content/themes/ai3/images/2010Posts/100706_semantic_component.png" alt="Semantic Components Workflow" width="686" height="682" /></a></p>
<p style="font-style: italic;" align="center"><small>(click for <a href="../wp-content/themes/ai3/images/2010Posts/100706_semantic_component.png"> full size</a>)</small></p>
</div>
<p>New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.</p>
<h3>Consolidating and Rationalizing Web Sites and Mailing Lists</h3>
<p><a href="http://openstructs.org/"><img style="border: 0px solid; width: 90px; height: 90px; float: right; margin-left: 10px;" title="OpenStructs and Open Semantic Framework Logo" src="../wp-content/themes/ai3/images/2010Posts/triple_90.png" alt="OpenStructs and Open Semantic Framework Logo" /></a>As the release of the <span style="font-weight: bold; font-style: italic;">semantic components</span> drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the <span style="font-weight: bold; font-style: italic;">open semantic framework</span> enabled us to consolidate and rationalize these resources.</p>
<p>Our first change was to consolidate all <span style="font-weight: bold; font-style: italic;">OSF</span>-related material under the existing <a href="http://openstructs.org/">OpenStructs.org </a>Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and <span style="font-weight: bold; font-style: italic;">OSF</span> material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to <a href="http://openstructs.org/">OpenStructs</a>.</p>
<p>We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new <a href="http://groups.google.com/group/open-semantic-framework?hl=en"><span style="font-weight: bold; font-style: italic;">Open Semantic Framework</span> Google group and mailing list</a>. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the <a href="http://groups.google.com/group/open-semantic-framework?hl=en"><span style="font-weight: bold; font-style: italic;">OSF</span> group</a>.</p>
<p>There has also been a revigoration of the developers&#8217; community Web site at <a href="http://community.openstructs.org/">http://community.openstructs.org/</a>. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.</p>
<p>Actual code SVN repositories are unchanged. These code repositories may be found at:</p>
<ul>
<li><a href="http://code.google.com/p/structwsf/">structWSF</a></li>
<li><a href="http://drupal.org/project/construct">conStruct</a></li>
<li><a href="http://code.google.com/p/semanticcomponents/">Semantic Components</a></li>
<li><a href="http://code.google.com/p/iron-notation/">irON Parsers</a>.</li>
</ul>
<p>We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="consol1"></a> [1] An alternative view of this layer diagram is shown by the general Structured Dynamics <a href="http://structureddynamics.com/products.html">product stack and architecture</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/894/consolidating-a-coherent-message-with-osf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Structure Paves the Way to the Semantic Web</title>
		<link>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/</link>
		<comments>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 05:55:31 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[Eisenhower]]></category>
		<category><![CDATA[IEEE]]></category>
		<category><![CDATA[Interstate highways]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=889</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Structure Paves the Way to the Semantic Web&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-06-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/&amp;rft.language=English"></span>
How Shall We Measure Progress Over the Past Three Years?

For a dozen years, my career has been centered on Internet search,  dynamic content and the deep Web. For the  past few years, I have been somewhat obsessed by two topics.
The first  topic, a conviction really, is that implicit structure needs to be [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Structure Paves the Way to the Semantic Web&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-06-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/&amp;rft.language=English"></span>
<h2>How Shall We Measure Progress Over the Past Three Years?</h2>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday     Brown Bag Lunch" width="158" height="179" /><br />
<a href="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405a_colorado-hwy.jpg"><img style="border: 0px solid; float: right; margin-left: 10px;" title="Colorado  Interstate construction - 1970; courtesy National Archives" src="../wp-content/themes/ai3/images/2007Posts/070405a_colorado-hwy.jpg" alt="Colorado  Interstate construction - 1970; courtesy National Archives" width="272" /></a>For a dozen years, my career has been centered on Internet search,  dynamic content and the <a href="http://en.wikipedia.org/wiki/Deep_web">deep Web</a>. For the  past few years, I have been somewhat obsessed by two topics.</p>
<p>The first  topic, a conviction really, is that implicit structure needs to be  extracted from Web content to enable it to be disambiguated, organized,  shared and re-purposed. The second topic, more an open question as a  former academic married to a professor, is what might replace editorial  selections and peer review to establish the authoritativeness of  content. These topics naturally steer one to the <a href="http://en.wikipedia.org/wiki/Semantic_web">semantic Web</a>.</p>
<h3><span style="font-weight: bold">A  Millennial Perspective</span></h3>
<p>The semantic Web, by whatever name it comes to be called, is an  inevitability.  History tells us that as information content grows, so  do the mechanisms for organizing and managing it. Over human history,  innovations such as writing systems, alphabetization, pagination, tables  of contents, indexes, concordances, reference look-ups, classification  systems, tables, figures, and statistics have emerged in parallel with  content growth [<a href="#SWref19">19</a>].</p>
<p>When the Lycos search engine, one of the first profitable Internet  ventures, was publicly released in 1994, it indexed a mere 54,000 pages [<a href="#SWref1">1</a>].  When  Google wowed us with its page-ranking algorithm in 1998, it soon  replaced my then favorite search engine, AltaVista.  Now, tens of  billions of indexed documents later, I often find Google&#8217;s results to be  overwhelming dross &#8212; unfortunately true again for all of the major  search engines.  Faceted browsing, vertical search, and Web 2.0&#8217;s  tagging and folksonomies demonstrate humanity&#8217;s natural penchant to  fight this entropy, efforts that will next continue with the semantic  Web and then mechanisms unforeseen to manage the chaos of burgeoning  content.</p>
<p>An awful lot of hot air has been expelled over the false dichotomy of  whether the semantic Web will fail or is on the verge of nirvana.  Arguments extend from the epistemological versus ontological  (classically defined) to Web 3.0 versus SemWeb or Web services (WS*)  versus REST (Representational State Transfer). My RSS feed reader points  to at least one such dust up every week.</p>
<p>Some set the difficulties of resolving semantic heterogeneities as  absolutes, leading to an illogical and false rejection of semantic Web  objectives. In contrast, some advocates set equally divisive arguments  for semantic Web purity by insisting on formal ontologies and  descriptive logics. Meanwhile, studied leaks about &#8220;stealth&#8221; semantic  Web ventures mean you should grab your wallet while simultaneously  shaking your head.</p>
<h3><span style="font-weight: bold">A  Decades-Long Perspective</span></h3>
<p>My mental image of the semantic Web is a road from here to some  achievable destination &#8212; say, Detroit. Parts of the road are well paved;  indeed, portions are already superhighways with controlled on-ramps and  off-ramps. Other portions are two lanes, some with way too many traffic  lights and some with dangerous intersections. A few small portions  remain unpaved gravel and rough going.</p>
<div style="float: right;  margin-left: 10px"><a href="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405b_1919wreck_400.jpg"><img style="border: 0px solid; width: 400px;" title="1919 Wreck in Nebraska" src="http://www.mkbergman.com/wp-content/themes/ai3/images/2007Posts/070405b_1919wreck_400.jpg" alt="1919 Wreck in Nebraska" align="middle" /></a></p>
<p align="center"><small>Wreck in Nebraska during the 1919  Transcontinental Motor Convoy</small></p>
</div>
<p>A lack of perspective makes things appear either too close or too far  away. The automobile isn&#8217;t yet a century old as a mass-produced item.  It wasn&#8217;t until 1919 that the US Army Transcontinental Motor Convoy made  the first automobile trip across the United States.</p>
<p>The 3,200 mile  route roughly followed today&#8217;s Lincoln Highway, US 30, from Washington,  D.C. to San Francisco. The convoy took 62 days and 250 recorded  accidents to complete the trip (see figure), half on dirt roads at an  average speed of 6 miles per hour. A tank officer on that trip later  observed Germany&#8217;s autobahns during World War II. When he subsequently  became President Dwight D. Eisenhower, he proposed and then signed the  Interstate Highway Act.</p>
<p>That was 50 years ago. Today, the US is  crisscrossed with 50,000 miles of interstates, which have completely  remade the nation&#8217;s economy and culture [<a href="#SWref2">2</a>].</p>
<h3><span style="font-weight: bold">Today&#8217;s  Perspective</span></h3>
<p>Like the interstate system in its early years, today&#8217;s semantic Web  lets you link together a complete trip, but the going isn&#8217;t as smooth or  as fast as it could be. Nevertheless, making the trip is doable and  keeps improving day by day, month by month.</p>
<p>My view of what&#8217;s required to smooth the road begins with extracting  structure and meaningful information according to understandable schema  from mostly uncharacterized content. Then we store the now-structured  content as RDF triples that can be further managed and manipulated at  scale. By necessity, the journey embraces tools and requirements that,  individually, might not constitute semantic Web technology as some  strictly define it. These tools and requirements are nonetheless  integral to reaching the destination. We are well into that journey&#8217;s  first leg, what I and others are calling the <span style="font-style: italic">structured Web</span>.</p>
<p>For the past six months or so I have been researching and assembling  as many semantic Web and related tools as I can find [<a href="#SWref3">3</a>].  That  <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/"><span style="font-style:  italic; font-weight: bold">Sweet Tools</span></a> listing now exceeds  500 tools [<a href="#SWref4">4</a>] (with  its presentation using the nifty lightweight Exhibit publication system  from MIT&#8217;s Simile program [<a href="#SWref5">5</a>]).   I&#8217;ve come to understand the importance of many ancillary tool sets to  the entire semantic Web highway, such as natural language processing and  information extraction. I&#8217;ve also found new categories of pragmatic  tools that embody semantic Web and data mediation processes but don&#8217;t  label themselves as such.</p>
<p>In its entirety, the <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/"><span style="font-style:  italic; font-weight: bold">Sweet Tools</span></a> listing provides a  pretty good picture of the semantic Web&#8217;s state. It&#8217;s a surprisingly  robust picture &#8212; though with some notable potholes &#8212; and includes  impressive open source options in all categories. Content publishing,  indexing, and retrieval at massive scales are largely solved problems.  We also have the infrastructure, languages, and (yes!) standards for  tying this content together meaningfully at the data and object levels.</p>
<p>I also think a degree of consensus has emerged on RDF as the  canonical data model for semantic information. RDF triple stores are  rapidly improving toward industrial strength, and RESTful designs enable  massive scalability, as terabyte- and petabyte-scale full-text indexes  prove.</p>
<p>Powerful and flexible middleware options, such as those from OpenLink  [<a href="#SWref6">6</a>], can  transform and integrate diverse file formats with a variety of back  ends. The World Wide Web Consortium&#8217;s GRDDL standard [<a href="#SWref7">7]</a> and  related tools, plus various &#8220;RDF-izers&#8221; from Massachusetts Institute of  Technology and elsewhere [<a href="#SWref8">8</a>],  largely provide the conversion infrastructure for getting Web data into  that canonical RDF form. Sure, some of these converters are still  research-grade, but getting them to operational capabilities at scale  now appears trivial.</p>
<p>Things start getting shakier when trying to structure information  into a semantic formalism. Controlled vocabularies and ontologies range  broadly and remain a contentious area. Publishers and authors perhaps  have too many choices: from straight Atom or RSS feeds and feeds with  tags to informal folksonomies and then Outline Processor Markup Language  [<a href="#SWref9">9</a>] or  microformats [<a href="#SWref10">10</a>].  From there, the formalism increases further to include the standard RDF  ontologies such as SIOC (Semantically-Interlinked Online Communities),  SKOS (Simple Knowledge Organizing System), DOAP (Description of a  Project), and FOAF (Friend of a Friend) [<a href="#SWref11">11</a>] and  the still greater formalism of OWL&#8217;s various dialects [<a href="#SWref12">12</a>].</p>
<div style="border: 1px solid #820000; background-color: #ffffe5; width: 460px; float: left; margin: 10px 10px 10px 0px; font-style: italic; font-size: 120%;">
<table border="0" cellspacing="4" cellpadding="4">
<tbody>
<tr>
<td style="vertical-align: middle; text-align: center"><em>If we compare the  semantic Web to the US interstate highway system, we&#8217;re still in the  early stages of a journey that will remake our economy and culture.</em></td>
</tr>
<tr>
<td style="text-align: center"><em>Many  potholes on the road to the semantic Web exist.</em></td>
</tr>
<tr>
<td style="text-align: center"><em>One ready  task is to transform existing structure to RDF. Another priority is to  refine tools to extract structure and meaningful information from  uncharacterized content.</em></td>
</tr>
</tbody>
</table>
</div>
<p>Arguing which of these is the theoretical best method is doomed to  failure, except possibly in a bounded enterprise environment. We live in  the real world, where multiple options will always have their advocates  and their applications.</p>
<p>All of us should welcome whatever structure we  can add to our information base, no matter where it comes from or how  it&#8217;s done. The sooner we can embrace content in any of these formats and  convert it into canonical RDF form, we can then move on to needed  developments in semantic mediation, some of the roughest road on the  journey.</p>
<h3 style="font-weight: bold">Potholes on  the Semantic Highway</h3>
<p>Semantic mediation requires appropriate structured content. Many  potholes on the road to the semantic Web exist because the content lacks  structured markup; others arise because existing structure requires  transformation. We need improved ways to address both problems. We also  need more intuitive means for applying schema to structure. Some have  referred to these issues as &#8220;who pays the tax.&#8221;</p>
<p>Recent experience with social software and collaboration proves that a  portion of the Internet user community is willing to tag and  characterize content. Furthermore, we can readily leverage that  resulting structure, and free riders are welcomed. The real pothole is  the lack of easy &#8212; even fun &#8212; data extractors and &#8220;structurizers.&#8221; But  we&#8217;re tantalizingly close.</p>
<p>Tools such as Solvent and Sifter from MIT&#8217;s Simile program [<a href="#SWref13">13</a>] and  Marmite from Carnegie Mellon University [<a href="#SWref14">14</a>] are  showing the way to match DOM (document object model) inspectors with  automated structure extractors. DBpedia,  the alpha version of Freebase,  and System One now provide large-scale, open Web data sets in RDF [<a href="#SWref15">15</a>],  including all of Wikipedia. Browser extensions such as Zotero [<a href="#SWref16">16</a>] are  showing how to integrate structure management into acceptable user  interfaces, as are services such as Zoominfo [<a href="#SWref17">17</a>]. Yet  we still lack easy means to design the differing structures suitable  for a plenitude of destinations.</p>
<p>Amazingly, a compelling road map for how all these pieces could truly  fit together is also incomplete. How do we actually get from here to  Detroit? Within specific components, architectural understandings are  sometimes OK (although documentation is usually awful for open source  projects, as most of the current tools are). Until our community better  documents that vision, attracting new contributors will be needlessly  slower, thus delaying the benefits of network effects.</p>
<p>So, let&#8217;s create a road map and get on with paving the gaps and  filling the potholes. It&#8217;s not a matter of standards or technology &#8212; we  have those in abundance. Let&#8217;s stop the silly squabbles and commit to  the journey in earnest. The <span style="font-style: italic">structured Web</span>&#8217;s ability to reach <span style="font-style: italic">Hyperland</span> [<a href="#SWref18">18</a>],  Douglas Adam&#8217;s prescient 1990 forecast of the semantic Web, now looks to  be no further away than Detroit.</p>
<div class="boxBrownDotted center_ok" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag    Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday      Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday      brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about three years     ago on <a href="http://www.mkbergman.com/357/structure-paves-the-way-to-the-semantic-web/">May 3, 2007</a>.  The piece was my answer to a request by <a href="http://www.mindswap.org/blog/">Jim  Hendler</a> to pen   some thoughts on the semantic Web, based on I believe what he thought might be a pragmatic perspective combining  Internet business with Web science. The formal piece appeared as a guest  editorial in  the May/June 2007 issue of <a href="http://www.computer.org/intelligent/">IEEE Intelligent Systems</a>. What appears above is unaltered from my original posting (aside from some minor formatting clean-up and &#8212; sorry to say &#8212; some of the projects are now defunct).</div>
<hr style="height: 1px; width: 33%; margin-left: 0px; margin-right: auto;" />
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref1" name="SWref1">[1]</a> Chris  Sherman, &#8220;Happy Birthday, Lycos!,&#8221; <span style="font-style: italic">Search Engine Watch</span>, August 14,  2002.  See <a href="http://searchenginewatch.com/showPage.html?page=2160551">http://searchenginewatch.com/showPage.html?page=2160551</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref2" name="SWref2">[2]</a> David  A. Pfeiffer, &#8220;Ike&#8217;s Interstates at 50: Anniversary of the Highway System  Recalls Eisenhower&#8217;s Role as Catalyst,&#8221; <span style="font-style: italic">Prologue Magazine</span>,  National Archives, Summer 2006, Vol. 38, No. 2. See: <a href="http://www.archives.gov/publications/prologue/2006/summer/interstates.html">http://www.archives.gov/publications/prologue/2006/summer/interstates.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref3" name="SWref3">[3]</a> The  mention of specific tool names is meant to be illustrative and not  necessarily a recommendation.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref4" name="SWref4">[4]</a> <span style="font-weight: bold">Sweet Tools</span> (SemWeb) listing; see <a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/">http://www.mkbergman.com/new-version-sweet-tools-sem-web/</a> .</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref5" name="SWref5">[5]</a> See <a href="http://simile.mit.edu/exhibit/">http://simile.mit.edu/exhibit/.</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref6" name="SWref6">[6]</a> OpenLink Software&#8217;s Virtuoso and Data Spaces products; see <a href="http://www.openlinksw.com/">http://www.openlinksw.com/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref7" name="SWref7">[7]</a> W3C&#8217;s  Gleaning Resource Descriptions from Dialects of Languages (GRDDL,  pronounced &#8220;griddle&#8221;).  See <a href="http://www.w3.org/2004/01/rdxh/spec">http://www.w3.org/2004/01/rdxh/spec</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref8" name="SWref8">[8]</a> See <a href="http://simile.mit.edu/wiki/RDFizers">http://simile.mit.edu/wiki/RDFizers</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref9" name="SWref9">[9]</a> Outline  Processor Markup Language (OPML); see <a href="http://www.opml.org/">http://www.opml.org/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref10" name="SWref10">[10]</a> Microformats; see <a href="http://microformats.org/">http://microformats.org</a>/.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref11" name="SWref11">[11]</a> <a href="http://en.wikipedia.org/wiki/DOAP">DOAP</a> (<a href="http://en.wikipedia.org/wiki/DOAP">Description of a Project</a>),  <a href="http://en.wikipedia.org/wiki/FOAF">FOAF</a> (<a href="http://en.wikipedia.org/wiki/FOAF">Friend of a Friend</a>), <a href="http://en.wikipedia.org/wiki/SIOC">SIOC</a> (<a href="http://en.wikipedia.org/wiki/SIOC">Semantically-Interlinked  Online Communities</a>) and <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> (<a href="http://en.wikipedia.org/wiki/SKOS">Simple Knowledge Organizing  System</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref12" name="SWref12">[12]</a> W3C&#8217;s Web Ontology Language (OWL).  See <a href="http://www.w3.org/TR/owl-features/">http://www.w3.org/TR/owl-features/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref13" name="SWref13">[13]</a> Solvent (<a href="http://simile.mit.edu/wiki/Solvent">http://simile.mit.edu/wiki/Solvent</a>)  and Sifter (<a href="http://simile.mit.edu/wiki/Sifter">http://simile.mit.edu/wiki/Sifter</a>)  are from MIT&#8217;s Simile program.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref14" name="SWref14">[14]</a> Marmite (<a href="http://www.cs.cmu.edu/%7Ejasonh/projects/marmite/">http://www.cs.cmu.edu/~jasonh/projects/marmite/</a>)  is from Carnegie Mellon University.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref15" name="SWref15">[15]</a> DBpedia (<a href="http://dbpedia.org/docs/">http://dbpedia.org/docs/</a>) and  Freebase (in alpha, by invitation only at <a href="http://www.freebase.com/">http://www.freebase.com/</a>)  are two of the first large-scale open datasets on the Web; Wikipedia  has also been converted to RDF by System One (<a href="http://labs.systemone.at/wikipedia3">http://labs.systemone.at/wikipedia3</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref16" name="SWref16">[16]</a> Zotero is produced by George Mason University&#8217;s Center for History and  New Media; see <a href="http://www.zotero.org/">http://www.zotero.org</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref17" name="SWref17">[17]</a> ZoomInfo (<a href="http://www.zoominfo.com/">http://www.zoominfo.com/</a>)  provides online structured search of companies and people, plus broader  services to enterprises.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref18" name="SWref18">[18]</a> The  late <a title="Douglas Adams" href="http://www.douglasadams.com/">Douglas  Adams</a>, of <em>Doctor Who </em>and <em>A Hitchhiker&#8217;s Guide to the  Galaxy</em> fame, produced a TV program for BBC2 presaging the Internet  called <a style="font-style: italic" href="http://en.wikipedia.org/wiki/Hyperland">Hyperland</a>.  This 50-min  video can be seen in five parts via YouTube at Part <a href="http://www.youtube.com/watch?v=rOsPKjbMvxY">1 of 5</a>, <a href="http://www.youtube.com/watch?v=ELSZ7pAmvKE">2 of 5</a>, <a href="http://www.youtube.com/watch?v=VF8dm9sK8as">3 of 5</a>, <a href="http://www.youtube.com/watch?v=6dB3_GcFV_0">4 of 5</a> and <a href="http://www.youtube.com/watch?v=b8pvOdMnflI">5 of 5</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="SWref19" name="SWref19">[19]</a> Since I first wrote this piece, I have systematized these developments in my <a href="http://www.mkbergman.com/temp-exhibit/">Timeline of Information History</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/889/brown-bag-lunch-structure-paves-the-way-to-the-semantic-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Historical Origins of the Knowledge Economy</title>
		<link>http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/</link>
		<comments>http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/#comments</comments>
		<pubDate>Fri, 23 Apr 2010 06:48:02 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Adaptive Innovation]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=879</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Historical Origins of the Knowledge Economy&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Adaptive Innovation&amp;rft.subject=Brown Bag Lunch&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/&amp;rft.language=English"></span>
A Reprise AI3 Post from Four Years Ago

In 2002 Joel Mokyr, an economic historian from Northwestern  University, wrote a book that should be read by anyone interested in  knowledge and its role in economic growth.  The Gifts of Athena : Historical Origins of the  Knowledge Economy is a sweeping and comprehensive [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Historical Origins of the Knowledge Economy&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Adaptive Innovation&amp;rft.subject=Brown Bag Lunch&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/&amp;rft.language=English"></span>
<h2>A Reprise AI3 Post from Four Years Ago</h2>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday    Brown Bag Lunch" width="158" height="179" /><a href="http://www.amazon.com/exec/obidos/tg/detail/-/0691120137/ref=olp_product_details/104-6346161-2267928?%5Fencoding=UTF8&amp;v=glance"><img style="border: 0px solid; float: right; margin-left: 10px;" src="http://images.amazon.com/images/P/0691120137.01._SCTZZZZZZZ_.jpg" alt="" width="72" height="110" /></a><br />
In 2002 Joel Mokyr, an economic historian from Northwestern  University, wrote a book that should be read by anyone interested in  knowledge and its role in economic growth.  <a href="http://www.amazon.com/gp/product/0691120137/sr=8-1/qid=1152193714/ref=sr_1_1/104-6346161-2267928?ie=UTF8"><span>The Gifts of Athena : Historical Origins of the  Knowledge Economy</span></a> is a sweeping and comprehensive account of  the period from 1760 (in what Mokyr calls the &#8220;Industrial  Enlightenment&#8221;) through the Industrial Revolution beginning roughly in  1820 and then continuing through the end of the 19th century.</p>
<p>The book  (and related expansions by Mokyr available as <a href="http://faculty.wcas.northwestern.edu/~jmokyr/papers.html">separate PDFs</a> on the  Internet) should be considered as the definitive reference on this topic  to date.  The book contains 40 pages of references to all of the  leading papers and writers on diverse technologies from mining to  manufacturing to health and the household.  The scope of subject  coverage, granted mostly focused on western Europe and America, is truly  impressive.</p>
<p>Mokyr deals with &#8216;useful knowledge,&#8217; as he acknowledges <a title="Simon Kuznets" href="http://en.wikipedia.org/wiki/Simon_Kuznets">Simon Kuznets</a>&#8216;  phrase.  Mokyr argues that the growth of recent centuries was driven by  the accumulation of knowledge and the declining costs of access to it.   Mokyr helps to break past logjams that have attempted to link single  factors such as the growth in science or the growth in certain  technologies (such as the steam engine or electricity) as the key  drivers of the massive increases in economic growth that coincided with  the era now known as the Industrial Revolution.</p>
<p>Mokyr cracks some of these prior impasses by picking up on ideas  first articulated through <a title="Michael Polanyi" href="http://en.wikipedia.org/wiki/Michael_Polanyi">Michael Polanyi</a>&#8217;s  &#8220;<a href="http://en.wikipedia.org/wiki/Tacit_knowledge">tacit knowing</a>&#8221;  (among other recent philosophers interested in the nature and  definition of knowledge).  Mokyr&#8217;s own schema posits <span style="font-style: italic">propositional  knowledge</span>, which he defines as the science, beliefs or the <span style="font-style: italic">epistemic  base</span> of knowledge, which he labels omega (<span style="font-weight: bold;">Ω</span>), in  combination with <span style="font-style: italic">prescriptive knowledge</span>, which are  the techniques (&#8221;recipes&#8221;), and which he also labels lambda (<span style="font-weight: bold;">λ</span>).   Mokyr notes that an addition to omega (<span style="font-weight: bold;">Ω</span>) is a <span style="font-style: italic">discovery</span>, an addition to  lambda (<span style="font-weight: bold;">λ</span>)  is an <span style="font-style: italic">invention</span>.</p>
<p>One of Mokyr&#8217;s key points is that both knowledge types reinforce one  another and, of course, the Industrial Revolution was a period of  unprecedented growth in such knowledge.  Another key point, easily  overlooked when &#8220;discoveries&#8221; are seemingly more noteworthy, is that <span style="font-style: italic">techniques</span> and <span style="font-style: italic">practical  applications </span>of knowledge can provide a multiplier effect and  are equivalently important.  For example, in addition to his main case  studies of the factory, health and the household, he says:</p>
<blockquote><p>The inventions of writing, paper, and printing not only  greatly reduced access costs but also materially<br />
affected human cognition, including the way people thought about their  environment.</p></blockquote>
<p>Mokyr also correctly notes how the accumulation of knowledge in  science and the <span style="font-style:  italic">epistemic base</span> promotes productivity and more still-more  efficient discovery mechanisms:</p>
<blockquote><p>The range of experimentation possibilities that needs to  be searched over is far larger if the searcher knows nothing about the  natural principles at work.  To paraphrase Pasteur&#8217;s famous aphorism  once more, fortune may sometimes favor unprepared minds, but only for a  short while.  It is in this respect that the width of the epistemic base  makes the big difference.</p></blockquote>
<p>In my own opinion, I think Mokyr starts to get closer to the mark  when he discusses knowledge &#8220;storage&#8221;, access costs and multiplier  effects from basic knowledge-based technologies or techniques.  Like  some other recent writers, he also tries to find analogies with  evolutionary biology.  For example:</p>
<blockquote><p>Much like DNA, useful knowledge does not exist by itself;  it has to be &#8220;carried&#8221; by people or in storage<br />
devices.  Unlike DNA, however, carriers can acquire and shed knowledge  so that the selection process is quite different.  This difference  raises the question of how it is transmitted over time, and whether it  can actually shrink as well as expand.</p></blockquote>
<p>One of the real advantages of this book is to move forward a re-think  of the &#8220;great man&#8221; or &#8220;great event&#8221; approach to history.  There are  indeed complicated forces at work.  I think Mokyr summarizes well this  transition when he states:</p>
<blockquote><p>A century ago, historians of technology felt that  individual inventors were the main actors that brought about<br />
the Industrial Revolution.  Such heroic interpretations were discarded  in favor of views that emphasized deeper economic and social factors  such as institutions, incentives, demand, and factor prices.  It seems,  however, that the crucial elements were neither brilliant individuals  nor the impersonal forces governing the masses, but a small group of at  most a few thousand peopled who formed a creative community based on the  exchange of knowledge.  Engineers, mechanics, chemists, physicians, and  natural philosophers formed circles in which access to knowledge was  the primary objective.  Paired with the appreciation that such knowledge  could be the base of ever-expanding prosperity, these elite networks  were indispensible, even if individual members were not.  Theories that  link education and human capital of technological progress need to  stress the importance of these small creative communities jointly with  wider phenomena such as literacy rates and universal schooling.</p></blockquote>
<p>There is so much to like and to be impressed with this book and even  later Mokyr writings.  My two criticisms are that, first, I found the  pseudo-science of his knowledge labels confusing (I kept having to  mentally translate the omega symbol) and I disliked the naming  distinctions between <em>propositional</em> and <em>prescriptive</em>, even  though I think the concepts are spot on.</p>
<p>My second criticism, a more major one, is that Mokyr notes, but does  not adequately pursue, &#8220;In the decades after 1815, a veritable explosion  of technical literature took place.  Comprehensive technical compendia  appeared in every industrial field.&#8221;  Statements such as these, and  there are many in the book, hint at perhaps some fundamental drivers.</p>
<p>Mokyr has provided the raw grist for answering his starting question of <span style="font-style: italic; text-decoration: underline;">why</span> such massive economic growth occurred in conjunction with the era of the  Industrial Revolution.  He has made many insights and posited new factors to explain this salutary discontinuity from all prior human  history.  But, in this reviewer&#8217;s opinion, he still leaves the <span style="font-style: italic; text-decoration: underline;">why</span> tantalizingly close but still unanswered. The fixity of information and  growing storehouses because of declining production and access costs  remain too poorly explored.</p>
<div class="boxBrownDotted center_ok" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag    Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday    Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday    brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about four years   ago on <a href="http://www.mkbergman.com/249/historical-origins-of-the-knowledge-economy/">July 6, 2006</a>. It was part of a series of book reviews I was doing at that time getting at the importance of bulk paper production as a key enabler of economic growth.  No changes have been made to the original  posting.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation</title>
		<link>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/</link>
		<comments>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 14:34:31 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[semantic annotation]]></category>
		<category><![CDATA[semantic discovery]]></category>
		<category><![CDATA[semantic heterogeneity]]></category>
		<category><![CDATA[semantic mediation]]></category>
		<category><![CDATA[Sweet Tools]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=875</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/&amp;rft.language=English"></span>

Mediating semantic  heterogeneities requires tools and automation (or semi-automation) at  scale.  But existing tools are still crude and lack across-the-board  integration.  This is one of the next challenges in getting more  widespread acceptance of the semantic Web.
In earlier posts, I described the significant progress in climbing the data federation [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday   Brown Bag Lunch" width="158" height="179" /></p>
<div class="boxGraySolid" style="margin-left: 190px;"><em>Mediating semantic  heterogeneities requires tools and automation (or semi-automation) at  scale.  But existing tools are still crude and lack across-the-board  integration.  This is one of the next challenges in getting more  widespread acceptance of the semantic Web.</em></div>
<p>In earlier posts, I described the significant progress in <a href="http://www.mkbergman.com/?p=229">climbing the data federation  pyramid</a>, today&#8217;s <a href="http://www.mkbergman.com/?p=231">evolution in emphasis to the  semantic Web</a>, and the <a href="http://www.mkbergman.com/?p=232">40 or so sources of semantic  heterogeneity</a>. We now transition to an overview of how one goes  about providing these semantics and resolving these heterogeneities.</p>
<h3>Why the Need for Tools and Automation?</h3>
<p>In an excellent recent overview of semantic Web progress, Paul Warren  points out:<a href="#_xedn1">[1]</a></p>
<blockquote><p><em>Although knowledge workers no doubt believe in the  value of annotating their documents, the pressure to create metadata  isn&#8217;t present. In fact, the pressure of time will work in a counter  direction. Annotation&#8217;s benefits accrue to other workers; the knowledge  creator only benefits if a community of knowledge workers abides by the  same rules. . . . Developing semiautomatic tools for learning ontologies  and extracting metadata is a key research area . . . .Having to move  out of a user&#8217;s typical working environment to &#8216;do knowledge management&#8217;  will act as a disincentive, whether the user is creating or retrieving  knowledge.</em></p></blockquote>
<p>Of course, even assuming that ontologies are created and semantics  and metadata are added to content, there still remains the nasty  problems of resolving heterogeneities (semantic mediation) and  efficiently storing and retrieving the metadata and semantic  relationships.</p>
<p>Putting all of this process in place requires the infrastructure in  the form of tools and automation and proper incentives and rewards for  users and suppliers to conform to it.</p>
<h3>Areas Requiring Tools and Automation</h3>
<p>In his paper, Warren repeatedly points to the need for  &#8220;semi-automatic&#8221; methods to make the semantic Web a reality. He makes  fully a dozen such references, in addition to multiple references to the  need for &#8220;reasoning algorithms.&#8221; In any case, here are some of the  areas noted by Warren needing &#8220;semi-automatic&#8221; methods:</p>
<ul>
<li>Assign authoritativeness</li>
<li>Learn ontologies</li>
<li>Infer better search requests</li>
<li>Mediate ontologies (semantic resolution)</li>
<li>Support visualization</li>
<li>Assign collaborations</li>
<li>Infer relationships</li>
<li>Extract entities</li>
<li>Create ontologies</li>
<li>Maintain and evolve ontologies</li>
<li>Create taxonomies</li>
<li>Infer trust</li>
<li>Analyze links</li>
<li>etc.</li>
</ul>
<p>In a different vein, SemWebCentral lists these clusters of semantic  Web-related tasks, each of which also requires tools:<a href="#_xedn2">[2]</a></p>
<ul>
<li><em>Create an ontology</em> &#8212; use a text or graphical ontology editor  to create the ontology, which is then validated. The resulting ontology  can then be viewed with a browser before being published</li>
<li><em>Disambiguate data </em>&#8211; generate a mapping between multiple  ontologies to identify where classes and properties are the same</li>
<li><em>Expose a relational database as OWL</em> &#8212; an editor is first  used to create the ontologies that represent the database schema, then  the ontologies are validated, translated to OWL and then the generated  OWL is validated</li>
<li><em>Intelligently query distributed data </em>&#8211; repository and again  able to be queried</li>
<li><em>Manually create data from an ontology</em> &#8212; a user would use an  editor to create new OWL data based on existing ontologies, which is  then validated and browsable</li>
<li><em>Programmatically interact with OWL content</em> &#8212; custom programs  can view, create, and modify OWL content with an API</li>
<li><em>Query non-OWL data</em> &#8212; via an annotation tool, create OWL  metadata from non-OWL content</li>
<li><em>Visualize semantic data</em> &#8212; view semantic data in a custom  visualizer.</li>
</ul>
<p>With some ontologies approaching tens to hundreds of thousands to  millions of triples, viewing, annotating and reconciling at scale can be  daunting tasks, the efforts behind which would never be taken without  useful tools and automation.</p>
<h3>A Workflow Perspective Helps Frame the Challenge</h3>
<p>A 2005 paper by Izza, Vincent and Burlat (among many other excellent  ones) at the first International Conference on Interoperability of  Enterprise Software and Applications (INTEROP-ESA) provides a very  readable overview on the role of semantics and ontologies in enterprise  integration.<a href="#_xedn3">[3]</a> Besides proposing a fairly  compelling unified framework, the authors also present a useful  workflow perspective emphasizing <a href="http://en.wikipedia.org/wiki/Web_service">Web services</a> (WS), also applicable to semantics in general, that helps frame this  challenge:</p>
<p align="center"><img src="http://www.mkbergman.com/wp-content/themes/ai3/images/2006Posts/060608a_SW_Workflow.gif" alt="" width="599" height="541" /></p>
<p align="center"><strong>Generic Semantic Integration Workflow </strong>(adapted from  <a href="#_xedn3">[3]</a>)</p>
<p>For existing data and documents, the workflow begins with information  extraction or annotation of semantics and metadata (#1) in accordance  with a reference ontology. Newly found information via harvesting must  also be integrated; however, external information or services may come  bearing their own ontologies, in which case some form of semantic  mediation is required.</p>
<p>Of course, this is a generic workflow, and depending on the  interoperation task, different flows and steps may be required. Indeed,  the overall workflow can vary by perspective and researcher, with  semantic resolution workflow modeling a prime area of current  investigations. (As one alternative among scores, see for example  Cardoso and Sheth.<a href="#_xedn4">[4]</a>)</p>
<h3>Matching and Mapping Semantic Heterogeneities</h3>
<p>Semantic mediation is a process of <em>matching</em> schemas and <em>mapping</em> attributes and values, often with intermediate transformations (such as  unit or language conversions) also required. The general problem of  schema integration is not new, with one prior reference going back as  early as 1986. <a href="#_xedn5">[5]</a> According to Alon Halevy:<a href="#_xedn6">[6]</a></p>
<blockquote><p><em>As would be expected, people have tried building  semi-automated schema-matching systems by employing a variety of  heuristics. The process of reconciling semantic heterogeneity typically  involves two steps. In the first, called schema matching, we find  correspondences between pairs (or larger sets) of elements of the two  schemas that refer to the same concepts or objects in the real world. In  the second step, we build on these correspondences to create the actual  schema mapping expressions. </em></p></blockquote>
<p>The issues of <em>matching</em> and <em>mapping</em> have been addressed  in many tools, notably commercial ones from MetaMatrix,<a href="#_xedn7">[7]</a> and open source and  academic projects such as Piazza, <a href="#_xedn8">[8]</a> SIMILE, <a href="#_xedn9">[9]</a> and the <a title="Web Service Execution Environment (WSMX)" href="http://www.wsmx.org/">WSMX</a> (Web service modeling execution environment) protocol from <a title="Digital Enterprise Research Institute" href="http://www.deri.org/">DERI</a>. <a href="#_xedn10">[10]</a> <a href="#_xedn11">[11]</a> A superb description of  the challenges in reconciling the vocabularies of different data sources  is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM&#8217;s  Prestigious Doctoral Dissertation Award.<a href="#_xedn12">[12]</a></p>
<p>What all of these efforts has found is the inability to completely  automate the mediation process. The current state-of-the-art is to  reconcile what is largely unambiguous automatically, and then prompt  analysts or subject matter experts to decide the questionable matches.  These are known as &#8220;semi-automated&#8221; systems and the user interface and  data presentation and workflow become as important as the underlying  matching and mapping algorithms. According to the WSMX project, there is  always a trade-off between how accurate these mappings are and the  degree of automation that can be offered<em>.</em></p>
<h3>Also a Need for Efficient Semantic Data Stores</h3>
<p>Once all of these reconciliations take place there is the (often  undiscussed) need to index, store and retrieve these semantics and their  relationships at scale, particularly for enterprise deployments. This  is a topic I have addressed many times from the standpoint of <a href="http://www.mkbergman.com/?p=227">scalability</a>, <a href="http://www.mkbergman.com/?p=233">more scalability</a>, and  comparisons of <a href="http://www.mkbergman.com/?p=185">database</a> and relational  technologies, but it is also not a new topic in the general community.</p>
<p>As Stonebraker and Hellerstein note in their retrospective covering  35 years of development in databases,<a href="#_xedn13">[13]</a> some of the first post-relational data models  were typically called semantic data models, including those of Smith and  Smith in 1977<a href="#_xedn14">[14]</a> and  Hammer and McLeod in 1981.<a href="#_xedn15">[15]</a> Perhaps what is different now is our ability to address some of the  fundamental issues.</p>
<p>At any rate, this subsection is included here because of the hidden  importance of database foundations. It is therefore a topic often  addressed in this series.</p>
<h3>A Partial Listing of Semantic Web Tools</h3>
<p>In all of these areas, there is a growing, but still spotty, set of  tools for conducting these semantic tasks. SemWebCentral, the open  source tools resource center, for example, lists many tools and whether  they interact or not with one another (the general answer is often No).<a href="#_xedn16">[16]</a> Protégé also has a  fairly long list of plugins, but not unfortunately well organized. <a href="#_xedn17">[17]</a></p>
<p>In the table below, I begin to compile a partial listing of semantic  Web tools, with more than 50 listed. Though a few are commercial, most  are open source. Also, for the open source tools, only the most  prominent ones are listed (<a href="http://www.sourceforge.net/">Sourceforge</a>, for example, has  about 200 projects listed with some relation to the semantic Web though  most of minor or not yet in alpha release).</p>
<table class="center_ok" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr style="border-style: none; width: 40%; background-image: none;">
<td style="background-color: #cccccc; width: 12%;">
<p align="center"><strong>NAME</strong></p>
</td>
<td style="background-color: #cccccc; width: 37%;">
<p align="center"><strong>URL</strong></p>
</td>
<td style="background-color: #cccccc; width: 50%;">
<p align="center"><strong>DESCRIPTION</strong></p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Almo</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/almo</td>
<td style="width: 50%;" valign="bottom">An ontology-based workflow  engine in Java</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Altova SemanticWorks</td>
<td style="width: 37%;" valign="top">http://www.altova.com/products_semanticworks.html</td>
<td style="width: 50%;" valign="top">Visual RDF and OWL editor that  auto-generates RDF/XML or nTriples based on visual ontology design</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Bibster</td>
<td style="width: 37%;" valign="top"><a href="http://bibster.semanticweb.org/">http://bibster.semanticweb.org/</a></td>
<td style="width: 50%;" valign="top">A semantics-based bibliographic  peer-to-peer system</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">cwm</td>
<td style="width: 37%;" valign="top">http://www.w3.org/2000/10/swap/doc/cwm.html</td>
<td style="width: 50%;" valign="top">A general purpose data processor  for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Deep Query Manager</td>
<td style="width: 37%;" valign="top">http://www.brightplanet.com/products/dqm_overview.asp</td>
<td style="width: 50%;" valign="top">Search federator from deep Web  sources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">DOSE</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/dose</td>
<td style="width: 50%;" valign="top">A distributed platform for semantic  annotation</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">ekoss.org</td>
<td style="width: 37%;" valign="top">http://www.ekoss.org/</td>
<td style="width: 50%;" valign="top">A collaborative knowledge sharing  environment where model developers can submit advertisements</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Endeca</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;"><a href="http://www.endeca.com/">http://www.endeca.com</a></span></td>
<td style="width: 50%;" valign="top">Facet-based content organizer and  search platform</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">FOAM</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;">http://ontoware.org/projects/map</span></td>
<td style="width: 50%;" valign="bottom">Framework for ontology alignment  and mapping</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Gnowsis</td>
<td style="width: 37%;" valign="top">http://www.gnowsis.org/</td>
<td style="width: 50%;" valign="top">A semantic desktop environment</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">GrOWL</td>
<td style="width: 37%;" valign="top">http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html</td>
<td style="width: 50%;" valign="top">Open source graphical ontology  browser and editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">HAWK</td>
<td style="width: 37%;" valign="top">http://swat.cse.lehigh.edu/projects/index.html#hawk</td>
<td style="width: 50%;" valign="top">OWL repository framework and  toolkit</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">HELENOS</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/artemis</td>
<td style="width: 50%;" valign="bottom">A Knowledge discovery workbench  for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jambalaya</td>
<td style="width: 37%;" valign="top"><span style="text-decoration: underline;"><a href="http://www.thechiselgroup.org/jambalaya">http://www.thechiselgroup.org/jambalaya</a></span></td>
<td style="width: 50%;" valign="top">Protégé plug-in for visualizing  ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jastor</td>
<td style="width: 37%;" valign="top"><a href="http://jastor.sourceforge.net/">http://jastor.sourceforge.net/</a></td>
<td style="width: 50%;" valign="bottom">Open source Java code generator  that emits Java Beans from ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Jena</td>
<td style="width: 37%;" valign="top">http://jena.sourceforge.net/</td>
<td style="width: 50%;" valign="top">Opensource ontology API written in  Java</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">KAON</td>
<td style="width: 37%;" valign="top">http://kaon.semanticweb.org/</td>
<td style="width: 50%;" valign="top">Open source ontology management  infrastructure</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Kazuki</td>
<td style="width: 37%;" valign="top"><a href="http://projects.semwebcentral.org/projects/kazuki/">http://projects.semwebcentral.org/projects/kazuki/</a></td>
<td style="width: 50%;" valign="bottom">Generates a java API for working  with OWL instance data directly from a set of OWL ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Kowari</td>
<td style="width: 37%;" valign="top">http://www.kowari.org/</td>
<td style="width: 50%;" valign="bottom">Open source database for RDF and  OWL</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">LuMriX</td>
<td style="width: 37%;" valign="top">http://www.lumrix.net/xmlsearch.php</td>
<td style="width: 50%;" valign="top">A commercial search engine using  semantic Web technologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">MetaMatrix</td>
<td style="width: 37%;" valign="top">http://www.metamatrix.com/</td>
<td style="width: 50%;" valign="top">Semantic vocabulary mediation and  other tools</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Metatomix</td>
<td style="width: 37%;" valign="top">http://www.metatomix.com/</td>
<td style="width: 50%;" valign="top">Commercial semantic toolkits and  editors</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">MindRaider</td>
<td style="width: 37%;" valign="top">http://mindraider.sourceforge.net/index.html</td>
<td style="width: 50%;" valign="top">Open source semantic Web outline  editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Model Futures OWL Editor</td>
<td style="width: 37%;" valign="top">http://www.modelfutures.com/OwlEditor.html</td>
<td style="width: 50%;" valign="top">Simple OWL tools, featuring UML  (XMI), ErWin, thesaurus and imports</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Net OWL</td>
<td style="width: 37%;" valign="top">http://www.netowl.com/</td>
<td style="width: 50%;" valign="top">Entity extraction engine from SRA  International</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Nokia Semantic Web Server</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/sws-uriqa</td>
<td style="width: 50%;" valign="top">An RDF based knowledge portal for  publishing both authoritative and third party descriptions of URI  denoted resources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">OntoEdit/OntoStudio</td>
<td style="width: 37%;" valign="top"><a href="http://ontoedit.com/">http://ontoedit.com/</a></td>
<td style="width: 50%;" valign="top">Engineering environment for  ontologies</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">OntoMat Annotizer</td>
<td style="width: 37%;" valign="top"><a href="http://annotation.semanticweb.org/ontomat">http://annotation.semanticweb.org/ontomat</a></td>
<td style="width: 50%;" valign="top">Interactive Web page OWL and  semantic annotator tool</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Oyster</td>
<td style="width: 37%;" valign="top">http://ontoware.org/projects/oyster</td>
<td style="width: 50%;" valign="bottom">Peer-to-peer system for storing  and sharing ontology metadata</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Piggy Bank</td>
<td style="width: 37%;" valign="top">http://simile.mit.edu/piggy-bank/</td>
<td style="width: 50%;" valign="top">A Firefox-based semantic Web  browser</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Pike</td>
<td style="width: 37%;" valign="top">http://pike.ida.liu.se/</td>
<td style="width: 50%;" valign="top">A dynamic programming (scripting)  language similar to Java and C for the semantic Web</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">pOWL</td>
<td style="width: 37%;" valign="top">http://powl.sourceforge.net/index.php</td>
<td style="width: 50%;" valign="top">Semantic Web development platform</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Protégé</td>
<td style="width: 37%;" valign="top">http://protege.stanford.edu/</td>
<td style="width: 50%;" valign="top">Open source visual ontology editor  written in Java with many plug-in tools</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RACER Project</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/racerproject</td>
<td style="width: 50%;" valign="bottom">A collection of Projects and  Tools to be used with the semantic reasoning engine RacerPro</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RDFReactor</td>
<td style="width: 37%;" valign="top">http://rdfreactor.ontoware.org/</td>
<td style="width: 50%;" valign="bottom">Access RDF from Java using  inferencing</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Redland</td>
<td style="width: 37%;" valign="top">http://librdf.org/</td>
<td style="width: 50%;" valign="top">Open source software libraries  supporting RDF</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">RelationalOWL</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/relational-owl</td>
<td style="width: 50%;" valign="top">Automatically extracts the  semantics of virtually any relational database and transforms this  information automatically into RDF/OW</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantical</td>
<td style="width: 37%;" valign="top">http://semantical.org/</td>
<td style="width: 50%;" valign="top">Open source semantic Web search  engine</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SemanticWorks</td>
<td style="width: 37%;" valign="top"><a href="http://www.altova.com/products_semanticworks.html">http://www.altova.com/products_semanticworks.html</a></td>
<td style="width: 50%;" valign="top">SemanticWorks RDF/OWL Editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantic Mediawiki</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/semediawiki</td>
<td style="width: 50%;" valign="top">Semantic extension to the  MediaWiiki wiki</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Semantic Net Generator</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/semantag</td>
<td style="width: 50%;" valign="top">Utility for generating topic maps  automatically</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Sesame</td>
<td style="width: 37%;" valign="top">http://www.openrdf.org/</td>
<td style="width: 50%;" valign="top">An open source RDF database with  support for RDF Schema inferencing and querying</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SMART</td>
<td style="width: 37%;" valign="top">http://web.ict.nsc.ru/smart/index.phtml?lang=en</td>
<td style="width: 50%;" valign="top">System for Managing Applications  based on RDF Technology</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SMORE</td>
<td style="width: 37%;" valign="top"><a href="http://www.mindswap.org/2005/SMORE/">http://www.mindswap.org/2005/SMORE/</a></td>
<td style="width: 50%;" valign="top">OWL markup for HTML pages</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SPARQL</td>
<td style="width: 37%;" valign="top">http://www.w3.org/TR/rdf-sparql-query/</td>
<td style="width: 50%;" valign="top">Query language for RDF</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SWCLOS</td>
<td style="width: 37%;" valign="top"><a href="http://iswc2004.semanticweb.org/demos/32/">http://iswc2004.semanticweb.org/demos/32/</a></td>
<td style="width: 50%;" valign="top">A semantic Web processor using Lisp</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Swoogle</td>
<td style="width: 37%;" valign="top"><a href="http://swoogle.umbc.edu/">http://swoogle.umbc.edu/</a></td>
<td style="width: 50%;" valign="top">A semantic Web search engine with  1.5 M resources</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">SWOOP</td>
<td style="width: 37%;" valign="top"><a href="http://www.mindswap.org/2004/SWOOP/">http://www.mindswap.org/2004/SWOOP/</a></td>
<td style="width: 50%;" valign="top">A lightweight ontology editor</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">Turtle</td>
<td style="width: 37%;" valign="top">http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/</td>
<td style="width: 50%;" valign="top">Terse RDF &#8220;Triple&#8221; language</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMO Studio</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmostudio</td>
<td style="width: 50%;" valign="top">A semantic Web service editor  compliant with WSMO as a set of Eclipse plug-ins</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMT Toolkit</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmt</td>
<td style="width: 50%;" valign="top">The Web Service Modeling Toolkit  (WSMT) is a collection of tools for use with the Web Service Modeling  Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web  Service Execution Environment (WSMX)</td>
</tr>
<tr>
<td style="width: 12%;" valign="top">WSMX</td>
<td style="width: 37%;" valign="top">https://sourceforge.net/projects/wsmx/</td>
<td style="width: 50%;" valign="top">Execution environment for dynamic  use of semantic Web services</td>
</tr>
</tbody>
</table>
<h3>Tools Still Crude, Integration Not Compelling</h3>
<p>Individually, there are some impressive and capable tools on this  list. Generally, however, the interfaces are not intuitive, integration  between tools is lacking, and why and how standard analysts should  embrace them is lacking. In the semantic Web, we have yet to see an  application of the magnitude of the first Mosaic browser that made HTML  and the World Wide Web compelling.</p>
<p>It is perhaps likely that a similar &#8220;killer app&#8221; may not be  forthcoming for the semantic Web. But it is important to remember just  how entwined tools are to accelerating acceptance and growth of new  standards and protocols.</p>
<div class="boxBrownDotted" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag   Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday   Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday   brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about four years  ago on <a href="http://www.mkbergman.com/241/methods-for-semantic-discovery-annotation-and-mediation/">June  12, 2006</a>. It was the follow-on to <a href="http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/">last week&#8217;s Brown Bag Lunch posting</a>. It is also the first attempt I made at assembling semantic Web- and -related tools, which has now grown into the 800+ <span style="color: #993300;"><strong><a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/">Sweet Tools</a></strong></span> listing. No changes have been made to the original posting.</div>
<hr size="1" />
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn1">[1]</a> Paul  Warren, &#8220;<a href="http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;">Knowledge  Management and the Semantic Web: From Scenario to Technology</a>,&#8221; <em>IEEE  Intelligent Systems</em>, vol. 21, no. 1, 2006, pp. 53-59. See <a href="http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;">http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&amp;pName=dso_level1&amp;path=dsonline/2006/02&amp;file=x1war.xml&amp;xsl=article.xsl&amp;</a></div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn2">[2]</a> See <a href="http://www.semwebcentral.org/index.jsp?page=workflows">http://www.semwebcentral.org/index.jsp?page=workflows</a>. [Link now missing.]</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn3">[3]</a> Said Izza, Lucien  Vincent and Patrick Burlat, &#8220;A Unified Framework for Enterprise  Integration: An Ontology-Driven Service-Oriented Approach,&#8221; pp. 78-89,  in <em>Pre-proceedings of the First International Conference on  Interoperability of Enterprise Software and Applications  (INTEROP-ESA&#8217;2005)</em>, Geneva, Switzerland, February 23 &#8211; 25, 2005, 618  pp. See <a href="http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf">http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn4">[4]</a> Jorge Cardoso and Amit  Sheth, &#8220;Semantic Web Processes: Semantics Enabled Annotation, Discovery,  Composition and Orchestration of Web Scale Processes,&#8221; in the<em> 4th  International Conference on Web Information Systems Engineering (WISE  2003)</em>, December 10-12, 2003, Rome, Italy. See <a href="http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf">http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn5">[5]</a> C. Batini, M.  Lenzerini, and S.B. Navathe, &#8220;A Comparative Analysis of Methodologies  for Database Schema Integration,&#8221; in <em>ACM Computing Survey</em>,  18(4):323-364, 1986.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn6">[6]</a> Alon Halevy, &#8220;Why Your  Data Won&#8217;t Mix,&#8221; <em>ACM Queue</em> vol. 3, no. 8, October 2005. See <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336">http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn7">[7]</a> Chuck Moser, Semantic  Interoperability: Automatically Resolving Vocabularies, presented at the  <em>4th Semantic Interoperability Conference</em>, February 10, 2006. See  <a href="http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt">http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn8">[8]</a> Alon Y. Halevy, Zachary  G. Ives, Peter Mork and Igor Tatarinov, &#8220;Piazza: Data Management  Infrastructure for Semantic Web Applications,&#8221; <em>Journal of Web  Semantics,</em> Vol. 1 No. 2, February 2004, pp. 155-175. See <a href="http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf">http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn9">[9]</a> Stefano Mazzocchi,  Stephen Garland, Ryan Lee, &#8220;SIMILE: Practical Metadata for the Semantic  Web,&#8221; January 26, 2005. See <a href="http://www.xml.com/pub/a/2005/01/26/simile.html">http://www.xml.com/pub/a/2005/01/26/simile.html</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn10">[10]</a> Adrian Mocan, Ed.,  &#8220;WSMX Data Mediation,&#8221; in <em>WSMX Working Draft, W3C Organization</em>,  11 October 2005. See <a href="http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011">http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn11">[11]</a> J.Madhavan , P. A.  Bernstein , P. Domingos and A. Y. Halevy, &#8220;Representing and Reasoning  About Mappings Between Domain Models,&#8221; in the <em>Eighteenth National  Conference on Artificial Intelligence</em>, pp.80-86, Edmonton, Alberta,  Canada, July 28-August 01, 2002.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn12">[12]</a> AnHai Doan, Learning  to Map between Structured Representations of Data, Ph.D. Thesis to the  Computer Science &amp; Engineering Department, University of Washington,  2002, 133 pp. See <a href="http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf">http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn13">[13]</a> Michael Stonebraker  and Joey Hellerstein, &#8220;What Goes Around Comes Around,&#8221; in Joseph M.  Hellerstein and Michael Stonebraker, editors, <em>Readings in Database  Systems, Fourth Edition</em>, pp. 2-41, The MIT Press, Cambridge, MA,  2005. See <a href="http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf">http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn14">[14]</a> John Miles Smith and  Diane C. P. Smith, &#8220;Database Abstractions: Aggregation and  Generalization,&#8221; <em>ACM Transactions on Database Systems</em> 2(2):  105-133, 1977.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn15">[15]</a> Michael Hammer and  Dennis McLeod, &#8220;Database Description with SDM: A Semantic Database  Model,&#8221; <em>ACM Transactions on Database Systems</em> 6(3): 351-386, 1981.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn16">[16]</a> See <a href="http://www.semwebcentral.org/index.jsp?page=home">http://www.semwebcentral.org/index.jsp?page=home</a>.</div>
<div style="margin: 10px  0pt; font-size: 90%;"><a name="_xedn17">[17]</a> See <a href="http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType">http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/875/brown-bag-lunch-methods-for-semantic-discovery-annotation-and-mediation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities</title>
		<link>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/</link>
		<comments>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 15:22:36 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[semantic heterogeneities]]></category>
		<category><![CDATA[semantic mediation]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=874</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/&amp;rft.language=English"></span>

Semantic mediation  &#8212; that is, resolving semantic heterogeneities &#8212; must address more than  40 discrete categories of potential mismatches from units of measure,  terminology, language, and many others.  These sources may derive from  structure, domain, data or language.
Earlier postings in this recent series traced the progress in climbing the data [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch:  Sources and Classification of Semantic Heterogeneities&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Enterprise&amp;rft.subject=Semantic Web&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-04-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday  Brown Bag Lunch" width="158" height="179" /></p>
<div class="boxGraySolid" style="margin-left: 190px;"><em>Semantic mediation  &#8212; that is, resolving semantic heterogeneities &#8212; must address more than  40 discrete categories of potential mismatches from units of measure,  terminology, language, and many others.  These sources may derive from  structure, domain, data or language.</em></div>
<p>Earlier postings in this recent series traced the progress in <a href="http://www.mkbergman.com/?p=229">climbing the data federation  pyramid</a> to today&#8217;s <a href="http://www.mkbergman.com/?p=231">current emphasis on the  semantic Web</a>. Partially this series is aimed at disabusing the  notion that data extensibility can arise simply by using the <a href="http://en.wikipedia.org/wiki/Xml">XML</a> (eXtensible Markup  Language) <em>data representation</em> protocol. As Stonebraker and  Hellerstein correctly observe:</p>
<blockquote><p><em>XML is sometimes marketed as the solution to the  semantic heterogeneity problem . . . .  Nothing could be further from  the truth. Just because two people tag a data element as a salary does  not mean that the two data elements are comparable. One could be salary  after taxes in French francs including a lunch allowance, while the  other could be salary before taxes in US dollars. Furthermore, if you  call them &#8220;rubber gloves&#8221;  and I call them &#8220;latex hand protectors&#8221;, then  XML will be useless in deciding that they are the same concept. Hence,  the role of XML will be limited to providing the vocabulary in which  common schemas can be constructed.</em><a href="#_edn1">[1]</a></p></blockquote>
<p>This series also covers the ontologies and the OWL language (written  in XML) that now give us the means to understand and process these  different domains and &#8220;world views&#8221; by machine. According to Natalya  Noy, one of the principal researchers behind the <a href="http://protege.stanford.edu/plugins/owl/">Protégé</a> development environment for ontologies and knowledge-based systems:</p>
<blockquote><p><em>How are ontologies and the Semantic Web different from  other forms of structured and semi-structured data, from database  schemas to XML? Perhaps one of the main differences lies in their  explicit formalization. If we make more of our assumptions explicit and  able to be processed by machines, automatically or semi-automatically  integrating the data will be easier. Here is another way to look at  this: ontology languages have formal semantics, which makes building  software agents that process them much easier, in the sense that their  behavior is much more predictable (assuming they follow the specified  explicit semantics&#8211;but at least there is something to follow).</em> <a href="#_edn2">[2]</a></p></blockquote>
<p>Again, however, simply because OWL (or similar) languages now give us  the means to represent an ontology, we still have the vexing challenge  of how to resolve the differences between different &#8220;world views,&#8221; even  within the same domain. According to Alon Halevy:</p>
<blockquote><p><em>When independent parties develop database schemas for  the same domain, they will almost always be quite different from each  other. These differences are referred to as semantic heterogeneity,  which also appears in the presence of multiple XML documents, Web  services, and ontologies&#8211;or more broadly, whenever there is more than  one way to structure a body of data. The presence of semi-structured  data exacerbates semantic heterogeneity, because semi-structured schemas  are much more flexible to start with. For multiple data systems to  cooperate with each other, they must understand each other&#8217;s schemas.  Without such understanding, the multitude of data sources amounts to a  digital version of the Tower of Babel.</em> <a href="#_edn3">[3]</a></p></blockquote>
<p>In the sections below, we describe the sources for how this  heterogeneity arises and classify the many different types of  heterogeneity. I then describe some broad approaches to overcoming these  heterogeneities, though a <a href="http://www.mkbergman.com/?p=241">subsequent post looks at that  topic in more detail</a>.</p>
<h3>Causes and Sources of Semantic Heterogeneity</h3>
<p>There are many potential circumstances where semantic heterogeneity  may arise (partially from Halevy <a href="#_edn3">[3]</a>):</p>
<ul>
<li>Enterprise information integration</li>
<li>Querying and indexing the deep Web (which is a classic data  federation problem in that there are literally tens to hundreds of  thousands of separate Web databases) <a href="#_edn4">[4]</a></li>
<li>Merchant catalog mapping</li>
<li>Schema <em>v.</em> data heterogeneity</li>
<li>Schema heterogeneity and semi-structured data.</li>
</ul>
<p>Naturally, there will always be differences in how differing authors  or sponsors create their own particular &#8220;world view,&#8221; which, if  transmitted in XML or expressed through an ontology language such as OWL  may also result in differences based on expression or syntax. Indeed,  the ease of conveying these schema as semi-structured XML, RDF or OWL  is in and of itself a source of potential expression heterogeneities.  There are also other sources in simple schema use and versioning that  can create mismatches <a href="#_edn3">[3]</a>. Thus, possible drivers in semantic mismatches  can occur from world view, perspective, syntax, structure and versioning  and timing:</p>
<ul>
<li>One schema may express a similar &#8220;world view&#8221; with different syntax,  grammar or structure</li>
<li>One schema may be a new version of the other</li>
<li>Two or more schemas may be evolutions of the same original schema</li>
<li>There may be many sources modeling the same aspects of the  underlying domain (&#8221;horizontal resolution&#8221; such as for competing trade  associations or standards bodies), or</li>
<li>There may be many sources that cover different domains but overlap  at the seams (&#8221;vertical resolution&#8221; such as between pharmaceuticals and  basic medicine).</li>
</ul>
<p>Regardless, the needs for semantic mediation are manifest, as are the  ways in which semantic heterogeneities may arise.</p>
<h3>Classification of Semantic Heterogeneities</h3>
<p>The first known classification scheme applied to data semantics that I  am aware of is from William Kent nearly 20 years ago.<a href="#_edn5">[5]</a> (If you  know of earlier ones, please send me a note.) Kent&#8217;s approach dealt more  with structural mapping issues (see below) than differences in meaning,  which he pointed to data dictionaries as potentially solving.</p>
<p>The most comprehensive schema I have yet encountered is from  Pluempitiwiriyawej and Hammer, &#8220;A Classification Scheme for Semantic and  Schematic Heterogeneities in XML Data Sources.&#8221; <a href="#_edn6">[6]</a> They  classify heterogeneities into three broad classes:</p>
<blockquote>
<ul>
<li><em>Structural </em>conflicts arise when the schema of the sources  representing related or overlapping data exhibit discrepancies.  Structural conflicts can be detected when comparing the underlying DTDs.  The class of structural conflicts includes generalization conflicts,  aggregation conflicts, internal path discrepancy, missing items, element  ordering, constraint and type mismatch, and naming conflicts between  the element types and attribute names.</li>
<li><em>Domain </em>conflicts arise when the semantic of the data sources  that will be integrated exhibit discrepancies. Domain conflicts can be  detected by looking at the information contained in the DTDs and using  knowledge about the underlying data domains. The class of domain  conflicts includes schematic discrepancy, scale or unit, precision, and  data representation conflicts.</li>
<li><em>Data </em>conflicts refer to discrepancies among similar or  related data values across multiple sources. Data conflicts can only be  detected by comparing the underlying DOCs. The class of data conflicts  includes ID-value, missing data, incorrect spelling, and naming  conflicts between the element contents and the attribute values.</li>
</ul>
</blockquote>
<p>Moreover, mismatches or conflicts can occur between set elements (a  &#8220;population&#8221; mismatch) or attributes (a &#8220;description&#8221; mismatch).</p>
<p>The table below builds on Pluempitiwiriyawej and Hammer&#8217;s schema by  adding the fourth major explicit category of language, leading to about  40 distinct potential sources of semantic heterogeneities:</p>
<div>
<table class="center_ok" style="width: 620px;" border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 127px;" valign="top">
<p align="center"><strong>Class</strong></p>
</td>
<td style="background-color: #cccccc; width: 142px;" valign="top">
<p align="center"><strong>Category</strong></p>
</td>
<td style="background-color: #cccccc; width: 350px;" valign="top">
<p align="center"><strong>Subcategory</strong></p>
</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="15"><strong>STRUCTURAL</strong></td>
<td style="width: 142px;" rowspan="4" align="left">Naming</td>
<td style="width: 350px;">Case Sensitivity</td>
</tr>
<tr>
<td style="width: 350px;">Synonyms</td>
</tr>
<tr>
<td style="width: 350px;">Acronyms</td>
</tr>
<tr>
<td style="width: 350px;">Homonyms</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Generalization / Specialization</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="2">Aggregation</td>
<td style="width: 350px;">Intra-aggregation</td>
</tr>
<tr>
<td style="width: 350px;">Inter-aggregation</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Internal Path Discrepancy</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="4">Missing Item</td>
<td style="width: 350px;">Content Discrepancy</td>
</tr>
<tr>
<td style="width: 350px;">Attribute List Discrepancy</td>
</tr>
<tr>
<td style="width: 350px;">Missing Attribute</td>
</tr>
<tr>
<td style="width: 350px;">Missing Content</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Element Ordering</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Constraint Mismatch</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Type Mismatch</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="8"><strong>DOMAIN</strong></td>
<td style="width: 142px;" rowspan="4">Schematic Discrepancy</td>
<td style="width: 350px;">Element-value to Element-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Attribute-value to Element-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Element-value to Attribute-label Mapping</td>
</tr>
<tr>
<td style="width: 350px;">Attribute-value to Attribute-label Mapping</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Scale or Units</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Precision</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="2">Data Representation</td>
<td style="width: 350px;">Primitive Data Type</td>
</tr>
<tr>
<td style="width: 350px;">Data Format</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="7"><strong>DATA</strong></td>
<td style="width: 142px;" rowspan="4">Naming</td>
<td style="width: 350px;">Case Sensitivity</td>
</tr>
<tr>
<td style="width: 350px;">Synonyms</td>
</tr>
<tr>
<td style="width: 350px;">Acronyms</td>
</tr>
<tr>
<td style="width: 350px;">Homonyms</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">ID Mismatch or Missing ID</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Missing Data</td>
</tr>
<tr>
<td style="width: 492px;" colspan="2">Incorrect Spelling</td>
</tr>
<tr>
<td style="width: 127px;" rowspan="8"><strong>LANGUAGE</strong></td>
<td style="width: 142px;" rowspan="4">Encoding</td>
<td style="width: 350px;">Ingest Encoding Mismatch</td>
</tr>
<tr>
<td style="width: 350px;">Ingest Encoding Lacking</td>
</tr>
<tr>
<td style="width: 350px;">Query Encoding Mismatch</td>
</tr>
<tr>
<td style="width: 350px;">Query Encoding Lacking</td>
</tr>
<tr>
<td style="width: 142px;" rowspan="4">Languages</td>
<td style="width: 350px;">Script Mismatches</td>
</tr>
<tr>
<td style="width: 350px;">Parsing / Morphological Analysis Errors (many)</td>
</tr>
<tr>
<td style="width: 350px;">Syntactical Errors (many)</td>
</tr>
<tr>
<td style="width: 350px;">Semantic Errors (many)</td>
</tr>
</tbody>
</table>
</div>
<p>Most of these line items are self-explanatory, but a few may not be:</p>
<ul>
<li><em>Homonyms</em> refer to the same name referring to more than one  concept, such as Name referring to a person v. Name referring to a book</li>
<li>A <em>generalization/specialization</em> mismatch can occur when  single items in one schema are related to multiple items in another  schema, or vice versa. For example, one schema may refer to &#8220;phone&#8221; but  the other schema has multiple elements such as &#8220;home phone,&#8221; &#8220;work  phone&#8221; and &#8220;cell phone&#8221;</li>
<li><em>Intra-aggregation</em> mismatches come when the same population is  divided differently (Census <em>v</em>. Federal regions for states, or  full person names <em>v.</em> first-middle-last, for examples) by schema,  whereas <em>inter-aggregation</em> mismatches can come from sums or counts  as added values</li>
<li>Internal path discrepancies can arise from different source-target  retrieval paths in two different schema (for example, hierarchical  structures where the elements are different levels of remove)</li>
<li>The four sub-types of <em>schematic discrepancy</em> refer to where  attribute and element names may be interchanged between schema</li>
<li>Under languages, <em>encoding</em> mismatches can occur when either  the import or export of data to XML assumes the wrong encoding type.  While XML is based on Unicode, it is important that source retrievals  and issued queries be in the proper encoding of the source. For Web  retrievals this is very important, because only about 4% of all  documents are in Unicode, and <a title="Tutorial:  Internet Languages  and Encodings" href="http://www.mkbergman.com/?p=195">earlier BrightPlanet provided  estimates there may be on the order of 25,000 language-encoding pairs  presently on the Internet</a></li>
<li>Even should the correct encoding be detected, there are significant  differences in different language sources in <em>parsing</em> (white  space, for example), <em>syntax</em> and <em>semantics </em>that can also  lead to many error types.</li>
</ul>
<p>It should be noted that a different take on classifying semantics and  integration approaches is taken by Sheth et al. <a href="#_edn7">[7]</a> Under their concept, they  split semantics into three forms: implicit, formal and powerful.  Implicit semantics are what is either largely present or can easily be  extracted; formal languages, though relatively scarce, occur in the form  of ontologies or other descriptive logics; and powerful (soft)  semantics are fuzzy and not limited to rigid set-based assignments.  Sheth et al.&#8217;s main point is that first-order logic (FOL) or descriptive  logic is inadequate alone to properly capture the needed semantics.</p>
<p>From my viewpoint, Pluempitiwiriyawej and Hammer&#8217;s <a href="#_edn6">[6]</a> classification better lends itself to pragmatic tools and approaches,  though the Sheth et al. approach also helps indicate what can be  processed <em>in situ</em> from input data <em>v.</em> inferred or  probabalistic matches.</p>
<h3>Importance of Reference Standards</h3>
<p>An attractive and compelling vision  &#8212; perhaps even a likely one  &#8212;  is that standard reference ontologies become increasingly prevalent as  time moves on and semantic mediation is seen as more of a mainstream  problem. Certainly, a start on this has been seen with the use of the <a href="http://dublincore.org/">Dublin  Core</a> metadata initiative, and increasingly other associations,  organizations, and major buyers are busy developing &#8220;standardized&#8221; or  reference ontologies.<a href="#_edn8">[8]</a> Indeed, there are now more than 10,000  ontologies available on the Web.<a href="#_edn9">[9]</a> Insofar as these gain  acceptance, semantic mediation can become an effort mostly at the  periphery and not the core.</p>
<p>But, such is not the case today. Standards only have limited success  and in targeted domains where incentives are strong. That acceptance and  benefit threshold has yet to be reached on the Web. Until such time, a  multiplicity of automated methods, semi-automated methods and gazetteers  will all be required to help resolve these potential heterogeneities.</p>
<div class="boxBrownDotted" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag  Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday  Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday  brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> about four years ago on <a href="http://www.mkbergman.com/232/sources-and-classification-of-semantic-heterogeneities/">June 6, 2006</a>. No changes have been made to the original posting. Current approaches to dealing with these heterogeneities would be to use &#8220;bridging&#8221; ontologies that map the mismatches.</div>
<hr size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn1" name="_edn1">[1]</a> Michael Stonebraker and Joey Hellerstein, &#8220;What Goes Around Comes  Around,&#8221; in Joseph M. Hellerstein and Michael Stonebraker, editors, <em>Readings  in Database Systems, Fourth Edition</em>, pp. 2-41, The MIT Press,  Cambridge, MA, 2005. See <a href="http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf">http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn2" name="_edn2">[2]</a> Natalya Noy,  &#8220;Order from Chaos,&#8221; ACM Queue vol. 3, no. 8, October 2005 See <a href="http://www.acmqueue.com/modules.php?name=Content&amp;pa=showpage&amp;pid=341&amp;page=1">http://www.acmqueue.com/modules.php?name=Content&amp;pa=showpage&amp;pid=341&amp;page=1</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn3" name="_edn3">[3]</a> Alon  Halevy, &#8220;Why Your Data Won&#8217;t Mix,&#8221; <em>ACM Queue</em> vol. 3, no. 8,  October 2005. See <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336">http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=336</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn4" name="_edn4">[4]</a> Michael  K. Bergman, &#8220;The Deep Web: Surfacing Hidden Value,&#8221; <em>BrightPlanet  Corporation White Paper</em>, June 2000. The most recent version of the  study was published by the University of Michigan&#8217;s <em>Journal of  Electronic Publishing</em> in July 2001. See <a href="http://www.press.umich.edu/jep/07-01/bergman.html">http://www.press.umich.edu/jep/07-01/bergman.html</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn5" name="_edn5">[5]</a> William  Kent, &#8220;The Many Forms of a Single Fact&#8221;, <em>Proceedings of the IEEE  COMPCON</em>, Feb. 27-Mar. 3, 1989, San Francisco. Also HPL-SAL-88-8,  Hewlett-Packard Laboratories, Oct. 21, 1988. [13 pp]. See <a href="http://www.bkent.net/Doc/manyform.htm">http://www.bkent.net/Doc/manyform.htm</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn6" name="_edn6">[6]</a> Charnyote  Pluempitiwiriyawej and Joachim Hammer, &#8220;A Classification Scheme for  Semantic and Schematic Heterogeneities in XML Data Sources,&#8221; <em>Technical  Report TR00-004</em>, University of Florida, Gainesville, FL, 36 pp.,  September 2000. See <a href="ftp://ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf">ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn7" name="_edn7">[7]</a> Amit  Sheth, Cartic Ramakrishnan and Christopher Thomas, &#8220;Semantics for the  Semantic Web: The Implicit, the Formal and the Powerful,&#8221; in <em>Int&#8217;l  Journal on Semantic Web &amp; Information Systems</em>, 1(1), 1-18,  Jan-March 2005. See <a href="http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html">http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html</a></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn8" name="_edn8">[8]</a> See,  among scores of possible examples, the NIEM (National Information  Exchange Model) agreed to between the US Departments of Justice and  Homeland Security; see <a href="http://www.niem.gov/">http://www.niem.gov/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a title="_edn9" name="_edn9">[9]</a> <a href="http://www.mkbergman.com/?p=194">OWL Ontologies: When Machine  Readable is Not Good Enough</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/874/brown-bag-lunch-sources-and-classification-of-semantic-heterogeneities/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Brown Bag Lunch: Untapped Assets: The $3 Trillion Value of US Documents</title>
		<link>http://www.mkbergman.com/871/brown-bag-lunch-untapped-assets-the-3-trillion-value-of-us-documents/</link>
		<comments>http://www.mkbergman.com/871/brown-bag-lunch-untapped-assets-the-3-trillion-value-of-us-documents/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 18:43:17 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Brown Bag Lunch]]></category>
		<category><![CDATA[Document Assets]]></category>
		<category><![CDATA[Information Automation]]></category>
		<category><![CDATA[documents]]></category>
		<category><![CDATA[economy]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=871</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Untapped Assets: The $3 Trillion Value of US Documents&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Document Assets&amp;rft.subject=Information Automation&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/871/brown-bag-lunch-untapped-assets-the-3-trillion-value-of-us-documents/&amp;rft.language=English"></span>

Today, in the advanced knowledge economy of the United States, the information contained within documents represents about a third of total gross domestic product, or an amount of about $3.3 trillion annually.
Yet our understanding of the value of documents and the means to manage them is abysmal. These failures impact enterprises of all sizes from [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Brown Bag Lunch: Untapped Assets: The $3 Trillion Value of US Documents&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Brown Bag Lunch&amp;rft.subject=Document Assets&amp;rft.subject=Information Automation&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/871/brown-bag-lunch-untapped-assets-the-3-trillion-value-of-us-documents/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_225.jpg" alt="Friday Brown Bag Lunch" width="158" height="179" /></p>
<p>Today, in the advanced knowledge economy of the United States, the information contained within documents represents about a third of total gross domestic product, or an amount of about <span style="text-decoration: underline;">$3.3 trillion</span> annually.</p>
<p>Yet our understanding of the value of documents and the means to manage them is abysmal. These failures impact enterprises of all sizes from the standpoints of revenues, profitability and reputation. Continued national productivity growth — and thus the wealth of all citizens — depends critically on understanding and managing these document values.</p>
<p>As this white paper describes, the lack of a compelling and demonstrable common understanding of the importance of documents is in itself a major factor limiting available productivity benefits. There is an old Chinese saying that roughly translated is “what cannot be measured, cannot be improved.” Many corporate officers may believe this to be the case for document creation and productivity, but, as this paper shows, in fact many of these document issues <span style="text-decoration: underline;">can be measured</span>.</p>
<div class="boxBrownDotted" style="min-height: 80px; max-width: 460px;"><img style="width: 64px; height: 73px; float: left; margin-right: 10px;" title="Friday Brown Bag Lunch" src="../wp-content/themes/ai3/images/lunchbag_64.png" alt="Friday Brown Bag Lunch" /> This <a href="../834/announcing-the-sporadic-friday-brown-bag-lunch">Friday brown bag leftover</a> was first placed into the <span style="font-weight: bold; color: #993300;">AI3</span> <a href="../chronological-listing/">refrigerator</a> on <a href="http://www.mkbergman.com/82/untapped-assets-the-3-trillion-value-of-us-enterprise-documents/">July 20, 2005</a>. No changes have been made to the original posting.</p>
<p>I&#8217;d like to thank David Siegel for recently highlighting this post from 5 years ago with nice kudos on his <a href="http://thepowerofpull.com/pull/mike-bergman-semantic-business-intelligence">PowerOfPull blog</a>. That reference is what caused me to dust off the cobwebs from this older piece.</div>
<p>To wit, some 25% of all of the annual trillions of dollar spent on document creation costs lend themselves to actionable improvements:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 532px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 363px;"><strong>U.S.</strong><strong> FIRMS</strong></td>
<td style="background-color: #cccccc; width: 86px;">
<p align="center"><strong>$ Million</strong></p>
</td>
<td style="background-color: #cccccc; width: 82px;"><strong>%</strong></td>
</tr>
<tr>
<td style="width: 363px;" valign="top">Cost to Create Documents</td>
<td style="width: 86px;">
<p align="right">$3,261,091</p>
</td>
<td style="width: 82px;"></td>
</tr>
<tr>
<td style="width: 363px;"><strong> Benefits</strong></td>
<td style="width: 86px;"></td>
<td style="width: 82px;"></td>
</tr>
<tr>
<td style="width: 363px;" valign="top">Benefits to Finding Missed or Overlooked Documents</td>
<td style="width: 86px;">
<p align="right">$489,164</p>
</td>
<td style="width: 82px;">
<p align="right">63%</p>
</td>
</tr>
<tr>
<td style="width: 363px;">Benefits to Improved Document Access</td>
<td style="width: 86px;">
<p align="right">$81,360</p>
</td>
<td style="width: 82px;">
<p align="right">10%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Re-finding Web Documents</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$32,967</p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="right">4%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Proposal Preparation and Wins</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$6,798</p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="right">1%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Paperwork Requirements and Compliance</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$119,868</p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="right">15%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Reducing Unauthorized Disclosures</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$51,187</p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="right">7%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"><strong>Total Annual Benefits</strong></td>
<td style="width: 86px;" valign="bottom">
<p align="right">$781,314</p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="right">100%</p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"><strong>PER LARGE FIRM</strong></td>
<td style="width: 86px;" valign="bottom">
<p align="center"><strong>$ Million</strong></p>
</td>
<td style="width: 82px;" valign="bottom">
<p align="center"><strong> </strong></p>
</td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Cost to Create Documents</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$955.6</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits to Finding Missed or Overlooked Documents</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$143.3</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits to Improving Document Access</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$23.8</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Re-finding Web Documents</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$9.7</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Proposal Preparation and Wins</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$2.0</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Paperwork Requirements and Compliance</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$35.1</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom">Benefits of Reducing Unauthorized Disclosures</td>
<td style="width: 86px;" valign="bottom">
<p align="right">$15.0</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"></td>
<td style="width: 86px;" valign="bottom"></td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 363px;" valign="bottom"><strong>Total Annual Benefits</strong></td>
<td style="width: 86px;" valign="bottom">
<p align="right">$229.0</p>
</td>
<td style="width: 82px;" valign="bottom"></td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 1. Mid-range Estimates for the Annual Value of Documents, U.S. Firms, 2002<a name="_ednref1"></a>[1]</p>
<p>The total benefit from improved document access and use to the U.S economy is on the order of $800 billion annually, or about 8% of GDP. For the 1,000 largest U.S. firms, benefits from these improvements can approach nearly $250 million annually per firm. About three-quarters of these benefits arise from <strong><em><span style="text-decoration: underline;">not</span></em></strong> re-creating the intellectual capital already invested in prior document creation. About one-quarter of the benefits are due to reduced regulatory non-compliance or paperwork, or better competitiveness in obtaining solicited grants and contracts.</p>
<p>Indeed, even these figures likely severely underestimate the benefits to enterprises from an improved leverage of document assets. It has always been the case that the best and most successful companies have been able to make better advantage of their intellectual assets than their competitors. The competitiveness advantage from better document access and use alone may exceed the huge benefits in the table above.</p>
<p>Documents — that is, <em>unstructured</em> and <em>semi-structured</em> data — are now at the point where structured data was at 15 years ago. At that time, companies realized that consolidating information from multiple numeric databases would be a key source of competitive advantage. That realization led to the development and growth of the data warehousing or business intelligence markets, now representing about $3.9 billion in annual software sales.</p>
<p>Search and enterprise content management software today only represents a fraction of that amount — perhaps on the order of $500 million annually. But given that intellectual content in documents represents three to four times the amount in numeric structured data, it is clear that document software capabilities are not being well utilized, reaching only a small fraction of their market potential.</p>
<p>The estimates provided in this white paper are drawn from numerous sources and are extremely fragmented, perhaps even inconsistent. One hope in preparing this document was to stimulate more research attention and data gathering around the critical issues of document value to the enterprise and the economy at large.</p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767203">EXECUTIVE SUMMARY</a></span></p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767204">I. INTRODUCTION</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767205">Documents: The Drivers of a Knowledge Economy</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767206">Documents: The Linchpin of Corporate Intellectual Assets</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767207">Documents: Unknown Value, Huge Implications</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767208">Documents: The Next Generation of Data Warehousing?</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767209">Connecting the Dots: A Pointillistic Approach</a></span></p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767210">II. INTERNAL DOCUMENTS</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767211">Number of ‘Valuable’ Documents Produced per Firm</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767212">Total Annual U.S. ‘Costs’ to Create Documents</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767213">‘Cost’ of Creating a ‘Typical’ Document</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767214">‘Cost’ of a Missed or Overlooked Document</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767215">Other Document Total ‘Cost’ Factors and Summary</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767216">Archival Lifetime of ‘Valuable’ Documents</a></span></p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767217">III. WEB DOCUMENTS AND SEARCH</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767218">Estimate of Time and Effort Devoted to Document Search</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767219">Effect of Non-persistent Search Efforts</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767220">‘Cost’ of Creating and Maintaining a Document Category Portal</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767221">‘Cost’ of Inaccessible or Hidden Intranet Sites</a></span></p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767222">IV. OPPORTUNITIES AND THREATS</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767223">‘Costs’ and Opportunity Costs of Winning Proposals</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767224">‘Costs’ of Regulation and Regulatory Non-compliance</a></span></p>
<p style="font-weight: bold; margin-left: 40px;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767225">‘Cost’ of an Unauthorized Posted Document</a></span></p>
<p style="font-weight: bold;"><span style="font-size: x-small;"><a href="../index.php#_Toc106767226">V. CONCLUSIONS</a></span></p>
<h1><a name="_Toc106767204"></a> I. INTRODUCTION</h1>
<p>How many documents does your organization create each year? What effort does this represent in terms of total staffing costs? What does it cost to create a ‘typical’ document? Of documents created, how much of the value in them is readily sharable throughout your organization? How long do you need to keep valuable documents and how can you access them? How much existing document content is re-created simply because prior work cannot be found? When prior information is missed, what do these prior investments in documents represent in terms of loss of market share, revenue or reputation? Indeed, what does the term, “document” represent in your organization’s context?</p>
<p>If you have difficulty answering these questions, you are not alone. Depending on the survey, from 90% to 97% of enterprises cannot answer these questions — in whole or in part. The purpose of this white paper is to provide the first comprehensive assessment ever of these document values.</p>
<p>Enterprises and the analyst community have historically overlooked the impact of <em>document creation</em> as opposed to <em>document handling</em>. Document creation is about 2-3 times more important — from an embedded cost standpoint — than document handling. Second, all aspects of document creation, and later access and use, assume a much greater role in the overall economics of enterprises than have been realized previously.</p>
<h2><a name="_Toc106767205"></a>Documents: The Drivers of a Knowledge Economy</h2>
<p>Put your index finger one inch from your nose. That is how close — and unfocused — document importance is to an organization. Documents are the salient reality of a knowledge economy, but like your finger, documents are often too close, ubiquitous and commonplace to appreciate.</p>
<p>How do your employees earn their livings? Writing proposals? Marketing or selling? Evaluating competitors or opportunities? Persuading? Analyzing? Communicating? Teaching? Of course, in some sectors, many make their living from growing things or making things. These are essential jobs — indeed, until the last few decades were the predominant drivers of economies — but are now being supplanted in advanced economies by knowledge work. Perhaps up to 35% of all company employees in the U.S. can be classified as knowledge workers.</p>
<p>And knowledge work means documents. The fact is that knowledge is produced and communicated through the written word. When we search, when we write, when we persuade, we may often do so verbally but make it persistent through the written word.</p>
<h2><a name="_Toc106767206"></a>Documents: The Linchpin of Corporate Intellectual Assets</h2>
<p>IBM estimates that corporate data doubles every six to eight months, 85% of which are documents.<a name="_ednref2"></a>[2] At least 10% of an enterprise’s information changes on a monthly basis.<a name="_ednref3"></a>[3] Year-on-year office document growth rates are on the order of 22%.<a name="_ednref4"></a>[4] As later analysis indicates, there are perhaps on the order of 10 billion documents created annually in the U.S with a mid-range “asset” value of $3.3 trillion per year. Documents are a huge contributor to the United States’ gross domestic product of $10.5 trillion (2002).</p>
<ul>
<li>According to a Coopers &amp; Lybrand study in 1993:<a name="_ednref5"></a>[5]</li>
<li>Ninety percent of corporate memory exists on paper</li>
<li>Ninety percent of the papers handled each day are merely shuffled</li>
<li>Professionals spend 5-15 percent of their time reading information, but up to 50 percent looking for it</li>
<li>On average, 19 copies are made of each paper document.</li>
</ul>
<p>A Xerox Corporation study commissioned in 2003 and conducted by IDC surveyed 1000 of the largest European companies and had similar findings:<a name="_ednref6"></a>[6],<a name="_ednref7"></a>[7]</p>
<ul>
<li>On average 45% of an executive’s time was spent dealing with documents</li>
<li>82% believe that documents were crucial to the successful operation of their organizations</li>
<li>A further 70% claimed that poor document processes could impact the operational agility of their organizations</li>
<li>While 83%, 78% and 76% consider faxes, email and electronic files as documents, respectively, only 48% and 46% categorize web pages and multimedia content as such.</li>
</ul>
<h2><a name="_Toc106767207"></a>Documents: Unknown Value, Huge Implications</h2>
<p>But, if defining what constitutes a document is hard, identifying the costs associated with all the document activities is almost impossible for many organizations. Ninety to 97 percent of the corporate respondents to the Coopers &amp; Lybrand and Xerox studies, respectively, could not estimate how much they spent on producing documents each year. Almost three quarters of them admit that the information is unavailable or unknown to them.</p>
<p>An A.T. Kearney study sponsored by Adobe, EDS, Hewlett-Packard, Mayfield and Nokia, published in 2001, estimated that workforce inefficiencies related to content publishing cost organizations globally about $750 billion. The study further estimated that knowledge workers waste between 15% to 25% of their time in non-productive document activities.<a name="_ednref8"></a>[8]</p>
<p><img class="center_ok" style="width: 664px; height: 402px;" src="../wp-content/themes/ai3/images/DocValue/Figure1.gif" alt="Enterprise document use (SPIN)" width="664" height="402" /></p>
<p style="text-align: center;">Figure 1. The Situation of Poor Enterprise Document Use Leads to Real Implications</p>
<p>But the situation is much broader and results in part from the inability to quantify the importance of both <em>internal</em> and <em>external</em> document assets to all aspects of the enterprise’s bottom line. For examples drawn from the main body of this white paper, early adopters of enterprise content software typically capture less than 1% of valuable internal documents available; large enterprises are witnessing the proliferation of internal and external Web sites, sometimes exceeding thousands; use of external content is presently limited to Internet search engines, producing non-persistent results and no capture of the investment in discovery or results; and “deep” content in searchable databases, which is common to large organizations and represents 90% of external Internet content, is completely untapped.</p>
<p>A USC study reported that typically only 32% of employees in knowledge organizations have access to good information about technical developments relevant to their work, and 79% claim they have inadequate information about what their competitors are doing.<a name="_ednref9"></a>[9]</p>
<p>The enterprise content integration software market is fragmented and confused, with only a few established companies providing partial solutions. Content integration is still a small market with annual revenues of less than $50 million worldwide.<a name="_ednref10"></a>[10] Vendor offerings fail to satisfy customer needs because of a lack of functionality and a lack of scalability to enterprise volumes. Sales in the market remain distinctly lower than those projected by industry analysts, even as the magnitude of “information overload” continues to grow at a dramatic rate.</p>
<h2><a name="_Toc106767208"></a>Documents: The Next Generation of Data Warehousing?</h2>
<p>Documents — that is, <em>unstructured</em> and <em>semi-structured</em> data — are now at the point where structured data was at 15 years ago. At that time, companies realized that consolidating information from multiple numeric databases would be a key source of competitive advantage. That realization led to the development and growth of the data warehousing or business intelligence markets, now representing about $3.9 billion in annual software sales.<a name="_ednref11"></a>[11]</p>
<p>Certain categories of businesses have been leaders in content integration, especially those that have recently had mergers and acquisitions activity, those that need to integrate business applications with content, and those for which the reuse of marketing assets across the organization is critical.<sup>10</sup></p>
<p>Stonebraker and Hellerstein have provided an insightful roadmap for how enterprise data integration or “federation” has trended over time: Data warehousing → Enterprise application integration → Enterprise content integration → Enterprise information integration.<a name="_ednref12"></a>[12] There are two threads to this trend. First, there has been a growing recognition of the importance of document (unstructured) content to contribute to actionable information. Second, increasingly unified and integrated means are being applied to all data sources to allow single-access retrievals.</p>
<h2><a name="_Toc106767209"></a>Connecting the Dots: A Pointillistic Approach</h2>
<p>The state of information regarding the value and cost of documents is extremely poor. Lack of defensible and vetted estimates for this information undercuts the ability to properly estimate the intellectual assets tied up in documents or the impacts of overlooked or misused documents.</p>
<p>Only three large document studies — the Coopers &amp; Lybrand, Xerox and A.T. Kearney studies noted above — have been conducted in the past ten years regarding the use and importance of documents within enterprises, and then solely from the standpoint of executive perceptions.</p>
<p>The quantified picture presented in this white paper regarding the costs and benefits of document creation, access and use is a paint-by-the-numbers assemblage of disparate data. The paper draws upon about 80 different data sources, many fragmented. The analysis approach by necessity has needed to conjoin assumptions and data from many diverse sources.</p>
<p>This approach leads to both uncertainty regarding “true” values and likely inaccuracies or mis-estimates in some areas. To make the assessment as consistent as possible, a base year of 2002 was used, the common year reference for most of the available data sources. To bracket uncertainties, most estimates are provided in low, medium and high estimates.</p>
<p>Thus, this study should be viewed as preliminary, but strongly indicative of the value of documents. Further research and data collection will surely refine these estimates. Clearly, though, by any measure, the value of documents to the enterprise is significant and huge, and should not continue to be overlooked.</p>
<h1><a name="_Toc106767210"></a>II. INTERNAL DOCUMENTS</h1>
<p>Though valuable content resides everywhere, the first challenge to enterprises is getting a handle on their own internal document content.</p>
<h2><a name="_Toc106767211"></a>Number of ‘Valuable’ Documents Produced per Firm</h2>
<p>A recent UC Berkeley study on “How Much Information?” estimated that more than 4 billion pages of <em>internal</em> office documents with <span style="text-decoration: underline;">archival</span> value are generated annually in the U.S. (Note: this is not the amount created, only those documents deemed worthy of retaining for more than one year).</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 100%;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 20%;" valign="bottom">
<p align="center"><strong>Firm Size (employees)</strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">1-9</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">10-19</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 10%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">20-99</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">100-499</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">500-999</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">1000-2500</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">2500-9999</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 10%;" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">&gt;10,000</span></strong></p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Firms</td>
<td style="width: 9%;" valign="bottom">
<p align="right">3,716,944</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">616,064</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">518,258</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">85,304</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">8,572</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">5,161</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,704</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">930</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Employees</td>
<td style="width: 9%;" valign="bottom">
<p align="right">12,328,094</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">8,274,541</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">20,370,447</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">16,410,367</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">5,906,266</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">7,894,226</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">12,519,664</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">31,357,579</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Knowledge Workers</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,217,093</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,488,099</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">3,663,435</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,951,251</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,062,187</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,419,703</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,251,545</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">5,639,368</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Number of Pages  –  Low</td>
<td style="width: 9%;" valign="bottom">
<p align="right">465,842,666</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">312,670,737</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">769,739,697</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">620,099,840</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">223,180,542</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">298,299,744</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">473,081,537</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">1,184,911,325</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Number of Pages  –  High</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,164,606,665</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">781,676,843</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">1,924,349,242</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,550,249,599</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">557,951,355</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">745,749,360</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,182,703,842</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">2,962,278,313</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Number of Docs  –  Low</td>
<td style="width: 9%;" valign="bottom">
<p align="right">46,584,267</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">31,267,074</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">76,973,970</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">62,009,984</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">22,318,054</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">29,829,974</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">47,308,154</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">118,491,133</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Number of Docs- High</td>
<td style="width: 9%;" valign="bottom">
<p align="right">116,460,666</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">78,167,684</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">192,434,924</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">155,024,960</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">55,795,135</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">74,574,936</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">118,270,384</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">296,227,831</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Docs/Firm  –  Low</td>
<td style="width: 9%;" valign="bottom">
<p align="right">13</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">51</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">149</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">727</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,604</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">5,780</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">17,496</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">127,410</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Docs/Firm  –  High</td>
<td style="width: 9%;" valign="bottom">
<p align="right">31</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">127</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">371</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">1,817</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">6,509</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">14,450</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">43,739</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">318,525</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Docs/Firm – 3 yr Low</td>
<td style="width: 9%;" valign="bottom">
<p align="right">38</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">152</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">446</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2,181</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">7,811</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">17,340</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">52,487</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">382,229</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Docs/Firm – 5 yr High</td>
<td style="width: 9%;" valign="bottom">
<p align="right">157</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">634</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">1,857</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">9,087</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">32,545</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">72,249</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">218,695</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">1,592,623</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 10%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
<td style="width: 10%;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">Content Management Workers</td>
<td style="width: 9%;" valign="bottom">
<p align="right">105,709</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">70,951</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">174,670</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">140,713</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">50,644</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">67,690</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">107,352</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">268,881</p>
</td>
</tr>
<tr>
<td style="width: 20%;" valign="bottom">CMWs/Firm</td>
<td style="width: 9%;" valign="bottom">
<p align="right">0</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">0</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">0</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">6</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">13</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">40</p>
</td>
<td style="width: 10%;" valign="bottom">
<p align="right">289</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 2. Document Projections for U.S. Firms by Size, 2002 Basis</p>
<p align="center"><small>Sources: UC Berkeley<a name="_ednref13"></a>[13], U.S. Commerce Department<a name="_ednref14"></a>[14], U.S. Bureau of Labor Statistics<a name="_ednref15"></a>[15], U.S. Census Bureau<a name="_ednref16"></a>[16]</small></p>
<p>Table 2 and Table 3 attempt to summarize the scale of this challenge for U.S. firms (for internal enterprise documents <em>only</em>). (See<a name="_ednref17"></a>[17] for a description of methodology regarding document scales, note<a name="_ednref18"></a>[18] for estimating the numbers of enterprise knowledge workers, and note<a name="_ednref19"></a>[19] for estimating content workers. A rough multiplier of 3x to 4x can be applied to extrapolate globally.<a name="_ednref20"></a>[20]) Breakouts are provided by size of firm; these include estimates for the number of knowledge and content workers within U.S. firms.</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 323px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 201px;" valign="bottom">
<p align="center"><strong>Category</strong></p>
</td>
<td style="background-color: #cccccc; width: 122px;" valign="bottom">
<p align="center"><strong>Value</strong></p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Firms</td>
<td style="width: 122px;" valign="bottom">
<p align="right">4,953,937</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Employees</td>
<td style="width: 122px;" valign="bottom">
<p align="right">127,273,960</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Knowledge Workers</td>
<td style="width: 122px;" valign="bottom">
<p align="right">20,692,680</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Annual Number of Docs – Low</td>
<td style="width: 122px;" valign="bottom">
<p align="right">9,291,013,320</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Annual Number of Docs- High</td>
<td style="width: 122px;" valign="bottom">
<p align="right">21,739,130,435</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Annual Docs/Firm – Low</td>
<td style="width: 122px;" valign="bottom">
<p align="right">1,875</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Annual Docs/Firm – High</td>
<td style="width: 122px;" valign="bottom">
<p align="right">4,388</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Total Docs/Firm – 3 yr Low</td>
<td style="width: 122px;" valign="bottom">
<p align="right">1,990</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Total Docs/Firm – 5 yr High</td>
<td style="width: 122px;" valign="bottom">
<p align="right">5,601</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom"></td>
<td style="width: 122px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">Content Management Workers</td>
<td style="width: 122px;" valign="bottom">
<p align="right">986,610</p>
</td>
</tr>
<tr>
<td style="width: 201px;" valign="bottom">CMWs/Firm</td>
<td style="width: 122px;" valign="bottom">
<p align="right">0.2</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 3. Total Annual Document Projections for U.S. Firms, 2002 Basis</p>
<p>Table 4 takes this information and breaks out distribution of document production for a ‘typical’ knowledge worker according to major document types. The data from this table is based on analysis of dozens of BrightPlanet customers averaged across about 10 million documents in various repositories.</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 97%;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 12%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 12%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 11%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 1%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 26%;" colspan="3" valign="bottom">
<p align="center"><strong>% Based On</strong></p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 12%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 12%;" valign="bottom">
<p align="center"><strong>All</strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>Unique</strong></p>
</td>
<td style="background-color: #cccccc; width: 11%;" valign="bottom">
<p align="center"><strong>MBs</strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>KB/Page</strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>Pg/Doc</strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>Pages</strong></p>
</td>
<td style="background-color: #cccccc; width: 1%;" valign="bottom">
<p align="center"><strong> </strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>Docs</strong></p>
</td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom">
<p align="center"><strong>MBs</strong></p>
</td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom">
<p align="center"><strong>Pages</strong></p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 24%;" colspan="2" valign="bottom"><strong>Archival Documents (3 yrs)</strong></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 11%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 1%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 8%;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 9%;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">DOC</td>
<td style="width: 12%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">281</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">59</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">20</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">10.5</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">2,938</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">52%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">36%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">50%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PDF</td>
<td style="width: 12%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">46</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">28</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">14</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">43.6</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">2,017</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">9%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">17%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">34%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PPT</td>
<td style="width: 12%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">32</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">26</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">55</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">14.6</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">474</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">6%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">16%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">8%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">XLS</td>
<td style="width: 12%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">178</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">51</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">2.7</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">484</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">33%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">31%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">8%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom"><strong> Weighted</strong></td>
<td style="width: 12%;" valign="bottom"><strong> </strong></td>
<td style="width: 8%;" valign="bottom">
<p align="right">537</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">164</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">28</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">11.0</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">5,912</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">100%</p>
</td>
</tr>
<tr>
<td style="width: 24%;" colspan="2" valign="bottom"><strong>Current Documents (I yr)</strong></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">DOC</td>
<td style="width: 12%;" valign="bottom">
<p align="right">221</p>
</td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom">
<p align="right">71</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">20</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">5.1</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">1,127</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">49%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">35%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">32%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PDF</td>
<td style="width: 12%;" valign="bottom">
<p align="right">66</p>
</td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom">
<p align="right">36</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">14</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">24.7</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">1,634</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">15%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">18%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">46%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PPT</td>
<td style="width: 12%;" valign="bottom">
<p align="right">53</p>
</td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom">
<p align="right">76</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">55</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">12.9</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">687</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">12%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">38%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">20%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">XLS</td>
<td style="width: 12%;" valign="bottom">
<p align="right">108</p>
</td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom">
<p align="right">17</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">0.6</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">70</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">24%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">8%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">2%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom"><strong> Weighted</strong></td>
<td style="width: 12%;" valign="bottom">
<p align="right">449</p>
</td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom">
<p align="right">199</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">57</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">7.8</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">3,517</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">100%</p>
</td>
</tr>
<tr>
<td style="width: 24%;" colspan="2" valign="bottom"><strong>Total per Employee</strong></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 11%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom"></td>
<td style="width: 9%;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">DOC</td>
<td style="width: 21%;" colspan="2" valign="bottom">
<p align="center">502</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">129</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">20</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">8.1</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">4,065</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">51%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">36%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">43%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PDF</td>
<td style="width: 21%;" colspan="2" valign="bottom">
<p align="center">112</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">64</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">14</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">32.5</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">3,650</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">11%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">18%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">39%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">PPT</td>
<td style="width: 21%;" colspan="2" valign="bottom">
<p align="center">86</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">102</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">55</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">13.5</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">1,161</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">9%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">28%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">12%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom">XLS</td>
<td style="width: 21%;" colspan="2" valign="bottom">
<p align="center">285</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">68</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">1.9</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">554</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">29%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">19%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">6%</p>
</td>
</tr>
<tr>
<td style="width: 12%;" valign="bottom"><strong> Weighted</strong></td>
<td style="width: 21%;" colspan="2" valign="bottom">
<p align="center">986</p>
</td>
<td style="width: 11%;" valign="bottom">
<p align="right">363</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">39</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">9.6</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">9,430</p>
</td>
<td style="width: 1%;" valign="bottom"></td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 8%;" valign="bottom">
<p align="right">100%</p>
</td>
<td style="width: 9%;" valign="bottom">
<p align="right">100%</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 4. Document Production for a ‘Typical’ Knowledge Worker</p>
<p>Note that word processed documents account for about 50% of typical production and storage demands. However, also note that documents of the highest archival value, as converted to PDFs for sharing and deployment, also represent about a third to two-fifths of stored documents.</p>
<h2><a name="_Toc106767212"></a>Total Annual U.S. ‘Costs’ to Create Documents</h2>
<p>Based on the information from Table 2 to Table 4 above, all updated to a common year 2002 basis, we can now estimate the total annual costs in the U.S. for creating all internal enterprise documents. The analysis is based on the UC Berkeley information and the Coopers &amp; Lybrand studies. The “bottom up” case is based on the number of annual U.S. documents estimated based on Table 2. These results are shown in the table below:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 450px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 144px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 306px;" colspan="3" valign="bottom">
<p align="center"><strong>Annual U.S. Office Documents</strong></p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 144px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 102px;" valign="bottom">
<p align="center"><strong>Number (M)</strong></p>
</td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom">
<p align="center"><strong>$/Document</strong></p>
</td>
<td style="background-color: #cccccc; width: 96px;" valign="bottom">
<p align="center"><strong>Total $ (B)</strong></p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">“Bottom Up” – Low</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,387</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$738.58</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$1,024</p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">“Bottom Up” – High</td>
<td style="width: 102px;" valign="bottom">
<p align="right">7,242</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$141.43</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$1,024</p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">Coopers &amp; Lybrand</td>
<td style="width: 102px;" valign="bottom">
<p align="right">11,975</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$272.33</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$3,261</p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">C&amp;L – UCB</td>
<td style="width: 102px;" valign="bottom">
<p align="right">27,737</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$272.33</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$7,554</p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">C&amp;L – “Bottom Up”</td>
<td style="width: 102px;" valign="bottom">
<p align="right">4,315</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$272.33</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$1,175</p>
</td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 96px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 144px;" valign="bottom">Average</td>
<td style="width: 102px;" valign="bottom">
<p align="right">10,531</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$384.11</p>
</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$3,253</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 5. Annual U.S. Office Document Cost Estimates<a name="_ednref21"></a>[21]</p>
<p>The average numbers above represent the average of the unique values in each column. The Table 5 analysis suggests there may be on the order of 10 billion documents created annually in the U.S with a total “asset” value on the order of $3.3 trillion per year.</p>
<h2><a name="_Toc106767213"></a>‘Cost’ of Creating a ‘Typical’ Document</h2>
<p>Based on the averages in the table above, a ‘typical’ document may cost on the order of $380 each to create.<a name="_ednref22"></a>[22] Of course, a “document” can vary widely in size, complexity and time to create, and therefore its individual cost and value will vary widely. An invoice generated from an automated accounting system could be a single page and produced automatically in the thousands; proposals for very large contracts can take tens of thousands to millions of dollars to create. For examples, here are some other ‘typical’ costs for a variety of documents:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 276px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 150px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 126px;" colspan="2" valign="bottom">
<p align="center"><strong>Ave. Cost</strong></p>
</td>
</tr>
<tr>
<td style="width: 150px;" valign="bottom">‘Typical’ Document</td>
<td style="width: 87px;" valign="bottom">
<p align="right">$384.11</p>
</td>
<td style="width: 39px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 150px;" valign="bottom"></td>
<td style="width: 87px;" valign="bottom"></td>
<td style="width: 39px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 150px;" valign="bottom">Invoice</td>
<td style="width: 87px;" valign="bottom">
<p align="right">$4.43</p>
</td>
<td style="width: 39px;" valign="bottom"><a name="_ednref23"></a>[23]</td>
</tr>
<tr>
<td style="width: 150px;" valign="bottom">Mortgage Application</td>
<td style="width: 87px;" valign="bottom">
<p align="right">$210.00</p>
</td>
<td style="width: 39px;" valign="bottom"><a name="_ednref24"></a>[24]</td>
</tr>
<tr>
<td style="width: 150px;" valign="bottom">‘Typical’ Proposal</td>
<td style="width: 87px;" valign="bottom">
<p align="right">$17,500.00</p>
</td>
<td style="width: 39px;" valign="bottom"><a name="_ednref25"></a>[25]</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 6. ‘Typical’ per Document Creation Costs</p>
<p>Depending on document mix and activities, individual enterprises may want to vary the average document creation costs used in their cost-benefit estimates.</p>
<h2><a name="_Toc106767214"></a>‘Cost’ of a Missed or Overlooked Document</h2>
<p>The Coopers &amp; Lybrand study suggests that 7.5 percent of all documents are lost forever, and that it costs $120 in labor ($150 updated to 2002) to find a misfiled document;<a name="_ednref26"></a>[26] other studies suggest that 5% to 6% of documents are routinely misplaced or misfiled.</p>
<p>In fact, the extent of this problem is unknown and is affirmed by the Xerox results:<a name="_ednref27"></a>[27]</p>
<ul>
<li>Almost three quarters of corporate respondents admit that the information is unavailable or unknown to them</li>
<li>95% of the companies are not able to estimate the cost of wasted or unused documents</li>
<li>On average 19% of printed documents were wasted.</li>
</ul>
<h2><a name="_Toc106767215"></a>Other Document Total ‘Cost’ Factors and Summary</h2>
<p>Five independent studies suggest that, on average, organizations spend from 5% to 15% of total company revenue on handling documents.<sup>27,<a name="_ednref28"></a>[28],<a name="_ednref29"></a>[29],<a name="_ednref30"></a>[30],<a name="_ednref31"></a>[31] </sup>These seemingly innocuous percentages can translate into huge bottom-line impacts for U.S. enterprises. For example, the total GDP of the United States was on the order of $10.5 <em>trillion</em> at the end of 2002.<a name="_ednref32"></a>[32] Translating this value into the results of Table 5 and the information in previous sections indicates the importance of document creation and handling for U.S enterprises:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 472px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 247px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 81px;" valign="bottom">
<p align="center"><strong>Low</strong></p>
</td>
<td style="background-color: #cccccc; width: 72px;" valign="bottom">
<p align="center"><strong>Medium</strong></p>
</td>
<td style="background-color: #cccccc; width: 73px;" valign="bottom">
<p align="center"><strong>High</strong></p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">Total U.S. Gross Domestic Product ($B)</td>
<td style="width: 81px;" valign="bottom">
<p align="right">$10,487</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">$10,487</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$10,487</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom"></td>
<td style="width: 81px;" valign="bottom"></td>
<td style="width: 72px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">Total Document Handling ($B)</td>
<td style="width: 81px;" valign="bottom">
<p align="right">$524</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">$1,049</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$1,573</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">
<p align="right">% of total GDP:</p>
</td>
<td style="width: 81px;" valign="bottom">
<p align="right">5.0%</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">10.0%</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">15.0%</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom"></td>
<td style="width: 81px;" valign="bottom"></td>
<td style="width: 72px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">Total Document Creation ($B)</td>
<td style="width: 81px;" valign="bottom">
<p align="right">$1,100</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">$3,261</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$7,554</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">
<p align="right">% of total GDP:</p>
</td>
<td style="width: 81px;" valign="bottom">
<p align="right">10.5%</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">31.1%</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">72.0%</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom"></td>
<td style="width: 81px;" valign="bottom"></td>
<td style="width: 72px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">Total Document Misfiled ($B)</td>
<td style="width: 81px;" valign="bottom">
<p align="right">$32</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">$81</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$160</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">
<p align="right">% of total GDP:</p>
</td>
<td style="width: 81px;" valign="bottom">
<p align="right">0.3%</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">0.8%</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">1.5%</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom"></td>
<td style="width: 81px;" valign="bottom"></td>
<td style="width: 72px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">ALL U.S. Document Burdens ($B)</td>
<td style="width: 81px;" valign="bottom">
<p align="right">$1,656</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">$4,390</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$9,287</p>
</td>
</tr>
<tr>
<td style="width: 247px;" valign="bottom">
<p align="right">% of total GDP:</p>
</td>
<td style="width: 81px;" valign="bottom">
<p align="right">15.8%</p>
</td>
<td style="width: 72px;" valign="bottom">
<p align="right">41.9%</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">88.6%</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 7. Range Estimates for Total U.S. Document Burdens in Enterprises, 2002<a name="_ednref33"></a>[33]</p>
<p>A few observations relate to this table. First, enterprises and the analyst community have greatly overlooked the impact of <em>document creation</em> as opposed to <em>document handling</em>. Document creation is about 2-3 times more important  – from an embedded cost standpoint  – than document handling. Second, all aspects of document creation assume a much greater role in the overall economics of enterprises than has been realized previously.</p>
<p><strong>The fact that documents have received so little management attention, awareness, measurement and direct attention to improve performance is shocking.</strong></p>
<h2><a name="_Toc106767216"></a>Archival Lifetime of ‘Valuable’ Documents</h2>
<p>The ‘low’ and ‘high’ estimates for documents in Table 2 and Table 3 assume that 2% and 5%, respectively, of internal documents have archival value. Were these percentages to be higher, the volume of documents requiring integration and access would likewise increase. The 2% value is derived from the UC Berkeley study,<a name="_ednref34"></a>[34] which also refers to an unpublished European study that places archival amounts at 10%. Unfortunately, there is little empirical information to support the degree to which documents deserve to be kept for archival purposes.</p>
<p>Assuming that documents may retain value for three to five years, the largest firms perhaps have as many as 4 million <em>internal</em> documents on average with enterprise-wide value. Firms with fewer employees generally have lower document counts. Archival percentages, however, are a tricky matter, since apparently 85% of all archived documents are accessed.<a name="_ednref35"></a>[35]</p>
<h1><a name="_Toc106767217"></a>III. WEB DOCUMENTS AND SEARCH</h1>
<p>Various estimates by Cowles/Simba,<a name="_ednref36"></a>[36] Veronis, Suhler &amp; Associates,<a name="_ednref37"></a>[37] and Outsell<a name="_ednref38"></a>[38] place the current market for on line business information in the $30 billion to $140 billion range, with significant projected growth. Outsell also indicates that marketing, sales, and product development professionals rely most heavily on information from the Internet for their daily decision making, based on a comparative study of Fortune 500 business professionals’ use of the open Web and fee-based desktop information content services.<a name="_ednref39"></a>[39] Clearly, relevant and targeted content, much of which resides on line, has extreme value to enterprises.</p>
<p>UC Berkeley estimates that about 500 petabytes of new information was published on the Web in 2002,<sup>34</sup><sup> </sup>based on original analysis conducted by BrightPlanet.<a name="_ednref40"></a>[40] The compound growth rate in Web documents has been on the order of more than 200% annually.<a name="_ednref41"></a>[41] Estimates for deep Web content range from about 6-8 times larger <a name="_ednref42"></a>[42] to 500 times larger<a name="_ednref43"></a>40 than standard “surface web” content. The size of Internet content is overwhelming, of highly variable quality, growing at a rapid pace, and with much of its content ephemeral.</p>
<h2><a name="_Toc106767218"></a>Estimate of Time and Effort Devoted to Document Search</h2>
<p>According to a recent study by iProspect, about 56 percent of users use search engines every day, based on a population of which more than 70 percent use the Internet more than 10 hours per week. Professionals abandon a current search 38% of the time after inspecting only one results page (the listing of document result URLs), and overall 82% of users attempt another search if relevant results are not found within the first three results pages. Just 13 percent of users said that they use different search engines for different types of searches.<a name="_ednref44"></a>[43] Only 7.5 percent of Internet users said they refined their search with additional keywords in cases where they were unable to achieve satisfactory results.<a name="_ednref45"></a>[44]</p>
<p>The average knowledge worker spends 2.3 hrs per day  –  or about 25% of work time  –  searching for critical job information.<a name="_ednref46"></a>[45] IDC estimates that enterprises employing 1,000 knowledge workers waste well over $6 million per year each in searching for information that does not exist, failing to find information that does, or recreating information that could have been found but was not.<a name="_ednref47"></a>[46] As that report stated, “It is simply impossible to create knowledge from information that cannot be found or retrieved.”</p>
<p>Vendors and customers often use time savings by knowledge workers as a key rationale for justifying a document or content initiative. This comes about because many studies over the years have noted that white collar employees spend a consistent 20% to 25% of their time seeking information; the premise is that more effective search will save time and drop these percentages. As a sample calculation, each 1% reduction in time devoted to search produces:</p>
<p>$50,000 (base salary) * 1.8 (burden rate) * 1.0% = $900/ employee</p>
<p>The stable percentage effort devoted to search over time suggests it is the “satisficing” allocation. (In other words, knowledge workers are willing to devote a quarter of their time to finding relevant information.) Thus, while better tools to aid better discovery may lead to finding better information and making better decisions more productively  – a far more important justification in itself  – there may not result a strict time or labor savings from more efficient search.<a name="_ednref48"></a>[47]</p>
<h2><a name="_Toc106767219"></a>Effect of Non-persistent Search Efforts</h2>
<p>The percentage of Web page visits that are re-visits is estimated at between 58%<a name="_ednref49"></a>[48] and 80%.<a name="_ednref50"></a>[49] While many of these re-visitations occur shortly after the first visit (<em>e.g</em>., during the same session using the back button), a significant number occur after a considerable amount of time has elapsed. Thus, it is not surprising that a survey of problems using the Web found “Not being able to find a page I know is out there,” and “Not being able to return to a page I once visited,” accounted for 17% of the problems reported, and that the most common problem using bookmarks was, “Changed content.”<a name="_ednref51"></a>[50] Depending on the content type, users use either “direct” or “indirect” approaches to re-find previously discovered information:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 335px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 205px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 65px;" valign="bottom">
<p align="center"><strong>Direct</strong></p>
</td>
<td style="background-color: #cccccc; width: 65px;" valign="bottom">
<p align="center"><strong>Indirect</strong></p>
</td>
</tr>
<tr>
<td style="width: 205px;" valign="bottom">Specific Information</td>
<td style="width: 65px;" valign="bottom">
<p align="right">42%</p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">58%</p>
</td>
</tr>
<tr>
<td style="width: 205px;" valign="bottom">General Information</td>
<td style="width: 65px;" valign="bottom">
<p align="right">58%</p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">43%</p>
</td>
</tr>
<tr>
<td style="width: 205px;" valign="bottom">Specific Documents</td>
<td style="width: 65px;" valign="bottom">
<p align="right">29%</p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">71%</p>
</td>
</tr>
<tr>
<td style="width: 205px;" valign="bottom">Web Documents</td>
<td style="width: 65px;" valign="bottom">
<p align="right">77%</p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">23%</p>
</td>
</tr>
<tr>
<td style="width: 205px;" valign="bottom">Emails</td>
<td style="width: 65px;" valign="bottom">
<p align="right">9%</p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">91%</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 8. General Approaches to Re-finding Previously Discovered Information <a name="_ednref52"></a>[51]</p>
<p>Direct approaches require remembering or specifically noting the specific location of the information. Direct approaches include: direct entry; emailing to self; emailing to others; printing out; saving as file; pasting the URL into a document; and posting to a personal Web site.</p>
<p>Indirect approaches include: searching; looking through bookmarks; and recalling from a history file. All of these indirect approaches are supported by modern browsers. Note that re-finding Web pages or documents relies heavily on having a record of a previously visited URL.</p>
<p>As a University of Washington study supported by Microsoft discovered, all of the specific direct and indirect techniques applied to these re-discovery approaches have significant drawbacks in terms of desired functions for the recall process: <a name="_ednref53"></a>[52]</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 624px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 132px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 48px;"><strong>Portability</strong></td>
<td style="background-color: #cccccc; width: 47px;"><strong>No of Access Points</strong></td>
<td style="background-color: #cccccc; width: 47px;"><strong>Persistence</strong></td>
<td style="background-color: #cccccc; width: 47px;"><strong>Preservation</strong></td>
<td style="background-color: #cccccc; width: 47px;"><strong>Currency</strong></td>
<td style="background-color: #cccccc; width: 47px;"><strong>Context</strong></td>
<td style="background-color: #cccccc; width: 55px;"><strong>Reminding</strong></td>
<td style="background-color: #cccccc; width: 54px;"><strong>Ease of Integration</strong></td>
<td style="background-color: #cccccc; width: 48px;"><strong>Communication</strong></td>
<td style="background-color: #cccccc; width: 54px;"><strong>Ease of Maintenance</strong></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 180px;" colspan="2" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">DIRECT APPROACHES</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 55px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 54px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 48px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 54px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Direct Entry</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Email to Self</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Email to Others</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Low?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Print-out</td>
<td style="width: 48px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Save as File</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Med?</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low?</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Med</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Paste URL in Doc</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low?</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">High?</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Personal Web Site</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">High?</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High?</p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 180px;" colspan="2" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">INDIRECT APPROACHES</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 47px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 55px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 54px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 48px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 54px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Search</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">High</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">Bookmark</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Low</p>
</td>
</tr>
<tr>
<td style="width: 132px;" valign="bottom">History</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Med</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">High</p>
</td>
<td style="width: 47px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 55px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">Low?</p>
</td>
<td style="width: 48px;" valign="bottom">
<p align="center">Low</p>
</td>
<td style="width: 54px;" valign="bottom">
<p align="center">?</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 9. Strengths and Weakness of Existing Techniques to Re-use Web Information</p>
<p>The general observation is that no present technique is able alone to keep search persistent, current or maintain context. These combined inadequacies mean that previously found information is not easily found again, or re-discovered, as the following table shows:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 303px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 238px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 65px;" valign="bottom">
<p align="center"><strong>Percent</strong></p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">Information No Longer Available</td>
<td style="width: 65px;" valign="bottom">
<p align="right">37%</p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">Re-tracing Path Fails</td>
<td style="width: 65px;" valign="bottom">
<p align="right">14%</p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">Time Length Since Last Find</td>
<td style="width: 65px;" valign="bottom">
<p align="right">9%</p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">Other Failure Reasons</td>
<td style="width: 65px;" valign="bottom">
<p align="right">9%</p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">
<p align="center"><strong>Total Information Lost</strong></p>
</td>
<td style="width: 65px;" valign="bottom">
<p align="right">68%</p>
</td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom"></td>
<td style="width: 65px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 238px;" valign="bottom">Success Finding Lost Information</td>
<td style="width: 65px;" valign="bottom">
<p align="right">32%</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 10. Success in Finding Important Earlier Found Web Information <a name="_ednref54"></a>[53]</p>
<p>This table has a number of important observations. First, some 37% of previously found information disappears from the Web, consistent with other findings that estimate about 40% of all Web content disappears annually, some of which has historical or archival value.<a name="_ednref55"></a>[54]</p>
<p>Second, and most importantly, nearly 70% of previously found valuable information cannot be rediscovered again. More than half of this problem is because the information is no longer available on the Web, but other reasons relate to the inadequacies of recall techniques for finding previously discovered information.</p>
<p>These observations can translate into some relatively huge costs on a per employee and per enterprise basis, as the table below shows:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 615px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 173px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 181px;" colspan="2" valign="bottom">
<p align="center"><strong><span style="text-decoration: underline;">Per Knowledge Worker</span></strong></p>
</td>
<td style="background-color: #cccccc; width: 136px;" valign="bottom">
<p align="center"><strong>Per ‘Large’</strong></p>
</td>
<td style="background-color: #cccccc; width: 125px;" valign="bottom">
<p align="center"><strong>All</strong></p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 173px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 97px;" valign="bottom">
<p align="center"><strong>Per Doc</strong></p>
</td>
<td style="background-color: #cccccc; width: 84px;" valign="bottom">
<p align="center"><strong>All Docs</strong></p>
</td>
<td style="background-color: #cccccc; width: 136px;" valign="bottom">
<p align="center"><strong>Enterprise</strong><strong> ($000)</strong></p>
</td>
<td style="background-color: #cccccc; width: 125px;" valign="bottom">
<p align="center"><strong>Enterprises ($M)</strong></p>
</td>
</tr>
<tr>
<td style="width: 173px;" valign="bottom">Re-finding Documents</td>
<td style="width: 97px;" valign="bottom">
<p align="right">$148.54</p>
</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$585</p>
</td>
<td style="width: 136px;" valign="bottom">
<p align="right">$3,547</p>
</td>
<td style="width: 125px;" valign="bottom">
<p align="right">$12,103</p>
</td>
</tr>
<tr>
<td style="width: 173px;" valign="bottom"></td>
<td style="width: 97px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 136px;" valign="bottom"></td>
<td style="width: 125px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 173px;" valign="bottom">Re-creating Documents</td>
<td style="width: 97px;" valign="bottom">
<p align="right">$384.11</p>
</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$1,008</p>
</td>
<td style="width: 136px;" valign="bottom">
<p align="right">$6,114</p>
</td>
<td style="width: 125px;" valign="bottom">
<p align="right">$20,864</p>
</td>
</tr>
<tr>
<td style="width: 173px;" valign="bottom"></td>
<td style="width: 97px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 136px;" valign="bottom"></td>
<td style="width: 125px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 173px;" valign="bottom">TOTAL</td>
<td style="width: 97px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom">
<p align="right">$1,593</p>
</td>
<td style="width: 136px;" valign="bottom">
<p align="right">$9,661</p>
</td>
<td style="width: 125px;" valign="bottom">
<p align="right">$32,967</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 11. ‘Cost’ of Not Readily Re-finding Valuable Web Information</p>
<p>This analysis assumes that some previously found information of value is again re-found (60%), but some is also not re-found and must be re-created (40%).<a name="_ednref56"></a>[55] The ‘large’ enterprise is identical to the definition in Table 2 (which is also nearly equivalent to a Fortune 1000 company).<a name="_ednref57"></a>[56]</p>
<p>The analysis indicates that poor methods to recall previously found and valuable Web documents may cost $1,600 per knowledge worker per year. This translates into nearly a $10 million productivity loss for the largest enterprises, or nearly $33 billion across all U.S. industries.</p>
<p>In relation to the total document costs noted in Table 7 above, these may seem to be comparatively small numbers. However, when viewed in the context of unproductive standard Web search, they indicate important failings in the ability to recall previously found valuable results from searches and their attendant productivity losses.</p>
<h2><a name="_Toc106767220"></a>‘Cost’ of Creating and Maintaining a Document Category Portal</h2>
<p>Users, administrators and industry analysts alike recognize the importance of placing content into logical, intuitive and hierarchically organized categories. About 60% of knowledge workers note that search is a difficult process, made all the more difficult without a logical organization to content.<a name="_ednref58"></a>[57] While technical distinctions exist, these logical structures organized into a hierarchical presentation are most often referred to as “taxonomies,” though other terms such as ontology, subject directory, subject tree, directory structure or classification schema may be used.</p>
<p>Delphi Group’s research with corporate Web sites points to the lack of organized information as the number one problem in the opinion of business professionals. More than three-quarters of the surveyed corporations indicated that a taxonomy or classification system for documents is imperative or somewhat important to their business strategy; more than one-third of firms that classify documents still use manual techniques.<sup>57</sup> Hierarchical arrangements of categorized subjects trigger associations and relationships that are not obvious when simply searching keywords. Other advantages cited for the taxonomic presentation of documents are the greater likelihood of discovery, ease-of-use, overcoming the difficulty of formulating effective search queries, being able to search only within related documents, discovery of relationships among similar terminology and concepts, and user satisfaction.<a name="_ednref59"></a>[58],<a name="_ednref60"></a>[59]</p>
<p>From the user standpoint, knowledge workers want to impose taxonomic order on document chaos, but only if the taxonomy models their domain accurately. They also want software to assist with categorizing, as long as it respects the taxonomy they created. Finally, the results of these category placements should be presented via a portal. Thus, as the common concern across all requirements, the taxonomy takes on tremendous importance for an application’s success.<a name="_ednref61"></a>[60]</p>
<p><img class="center_ok" src="../wp-content/themes/ai3/images/DocValue/Figure2.gif" alt="Large firm documents" width="447" height="295" /></p>
<p style="text-align: center;">Figure 2. Typical Large Firm Documents, Thousands</p>
<p>Enterprises that have adopted directory structures for content management are not yet achieving enterprise-wide relevance, presenting on average 1% of all relevant documents in an organized portal view. These limitations appear to be driven by weaknesses in the technology and high costs associated with conventional approaches:</p>
<ul>
<li><em>Comprehensiveness and Scale </em> –  according to a market report published by Plumtree in 2003, the average document portal contains about 37,000 documents.<a name="_ednref62"></a>[61] This was an increase from a 2002 Plumtree survey that indicated average document counts of 18,000.<a name="_ednref63"></a>[62] However, about 60% of respondents to a Delphi Group survey said they had more than 50,000 internal documents in their portal environment (generally the department level),<sup> 3</sup> and as Table 2 indicates above, most of the largest firms likely have millions or more<em> internal</em> documents deserving of common access and archiving.</li>
<li>The left-hand bar in Figure 2 indicates current averages for documents in existing content portals. The right-hand (yellow and orange) bar indicates potential based on high and low estimates. The ‘Archive’ case (middle bar) show the same values as provided in Table 2, and represent a conservative view of “archival-likely” documents. The right bar is a more representative view of actual current <em>internal </em>content that enterprises may want to make available to their employees.<a name="_ednref64"></a>[63] Two observations have merit: 1) under current practice, enterprises are at most making 10% of their useful documents available, and more likely slightly over 1%; 2) the documents that are being made available are solely internal, and neglect potentially important external sources that would increase document counts considerably.</li>
<li><em>Implementation Times </em> – though average time to stand-up a new content installation is about 6 months, there is also a 22% risk that deployment times exceeds that and an 8% risk it takes longer than one year. Furthermore, internal staff necessary for initial stand-up average nearly 14 people (6 of whom are strictly devoted to content development), with the potential for much larger head counts<a name="_ednref65"></a>[64]</li>
<li><em>Ongoing Maintenance and Staffing Costs </em> – ongoing maintenance and staffing costs typically exceed the initial deployment effort. This trend is perhaps not surprising in that once a valuable content portal has been created there will be demands to expand its scope and coverage. Based on these various factors, Table 12 summarizes set-up, ongoing maintenance and key metrics for today’s conventional approaches versus what BrightPlanet can do (the BrightPlanet document count is based on a ‘typical’ installation; there are no practical scale limits)</li>
</ul>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 568px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 120px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 98px;" valign="bottom">
<p align="center"><strong>DOCUMENT</strong></p>
</td>
<td style="background-color: #cccccc; width: 187px;" colspan="3" valign="bottom">
<p align="center"><strong>INITIAL SET-UP</strong></p>
</td>
<td style="background-color: #cccccc; width: 163px;" colspan="2" valign="bottom">
<p align="center"><strong>MAINTENANCE</strong></p>
</td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 120px;"></td>
<td style="background-color: #cccccc; width: 98px;">
<p align="center"><strong>BASIS</strong></p>
</td>
<td style="background-color: #cccccc; width: 64px;">
<p align="center"><strong>Staff</strong></p>
</td>
<td style="background-color: #cccccc; width: 49px;">
<p align="center"><strong>Mos</strong></p>
</td>
<td style="background-color: #cccccc; width: 73px;">
<p align="center"><strong>$/Doc</strong></p>
</td>
<td style="background-color: #cccccc; width: 73px;">
<p align="center"><strong>Staff</strong></p>
</td>
<td style="background-color: #cccccc; width: 91px;">
<p align="center"><strong>$/Doc</strong></p>
</td>
</tr>
<tr>
<td style="width: 120px;" valign="bottom">Current Practice</td>
<td style="width: 98px;" valign="bottom">
<p align="right">37,000</p>
</td>
<td style="width: 64px;" valign="bottom">
<p align="right">6.2</p>
</td>
<td style="width: 49px;" valign="bottom">
<p align="right">5.4</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$4.861</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">6.4</p>
</td>
<td style="width: 91px;" valign="bottom">
<p align="right">$11.278</p>
</td>
</tr>
<tr>
<td style="width: 120px;" valign="bottom">BrightPlanet</td>
<td style="width: 98px;" valign="bottom">
<p align="right">250,000</p>
</td>
<td style="width: 64px;" valign="bottom">
<p align="right">1.0</p>
</td>
<td style="width: 49px;" valign="bottom">
<p align="right">0.8</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">$0.017</p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="right">0.3</p>
</td>
<td style="width: 91px;" valign="bottom">
<p align="right">$0.078</p>
</td>
</tr>
<tr>
<td style="width: 120px;" valign="bottom"></td>
<td style="width: 98px;" valign="bottom"></td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 49px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
<td style="width: 73px;" valign="bottom"></td>
<td style="width: 91px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 120px;" valign="bottom">BP Advantage</td>
<td style="width: 98px;" valign="bottom">
<p align="center"><strong>6.8 x + up</strong></p>
</td>
<td style="width: 64px;" valign="bottom">
<p align="center"><strong>6.2 x</strong></p>
</td>
<td style="width: 49px;" valign="bottom">
<p align="center"><strong>6.7 x</strong></p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="center"><strong>280.4 x</strong></p>
</td>
<td style="width: 73px;" valign="bottom">
<p align="center"><strong>21.4 x</strong></p>
</td>
<td style="width: 91px;" valign="bottom">
<p align="center"><strong>144.6 x</strong></p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 12. Staff, Time and per Document Costs for Categorized Document Portals</p>
<ul>
<li>The content staff level estimates in the table are consistent with anecdotal information and with a survey of 40 installations that found there were on average 14 content development staff managing each enterprise’s content portal.<a name="_ednref66"></a>[65]</li>
</ul>
<p>Though conventional approaches to content integration seem to lead to high per document set-up and maintenance costs, these should be contrasted with standard practice that suggests it may cost on average $25 to $40 per document simply for filing.<sup>29</sup> Indeed, labor costs can account for up to 30% of total document handling costs.<sup>28</sup> Nonetheless, at $5 to $11 per document for content management alone, this could result in no actual cost savings if electronic access does not displace current filing practices. When multiplied across all enterprise documents, these uncertainties can translate into huge swings in costs or benefits for a content portal initiative.</p>
<ul>
<li><em>Software License v</em>.<em> Full Project Costs</em> – according to Charles Phillips of Morgan Stanley, only 30% of the money spent on major software projects goes to the actual purchase of commercially packaged software. Another third goes to internal software development by companies. The remaining 37% goes to third-party consultants.<a name="_ednref67"></a>[66] In evaluating a commitment, internal staff and consulting time should be carefully scrutinized. Efficiencies in initial deployment and ongoing support are the biggest cost drivers</li>
<li><em>Internal PLUS External Sources</em> – weaknesses in scalability and high implementation costs often lead to a dismissal of the importance of integrating internal plus external content. Few installations address relevant content external to the enterprise essential to achieving its missions. Granted, the increase in scales associated with external content are large, but for some businesses integration with external content may be essential.</li>
</ul>
<p>While other vendors claim fast categorization times, what they fail to mention is the lengthy pre-processing times necessary for generating their categorization metatags. According to Forrester Research, some of these metatagging systems can only process five to 15 documents per hour!<a name="_ednref68"></a>[67]</p>
<h2><a name="_Toc106767221"></a>‘Cost’ of Inaccessible or Hidden Intranet Sites</h2>
<p>In 2003, the portal vendor Plumtree noticed a new trend that it called “Web sprawl,” by which it meant the costly proliferation of Web applications, intranets and extranets.<a name="_ednref69"></a>[68] BEA has taken up this trend as a major thrust to its Web service offerings through an approach it calls “enterprise portal rationalization” (EPR).<a name="_ednref70"></a>[69] According to BEA, its architectural offerings are meant to control the “metastasizing” of corporate Web sites.</p>
<p>How common and to what scale is the proliferation of enterprise Web sites? I have not been able to find any comprehensive studies on this topic, but has been able to find many anecdotal examples. The proliferation, in fact, began as soon as the Internet became popular:</p>
<ul>
<li>As reported in 2000, Intel had more than 1 million URLs on its intranet with more than 100 new Web sites being introduced each month<a name="_ednref71"></a>[70]</li>
<li>In 2002, IBM consolidated over 8,000 intranet sites, 680 ‘major’ sites, 11 million Web pages and 5,600 domain names into what it calls the IBM Dynamic Workplaces, or W3 to employees<a name="_ednref72"></a>[71]</li>
<li>Silicon Graphics’ ‘Silicon Junction’ company-wide portal serves 7,200 employees with 144,000 Web pages consolidated from more than 800 internal Web sites<a name="_ednref73"></a>[72]</li>
<li>Hewlett-Packard Co., for example, has sliced the number of internal Web sites it runs from 4,700 (1,000 for employee training, 3,000 for HR) to 2,600, and it makes them all accessible from one home, @HP <a name="_ednref74"></a>[73]<sup>,<a name="_ednref75"></a>[74]</sup></li>
<li>Avaya Corporation is now consolidating more than 800 internal Web sites globally<a name="_ednref76"></a>[75]</li>
<li>The <em>Wall Street Journal</em> recently reported that AT&amp;T has 10 information architects on staff to maintain its 3,600 intranet sets that contain 1.5 million public Web pages<a name="_ednref77"></a>[76]</li>
<li>The new Department of Homeland Security is faced with the challenge of consolidating more than 3,000 databases inherited from its various constituent agencies.<a name="_ednref78"></a>[77]</li>
</ul>
<p>BrightPlanet’s customers confirm these trends, with indicators of hundreds if not thousands of internal Web sites common in the largest companies. Indeed, it is surprising how many instances there are where corporate IT does not even know the full extent of Web site proliferation. The problem is likely much greater than realized:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 586px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 306px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 84px;" valign="bottom">
<p align="center"><strong>Low</strong></p>
</td>
<td style="background-color: #cccccc; width: 92px;" valign="bottom">
<p align="center"><strong>Med</strong></p>
</td>
<td style="background-color: #cccccc; width: 103px;" valign="bottom">
<p align="center"><strong>High</strong></p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Number of Large Firms</td>
<td style="width: 84px;" valign="bottom">
<p align="right">930</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">1,500</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">3,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Ave Number of Web Sites per Firm</td>
<td style="width: 84px;" valign="bottom">
<p align="right">100</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">500</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">900</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Ave. Number of Documents per Web Site</td>
<td style="width: 84px;" valign="bottom">
<p align="right">100</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">350</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">1,500</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Large Firm Web Sites</td>
<td style="width: 84px;" valign="bottom">
<p align="right">93,000</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">750,000</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">2,700,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Percentage of Known Web Sites</td>
<td style="width: 84px;" valign="bottom">
<p align="right">85%</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">60%</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">40%</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Percentage of Doc Federation for Known Sites</td>
<td style="width: 84px;" valign="bottom">
<p align="right">50%</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">10%</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">2%</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"><strong><span style="text-decoration: underline;">Site Development &amp; Maintenance</span></strong></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Development Cost per Web Site</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$300</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$1,701</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$9,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Annual Maintenance Cost per Site</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$800</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$3,947</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$21,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Yr 1 Cost per Site</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$1,100</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$5,649</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$30,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Yr 1 per Large Firm Costs ($000)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$110</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$2,824</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$27,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Yr 1 Large Firm Costs ($M)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$102</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$4,237</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$81,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"><strong><span style="text-decoration: underline;">‘Cost’ of Unfound Documents</span></strong></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">No. of Unknown Documents per Firm</td>
<td style="width: 84px;" valign="bottom">
<p align="right">5,750</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">80,500</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">820,800</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Number of Large Firm Unknown Docs</td>
<td style="width: 84px;" valign="bottom">
<p align="right">5,347,500</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">120,750,000</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">2,462,400,000</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Cost per Web Site</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$6,900</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$23,915</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$350,310</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Cost of Unknown Docs per Firm ($000)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$690</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$11,958</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$315,279</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Cost of Large Firm Unknown Docs ($M)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$642</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$17,937</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$945,837</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"><strong><span style="text-decoration: underline;">Summary</span></strong></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Cost per Firm ($000)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$800</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$14,782</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$342,279</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Total Cost all Large Firms ($M)</td>
<td style="width: 84px;" valign="bottom">
<p align="right">$744</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$22,173</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">$1,026,837</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom"></td>
<td style="width: 84px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 103px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Development as % of Total Costs</td>
<td style="width: 84px;" valign="bottom">
<p align="right">14%</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">19%</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">8%</p>
</td>
</tr>
<tr>
<td style="width: 306px;" valign="bottom">Unfound Documents as % of Total Costs</td>
<td style="width: 84px;" valign="bottom">
<p align="right">86%</p>
</td>
<td style="width: 92px;" valign="bottom">
<p align="right">81%</p>
</td>
<td style="width: 103px;" valign="bottom">
<p align="right">92%</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 13. Development and Unfound Document ‘Costs’ for Large Firms due to Web Sprawl</p>
<p>Table 13 consolidates previous information to estimate what the ‘costs’ of Web sprawl might be to larger firms (analogous to the Fortune 1000). The table presents Low, Medium and High estimates for number of Web sites per firm, known and unknown documents in each, and associated costs for initial site development and first-year maintenance plus the value of unfound information. The Medium category uses the average values from previous tables. The Low and High values bracket these amounts based on distribution of known values and expert judgment.</p>
<p>The table indicates as a mid-range estimate that an individual Web site for a large enterprise may cost about $6,000 to set-up and maintain in the first year and represents $24,000 in opportunity costs due to unknown or unfound documents. For the average large enterprise across all Web sites, these costs may be $4.2 million and $12.0 million, respectively. Across all large firms, total costs due to Web sprawl may be on the order of $22 billion.</p>
<p>While site development and maintenance costs are not trivial, exceeding $4 billion for all large firms (which can also be significantly reduced  – see previous section), the major cost impact comes from the inability to find or federate the information that is available. Unfound documents represent <strong><em><span style="text-decoration: underline;">well in excess of 80%</span></em></strong> of the costs associated with Web sprawl.</p>
<p>The Web sprawl situation is analogous to other major technology shifts. For example, in the early 1980s, IT grappled mightily with the proliferation of personal computers. Centralized control was impossible in that circumstance because individuals and departments recognized the productivity benefits to be gained by PCs. Only when enterprise-capable vendors of networking technology, such as Novell, were able to offer integration solutions was the corporation able to control and fully exploit the PC’s technology potential.</p>
<p>The proliferation of internal enterprise Web sites is responding to similar drivers: innovation, customer service, or superior methods of product or solutions delivery. Ambitious mid-level managers will continue to exploit these advantages by “cowboy” additions of more corporate Web sites, and that is likely to the good for most enterprises. Gaining control and fully realizing the value of this Web site proliferation  – while not stymieing innovation  – will likely require enabling technology analogous to the networking of PCs.</p>
<h1><a name="_Toc106767222"></a>IV. OPPORTUNITIES AND THREATS</h1>
<p>The previous analysis has focused on more-or-less direct costs and drivers. These impacts are huge and deserve proper consideration. But there are other implications from the inability to access and manage relevant document information. These implications fall into the categories of lost opportunities, liabilities, or non-compliance. These implications often far outweigh the direct costs in their bottom-line impacts. This section presents only a few of these many opportunities.</p>
<h2><a name="_Toc106767223"></a>‘Costs’ and Opportunity Costs of Winning Proposals</h2>
<p>Competitive proposals are an important revenue factor to hundreds of thousands of businesses. Indeed, contracts and grants from federal, state and local governments accounted for 12.1% of GDP in 2002; the amount competitively awarded equaled about 5.6% of GDP.<a name="_ednref79"></a>[78] Reducing the fully-burdened costs of producing responses to competitive procurements and improving the rate of successfully obtaining them can be a huge competitive advantage to business.</p>
<p>Significant proportions of commercial projects and programs are likewise awarded through competitive proposals and bids. However, literature references to these are limited, and the remainder of this section relies on federal sector statistics as a proxy for the overall category.</p>
<p>Though the federal government is making strides in providing central clearinghouses to opportunities  – and is also doing much in moving to uniform application standards and electronic application submissions  – these efforts are still in their nascent stages and similar efforts at the state and local level are severely lagging. As a result, the magnitude of the proposal opportunity is perhaps largely unknown to many businesses. This lack of appreciation and attention to the cost- and success-drivers behind winning proposals is a real gap in the competitiveness of many individual businesses.</p>
<p>Table 14 on the following page consolidates information from many government sources to quantify the magnitude of this competitively-awarded grant and contract opportunity with governments.</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 527px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 271px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 102px;" valign="bottom">
<p align="center"><strong>Number of Awards</strong></p>
</td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom">
<p align="center"><strong>Amount ($000)</strong></p>
</td>
<td style="background-color: #cccccc; width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 271px;" valign="bottom"><strong><span style="text-decoration: underline;">Federal Government</span></strong></td>
<td style="background-color: #cccccc; width: 102px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Grants</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,335,813</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$441,037,633</p>
</td>
<td style="width: 46px;" valign="bottom"><a name="_ednref80"></a>[79] <a name="_ednref81"></a>[80]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Contract Procurements</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,155,096</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$327,413,076</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Grants</td>
<td style="width: 102px;" valign="bottom">
<p align="right">336,091</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$99,234,657</p>
</td>
<td style="width: 46px;" valign="bottom"><a name="_ednref82"></a>[81]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Procurements</td>
<td style="width: 102px;" valign="bottom">
<p align="right">909,087</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$231,878,136</p>
</td>
<td style="width: 46px;" valign="bottom"><a name="_ednref83"></a>[82]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Competitive Opportunities</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,245,179</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$331,112,793</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Ave Competitive Opportunity</td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom">
<p align="right">$266</p>
</td>
<td style="width: 46px;" valign="bottom"><a name="_ednref84"></a>[83]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 271px;" valign="bottom"><strong><span style="text-decoration: underline;">State &amp; Local Government</span></strong></td>
<td style="background-color: #cccccc; width: 102px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 46px;" valign="bottom"><a name="_ednref85"></a>[84] <a name="_ednref86"></a>[85]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Grants</td>
<td style="width: 102px;" valign="bottom">
<p align="right">757,199</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$190,000,000</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Contract Procurements</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,439,031</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$310,000,000</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Grants</td>
<td style="width: 102px;" valign="bottom">
<p align="right">190,512</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$42,750,512</p>
</td>
<td style="width: 46px;" valign="bottom"><a name="_ednref87"></a>[86]</td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Procurements</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,132,551</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$219,545,972</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Competitive Opportunities</td>
<td style="width: 102px;" valign="bottom">
<p align="right">1,323,063</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$262,296,485</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Ave Competitive Opportunity</td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom">
<p align="right">$198</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 271px;" valign="bottom"><strong><span style="text-decoration: underline;">Total (no B-to-B)</span></strong></td>
<td style="background-color: #cccccc; width: 102px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Grants</td>
<td style="width: 102px;" valign="bottom">
<p align="right">526,603</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$141,985,169</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Competitively-awarded Procurements</td>
<td style="width: 102px;" valign="bottom">
<p align="right">2,041,638</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$451,424,108</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Total Competitive Opportunities</td>
<td style="width: 102px;" valign="bottom">
<p align="right">2,568,241</p>
</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$593,409,277</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 271px;" valign="bottom">Ave Competitive Opportunity</td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom">
<p align="right">$231</p>
</td>
<td style="width: 46px;" valign="bottom"></td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 14. Federal, State &amp; Local Contract and Grant Opportunities, 2002</p>
<p>This analysis suggests there are nearly $600 billion available each year for competitively awarded grants and procurements from all levels of government within the U.S.; about 60% from the federal sector. The average competitive award is about $270 K for grants; about $220 K for contract procurements.</p>
<p>Aside from construction firms (which are excluded in this and prior analyses), there are on the order of 92,500 federal contract-seeking firms today.<a name="_ednref88"></a>[87] In 2003, the top 200 federal contracting firms accounted for nearly $190 billion in contract outlays.<a name="_ednref89"></a>[88] While it is unclear what proportion of these commitments were competitive (81% of total federal commitments) or based on all contract procurements (57% of total federal commitments), it is clear that more than 90,000 firms are competing via a classic power curve for a minor portion of available federal revenues. This power curve is shown in Figure 3 below for the 200 largest federal contractors, which obtain a proportionately high percentage of all contract dollars.</p>
<p><img class="center_ok" src="../wp-content/themes/ai3/images/DocValue/Figure3.gif" alt="Power curve distribution of Fedeeral contractors" width="623" height="331" /></p>
<p style="text-align: center;">Figure 3. Power Curve Distribution of Top 200 Federal Contractors by Revenue, 2002</p>
<p>The combination of these factors enables an estimate of the bottom-line proposal impacts by firm. This information is shown in the table below:</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 648px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 324px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom">
<p align="center"><strong>Number</strong></p>
</td>
<td style="background-color: #cccccc; width: 180px;" colspan="3" valign="bottom">
<p align="center"><strong>Amount ($000)</strong></p>
</td>
<td style="background-color: #cccccc; width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Total Competitive Awards</td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Federal</td>
<td style="width: 108px;" valign="bottom">
<p align="right">1,245,179</p>
</td>
<td style="width: 180px;" colspan="3" valign="bottom">
<p align="center">$331,112,793</p>
</td>
<td style="width: 36px;" valign="bottom"><a name="_ednref90"></a>[89]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">State &amp; Local</td>
<td style="width: 108px;" valign="bottom">
<p align="right">1,323,063</p>
</td>
<td style="width: 180px;" colspan="3" valign="bottom">
<p align="center">$262,296,485</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Number of Competing Firms</td>
<td style="width: 108px;" valign="bottom">
<p align="right">120,250</p>
</td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"><a name="_ednref91"></a>[90]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Number of Winning Firms</td>
<td style="width: 108px;" valign="bottom">
<p align="right">90,805</p>
</td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Number of Winning Proposals</td>
<td style="width: 108px;" valign="bottom">
<p align="right">2,326,485</p>
</td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Number of Submitted Proposals</td>
<td style="width: 108px;" valign="bottom">
<p align="right">11,211,974</p>
</td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 64px;" valign="bottom"></td>
<td style="width: 116px;" colspan="2" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 324px;" valign="bottom"><strong><span style="text-decoration: underline;">Direct Proposal Preparation Costs</span></strong></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 64px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 116px;" colspan="2" valign="bottom"></td>
<td style="background-color: #cccccc; width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Winning Proposal Preparation</td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 180px;" colspan="3" valign="bottom">
<p align="center">$5,021,357</p>
</td>
<td style="width: 36px;" valign="bottom"><a name="_ednref92"></a>[91]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Losing Proposals Preparation</td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 180px;" colspan="3" valign="bottom">
<p align="center">$16,939,516</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">TOTAL Proposal Preparation</td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 180px;" colspan="3" valign="bottom">
<p align="center">$21,960,873</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 90px;" colspan="2" valign="bottom"></td>
<td style="width: 90px;" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom">
<p align="center"><strong>Low</strong></p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="center"><strong>Med</strong></p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="center"><strong>High</strong></p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Improvement in RFP Development</td>
<td style="width: 108px;" valign="bottom">
<p align="right">7.5%</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">15.0%</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">35.0%</p>
</td>
<td style="width: 36px;" valign="bottom"><a name="_ednref93"></a>[92]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 90px;" colspan="2" valign="bottom"></td>
<td style="width: 90px;" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 324px;" valign="bottom"><strong><span style="text-decoration: underline;">Proposal Preparation</span></strong></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 90px;" colspan="2" valign="bottom"></td>
<td style="background-color: #cccccc; width: 90px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Benefits – Individual Submitters ($000)</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$14</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">$27</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">$64</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Benefits – All Submitters ($000)</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$1,647,065</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">$3,294,131</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">$7,686,305</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 90px;" colspan="2" valign="bottom"></td>
<td style="width: 90px;" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="background-color: #cccccc; width: 324px;" valign="bottom"><strong><span style="text-decoration: underline;">Proposal Success Benefits</span></strong></td>
<td style="background-color: #cccccc; width: 108px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 90px;" colspan="2" valign="bottom"></td>
<td style="background-color: #cccccc; width: 90px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Increase in Number of Winning Submissions</td>
<td style="width: 108px;" valign="bottom">
<p align="right">6,810</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">13,621</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">31,782</p>
</td>
<td style="width: 36px;" valign="bottom"><a name="_Ref90884783"></a><a name="_ednref94"></a>[93]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Increase in Number of Winning Firms</td>
<td style="width: 108px;" valign="bottom">
<p align="right">1,406</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">2,812</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">6,562</p>
</td>
<td style="width: 36px;" valign="bottom"><a name="_ednref95"></a>[94]</td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Benefits – Individual Submitters ($000)</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$1,235</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">$1,235</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">$1,235</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom">Benefits – All Submitters ($000)</td>
<td style="width: 108px;" valign="bottom">
<p align="right">$1,737,101</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">$3,474,203</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">$8,106,473</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"></td>
<td style="width: 108px;" valign="bottom"></td>
<td style="width: 90px;" colspan="2" valign="bottom"></td>
<td style="width: 90px;" valign="bottom"></td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 324px;" valign="bottom"><strong><span style="text-decoration: underline;">Benefits – All Submitters/All Aspects</span></strong></td>
<td style="width: 108px;" valign="bottom">
<p align="right">$3,384,167</p>
</td>
<td style="width: 90px;" colspan="2" valign="bottom">
<p align="right">$6,768,334</p>
</td>
<td style="width: 90px;" valign="bottom">
<p align="right">$15,792,778</p>
</td>
<td style="width: 36px;" valign="bottom"></td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 15. Combined Preparation Costs and Opportunity Costs for Proposals</p>
<p>Across all entities, the annual cost of preparing proposals to competitive solicitations from government agencies at all levels is on the order of $22 billion, $5 billion for winning firms and $17 billion for losing firms. Better access to missing information and better information  – assuming no change in the underlying ideas or proposal-writing skills  – suggests that proposal response costs could be reduced by more than $3 billion annually. Another $3 billion annually is available for better winning of competitive proposals. Individual benefits to firms that respond to competitive solicitations is on average $1.25 million per competing firm.<a name="_ednref96"></a>[95]</p>
<p>The more significant benefit to individual firms from improved access to “missing” information and better information is increasing the likelihood of winning a competitive award. Firms that embrace these practices are estimated to obtain a $1.2 million annual benefit. Given that many firms that have previously been losing awards have relatively low annual revenues, the percent impact on the bottom line can be quite striking due to improved proposal preparation information.</p>
<h2><a name="_Toc106767224"></a>‘Costs’ of Regulation and Regulatory Non-compliance</h2>
<p>A December 2001 small business poll by the National Federation of Independent Business (NFIB) gauged the impacts of the regulatory workload on firms. When asked “is government regulation a very serious, somewhat serious, not too serious, or not at all serious problem for your business,” nearly half, or 43.6 percent, answered “very serious” or “somewhat serious.” The respondents indicated the most serious regulatory problems were at the federal level (49 %), state level (35 %) or local level (13%) of government. The biggest single regulatory problem cited was extra paperwork, followed by difficulty understanding how to comply with regulations and dollars spent doing so.<a name="_ednref97"></a>[96] A later December 2003 NFIB survey indicates that the average cost per hour of complying with paperwork requirements was $48.72.<a name="_ednref98"></a>[97]</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 624px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 156px;" valign="bottom">
<p align="center"><strong>Type of Regulation</strong></p>
</td>
<td style="background-color: #cccccc; width: 96px;" valign="bottom">
<p align="center"><strong>All Firms</strong></p>
</td>
<td style="background-color: #cccccc; width: 120px;" valign="bottom">
<p align="center"><strong>&lt;20 Employees</strong></p>
</td>
<td style="background-color: #cccccc; width: 132px;" valign="bottom">
<p align="center"><strong>20-499 Employees</strong></p>
</td>
<td style="background-color: #cccccc; width: 120px;" valign="bottom">
<p align="center"><strong>500+ Employees</strong></p>
</td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom">All Federal Regulations</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$5,107</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$7,544</p>
</td>
<td style="width: 132px;" valign="bottom">
<p align="right">$4,671</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$4,827</p>
</td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom"></td>
<td style="width: 96px;" valign="bottom"></td>
<td style="width: 120px;" valign="bottom"></td>
<td style="width: 132px;" valign="bottom"></td>
<td style="width: 120px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom">Environmental</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$1,312</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$3,600</p>
</td>
<td style="width: 132px;" valign="bottom">
<p align="right">$1,269</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$776</p>
</td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom">Economic</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$2,234</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$1,748</p>
</td>
<td style="width: 132px;" valign="bottom">
<p align="right">$1,782</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$2,688</p>
</td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom">Workplace</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$843</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$897</p>
</td>
<td style="width: 132px;" valign="bottom">
<p align="right">$944</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$755</p>
</td>
</tr>
<tr>
<td style="width: 156px;" valign="bottom">Tax Compliance</td>
<td style="width: 96px;" valign="bottom">
<p align="right">$719</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$1,300</p>
</td>
<td style="width: 132px;" valign="bottom">
<p align="right">$676</p>
</td>
<td style="width: 120px;" valign="bottom">
<p align="right">$608</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 16. Per Employee Costs of Federal Regulation by Firm Size, 2002</p>
<p>According to a 2001 report, “The Impact of Regulatory Costs on Small Firms” by W. Mark Crain and Thomas D. Hopkins, the total costs of Federal regulations were estimated to be $843 billion in 2000, or 8 percent of the U. S. Gross Domestic Product. Of these costs, $497 billion fell on business and $346 billion fell on consumers or other governments. Here are how those impacts are estimated on a per employee basis across a range of firm sizes:<a name="_ednref99"></a>[98]</p>
<p>As of September 30, 2002, federal agencies estimated there were about 8.2 billion “burden hours” of paperwork government-wide. Almost 95 percent of those 8.2 billion hours were being collected primarily for the purpose of regulatory compliance. <a name="_ednref100"></a>[99]</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 492px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 192px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 156px;" valign="bottom">
<p align="center"><strong>Burden Hrs (million)</strong></p>
</td>
<td style="background-color: #cccccc; width: 144px;" valign="bottom">
<p align="center"><strong>Labor Costs ($M)</strong></p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom"><strong>Total Government</strong></td>
<td style="width: 156px;" valign="bottom">
<p align="right">8,223.17</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$318,237</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom"><strong>Total Gov (excl. Treasury)</strong></td>
<td style="width: 156px;" valign="bottom">
<p align="right">1,472.74</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$56,995</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom"></td>
<td style="width: 156px;" valign="bottom"></td>
<td style="width: 144px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Treasury</td>
<td style="width: 156px;" valign="bottom">
<p align="right">6,750.43</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$261,242</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Transportation</td>
<td style="width: 156px;" valign="bottom">
<p align="right">244.73</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$9,471</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">HHS</td>
<td style="width: 156px;" valign="bottom">
<p align="right">224.83</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$8,701</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Labor</td>
<td style="width: 156px;" valign="bottom">
<p align="right">189.22</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$7,323</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">EPA</td>
<td style="width: 156px;" valign="bottom">
<p align="right">140.47</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$5,436</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Defense</td>
<td style="width: 156px;" valign="bottom">
<p align="right">92.36</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$3,574</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Agriculture</td>
<td style="width: 156px;" valign="bottom">
<p align="right">88.59</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$3,428</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Justice</td>
<td style="width: 156px;" valign="bottom">
<p align="right">46.60</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$1,803</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Education</td>
<td style="width: 156px;" valign="bottom">
<p align="right">38.44</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$1,488</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">State</td>
<td style="width: 156px;" valign="bottom">
<p align="right">29.23</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$1,131</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">HUD</td>
<td style="width: 156px;" valign="bottom">
<p align="right">21.93</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$849</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Commerce</td>
<td style="width: 156px;" valign="bottom">
<p align="right">11.65</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$451</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Interior</td>
<td style="width: 156px;" valign="bottom">
<p align="right">7.66</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$296</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Energy</td>
<td style="width: 156px;" valign="bottom">
<p align="right">3.76</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$146</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom"></td>
<td style="width: 156px;" valign="bottom"></td>
<td style="width: 144px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">SEC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">136.58</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$5,286</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FTC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">69.66</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$2,696</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FCC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">26.80</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$1,037</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">SSA</td>
<td style="width: 156px;" valign="bottom">
<p align="right">24.89</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$963</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FAR (contracts)</td>
<td style="width: 156px;" valign="bottom">
<p align="right">24.49</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$948</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FCIC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">9.87</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$382</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">NRC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">8.34</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$323</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FEMA</td>
<td style="width: 156px;" valign="bottom">
<p align="right">7.77</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$301</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">Veterans Administration</td>
<td style="width: 156px;" valign="bottom">
<p align="right">7.31</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$283</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">NASA</td>
<td style="width: 156px;" valign="bottom">
<p align="right">5.95</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$230</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">NSF</td>
<td style="width: 156px;" valign="bottom">
<p align="right">4.46</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$173</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">FERC</td>
<td style="width: 156px;" valign="bottom">
<p align="right">4.38</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$170</p>
</td>
</tr>
<tr>
<td style="width: 192px;" valign="bottom">SBA</td>
<td style="width: 156px;" valign="bottom">
<p align="right">2.77</p>
</td>
<td style="width: 144px;" valign="bottom">
<p align="right">$107</p>
</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 17. Federal Government Paperwork Burdens, 2002<a name="_ednref101"></a>[100]</p>
<p>A December 2003 NFIB survey indicates that the average cost per hour of complying with paperwork requirements was $48.72.<a name="_ednref102"></a>[101] If these costs are substituted, the total cost burden in the table above would be about $400 billion, $71 billion of which excludes Treasury and the IRS.</p>
<p>Despite legislation requiring federal paperwork reduction and embracing of e-government initiatives, paperwork burdens continue to increase. Total burden hours in 2002, for example, increased 600 million hours, or about 4 percent, from the previous year. The Code of Federal Regulations (CFR) continues to expand despite efforts to curtail further growth. The CFR grew from 71,000 pages in 1975 to 135,000 pages in 1998. Annually, there are more than 4,000 regulatory changes introduced by the federal government. The federal government now has over 8,000 separate information collection requests authorized by OMB.<a name="_ednref103"></a>[102]</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 546px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 415px;">
<p align="center"><strong>Federal Source</strong></p>
</td>
<td style="background-color: #cccccc; width: 102px;">
<p align="center"><strong>Fines ($ 000)</strong></p>
</td>
<td style="background-color: #cccccc; width: 29px;"></td>
</tr>
<tr>
<td style="width: 415px;">Internal Revenue Service</td>
<td style="width: 102px;">
<p align="right">$4,119,622</p>
</td>
<td style="width: 29px;">
<p align="center"><a name="_ednref104"></a>[103]</p>
</td>
</tr>
<tr>
<td style="width: 415px;">Corporate Income</td>
<td style="width: 102px;">
<p align="right">$1,120,531</p>
</td>
<td style="width: 29px;"></td>
</tr>
<tr>
<td style="width: 415px;">Employment Taxes</td>
<td style="width: 102px;">
<p align="right">$2,691,021</p>
</td>
<td style="width: 29px;"></td>
</tr>
<tr>
<td style="width: 415px;">Excise Taxes</td>
<td style="width: 102px;">
<p align="right">$200,585</p>
</td>
<td style="width: 29px;"></td>
</tr>
<tr>
<td style="width: 415px;">Other Taxes</td>
<td style="width: 102px;">
<p align="right">$107,486</p>
</td>
<td style="width: 29px;"></td>
</tr>
<tr>
<td style="width: 415px;"></td>
<td style="width: 102px;"></td>
<td style="width: 29px;" valign="bottom"><a name="_ednref105"></a>[104]</td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Agriculture</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$2,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Economic Stabilization</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$9,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Labor &amp; Immigration</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$72,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Commerce &amp; Customs (excl SEC)</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$22,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">SEC</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$101,000</p>
</td>
<td style="width: 29px;" valign="bottom">
<p align="right"><a name="_ednref106"></a>[105]</p>
</td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Narcotics &amp; Alcohol</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$2,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Mine Safety</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$18,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Environmental Protection</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$212,000</p>
</td>
<td style="width: 29px;" valign="bottom">
<p align="right"><a name="_ednref107"></a>[106]</p>
</td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Miscellaneous</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$1,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">Other</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$448,000</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 415px;" valign="bottom">TOTAL</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$5,006,622</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 18. Federal Fines and Penalties to Corporations, 2002</p>
<p>Another source of costs to enterprises are civil penalties and fines for non-compliance with existing regulations, as shown in the table above for 2002 by agency. A total of $5 billion annually is expended by U.S. businesses for civil penalties due to non-compliance with federal regulation, $1 billion of which is due to non-tax purposes.</p>
<p>However, these estimates may undercount actual fines and penalties levied by the federal government due to the accounting basis of the OMB source. For example, the Department of Labor (DOL) collected fines and penalties totaling $175 million from employers in fiscal year 2002 for Fair Labor Standards Act (FLSA) violations.<a name="_ednref108"></a>[107] According to a 2002 report, since 1990, 43 of the government’s top contractors paid approximately $3.4 billion in fines/penalties, restitution, and settlements.<a name="_ednref109"></a>[108] And, according to another report, the corporations liable to the top 100 False Claims Act paid more than $12 billion since 1986.<a name="_ednref110"></a>[109] Since there is no central clearinghouse for this information, with both individual agency general counsels and the Department of Justice responsible for actual collections, the figures in Table 18 should be interpreted as estimates.</p>
<p>Table 19 on the next page consolidates the information in Table 16 to Table 18 to estimate the overall regulatory and paperwork burdens on U.S. businesses, plus estimates of the benefits to be gained from better document access and use.</p>
<h2><a name="_Toc106767225"></a>‘Cost’ of an Unauthorized Posted Document</h2>
<p>Unauthorized information disclosures derive mainly from within an organization. The ease of electronic record duplication and dissemination  – particularly through postings on enterprise Web sites  – increases a firm’s vulnerability to this problem. Records mutate and propagate in poorly controlled environments. On average, unauthorized disclosure of confidential information costs Fortune 1000 companies about $15 million per company per year.<a name="_ednref111"></a>[110]</p>
<p>A few privacy laws demonstrate the potential liabilities associated with disclosure of confidential information due to inadvertent mistakes or disgruntled employees. As one example, the Health Insurance Portability and Accountability Act (HIPAA) of 1996 sets security standards protecting the confidentiality and integrity of “individually identifiable health information,” past, present or future. Failure to comply with any of the electronic data, security, or privacy standards can result in civil monetary penalties up to $25,000 per standard per year. Violation of the privacy regulations for commercial or malicious purposes can result in criminal penalties of $50,000 to $250,000 in fines and one to ten years of imprisonment.<a name="_ednref112"></a>[111]</p>
<table style="text-align: left; margin-left: auto; margin-right: auto; width: 641px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="background-color: #cccccc; width: 318px;" valign="bottom"></td>
<td style="background-color: #cccccc; width: 92px;" valign="bottom">
<p align="center"><strong> </strong></p>
</td>
<td style="background-color: #cccccc; width: 202px;" colspan="3" valign="bottom">
<p align="center"><strong>Amount ($000)</strong></p>
</td>
<td style="background-color: #cccccc; width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total Federal Paperwork Burden (non-tax)</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$56,995,038</p>
</td>
<td style="width: 29px;" valign="bottom"><a name="_ednref113"></a>[112]</td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total Federal Other Regulatory Burden</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$331,791,551</p>
</td>
<td style="width: 29px;" valign="bottom"><a name="_ednref114"></a>[113]</td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total Federal Fines and Penalties</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$5,006,622</p>
</td>
<td style="width: 29px;" valign="bottom"><a name="_ednref115"></a>[114]</td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 110px;" colspan="2" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total State and Local Paperwork Burden (non-tax)</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$32,059,709</p>
</td>
<td style="width: 29px;" valign="bottom"><a name="_ednref116"></a>[115]</td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total State and Local Other Regulatory Burden</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$186,632,748</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Total State and Local Fines and Penalties</td>
<td style="width: 294px;" colspan="4" valign="bottom">
<p align="center">$2,816,225</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom">
<p align="center"><strong>Low</strong></p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="center"><strong>Med</strong></p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="center"><strong>High</strong></p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Improvements Due to Better Information</td>
<td style="width: 92px;" valign="bottom">
<p align="right">7.5%</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">15.0%</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">35.0%</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"><strong><span style="text-decoration: underline;">Paperwork Burdens </span></strong><span style="text-decoration: underline;">(non-tax)</span></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits per Large Firm</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$1,957</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$3,915</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$9,134</p>
</td>
<td style="width: 29px;" valign="bottom"><a name="_ednref117"></a>[116]</td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits – All Firms</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$6,679,106</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$13,358,212</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$31,169,161</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"><strong><span style="text-decoration: underline;">Other Regulatory Burdens</span></strong></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits per Large Firm</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$11,394</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$22,788</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$53,172</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits – All Firms</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$38,881,822</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$77,763,645</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$181,448,505</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"><strong><span style="text-decoration: underline;">Reductions in Fines and Penalties</span></strong></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits per Large Firm</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$4,212</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$8,424</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$19,655</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits – All Firms</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$14,372,953</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$28,745,905</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$67,073,779</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom"><strong><span style="text-decoration: underline;">TOTAL – All Regulatory Burdens</span></strong></td>
<td style="width: 92px;" valign="bottom"></td>
<td style="width: 100px;" colspan="2" valign="bottom"></td>
<td style="width: 102px;" valign="bottom"></td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits per Large Firm</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$17,563</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$35,126</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$81,962</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
<tr>
<td style="width: 318px;" valign="bottom">Benefits – All Firms</td>
<td style="width: 92px;" valign="bottom">
<p align="right">$59,933,881</p>
</td>
<td style="width: 100px;" colspan="2" valign="bottom">
<p align="right">$119,867,762</p>
</td>
<td style="width: 102px;" valign="bottom">
<p align="right">$279,691,445</p>
</td>
<td style="width: 29px;" valign="bottom"></td>
</tr>
</tbody>
</table>
<p style="text-align: center;">Table 19. Regulatory Burden and Benefits to Firms from Improved Information</p>
<p>As another example, the Gramm-Leach-Bliley Act (GLBA) of 1999 mandates the financial industry to create guidelines for the safeguarding of customer information. GLBA includes severe civil and criminal penalties for non-compliance, with civil penalties up to $100,000 for each violation and key officers may be fined up to $10,000 per violation. Violation of the GLBA can also carry hefty sanctions, including termination of FDIC insurance and fines of up to $1,000,000 for an individual or one percent of the total assets of the financial institution.<a name="_ednref118"></a>[117]</p>
<p>Other major areas of unauthorized disclosure liability occur in national security, identity theft, and commerce, tax and Social Security information. Indeed, virtually every state and federal agency related to a company’s business has policies and fines regarding unauthorized disclosures. Monitoring these requirements is thus an imperative for enterprise management to prevent exposure to fines and loss of reputation.</p>
<p>On a less-quantifiable basis there are also risks about the clarity of the enterprise message to customers, suppliers and partners. Unmanaged Web sprawl is a critical hole for enterprises to ensure compliance with privacy and confidentiality regulations, and to promote clarity of message and accuracy to stakeholders.</p>
<h1><a name="_Toc106767226"></a>V. CONCLUSIONS</h1>
<p>Prior to the analysis in this white paper, the state of understanding about the value of document assets had been abysmal. While still preliminary and subject to much improvement, this study has nonetheless found:</p>
<ul>
<li>The value of documents  –  in their creation, access and use  –  can indeed be measured</li>
<li>The information contained within U.S. enterprise documents represents about a third of gross domestic product, or an amount of about <em><span style="text-decoration: underline;">$3.3 trillion</span></em> annually</li>
<li>Some 25% of all of these expenditures lend themselves to actionable improvements</li>
<li>There are perhaps on the order of 10 billion documents created annually in the U.S.</li>
<li>Corporate data doubles every six to eight months; 85% of this data is contained in documents</li>
<li>Ninety to 97 percent of enterprises cannot estimate how much they spend on producing documents each year</li>
<li>Document creation is about 2-3 times more important  –  from an embedded cost standpoint  –  than document handling</li>
<li>It costs, on average, $350 to create a ‘typical’ document</li>
<li>The total potential benefit from practical improvements in document access and use to the U.S economy is on the order of $800 billion annually, or about 8% of GDP</li>
<li>For the 1,000 largest U.S. firms, benefits from these improvements can approach nearly $250 million annually per firm</li>
<li>About three-quarters of these benefits arise from <strong><em><span style="text-decoration: underline;">not</span></em></strong> re-creating the intellectual capital already invested in prior document creation</li>
<li>Another 25% of the benefits are due to reduced regulatory non-compliance or paperwork, or better competitiveness in obtaining solicited contracts and grants</li>
<li>$33 billion is wasted each year in re-finding previously found Web documents</li>
<li>Paperwork and regulatory improvements due to documents can save U.S. enterprises $120 billion each year</li>
<li>Lack of document access due to Web sprawl costs U.S. enterprises $22 billion each year</li>
<li>$8 billion in annual benefits is available due to document improvements for competitive governmental grant and contract solicitations</li>
<li>These figures likely severely underestimate the benefits to enterprises from improved competitiveness, a factor not analyzed in this study</li>
<li>Documents are now at the point where structured data was at 15 years ago at the nascent emergence of the data warehousing market.</li>
</ul>
<p style="text-align: left;">As noted throughout, there is a considerable need for additional research and data on document creation, use, costs and benefits. Additional technical endnotes are provided in the PDF version of the full paper.</p>
<p style="text-align: left;">
<hr style="border: 1px solid #cccccc; height: 1px; width: 33%; color: #ffffff;" size="1" noshade="noshade" />
<p style="text-align: left;"><a name="_edn1"></a><span style="font-size: x-small;">[1] All sources and assumptions are fully documented in footnotes in the main body of this white paper; general assumptions used in multiple tables are provided in the Technical Endnotes.</span></p>
<p><span style="font-size: x-small;"><a name="_edn2"></a>[2] As quoted by Armando Garcia, vice president of content management at IBM; see <a href="http://www.contentworld.com/conference/conthur.html">http://www.contentworld.com/conference/conthur.html</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn3"></a> [3] Delphi Group, “Taxonomy &amp; Content Classification Market Milestone Report,” <em>Delphi Group White Paper</em>, 2002. See <a href="http://delphigroup.com/">http://delphigroup.com</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn4"></a> [4] Based on the 1999 to 2001 estimate changes in reference 34, Table 2-6.</span></p>
<p><span style="font-size: x-small;"><a name="_edn5"></a>[5] As initially published in Inc Magazine in 1993. Reference to this document may be found at: <a href="http://www.contingencyplanning.com/PastIssues/marapr2001/6.asp">http://www.contingencyplanning.com/PastIssues/marapr2001/6.asp</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn6"></a>[6] J. Snowdon, <em>Documents </em> – <em> The Lifeblood of Your Business?</em>, October 2003, 12 pp. The white paper may be found at: <a href="http://www.mdy.com/News&amp;Events/Newsletter/IDCDocMgmt.pdf">http://www.mdy.com/News&amp;Events/Newsletter/IDCDocMgmt.pdf</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn7"></a>[7] Xerox Global Services, <em>Documents – An Opportunity for Cost Control and Business Transformation</em>, 28 pp., 2003. The findings may be found at: <a href="http://www.sap.com/solutions/srm/pdf/CCS_Xerox.pdf">http://www.sap.com/solutions/srm/pdf/CCS_Xerox.pdf</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn8"></a>[8] A.T. Kearney, <em>Network Publishing: Creating Value Through Digital Content</em>, A.T. Kearney White Paper, April 2001, 32 pp. See <a href="http://www.adobe.com/aboutadobe/pressroom/pressmaterials/networkpublishing/pdfs/netpubwh.pdf">http://www.adobe.com/aboutadobe/pressroom/pressmaterials/networkpublishing/pdfs/netpubwh.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn9"></a>[9] S.A. Mohrman and D.L. Finegold, <em>Strategies for the Knowledge Economy: From Rhetoric to Reality, 2000,</em><a href="http://www.marshall.usc.edu/ceo/Books/pdf/knowledge_economy.pdf">http://www.marshall.usc.edu/ceo/Books/pdf/knowledge_economy.pdf</a>.</span> University of Southern California study as supported by Korn/Ferry International, January 2000, 43 pp. See</p>
<p><span style="font-size: x-small;"><a name="_edn10"></a>[10] C. Moore, <em>TheContent Integration Imperative</em>, Forrester Research Trends Report, March 26, 2004, 14 pp.</span></p>
<p><span style="font-size: x-small;"><a name="_edn11"></a>[11] D. Vesset, <em>Worldwide Business Intelligence Forecast and Anal ysis, 2003-2007</em>, International Data Corporation, June 2003, 18 pp. See <a href="http://www.dwway.com/file/20030708085453_IDC_WW-BIFORECASTANDANALYSIS2003-07_JUN03.pdf">http://www.dwway.com/file/20030708085453_IDC_WW-BIFORECASTANDANALYSIS2003-07_JUN03.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn12"></a>[12] M. Stonebraker and J. Hellerstein, “Content Integration for E-Business,” in <em>ACM SIGMOD Proceedings</em>, Santa Barbara, CA, pp. 552-560, May 2001.</span></p>
<p><span style="font-size: x-small;"><a name="_edn13"></a>[13] P. Lyman and H. Varian, “How Much Information, 2003,” retrieved from <a href="http://www.sims.berkeley.edu/how-much-info-2003">http://www.sims.berkeley.edu/how-much-info-2003</a> on December 1, 2003.</span></p>
<p><span style="font-size: x-small;"><a name="_edn14"></a>[14] U.S. Department of Commerce, Digital Economy 2003, Economic Statistics Administration, U.S. Dept. of Commerce, Washington, D.C., April 2004, 155 pp. See <a href="http://www.esa.doc.gov/DigitalEconomy2003.cfm">http://www.esa.doc.gov/DigitalEconomy2003.cfm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn15"></a>[15] U.S. Department of Labor, “Occupation Employment and Wages, 2002,” Bureau of Labor Statistics. See <a href="http://www.bls.gov/news.release/archives/ocwage_11192003.pdf">http://www.bls.gov/news.release/archives/ocwage_11192003.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn16"></a>[16] U.S. Census Bureau, “Statistics of U.S. Businesses 2001.” See <a href="http://www.census.gov/epcd/susb/2001/us/US--.htm">http://www.census.gov/epcd/susb/2001/us/US–.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn17"></a>[17] Total office documents counts were obtained on a page basis from reference 13, which used a value of 2% for what documents deserve to be archived. This formed the ‘lo’ case, with the high case using a 5% estimate (lower still than the ENST 10% estimated cited in reference 13). Total pages were converted to numbers of documents on an average 8 pp per document basis; see Technical Endnotes for further discussion.</span></p>
<p><span style="font-size: x-small;"><a name="_edn18"></a>[18] See Technical Endnotes for the derivation of knowledge worker estimates.</span></p>
<p><span style="font-size: x-small;"><a name="_edn19"></a>[19] See Technical Endnotes for the derivation of content worker estimates.</span></p>
<p><span style="font-size: x-small;"><a name="_edn20"></a>[20] Citation sources and assumptions for this analysis are presented in the BrightPlanet white paper, “A Cure to IT Indigestion: Deep Content Federation,” BrightPlanet Corporation White Paper, June 2004, 31 pp.</span></p>
<p><span style="font-size: x-small;"><a name="_edn21"></a>[21] The “bottom up” cases are built from the number of assumed knowledge workers in Table 3. The “low” and “high” variants are based on a 5% archival value or 350 annual documents created per worker, respectively, applied to worker staff costs associated with document creation. The “Coopers &amp; Lybrand” case is a strict updating of that study to 2002. The other two “C&amp;L” cases use the updated per document costs from the C&amp;L study; the first variant uses the annual documents created from the UC Berkeley study without archiving; the second variant uses the average of the “low” and “high” document numbers. See further Technical Endnotes for other key assumptions.</span></p>
<p><span style="font-size: x-small;"><a name="_edn22"></a>[22] The individual values in Table 5 range from about $140 to $740 per document, with the update of the Coopers &amp; Lybrand study being about $270. Separate Delphi analysis by BrightPlanet has shown median values of about $550 per document.</span></p>
<p><span style="font-size: x-small;"><a name="_edn23"></a>[23] See http:// <a href="http://www.eds.com/services_offerings/ibill_openbill_b2b.shtml">www.eds.com/services_offerings/ibill_openbill_b2b.shtml</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn24"></a>[24] See <a href="http://www.hsh.com/cfee-sample.html">http://www.hsh.com/cfee-sample.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn25"></a>[25] See <a href="http://www.atp.nist.gov/eao/applicants/section9.htm">http://www.atp.nist.gov/eao/applicants/section9.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn26"></a>[26] As initially published in Inc Magazine in 1993. Reference to this document may be found at: <a href="http://www.contingencyplanning.com/PastIssues/marapr2001/6.asp">http://www.contingencyplanning.com/PastIssues/marapr2001/6.asp</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn27"></a>[27] Xerox Global Services, Documents – An Opportunity for Cost Control and Business Transformation, 28 pp., 2003. The findings may be found at: <a href="http://www.sap.com/solutions/srm/pdf/CCS_Xerox.pdf">http://www.sap.com/solutions/srm/pdf/CCS_Xerox.pdf</a> and J. Snowdon, Documents  –  The Lifeblood of Your Business?, October 2003, 12 pp. The white paper may be found at: <a href="http://www.mdy.com/News&amp;Events/Newsletter/IDCDocMgmt.pdf">http://www.mdy.com/News&amp;Events/Newsletter/IDCDocMgmt.pdf</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn28"></a> [28] Optika Corporation. See <a href="http://www.optika.com/ROI/calculator/ROI_roiresults.cfm">http://www.optika.com/ROI/calculator/ROI_roiresults.cfm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn29"></a>[29] Cap Ventures information, as cited in ZyLAB Technologies B.V., “Know the Cost of Filing Your Paper Documents,” Zylab White Paper, 2001. See <a href="http://www.zylab.com/downloads/whitepapers/PDF/21%20-%20Know%20the%20cost%20of%20filing%20your%20paper%20documents.pdf">http://www.zylab.com/downloads/whitepapers/PDF/21%20-%20Know%20the%20cost%20of%20filing%20your%20paper%20documents.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn30"></a>[30] ALL Associates Group, Inc., EDAM Sector Summary, April 2003, 2 pp.</span></p>
<p><span style="font-size: x-small;"><a name="_edn31"></a>[31] ALL Associates Group, 2002 EDAM Metrics for Major U.S. Companies.</span></p>
<p><span style="font-size: x-small;"><a name="_edn32"></a>[32] By the second Q 2004, this amount was $11.6 trillion. U.S. Federal Reserve Board, Flow of Funds Accounts for the United States, Sept. 16, 2004. See <a href="http://www.federalreserve.gov/releases/Z1/current/accessible/f6.htm">http://www.federalreserve.gov/releases/Z1/current/accessible/f6.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn33"></a>[33] The bases for this table have the following assumptions: 1) the three cases for document handling are based on 5%, 10% and 15% of total enterprise revenues, per the earlier section; 2) the three cases for document creation are based on the ‘C&amp;L Bottom-Up’, ‘Bottom-up  – High,’ and ‘Coopers &amp; Lybrand’ items for the Low, Medium, and High columns, respectively, in Table 5; and 3) the document misfiling case draws on the same basis but using the total document estimates and misfiled percentages of 5%, 7.5% and 9% consistent with the previous discussion section. See further the Technical Endnotes.</span></p>
<p><span style="font-size: x-small;"><a name="_edn34"></a>[34] P. Lyman and H. Varian, “How Much Information, 2003,” retrieved from <a href="http://www.sims.berkeley.edu/how-much-info-2003">http://www.sims.berkeley.edu/how-much-info-2003</a> on December 1, 2003.</span></p>
<p><span style="font-size: x-small;"><a name="_edn35"></a>[35] Cap Ventures information, as cited in ZyLAB Technologies B.V., “Know the Cost of Filing Your Paper Documents,” Zylab White Paper, 2001. See <a href="http://www.zylab.com/downloads/whitepapers/PDF/21%20-%20Know%20the%20cost%20of%20filing%20your%20paper%20documents.pdf">http://www.zylab.com/downloads/whitepapers/PDF/21%20-%20Know%20the%20cost%20of%20filing%20your%20paper%20documents.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn36"></a>[36] As reported in <a href="http://www.hoovers.com/company/archive/detail/0,2049,7_2322,00.html">http://www.hoovers.com/company/archive/detail/0,2049,7_2322,00.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn37"></a>[37] See <a href="http://www.veronissuhler.com/businfo/segment.html">http://www.veronissuhler.com/businfo/segment.html</a>, August 2, 2000.</span></p>
<p><span style="font-size: x-small;"><a name="_edn38"></a>[38] See <a href="http://www.outsellinc.com/docs/pr_release/pr20000602_01.htm">http://www.outsellinc.com/docs/pr_release/pr20000602_01.htm</a>, June 2, 2000.</span></p>
<p><span style="font-size: x-small;"><a name="_edn39"></a>[39] See <a href="http://www.outsellinc.com/docs/pr_release/pr20000629_01.htm">http://www.outsellinc.com/docs/pr_release/pr20000629_01.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn40"></a>[40] M.K. Bergman, “The Deep Web: Surfacing Hidden Value,” BrightPlanet Corporation White Paper, June 2000. The most recent version of the study was published by the University of Michigan’s Journal of Electronic Publishing in July 2001. See <a href="http://www.press.umich.edu/jep/07-01/bergman.html">http://www.press.umich.edu/jep/07-01/bergman.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn41"></a>[41] This analysis assumes there were 1 million documents on the Web as of mid-1994.</span></p>
<p><span style="font-size: x-small;"><a name="_edn42"></a>[42] See, for example, C. Sherman and G. Price, The Invisible Web, Information Today, Inc., Medford, NJ, 2001, 439 pp., and P. Pedley, The Invisible Web: Searching the Hidden Parts of the Internet, Aslib-IMI, London, 2001, 138pp.</span></p>
<p><span style="font-size: x-small;">[43] iProspect Corporation, iProspect Search Engine User Attitudes, April/May 2004, 28 pp. See <a href="http://www.iprospect.com/premiumPDFs/iProspectSurveyComplete.pdf">http://www.iprospect.com/premiumPDFs/iProspectSurveyComplete.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn45"></a>[44] As reported at <a href="http://www.nua.ie/surveys/index.cgi?f=VS&amp;art_id=905358569&amp;rel=true">http://www.nua.ie/surveys/index.cgi?f=VS&amp;art_id=905358569&amp;rel=true</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn46"></a>[45] Delphi Group, “Taxonomy &amp; Content Classification Market Milestone Report,” Delphi Group White Paper, 2002. See <a href="http://delphigroup.com/">http://delphigroup.com</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn47"></a>[46] C. Sherman and S. Feldman, “The High Cost of Not Finding Information,” International Data Corporation Report #29127, 11 pp., April 2003.</span></p>
<p><span style="font-size: x-small;"><a name="_edn48"></a>[47] M.E.D. Koenig, “Time Saved  –  a Misleading Justification for KM,” KMWorld Magazine, Vol 11, Issue 5, May 2002. See <a href="http://www.kmworld.com/publications/magazine/index.cfm">http://www.kmworld.com/publications/magazine/index.cfm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn49"></a>[48] G. Xu, A. Cockburn and B. McKenzie, Lost on the Web: An Introduction to Web Navigation Research, <a href="http://www.cosc.canterbury.ac.nzq/ACMchapterq/NZCSPGq/papers">http://www.cosc.canterbury.ac.nzq/ACMchapterq/NZCSPGq/papers</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn50"></a>[49] A. Cockburn and B. McKenzie, What Do Web Users Do? An Empirical Analysis of Web Use, 2000. See <a href="http://citeseer.ist.psu.edu/cockburn00what.html">http://citeseer.ist.psu.edu/cockburn00what.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn51"></a>[50] Tenth edition of GVU’s (graphics, visualization and usability} WWW User Survey, May 14, 1999. See <a href="http://www.gvu.gatech.edu/user_surveys/survey-1998-10/tenthreport.html">http://www.gvu.gatech.edu/user_surveys/survey-1998-10/tenthreport.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn52"></a>[51] C. Alvarado, J. Teevan, M. S. Ackerman and D.Karger, “Surviving the Information Explosion: How People Find Their Electronic Information,” AI Memo 2003-06, April 2003, 11 pp.., Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory. See <a href="ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-006.pdf">ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-006.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn53"></a>[52] W. Jones, H. Bruce and S. Dumais, “Keeping Found Things Found on the Web,” See <a href="http://washington.edu/KFTF_Web.pdf">http://washington.edu/KFTF_Web.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn54"></a>[53] J. Teevan, “How People Re-find Information When the Web Changes,” AI Memo 2004-014, June 2004, 10 pp., Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory. See <a href="ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.pdf">ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn55"></a>[54] Library of Congress, “Preserving Our Digital Heritage: Plan for the National Digital Information Infrastructure and Preservation Program”, a Report to Congress by the U.S. Library of Congress, 2002, 66 pp. See <a href="http://www.digitalpreservation.gov/ndiipp/">http://www.digitalpreservation.gov/ndiipp/</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn56"></a>[55] Consistent with Table 8; this analysis also assumes the 25% search time commitment by employee and previous values from earlier tables.</span></p>
<p><span style="font-size: x-small;"><a name="_edn57"></a>[56] All subsequent references to ‘Large’ firms is based on the last column in Table 2, namely the 930 U.S. firms with more than 10,000 employees.</span></p>
<p><span style="font-size: x-small;"><a name="_edn58"></a>[57] Delphi Group, “Taxonomy &amp; Content Classification Market Milestone Report,” Delphi Group White Paper, 2002. See <a href="http://delphigroup.com/">http://delphigroup.com</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn59"></a>[58] S. Stearns, “Realize the Value Locked in Your Content Silos Without Breaking the Bank: Automated Classification Tools to Improve Information Discovery,” Inmagic White Paper, version 1.0, 2004. 10 pp. See <a href="http://www.inmagic.com/">http://www.inmagic.com</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn60"></a>[59] P. Sonderegger, “Weave Search into the Browsing Experience,” ForresterQuick Take, Forrester Research, Inc., Feb. 18, 2004. 2 pp.</span></p>
<p><span style="font-size: x-small;"><a name="_edn61"></a> [60] P. Russom, “An Eye for the Needle,” Intelligent Enterprise, January 14, 2002. See <a href="http://www.iemagazine.com/020114/502feat2_1">http://www.iemagazine.com/020114/502feat2_1</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn62"></a>[61] This average was estimated by interpolating figures shown on Figure 8 in reference 68.</span></p>
<p><span style="font-size: x-small;"><a name="_edn63"></a>[62] This average was estimated by interpolating figures shown on the p.14 figure in Plumtree Corporation, “The Corporate Portal Market in 2002,” Plumtree Corp. White Paper, 27 pp. See <a href="http://www.plumtree.com/pdf/Corporate_Portal_Survey_White_Paper_February2002.pdf">http://www.plumtree.com/pdf/Corporate_Portal_Survey_White_Paper_February2002.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn64"></a>[63] The ‘low’ case represents the archival value in the middle bars with the addition that 30% of internal documents generated in the current year have a value to be shared for one year; the ‘high’ case represents the related archival value in the middle bars but with 40% of documents generated in that year having a value to be shared for one year.</span></p>
<p><span style="font-size: x-small;"><a name="_edn65"></a>[64] Analysis based on reference 68, with interpolations from Figure 16.</span></p>
<p><span style="font-size: x-small;"><a name="_edn66"></a>[65] M. Corcoran, “When Worlds Collide: Who Really Owns the Content,” AIIM Conference, New York, NY, March 10, 2004. See <a href="http://show.aiimexpo.com/convdata/aiim2003/brochures/64CorcoranMary.pdf">http://show.aiimexpo.com/convdata/aiim2003/brochures/64CorcoranMary.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn67"></a>[66] C. Phillips, “Stemming the Software Spending Spree,” Optimize Magazine, April 2002, Issue 6. See <a href="http://www.optimizemag.com/article/showArticle.jhtml?articleId=17700698&amp;pgno=1">http://www.optimizemag.com/article/showArticle.jhtml?articleId=17700698&amp;pgno=1</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn68"></a>[67] C. Moore, “The Content Integration Imperative,” Forrester Research, Inc., March 26, 2004, 14 pp.</span></p>
<p><span style="font-size: x-small;"><a name="_edn69"></a> [68] Plumtree Corporation, “The Corporate Portal Market in 2003,” Plumtree Corp. White Paper, 30 pp. See <a href="http://www.plumtree.com/portalmarket2003/default.asp">http://www.plumtree.com/portalmarket2003/default.asp</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn70"></a> [69] BEA Corporation, “Enterprise Portal Rationalization,” BEA Technical White Paper, 23 pp., 2004. See <a href="http://www.bea.com/content/news_events/white_papers/BEA_epr_wp.pdf">http://www.bea.com/content/news_events/white_papers/BEA_epr_wp.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn71"></a>[70] A. Aneja, C.Rowan and B. Brooksby, “Corporate Portal Framework for Transforming Content Chaos on Intranets,” Intel Technology Journal Q1, 2000. See <a href="http://developer.intel.com/technology/itj/q12000/pdf/portal.pdf">http://developer.intel.com/technology/itj/q12000/pdf/portal.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn72"></a> [71] J. Smeaton, “IBM’s Own Intranet: Saving Big Blue Millions,” Intranet Journal, Sept. 25, 2002. See <a href="http://www.intranetjournal.com/articles/200209/ij_09_25_02a.html">http://www.intranetjournal.com/articles/200209/ij_09_25_02a.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn73"></a> [72] See <a href="http://www.wookieweb.com/Intranet/">http://www.wookieweb.com/Intranet/</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn74"></a> [73] D. Voth, “Why Enterprise Portals are the Next Big Thing,” LTI Magazine, October 1, 2002. See <a href="http://www.ltimagazine.com/ltimagazine/article/articleDetail.jsp?id=36877">http://www.ltimagazine.com/ltimagazine/article/articleDetail.jsp?id=36877</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn75"></a> [74] A. Nyberg, “Is Everybody Happy?” CFO Magazine, November 01, 2002. See <a href="http://www.cfo.com/article/1%2C5309%2C8062%2C00.html">http://www.cfo.com/article/1%2C5309%2C8062%2C00.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn76"></a> [75] See <a href="http://www.proudfoot-plc.com/pdf_20004-USPR1002Avayaweb.asp">http://www.proudfoot-plc.com/pdf_20004-USPR1002Avayaweb.asp</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn77"></a> [76] Wall Street Journal, May 4, 2004, p. B1.</span></p>
<p><span style="font-size: x-small;"><a name="_edn78"></a> [77] pers. comm.., Jonathon Houk, Director of DHS IIAP Program, November 2003.</span></p>
<p><span style="font-size: x-small;"><a name="_edn79"></a>[78] These figures are based on Table 12 and the GDP figures from reference 32. Note, the analysis in this section also ignores business-to-business opportunities, which are also likely significant.</span></p>
<p><span style="font-size: x-small;"><a name="_edn80"></a>[79] Total grant and procurement amounts are derived from the U.S. Census Bureau, Consolidated Federal Funds Report (CFFR). See <a href="http://harvester.census.gov/cffr/asp/Reports.asp">http://harvester.census.gov/cffr/asp/Reports.asp</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn81"></a>[80] The number of awards and an analysis of which line items are competitively awarded was derived from the U.S. Census Bureau, Federal Assistance Award Data System (FAADS). See <a href="http://www.census.gov/govs/faads/021sumus.htm">http://www.census.gov/govs/faads/021sumus.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn82"></a>[81] Specific categories of grants were analyzed based on the U.S. General Services Administration’s Catalog of Federal Domestic Assistance (CFDA) definitions to determine degree of competitiveness; see <a href="http://12.46.245.173/cfda/cfda.html">http://12.46.245.173/cfda/cfda.html</a>. Figures from the U.S. Department of Health and Human Services, Grant.gov Clearinghouse (see <a href="http://www.grants.gov/">http://www.grants.gov/</a>) suggest that $350 billion in federal grants is available, but many of the specific grant opportunities are geared to state governments or individuals. That is why the figures shown indicate only $100 billion in competitive opportunities available directly to enterprises.</span></p>
<p><span style="font-size: x-small;"><a name="_edn83"></a>[82] U.S. General Services Administration, Federal Procurement Data System  –  NG (FY 2003 data); see <a href="http://www.fpdc.gov/fpdc/FPR2003a.pdf">http://www.fpdc.gov/fpdc/FPR2003a.pdf</a> and <a href="http://www.fpdc.gov/fpdc/FPR2003c.pdf">http://www.fpdc.gov/fpdc/FPR2003c.pdf</a>. These sources are also the reference for the number of actions or successful awards. Due to discrepancies, these amounts were adjusted to conform with the totals in reference 79.</span></p>
<p><span style="font-size: x-small;"><a name="_edn84"></a>[83] Average competitive opportunities are derived by dividing the total award amount by category by the number of awards for that category.</span></p>
<p><span style="font-size: x-small;"><a name="_edn85"></a>[84] See <a href="http://www.gcswin.com/opportunities/opp2.htm">http://www.gcswin.com/opportunities/opp2.htm</a>. This is the only summary reference for state and local information found. Splits between grants and contract procurements were adjusted based on the assumption that contract amounts differed at the non-federal level. Thus, while the split for grant-contract procurements in the federal sector is about 58%-42% in the federal sector, it is assumed to be 38%-62% at the state and local level.</span></p>
<p><span style="font-size: x-small;"><a name="_edn86"></a>[85] There may also be some double counting of state amounts due to transfers from the federal government. For example, in 2002, $360,534 million in direct transfers was made to states and localities from the federal government. U.S. Census Bureau, State and Local Government Finances by Level of Government and by State: 2001  – 02. See <a href="http://www.census.gov/govs/estimate/0200ussl_1.html">http://www.census.gov/govs/estimate/0200ussl_1.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn87"></a>[86] This analysis assumes that individual grant and contract awards are 80% of the amount shown at the federal level.</span></p>
<p><span style="font-size: x-small;"><a name="_edn88"></a>[87] To be listed requires a minimum of $10,000 in federal contracts; see <a href="http://clinton2.nara.gov/WH/EOP/OP/html/aa/aa06.html">http://clinton2.nara.gov/WH/EOP/OP/html/aa/aa06.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn89"></a>[88] See <a href="http://www.govexec.com/features/0804-15/0804-15s1s1.htm">http://www.govexec.com/features/0804-15/0804-15s1s1.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn90"></a>[89] This header information is drawn from Table 12.</span></p>
<p><span style="font-size: x-small;"><a name="_edn91"></a>[90] Number of competing firms is increased from the federal contractor baseline by a factor of 1.30 to account for new state and local government contractors.</span></p>
<p><span style="font-size: x-small;"><a name="_edn92"></a>[91] Winning and losing proposal preparation costs are based on the empirical percentages from NIST (see reference 93), namely 0.85% and 0.59%, respectively, as a percent of total award amounts.</span></p>
<p><span style="font-size: x-small;"><a name="_edn93"></a>[92] The ‘Low’ basis for improvements is based on the finding of missing information discussed in a previous section; the ‘High” basis reflects the difference between lowest quartile and highest quartile efforts spent on successful proposal preparation (see reference 93). The ‘Med’ basis is an intermediate value between these two.</span></p>
<p><span style="font-size: x-small;"><a name="_edn94"></a>[93] The increase in winning submissions is calculated based on numbers of winning proposals times the RFP improvement factor. In fact, because all things being equal the pool of contract dollars does not change, this amount merely represents a shift of winning awards from existing winners to new winners. In other words, total contracts amounts are a zero-sum game with proposal improvements by previous losers taken from the pool of previous winners.</span></p>
<p><span style="font-size: x-small;"><a name="_edn95"></a>[94] The analysis in Figure 2 indicates there is a power curve distribution of awards. The number of new winning proposals was applied to this curve to estimate the actual number of new firms winning awards; see Figure 2 for the power-curve fitting equation.</span></p>
<p><span style="font-size: x-small;"><a name="_edn96"></a>[95] Of course, better probabilities of winning competitive solicitations are a zero-sum game. New winners displace old winners. The real advantage in this arena is to individual firms that better succeed at securing the existing pool of competitive funds. The benefits to individual companies can be the difference between profitability, indeed survival.</span></p>
<p><span style="font-size: x-small;"><a name="_edn97"></a>[96] NFIB, Coping with Regulation, NFIB National Small Business Poll, Vol. 1, Issue 5. See <a href="http://www.nfib.com/object/3105105.html">http://www.nfib.com/object/3105105.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn98"></a>[97] NFIB, Paperwork and Record-keeping, NFIB National Small Business Poll, Vol. 3, Issue 5. See <a href="http://www.nfib.com/object/4131277.html">http://www.nfib.com/object/4131277.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn99"></a>[98] W. M. Crain &amp; T. D. Hopkins, “The Impact of Regulatory Costs on Small Firms”, Report to the Small Business Administration, RFP No. SBAHQ-00-R-0027 (2001). The report’s 2000 year basis was updated to 2002 based on a 4% annual inflation factor.</span></p>
<p><span style="font-size: x-small;"><a name="_edn100"></a>[99] U.S. General Accounting Office, Paperwork Reduction Act: Record Increase in Agencies’ Burden Estimates, testimony of V. S. Rezendes, before the Subcommittee on Energy, Policy, Natural Resources and Regulatory Affairs, Committee on Government Reform, House of Representatives, April 11, 2003. See <a href="http://www.reform.house.gov/UploadedFiles/Testimony_GAO_Revised.pdf">http://www.reform.house.gov/UploadedFiles/Testimony_GAO_Revised.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn101"></a>[100] Office of Management and Budget, Managing Information Collection and Dissemination, Fiscal Year 2003, 198 pp. (Table A1). See <a href="http://www.whitehouse.gov/omb/inforeg/2003_info_coll_dism.pdf">http://www.whitehouse.gov/omb/inforeg/2003_info_coll_dism.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn102"></a>[101] NFIB, Paperwork and Record-keeping, NFIB National Small Business Poll, Vol. 3, Issue 5. See <a href="http://www.nfib.com/object/4131277.html">http://www.nfib.com/object/4131277.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn103"></a>[102]U.S. Small Business Administration, Final Report of the Small Business Paperwork Relief Task Force, June 27, 2003, 64 pp. See <a href="http://www.sbaonline.sba.gov/advo/laws/final_paperwork03.pdf">http://www.sbaonline.sba.gov/advo/laws/final_paperwork03.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn104"></a>[103] IRS, Civil Penalties Assessed and Abated, by Type of Penalty and Type of Tax (Table 26), September 20, 2002. See <a href="http://www.irs.gov/pub/irs-soi/02db26cp.xls">http://www.irs.gov/pub/irs-soi/02db26cp.xls</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn105"></a>[104] Except as footnoted, the figures below are drawn from the OMB Public Budget Tables. Civil penalties for crime victims have been excluded from these figures. See <a href="http://www.whitehouse.gov/omb/budget/fy2005/db.html">http://www.whitehouse.gov/omb/budget/fy2005/db.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn106"></a>[105] Obtained orders in SEC judicial and administrative proceedings requiring securities law violators to disgorge illegal profits of approximately $1.293 billion. Civil penalties ordered in SEC proceedings totaled approximately $101 million. See SEC <a href="http://www.sec.gov/pdf/annrep02/ar02enforce.pdf">http://www.sec.gov/pdf/annrep02/ar02enforce.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn107"></a>[106] T. L. Sansonetti, U.S. Department of Justice, testimony before the House Committee on the Judiciary, Subcommittee on Commercial and Administrative Law, March 9, 2004. See <a href="http://www.house.gov/judiciary/sansonetti030904.htm">http://www.house.gov/judiciary/sansonetti030904.htm</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn108"></a>[107]Argy, Wiltse &amp; Robinson, Business Insights, Summer 2003, 4 pp. See <a href="http://www.awr.com/news_let/Argy%20Summer%202003.pdf">http://www.awr.com/news_let/Argy%20Summer%202003.pdf</a></span></p>
<p><span style="font-size: x-small;"><a name="_edn109"></a>[108] Project on Government Oversight, Federal Contractor Misconduct: Failures of the Suspension and Debarment System, revised May 10, 2002. See <a href="http://www.pogo.org/p/contracts/co-020505-contractors.html">http://www.pogo.org/p/contracts/co-020505-contractors.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn110"></a>[109]Corporate Crime Reporter, Top 100 False Claims Act Settlements, December 30, 2003, 64 pp. See <a href="http://www.corporatecrimereporter.com/fraudrep.pdf">http://www.corporatecrimereporter.com/fraudrep.pdf</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn111"></a>[110] According to Alchemia Corporation testimony citing a Price Waterhouse Coopers study, FDA Hearing, Jan. 17, 2002. See http://www.fda.gov/ohrms/dockets/dockets/ 00d1538/00d-1538_mm00023_01_vol7.doc.</span></p>
<p><span style="font-size: x-small;"><a name="_edn112"></a>[111] For example, see <a href="http://www.medschool.ucsf.edu/curriculum/clinical/guide/section2/confidentiality.asp">http://www.medschool.ucsf.edu/curriculum/clinical/guide/section2/confidentiality.asp</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn113"></a>[112] From Table 17.</span></p>
<p><span style="font-size: x-small;"><a name="_edn114"></a>[113] From Table 16 after adjusting by total number of employees for all firms as shown on Table 2, and removal of total burdens as shown in Table 17.</span></p>
<p><span style="font-size: x-small;"><a name="_edn115"></a>[114] From Table 18.</span></p>
<p><span style="font-size: x-small;"><a name="_edn116"></a>[115] All ‘State and Local’ items are based on the ratio of state and local budgets in relation to the federal budget, excluding direct federal transfers, and applied to those factors for the federal sector. This ratio is 0.563. See <a href="http://www.gpoaccess.gov/usbudget/fy01/guide01.html">http://www.gpoaccess.gov/usbudget/fy01/guide01.html</a>.</span></p>
<p><span style="font-size: x-small;"><a name="_edn117"></a>[116] All ‘Large Firm’ estimates are based on the ratio of large firm documents to total firm documents; see Table 2.</span></p>
<p><span style="font-size: x-small;"><a name="_edn118"></a>[117] For example, see <a href="http://www.nfr.com/why/mandates.php#gramm">http://www.nfr.com/why/mandates.php#gramm</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/871/brown-bag-lunch-untapped-assets-the-3-trillion-value-of-us-documents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Citizen DAN, Prise Deux</title>
		<link>http://www.mkbergman.com/869/citizen-dan-prise-deux/</link>
		<comments>http://www.mkbergman.com/869/citizen-dan-prise-deux/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 04:36:03 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Citizen DAN]]></category>
		<category><![CDATA[citizen journalism]]></category>
		<category><![CDATA[community indicators]]></category>
		<category><![CDATA[data appliance]]></category>
		<category><![CDATA[gov 2.0]]></category>
		<category><![CDATA[knc]]></category>
		<category><![CDATA[Knight News Challenge]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[open data]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=869</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Citizen DAN, Prise Deux&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Software Development&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/869/citizen-dan-prise-deux/&amp;rft.language=English"></span>
Huzzah! for Local Government Open Data, Transparency, Community Indicators         and Citizen Journalism
While the Knight News         Challenge is still working its way through the screening details,         Structured Dynamics&#8216;     [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Citizen DAN, Prise Deux&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Software Development&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/869/citizen-dan-prise-deux/&amp;rft.language=English"></span>
<h2><img style="float: left; margin-right: 10px; width: 260px; height: 195px;" title="Citizen DAN Logo" src="../wp-content/themes/ai3/images/2009Posts/091214_citizen_dan_logo.png" alt="Citizen DAN Logo" align="left" />Huzzah! for Local Government Open Data, Transparency, Community Indicators         and Citizen Journalism</h2>
<p>While the <a href="http://www.newschallenge.org/">Knight News         Challenge</a> is still working its way through the screening details,         <a href="http://structureddynamics.com/">Structured Dynamics</a>&#8216;    <strong> Citizen DAN</strong> proposal remains in the hunt. Listen to this:</p>
<p>To date, we have         been the  most viewed proposal by far (<span class="double_u">2x</span> more         than the second most viewed!!! <span style="font-style: italic;">Hooray!</span>) and are in the top five of          highest rated (have also been at #1 or #2, depending. <span style="font-style: italic;">Hooray!</span>).         Thanks to all of you for your interest and support.</p>
<p>There is much to recommend this <a href="http://www.newschallenge.org/">KNC</a> approach, not the least of         which being able to attract some 2,500 proposals seeking a piece of the         2010 $5 million potential grant awards. Our proposal extends SD’s         basic <a href="http://openstructs.org/structwsf">structWSF</a> and         <a href="http://constructscs.com/">conStruct</a> Drupal frameworks to         provide a <em><strong>d</strong></em>ata         <em><strong>a</strong></em>ppliance and         <em><strong>n</strong></em>etwork (DAN) to support citizen journalists         with data and analysis at the local, community level.</p>
<p>None of our rankings, of course, guarantees anything. But, we also feel         good about how the market is looking at these frameworks. We have         recently been awarded some pretty exciting and related contracts. Any         and all of these initiatives will continue to contribute to the <a href="http://citizen-dan.org/details.html">open         source Citizen DAN vision</a>.</p>
<p>And, what might that vision be? Well, after some weeks away from it, I         read again our  online submission to the Knight News Challenge. I have to say: It         ain&#8217;t too bad! (Plus many supporting goodies and <a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=1d00faaf-f1ff-40d8-b88d-8eeced420e36">details</a>.)</p>
<p>So, I repeat in its entirety below, the KNC questions and our formal         responses. This information from our original submittal is         unchanged, except to add some live links where they could not be         submitted as such before. (BTW, the <big><span style="font-weight: bold;">bold headers</span></big> are the KNC questions.) Eventual winners are slated to be announced around mid-June. We&#8217;re keeping our fingers crossed, but we are pursuing this initiative in any case.</p>
<hr style="width: 20%; height: 1px; text-align: center;" />
<h3>Describe your project:</h3>
<p>Citizen DAN is an open source framework to leverage relevant local data         for citizen journalists. It is a:</p>
<ul>
<li>Appliance for filtering and analyzing data specific to local         community indicators</li>
<li>Means to visualize local data over time or by neighborhood</li>
<li>Meeting place for the public to upload and share local data and         information</li>
<li>Web data portal that can be individually tailored by any local         community</li>
<li>Node in a global network of communities across which to compare         indicators of community well-being.</li>
</ul>
<p>Good decisions and good journalism require good information. Starting         with pre-loaded government data, Citizen DAN provides any citizen the         framework to learn and compare local statistics and data with other         similar communities. This helps to promote the grist for citizen         journalism; it is also a vehicle for discovery and learning across the         community.</p>
<p>Citizen DAN comes pre-packaged with all necessary deployment components         and documentation, including local data from government sources. It         includes facilities for direct upload of additional local data in         formats from spreadsheets to standard databases. Many standard         converters are included with the basic package.</p>
<p>Citizen DAN may be implemented by local governments or by community         advocacy groups. When deployed, using its clear documentation, sponsors         may choose whether or what portions of local data are exposed to the         broader Citizen DAN network. Data exposed on the network is         automatically available to any other network community for comparison         and analysis purposes.</p>
<p>This data appliance and network (DAN) is multi-lingual. It will be         tested in three cities in Canada and the US, showing its multi-lingual         capabilities in English, Spanish and French.</p>
<h3>How will your project improve the way news and information are         delivered to geographic communities?</h3>
<p>With Citizen DAN, anyone with Web access can now get, slice, and dice         information about how their community is doing and how it compares to         other communities. We have learned from Web 2.0 and user-generated         content that once exposed, useful information can be taken and analyzed         in valuable and unanticipated ways.</p>
<p>The trick is to get information that already exists. Citizen         journalists of the past may not have either known:</p>
<ol>
<li>Where to find relevant information, or</li>
<li>How to ‘slice-and-dice’ that information to extract         meaningful nuggets.</li>
</ol>
<p>By removing these hurdles, Citizen DAN improves the ways information is         delivered to communities and provides the framework for sifting through         it to extract meaning.</p>
<h3>How is your idea innovative? (new or different from what already         exists)</h3>
<p>Government public data in electronic tabular form or as published         listings or tables in local newspapers has been available for some         time. While meeting strict ‘disclosure’ requirements, this         information has neither been readily analyzable nor actionable.</p>
<p>The meaning of information lies in its interpretation and analysis.</p>
<p>Citizen DAN is innovative because it:</p>
<ol>
<li>Is a platform for accessing and exposing available community data</li>
<li>Provides powerful Web-based tools for drilling down and mining data</li>
<li> <em>Changes the game</em> via public-provided data, and</li>
<li>Packages Citizen DAN in a Web framework that is available to any         local citizen and requires no expertise other than clicking links.</li>
</ol>
<h3>What experience do you or your organization have to successfully         develop this project?</h3>
<p>Structured Dynamics has already developed and released as open-source         code <a href="http://openstructs.org/">structWSF</a> and <a href="http://constructscs.com/">conStruct</a> , the basic foundations to         this proposal. structWSF provides the network and dataset         “backbone” to this proposal; conStruct provides the Drupal         portal and Web site framework.</p>
<p>To this foundation we add proven experience and knowledge of datasets         and how to access them, as well as tools and converters for how to         stage them for standard public use. A key expertise of Structured         Dynamics is the conversion of virtually any legacy data format into         interoperable canonical forms.</p>
<p>These are important challenges, which require experience in the         semantics of data and mapping from varied forms into useful and common         frameworks. Structured Dynamics has codified its expertise in these         areas into the software underlying Citizen DAN.</p>
<p>Structured Dynamics’ principals are also multi-lingual, with         language-neutral architectures and code. The company’s principals         are also some of the most prominent bloggers and writers in the         semantic Web. We are acknowledged as attentive to documentation and         communication.</p>
<p>Finally, Structured Dynamics’ principals have more than a decade         of track record in successful data access and mining, and software and         venture development.</p>
<p>To this strong basis, we have preliminary city commitments for         deploying this project in the United States (English and Spanish) and         Canada (French and English).</p>
<h3>What unmet need does your proposal answer?</h3>
<p><a href="http://www.thisweknow.org/">ThisWeKnow</a> offers local Census         data, but no community or publishing aspects. Data sharing is in         <a href="http://www.datasf.org/">DataSF</a> and <a href="http://www.nyc.gov/html/datamine/html/home/home.shtml">DataMine</a> (NYC), but they lack collaboration, community networks and comparisons,         or powerful data visualization or mapping.</p>
<p>Citizen DAN is a turnkey platform for any size community to create,         publish, search, browse, slice-and-dice, visualize or compare         indicators of community well-being. Its use makes the Web more locally         focused. With it, researchers, watchdog groups, reporters, local         officials and interested citizens can now discover hard data for &#8216;new         news&#8217; or fact-check mainstream media.</p>
<h3>What tasks/benchmarks need to be accomplished to develop your project         and by when will you complete them?</h3>
<p>There are two releases with feedback. Each task summary, listing of         task hours (hr) and duration in months (mo), in rough sequence order         with overlaps, is:</p>
<ol>
<li>Dataset Prep/Staging: identify, load and stage baseline datasets;         provide means for aggregating data at different levels; 420 hr; 2.5 mo</li>
<li>Refine Data Input Facility: feature to upload other external data,         incl direct from local sources; XML, spreadsheet, JSON forms; dataset         metadata; 280 hr; 3 mo</li>
<li>Add Data Visualization Component: Flex mapping/data visualization         (charts, graphs) using any slice-and-dice; 390 hr; 3 mo</li>
<li>Make Multi-linguality Changes: English, French, Spanish versions;         220 hr; 2 mo</li>
<li>Refine User Interface: update existing interface in faceted browse;         filter; search; record create, manage and update; imports; exports; and         user access rights; 380 hr; 3 mo</li>
<li>Standard Citizen DAN Ontologies: the coherent schema for the data;         140 hr; 3 mo</li>
<li>Create Central Portal: distribution and promotion site for project;         120 hr; 2 mo</li>
<li>Deploy/Test First Release: release by end of Mo 5 @ 3 test sites;         300 hr; 4 mo</li>
<li>Revise Based on Feedback: bug fixing and 4 mo testing/feedback,         then revision #2; 420 hr</li>
<li>Package/Document: component packaging for easier installs;         increased documentation; 310 hr; 2 mo</li>
<li>Marketing/Awareness: see next question; 240 hr; 12 mo</li>
<li>Project Management: standard PM/interact with test communities,         partners; 220 hr; 12 mo.</li>
</ol>
<p>See attached task details.</p>
<h3>What will you have changed by the end of your project?</h3>
<div style="text-align: center;">
<pre>"Information is the currency of democracy." <em>Thomas Jefferson</em> (n.b.)</pre>
</div>
<p>We intuitively understand that an informed citizenry is a healthy         polity. At the global level and in 250 languages, we see how Wikipedia,         matched with the Internet and inexpensive laptops, is bringing         unforeseen information and enrichment to all. Across the board, we are         seeing the democratization of information.</p>
<p>But very little of this revolution has percolated to the local level.</p>
<p>Only in the past decade or so have we seen free, electronic access to         national Census data. We still see local data only published in print         or not available at all, limiting both awareness but more importantly         understanding and analysis. Data locked up in municipal computers or         available but not expressed via crowdsourcing is as good as         non-existent.</p>
<p>Though many citizens at the local level are not numeric, intuition has         to tell us that the absense of empirical, local data hurts our ability         to understand, reason and debate our local circumstances. Are we doing         better or worse than yesterday? Than in comparison with our peers?         Under what measures does this have meaning about community well being?</p>
<p>The purpose of the Citizen DAN project is to create an appliance &#8212; in         the same sense of refrigerators keeping our food from spoiling &#8212; by         which any citizen can crack open and expose relevant data at the local         level. Citizen DAN is about enrichening our local information and         keeping our communities healthy.</p>
<h3>How will you measure progress and ultimately success?</h3>
<p>We will measure the progress of the project by the number of         communities and local organizations that use the Citizen DAN platform         to create and publish community data. Subsidiary measures include the         number of:</p>
<ul>
<li>Individual users across all installations</li>
<li>Users contributing uploaded datasets</li>
<li>Contributed datasets</li>
<li>Contributed applications based on the platform</li>
<li>Interconnected sites in the network</li>
<li>Different Citizen DAN networks</li>
<li>Substantive articles and blog posts on Citizen DAN</li>
<li>Mentions of &#8216;Citizen DAN&#8217; (and local naming or variants, which will         be tracked) in news articles</li>
<li>Contributed blog posts on the central Citizen DAN portal</li>
<li>Software package downloads, and</li>
<li>Google citations and hits on &#8216;Citizen DAN&#8217; (and prominent         variants).</li>
</ul>
<p>These measures, plus active sites with profiles of each, will be         monitored and tracked on the central Citizen DAN portal.</p>
<p>&#8216;Ultimate success&#8217; is related to the general growth in transparent         government at the local level. Growth in Citizen DAN-related measures         on a year-over-year basis or in relation to Gov2.0 would indicate         success.</p>
<h3>Do you see any risk in the development of your project?</h3>
<p>There is no technical risk to this proposal, but there are risks in         scope, awareness and acceptance. Our system has been operational for         one year for relevant use cases; all components have been integrated,         debugged, and put into production.</p>
<p>Scope risks relate to how much data the Citizen DAN platform is loaded         with, and how much functionality is included. We balance the data         question by using common public datasets for baseline data, then add         features for localities to &#8220;crowdsource&#8221; their own supplementary data.         We balance the functionality question by limiting new development to         data visualization/mapping and to upload functions (per above), and         then to refine what already exists.</p>
<p>Awareness risks arise from a crowded attention space. We can overcome         this in two ways. The first is to satisfy users at our test sites. That         will result in good recommendations to help seed a snowball effect. The         second way is to use social media and our existing Web outlets         aggressively. We have been building awareness for our own properties in         steady, inch-by-inch measures. While a notable few Web efforts may go         viral, the process is not predictable. Steady, constant focus is our         preferred recipe.</p>
<p>Acceptance risk is intimately linked with awareness and use. If we can         satisfy each Citizen DAN community, then new datasets, new         functionality and new awareness will naturally arise. More users and         more contributions through the network effect are the best way to broad         acceptance.</p>
<h3>What is your marketing plan? How will people learn about what you are         doing?</h3>
<p>Marketing and awareness efforts will include our use of social media,         dedicated Web sites, support from test communities, and outreach to         relevant community Web sites.</p>
<p>Our own blogs are popular in the semantic Web and structured data space         (~3K uniques daily); we have published two posts on Citizen DAN and         will continue to do so with more frequency once the effort gets         underway.</p>
<p>We will create a central portal (<a href="http://citizen-dan.org/">http://citizen-dan.org</a>) based on the         project software (akin to our other project sites). The model for this         apps and deployments clearinghouse is CrimeReports.com. Using social         aspects and crowdsourcing, the site will encourage sharing and best         practices amongst the growing number of Citizen DAN communities.</p>
<p>We will blog and post announcements for key releases and milestones on         relevant external Web sites including various <a href="http://en.wikipedia.org/wiki/E-Government">Gov 2.0</a> sites, <a href="http://www.communityindicators.net/">Community Indicators         Consortium</a>, <a href="http://www.govloop.com/">GovLoop</a>, <a href="http://www.newschallenge.org/">Knight News Challenge</a>, the <a href="http://www.sunlightfoundation.com/">Sunlight Foundation</a>, and so         forth. In addition, we will collate and track individual community         efforts (maintained on the central Citizen DAN site) and make specific         outreach to community data sites (such as <a href="http://www.datasf.org/">DataSF</a> or <a href="http://www.nyc.gov/html/datamine/html/home/home.shtml">DataMine</a> at         NYC.gov). We will use Twitter (#CitizenDAN, etc) and the social         networks of LinkedIn, Facebook, and Meetup to promote Citizen DAN         activity.</p>
<p>We will interact with advocates of citizen journalism, and engage civic         organizations, media, and government officials (esp in our three test         communities) to refine our marketing plan.</p>
<h3>Is this a one-time experiment or do you think it will continue after         the grant?</h3>
<p>Citizen DAN is not an experiment. It is a working framework that gives         any locality and its citizenry the means to assemble, share and compare         measures of its community well-being with other communities. These         indicators, in turn, provide substance and grist for greater advocacy         and writing and blogging (&#8221;journalism&#8221;) at the local level.</p>
<p>Granted, there are unknowns: How many localities will adopt the Citizen         DAN appliance? How essential will its data be to local advocacy and         news? How active will each Citizen DAN installation be in attracting         contributions and local data?</p>
<p>We submit the better way to frame the question is the degree of         adoption, as opposed to will it work.</p>
<p>Web-based changes in our society and social interaction are leading to         the democratization of information, access to it, and channels for         expression. Whether ultimately successful in the specific form proposed         herein, Citizen DAN and its open source software and frameworks will         surely be adopted in one form or another &#8212; to one degree or another &#8212;         in the unassailable trend toward local government transparency and         citizen involvement.</p>
<p>In short, Yes: We believe Citizen DAN will continue long after the         grant.</p>
<h3>If it is to be self-sustainable, what is the plan for making that         happen?</h3>
<p>Our plan begins with the nature of Citizen DAN as software and         framework. Sustainability is a question of whether the appliance itself         is useful, and how users choose to leverage it.</p>
<p>Mediawiki, the software behind Wikipedia, is an analog. Mediawiki is an         enabling infrastructure. Some sites using it are not successful; others         wildly so. Success has required the combination of a good appliance         with topicality and good management. The same is true for Citizen DAN.</p>
<p>Our plan thus begins with Citizen DAN as a useful appliance, as free         open source with great documentation and prominent initial use cases.         Our plan continues with our commitment to the local citizen         marketplace.</p>
<p>We are developing Citizen DAN because of current trends. We foresee         many hundreds of communities adopting the system. Most will be able to         do so on their own. Some others may require modifications or         assistance. Our self-interest is to ensure a high level of adoption.</p>
<p>An era of citizen engagement is unfolding at the local level, fueled by         Web technologies and growing comfort with crowdsourcing and social         networks. Meanwhile, local government constraints and pressures for         transparency are unleashing locked-up data. These forces will create         new opportunities for data literacy by the public, that will itself         bring new understanding and improvements in governance and budgeting.         We plan on Citizen DAN and its offspring to be one of the catalysts for         those changes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/869/citizen-dan-prise-deux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two Contrasting Styles for the Semantic Enterprise</title>
		<link>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 15:36:49 +0000</pubDate>
		<dc:creator>Mike Bergman</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Semantic Enterprise]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=866</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
Our Own Approach is Adaptive and Incremental
It is gratifying to see the emergence of the term semantic enterprise, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
<h2><img style="border: 0px solid; width: 225px; height: 225px; float: left; margin-right: 10px;" title="Two Faces in Circle, from http://energeticrelations.com/" src="../wp-content/themes/ai3/images/2010Posts/100214_two_faces_in_circle.jpg" alt="Two Faces in Circle, from http://energeticrelations.com/" />Our Own Approach is Adaptive and Incremental</h2>
<p>It is gratifying to see the emergence of the term <span style="font-style: italic;">semantic enterprise</span>, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single (nor best, depending on         circumstance) way to approach becoming a semantic enterprise.</p>
<p>In this piece I contrast two styles. The more traditional and familiar         one is comprehensive, complete and &#8220;engineered&#8221; in its approach. The         second, and emerging style, is more adaptive and incremental. While         <a href="http://structureddynamics.com/">Structured Dynamics</a> is a         proponent and thought leader for the adaptive style, the use and         applicability of either approach is really a function of objectives and         circumstances. The choice of approach depends on use case, and should not be a dogmatic one.</p>
<p>Any time a contrast is posed, one should be on guard about         setting up a rhetorical strawman. There may perhaps be a bit of this         flavor in this article; if so, it is unintended. It is probably best to         realize that there is a gradient &#8212; or spectrum &#8212; of possible         approaches between these contrasting styles. The real message is to         understand these differences such that you can comfortably place your         own organization at the right points along this spectrum.</p>
<h3>A Spectrum of Advantages and Differences</h3>
<p>The general idea of semantics in the enterprise preceeds the use of the         term, having been somewhat captured before by the ideas of <a href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise         application integration</a>, <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration">enterprise         information integration</a> and other concepts even related to <a href="http://en.wikipedia.org/wiki/Federated_database_system">data         federation</a> and <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a> stretching back to the 1980s. However, as a specific label, we can look         back to the first mentions in the late 1990s and more concerted         attention beginning from about 2002 or so onward <a href="#styles1">[1]</a>. As another         indicator, since 2005 the Semantic Technology Conference has given         specific prominence to the enterprise <a href="#styles2">[2]</a>.</p>
<p>Throughout this period, the sense from academic papers, many vendors,         and most pundits <a href="#styles3">[3]</a> has been on things like automated reasoning,         machine-aided decision making, aspects of artificial intelligence, and         so forth. The general tone is often framed as &#8220;revolution&#8221; or &#8220;massive         changes&#8221; or something &#8220;entirely new.&#8221; If you are a consultant or         software/implementation vendor &#8212; especially where VC money is backing         the venture with hopes for big returns and home runs &#8212; it may make         cynical sense to sell such large and costly change.</p>
<p>I believe there are circumstances where the <span style="font-style: italic;">Semantic Enterprise</span> writ this large may         make sense and be financially justified. But, this kind of &#8220;big change&#8221;         view has also seen relatively few visible (or successful) deployments.         It has colored what it means to be a semantic enterprise. And, I         believe, it has weakened market credibility by perhaps overpromising         and underdelivering. The conventional view of what it is         be a semantic enterprise deserves to be balanced.</p>
<p>So, as we balance this understanding of the semantic enterprise to one         that is more nuanced, we can contrast the characteristics of the two         apposite styles as follows:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style</td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style</td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>A focus on a more complete, comprehensive coverage of the                 semantics in the domain</li>
<li>More enterprise-wide, less partial or departmental</li>
<li>Greater emphasis on &#8220;<a href="http://en.wikipedia.org/wiki/Closed_world_assumption">closed                 world</a>&#8221; approaches <a href="#styles4">[4]</a>; more akin to relational database                 architecting and schema</li>
<li>Expansion is possible, but effort may be somewhat complex</li>
<li>A general implication is to replace or supplant existing                 information structures with semantic ones</li>
<li>Not necessarily based on semantic Web standards and                 languages <a href="#styles5">[5]</a> (<span style="font-style: italic;">e.g.</span>,                 may include <a href="http://en.wikipedia.org/wiki/Common_logic">Common Logic</a>,                 <a href="http://en.wikipedia.org/wiki/Frame_%28artificial_intelligence%29"> frame logics</a>, etc.)</li>
<li>Richer set of predicates (relations)</li>
<li>Though a distinction is maintained between                 schema and instances, their separation may not be consistently                 (physically) enforced</li>
<li>Often more complicated inferencing and logic tests</li>
<li>More complete enumeration and characterization of items</li>
<li>Much process around semantics agreement across groups</li>
<li>Fairly well-developed implementation tools, including for                 ontology engineering</li>
<li>Implementation times in months to years</li>
<li>Implementation costs akin to traditional large-scale IT                 projects</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>An emphasis on a simpler, incremental, &#8220;learn as you go&#8221;                 approach</li>
<li>Start with single departments or limited vertical apps</li>
<li>Embedded in the &#8220;<a href="http://en.wikipedia.org/wiki/Open_world_assumption">open                 world</a>&#8221; approach <a href="#styles4">[4]</a>, with incorporation of external                 information</li>
<li>Design and approach inherently allows incremental expansion                 and adaptation</li>
<li>A key premise is to build from and leverage existing                 information structures, vocabularies and assets</li>
<li>Fully based on semantic Web standards and languages <a href="#styles5">[5]</a>,                 often including linked data <a href="#styles6">[6]</a></li>
<li>Tends to start simply with hierarchical or related concepts                 (<span style="font-style: italic;">e.g.</span>, SKOS)</li>
<li>Conscious distinction in the structure for                 handling schema separate from instances <a href="#styles7">[7]</a></li>
<li>Inferencing logic based more on concept matching, or                 parent-child or part-of relationships</li>
<li>Degree of item characterization based on current scope</li>
<li>Initial semantic matching can be driven from existing                 assets</li>
<li>Fairly well-developed implementation tools, <span style="font-style: italic; text-decoration: underline;">except</span> for how to engage publics in the development process</li>
<li>Implementation times in weeks to months</li>
<li>Implementation costs driven by available budgets (and thus                 scope)</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Note we have labeled the conventional approach as the &#8220;comprehensive,         engineering&#8221; style; its contrast, and the one we position more closely to, is the         &#8220;adaptive, incremental&#8221; style.</p>
<p style="margin-left: 30px; margin-right: 30px;">[Others have posited contrasting styles, most often as "top down"         <span style="font-style: italic;">v.</span> "bottom up." However, in         one interpretation of that distinction, "top down" means a layer on top         of the existing Web <a href="#styles8">[8]</a>. On the other hand, &#8220;top down&#8221; is more often         understood in the sense of a &#8220;comprehensive, engineered&#8221; view,         consistent with my own understanding <a href="#styles9">[9]</a>. Yet no matter which  		characterization, neither captures what I feel to be the more         important considerations of mindset, logic and premise.]</p>
<p>Though the table above contrasts many points, I think there are two         main distinctions to the adaptive approach. First, it firmly embraces         the open world assumption. OWA is key to an incremental, &#8220;learn as you         go&#8221; deployment that is also well suited to incorporation of external         information. The second main distinction is to leverage and build from         existing assets.</p>
<h3>A Spectrum of Applications</h3>
<p>Yet as noted in the opening, which of these approaches makes better         sense depends on circumstance. One aspect of circumstance is available         budget and deployment times for pilots or proofs-of-concept. Another         aspect, of course, is the planned use or application         for the deployment.</p>
<p>These are by no means hard distinctions, but in general we can see         these contrasting approaches applying to the following uses:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more CWA driven)</span></td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more OWA driven)</span></td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>Bounded, &#8220;inward&#8221; applications (high degree of control and                 completeness)</li>
<li>Engineering enterprises</li>
<li>Technical domains and organizations</li>
<li>Aeronautics</li>
<li>Pharmaceuticals</li>
<li>Chemicals</li>
<li>Petroleum</li>
<li>Energy</li>
<li>A/E firms (construction)</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>External facing applications, organizations (customers,                 incorporation of external data)</li>
<li>Faceted Search</li>
<li>Taxonomy updates</li>
<li>Multi-domain master data management (MDM)</li>
<li>Simple (initially) inferencing</li>
<li>Consumer products</li>
<li>Finance</li>
<li>Health care</li>
<li>Knowledge enterprises</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>A critical distinction is the nature of the enterprise itself.         &#8220;External-facing&#8221; enterprises or functions that want or need to         incorporate much external information (say, marketing or competitive intelligence) are advised to look closely at         the adaptive approach. Organizations that have more complete control         over their circumstances should perhaps focus on the conventional         approach.</p>
<h3>Adoption Thresholds and Risks</h3>
<p>In previous writings I have pointed to the manifest benefits that can         accrue to the semantic enterprise [see, esp. <a href="#styles10">10</a>]. But we also have         witnessed nearly a decade of promotion for semantics in the enterprise,         with perhaps a lack of progress in some areas or unmet promises in         others. These raise questions and skepticism of the real eventual costs         and benefits.</p>
<p>I believe some of this skepticism is inherent with anything new &#8212; the         general IT fatigue from what the current &#8220;next great thing&#8221; might be.         But I also believe that some of this skepticism results from an         approach to semantics in the enterprise that is both lengthy to deploy         and high cost.</p>
<p>The key advantage of the adaptive, incremental approach is that the         whole IT game in the enterprise can change. An open world approach         enables adoption as it proves itself and as budgets allow. Commitments         made under this approach have, in essence, permanent value. Past fears         and concerns about making &#8220;wrong&#8221; bets no longer apply. With learning,         targets can be re-adjusted, structure re-defined and applications         re-focused, all as new discoveries and broadening scope dictate.</p>
<p>This does not make the adaptive approach better than the conventional         one. But, it does make it less risky and, well, more <span style="font-style: italic;">adaptive</span>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles1"></a>[1] For example, the earliest Google mentions on &#8220;semantic enterprise&#8221;         date to about 1998 or 1999. In 2002, the University of Georgia and Amit         Sheth offered the first known academic course on the Semantic         Enterprise; see <a href="http://lsdis.cs.uga.edu/SemanticEnterprise/">http://lsdis.cs.uga.edu/SemanticEnterprise/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles2"></a>[2] See the conference guide for the <a href="http://www.wilshireconferences.com/webfiles/STC05/Stc05Final.pdf">Semantic         Technology Conference 2005</a>. The sixth one, the <a href="http://www.semantic-conference.com/">2010 Semantic Technology         Conference</a>, is upcoming on June 21-25 in San Francisco.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles3"></a>[3] See, for example, Mitchell Ummell, ed., 2009. “The Rise of         the Semantic Enterprise,” special dedicated edition of the         <span style="font-style: italic;">Cutter IT Journal</span>, Vol. 22(9),         40 pp., September 2009. See <a href="http://www.cutter.com/offers/semanticenterprise.html">http://www.cutter.com/offers/semanticenterprise.html</a> (after filling out contact form). Partially in response to this         conventional view, I wrote <a href="#styles10">[10]</a>. In that article I offered as a working         definition that &#8220;<span style="font-style: italic;">a</span> <span style="font-weight: bold; font-style: italic;">semantic         enterprise</span> <span style="font-style: italic;">is one that adopts         the languages and standards of the</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a> <span style="font-style: italic;">. . .</span> <span style="font-style: italic;">and applies them to the issues of information         interoperability, preferably using the best practices of</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a><span style="font-style: italic;">.</span>&#8221; That happens to be Structured Dynamics&#8217;         preferred definition, though as this posting indicates, there is a         spectrum of definitions of the term.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles4"></a>[4] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> “The Open World Assumption: Elephant in the Room</a>“,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles5"></a>[5] See for example <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> , <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> and <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles6"></a>[6] <a href="http://en.wikipedia.org/wiki/Linked_data">Linked data</a> is a set of best practices for publishing and deploying instance and         class data using the RDF data model. Two of the best practices are to         name the data objects using uniform resource identifiers (URIs), and to         expose the data for access via the HTTP protocol. Both of these         practices enable the Web to become a distributed database, which also         means that Web architectures can also be readily employed.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles7"></a>[7] We use a basis in <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a> for defining the roles and splits in schema and instances.         As we define it:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles8"></a>[8] One article that got quite a bit of play a few years back was A.         Iskold, 2007. &#8220;<a href="http://www.readwriteweb.com/archives/the_top-down_semantic_web.php">Top         Down: A New Approach to the Semantic Web</a>,&#8221; in <em>ReadWrite Web</em>, Sept.         20, 2007. The problem with this terminology is that it offers a         completely different sense of &#8220;top down&#8221; to traditional uses. In         Iskold&#8217;s argument, his &#8220;top down&#8221; is a layering on top of the existing         Web.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles9"></a>[9] The more traditional view of &#8220;top down&#8221; with respect to the         semantic Web is in relation to how the system is constructed. This is         reflected well in a presentation from the <a href="http://lsdis.cs.uga.edu/SemNSF/SemWebWorkshopAgenda.htm">NSF Workshop         on DB &amp; IS Research for Semantic Web and Enterprises</a>, April 3,         2002, entitled &#8220;<a href="http://lsdis.cs.uga.edu/%7Ekashyap/talks/SWWS%20Panel.ppt">The         &#8216;Emergent, Semantic Web: Top Down Design or Bottom Up         Consensus?</a>&#8220;. Under this view, top down is design and         committee-driven; bottom up is more decentralized and based on social         processes, which is more akin to Iskold&#8217;s &#8220;top down.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles10"></a>[10] M.K. Bergman, 2009. &#8220;<a href="../825/fresh-perspectives-on-the-semantic-enterprise/">Fresh         Perspectives on the Semantic Enterprise</a>,&#8221; <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, Sept.         28, 2009.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
