<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI3:::Adaptive Information &#187; Adaptive Information</title>
	<atom:link href="http://www.mkbergman.com/category/adaptive-information/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Wed, 10 Mar 2010 05:21:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Citizen DAN, Prise Deux</title>
		<link>http://www.mkbergman.com/869/citizen-dan-prise-deux/</link>
		<comments>http://www.mkbergman.com/869/citizen-dan-prise-deux/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 04:36:03 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Citizen DAN]]></category>
		<category><![CDATA[citizen journalism]]></category>
		<category><![CDATA[community indicators]]></category>
		<category><![CDATA[data appliance]]></category>
		<category><![CDATA[gov 2.0]]></category>
		<category><![CDATA[knc]]></category>
		<category><![CDATA[Knight News Challenge]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[open data]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=869</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Citizen DAN, Prise Deux&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Software Development&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/869/citizen-dan-prise-deux/&amp;rft.language=English"></span>
Huzzah! for Local Government Open Data, Transparency, Community Indicators         and Citizen Journalism
While the Knight News         Challenge is still working its way through the screening details,         Structured Dynamics&#8216;     [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Citizen DAN, Prise Deux&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Software Development&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-03-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/869/citizen-dan-prise-deux/&amp;rft.language=English"></span>
<h2><img style="float: left; margin-right: 10px; width: 260px; height: 195px;" title="Citizen DAN Logo" src="../wp-content/themes/ai3/images/2009Posts/091214_citizen_dan_logo.png" alt="Citizen DAN Logo" align="left" />Huzzah! for Local Government Open Data, Transparency, Community Indicators         and Citizen Journalism</h2>
<p>While the <a href="http://www.newschallenge.org/">Knight News         Challenge</a> is still working its way through the screening details,         <a href="http://structureddynamics.com/">Structured Dynamics</a>&#8216;    <strong><a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=82f0c1fa-57cc-410d-98c5-82a989662657"> Citizen DAN</a></strong><a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=82f0c1fa-57cc-410d-98c5-82a989662657"> proposal</a> remains in the hunt. Listen to this:</p>
<p>To date, we have         been the <a href="http://generalprop.newschallenge.org/SNC/Main.aspx?&amp;pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9"> most viewed</a> proposal by far (<span class="double_u">2x</span> more         than the second most viewed!!! <span style="font-style: italic;">Hooray!</span>) and are in the top five of         <a href="http://generalprop.newschallenge.org/SNC/GroupSearch.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;sortby=2"> highest rated</a> (have also been at #1 or #2, depending. <span style="font-style: italic;">Hooray!</span>).         Thanks to all of you for your interest and support.</p>
<p>There is much to recommend this <a href="http://www.newschallenge.org/">KNC</a> approach, not the least of         which being able to attract some 2,500 proposals seeking a piece of the         2010 $5 million potential grant awards. Our proposal extends SD’s         basic <a href="http://openstructs.org/structwsf">structWSF</a> and         <a href="http://constructscs.com/">conStruct</a> Drupal frameworks to         provide a <em><strong>d</strong></em>ata         <em><strong>a</strong></em>ppliance and         <em><strong>n</strong></em>etwork (DAN) to support citizen journalists         with data and analysis at the local, community level.</p>
<p>None of our rankings, of course, guarantees anything. But, we also feel         good about how the market is looking at these frameworks. We have         recently been awarded some pretty exciting and related contracts. Any         and all of these initiatives will continue to contribute to the open         source Citizen DAN vision.</p>
<p>And, what might that vision be? Well, after some weeks away from it, I         read again our <a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=82f0c1fa-57cc-410d-98c5-82a989662657"> online submission</a> to the Knight News Challenge. I have to say: It         ain&#8217;t too bad! (Plus many supporting <a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=c552467e-1fe7-4d23-bf46-cb01da699bc9">goodies</a> and <a href="http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=dc3ab619-8eb5-4ac5-ae7b-36b7e98bddc9&amp;itemguid=1d00faaf-f1ff-40d8-b88d-8eeced420e36">details</a>.)</p>
<p>So, I repeat in its entirety below, the KNC questions and our formal         responses. This information from our original submittal is         unchanged, except to add some live links where they could not be         submitted as such before. (BTW, the <big><span style="font-weight: bold;">bold headers</span></big> are the KNC questions.) Eventual winners are slated to be announced around mid-June. We&#8217;re keeping our fingers crossed, but we are pursuing this initiative in any case.</p>
<hr style="width: 20%; height: 1px; text-align: center;" />
<h3>Describe your project:</h3>
<p>Citizen DAN is an open source framework to leverage relevant local data         for citizen journalists. It is a:</p>
<ul>
<li>Appliance for filtering and analyzing data specific to local         community indicators</li>
<li>Means to visualize local data over time or by neighborhood</li>
<li>Meeting place for the public to upload and share local data and         information</li>
<li>Web data portal that can be individually tailored by any local         community</li>
<li>Node in a global network of communities across which to compare         indicators of community well-being.</li>
</ul>
<p>Good decisions and good journalism require good information. Starting         with pre-loaded government data, Citizen DAN provides any citizen the         framework to learn and compare local statistics and data with other         similar communities. This helps to promote the grist for citizen         journalism; it is also a vehicle for discovery and learning across the         community.</p>
<p>Citizen DAN comes pre-packaged with all necessary deployment components         and documentation, including local data from government sources. It         includes facilities for direct upload of additional local data in         formats from spreadsheets to standard databases. Many standard         converters are included with the basic package.</p>
<p>Citizen DAN may be implemented by local governments or by community         advocacy groups. When deployed, using its clear documentation, sponsors         may choose whether or what portions of local data are exposed to the         broader Citizen DAN network. Data exposed on the network is         automatically available to any other network community for comparison         and analysis purposes.</p>
<p>This data appliance and network (DAN) is multi-lingual. It will be         tested in three cities in Canada and the US, showing its multi-lingual         capabilities in English, Spanish and French.</p>
<h3>How will your project improve the way news and information are         delivered to geographic communities?</h3>
<p>With Citizen DAN, anyone with Web access can now get, slice, and dice         information about how their community is doing and how it compares to         other communities. We have learned from Web 2.0 and user-generated         content that once exposed, useful information can be taken and analyzed         in valuable and unanticipated ways.</p>
<p>The trick is to get information that already exists. Citizen         journalists of the past may not have either known:</p>
<ol>
<li>Where to find relevant information, or</li>
<li>How to ‘slice-and-dice’ that information to extract         meaningful nuggets.</li>
</ol>
<p>By removing these hurdles, Citizen DAN improves the ways information is         delivered to communities and provides the framework for sifting through         it to extract meaning.</p>
<h3>How is your idea innovative? (new or different from what already         exists)</h3>
<p>Government public data in electronic tabular form or as published         listings or tables in local newspapers has been available for some         time. While meeting strict ‘disclosure’ requirements, this         information has neither been readily analyzable nor actionable.</p>
<p>The meaning of information lies in its interpretation and analysis.</p>
<p>Citizen DAN is innovative because it:</p>
<ol>
<li>Is a platform for accessing and exposing available community data</li>
<li>Provides powerful Web-based tools for drilling down and mining data</li>
<li> <em>Changes the game</em> via public-provided data, and</li>
<li>Packages Citizen DAN in a Web framework that is available to any         local citizen and requires no expertise other than clicking links.</li>
</ol>
<h3>What experience do you or your organization have to successfully         develop this project?</h3>
<p>Structured Dynamics has already developed and released as open-source         code <a href="http://openstructs.org/">structWSF</a> and <a href="http://constructscs.com/">conStruct</a> , the basic foundations to         this proposal. structWSF provides the network and dataset         “backbone” to this proposal; conStruct provides the Drupal         portal and Web site framework.</p>
<p>To this foundation we add proven experience and knowledge of datasets         and how to access them, as well as tools and converters for how to         stage them for standard public use. A key expertise of Structured         Dynamics is the conversion of virtually any legacy data format into         interoperable canonical forms.</p>
<p>These are important challenges, which require experience in the         semantics of data and mapping from varied forms into useful and common         frameworks. Structured Dynamics has codified its expertise in these         areas into the software underlying Citizen DAN.</p>
<p>Structured Dynamics’ principals are also multi-lingual, with         language-neutral architectures and code. The company’s principals         are also some of the most prominent bloggers and writers in the         semantic Web. We are acknowledged as attentive to documentation and         communication.</p>
<p>Finally, Structured Dynamics’ principals have more than a decade         of track record in successful data access and mining, and software and         venture development.</p>
<p>To this strong basis, we have preliminary city commitments for         deploying this project in the United States (English and Spanish) and         Canada (French and English).</p>
<h3>What unmet need does your proposal answer?</h3>
<p><a href="http://www.thisweknow.org/">ThisWeKnow</a> offers local Census         data, but no community or publishing aspects. Data sharing is in         <a href="http://www.datasf.org/">DataSF</a> and <a href="http://www.nyc.gov/html/datamine/html/home/home.shtml">DataMine</a> (NYC), but they lack collaboration, community networks and comparisons,         or powerful data visualization or mapping.</p>
<p>Citizen DAN is a turnkey platform for any size community to create,         publish, search, browse, slice-and-dice, visualize or compare         indicators of community well-being. Its use makes the Web more locally         focused. With it, researchers, watchdog groups, reporters, local         officials and interested citizens can now discover hard data for &#8216;new         news&#8217; or fact-check mainstream media.</p>
<h3>What tasks/benchmarks need to be accomplished to develop your project         and by when will you complete them?</h3>
<p>There are two releases with feedback. Each task summary, listing of         task hours (hr) and duration in months (mo), in rough sequence order         with overlaps, is:</p>
<ol>
<li>Dataset Prep/Staging: identify, load and stage baseline datasets;         provide means for aggregating data at different levels; 420 hr; 2.5 mo</li>
<li>Refine Data Input Facility: feature to upload other external data,         incl direct from local sources; XML, spreadsheet, JSON forms; dataset         metadata; 280 hr; 3 mo</li>
<li>Add Data Visualization Component: Flex mapping/data visualization         (charts, graphs) using any slice-and-dice; 390 hr; 3 mo</li>
<li>Make Multi-linguality Changes: English, French, Spanish versions;         220 hr; 2 mo</li>
<li>Refine User Interface: update existing interface in faceted browse;         filter; search; record create, manage and update; imports; exports; and         user access rights; 380 hr; 3 mo</li>
<li>Standard Citizen DAN Ontologies: the coherent schema for the data;         140 hr; 3 mo</li>
<li>Create Central Portal: distribution and promotion site for project;         120 hr; 2 mo</li>
<li>Deploy/Test First Release: release by end of Mo 5 @ 3 test sites;         300 hr; 4 mo</li>
<li>Revise Based on Feedback: bug fixing and 4 mo testing/feedback,         then revision #2; 420 hr</li>
<li>Package/Document: component packaging for easier installs;         increased documentation; 310 hr; 2 mo</li>
<li>Marketing/Awareness: see next question; 240 hr; 12 mo</li>
<li>Project Management: standard PM/interact with test communities,         partners; 220 hr; 12 mo.</li>
</ol>
<p>See attached task details.</p>
<h3>What will you have changed by the end of your project?</h3>
<div style="text-align: center;">
<pre>"Information is the currency of democracy." <em>Thomas Jefferson</em> (n.b.)</pre>
</div>
<p>We intuitively understand that an informed citizenry is a healthy         polity. At the global level and in 250 languages, we see how Wikipedia,         matched with the Internet and inexpensive laptops, is bringing         unforeseen information and enrichment to all. Across the board, we are         seeing the democratization of information.</p>
<p>But very little of this revolution has percolated to the local level.</p>
<p>Only in the past decade or so have we seen free, electronic access to         national Census data. We still see local data only published in print         or not available at all, limiting both awareness but more importantly         understanding and analysis. Data locked up in municipal computers or         available but not expressed via crowdsourcing is as good as         non-existent.</p>
<p>Though many citizens at the local level are not numeric, intuition has         to tell us that the absense of empirical, local data hurts our ability         to understand, reason and debate our local circumstances. Are we doing         better or worse than yesterday? Than in comparison with our peers?         Under what measures does this have meaning about community well being?</p>
<p>The purpose of the Citizen DAN project is to create an appliance &#8212; in         the same sense of refrigerators keeping our food from spoiling &#8212; by         which any citizen can crack open and expose relevant data at the local         level. Citizen DAN is about enrichening our local information and         keeping our communities healthy.</p>
<h3>How will you measure progress and ultimately success?</h3>
<p>We will measure the progress of the project by the number of         communities and local organizations that use the Citizen DAN platform         to create and publish community data. Subsidiary measures include the         number of:</p>
<ul>
<li>Individual users across all installations</li>
<li>Users contributing uploaded datasets</li>
<li>Contributed datasets</li>
<li>Contributed applications based on the platform</li>
<li>Interconnected sites in the network</li>
<li>Different Citizen DAN networks</li>
<li>Substantive articles and blog posts on Citizen DAN</li>
<li>Mentions of &#8216;Citizen DAN&#8217; (and local naming or variants, which will         be tracked) in news articles</li>
<li>Contributed blog posts on the central Citizen DAN portal</li>
<li>Software package downloads, and</li>
<li>Google citations and hits on &#8216;Citizen DAN&#8217; (and prominent         variants).</li>
</ul>
<p>These measures, plus active sites with profiles of each, will be         monitored and tracked on the central Citizen DAN portal.</p>
<p>&#8216;Ultimate success&#8217; is related to the general growth in transparent         government at the local level. Growth in Citizen DAN-related measures         on a year-over-year basis or in relation to Gov2.0 would indicate         success.</p>
<h3>Do you see any risk in the development of your project?</h3>
<p>There is no technical risk to this proposal, but there are risks in         scope, awareness and acceptance. Our system has been operational for         one year for relevant use cases; all components have been integrated,         debugged, and put into production.</p>
<p>Scope risks relate to how much data the Citizen DAN platform is loaded         with, and how much functionality is included. We balance the data         question by using common public datasets for baseline data, then add         features for localities to &#8220;crowdsource&#8221; their own supplementary data.         We balance the functionality question by limiting new development to         data visualization/mapping and to upload functions (per above), and         then to refine what already exists.</p>
<p>Awareness risks arise from a crowded attention space. We can overcome         this in two ways. The first is to satisfy users at our test sites. That         will result in good recommendations to help seed a snowball effect. The         second way is to use social media and our existing Web outlets         aggressively. We have been building awareness for our own properties in         steady, inch-by-inch measures. While a notable few Web efforts may go         viral, the process is not predictable. Steady, constant focus is our         preferred recipe.</p>
<p>Acceptance risk is intimately linked with awareness and use. If we can         satisfy each Citizen DAN community, then new datasets, new         functionality and new awareness will naturally arise. More users and         more contributions through the network effect are the best way to broad         acceptance.</p>
<h3>What is your marketing plan? How will people learn about what you are         doing?</h3>
<p>Marketing and awareness efforts will include our use of social media,         dedicated Web sites, support from test communities, and outreach to         relevant community Web sites.</p>
<p>Our own blogs are popular in the semantic Web and structured data space         (~3K uniques daily); we have published two posts on Citizen DAN and         will continue to do so with more frequency once the effort gets         underway.</p>
<p>We will create a central portal (<a href="http://citizen-dan.org/">http://citizen-dan.org</a>) based on the         project software (akin to our other project sites). The model for this         apps and deployments clearinghouse is CrimeReports.com. Using social         aspects and crowdsourcing, the site will encourage sharing and best         practices amongst the growing number of Citizen DAN communities.</p>
<p>We will blog and post announcements for key releases and milestones on         relevant external Web sites including various <a href="http://en.wikipedia.org/wiki/E-Government">Gov 2.0</a> sites, <a href="http://www.communityindicators.net/">Community Indicators         Consortium</a>, <a href="http://www.govloop.com/">GovLoop</a>, <a href="http://www.newschallenge.org/">Knight News Challenge</a>, the <a href="http://www.sunlightfoundation.com/">Sunlight Foundation</a>, and so         forth. In addition, we will collate and track individual community         efforts (maintained on the central Citizen DAN site) and make specific         outreach to community data sites (such as <a href="http://www.datasf.org/">DataSF</a> or <a href="http://www.nyc.gov/html/datamine/html/home/home.shtml">DataMine</a> at         NYC.gov). We will use Twitter (#CitizenDAN, etc) and the social         networks of LinkedIn, Facebook, and Meetup to promote Citizen DAN         activity.</p>
<p>We will interact with advocates of citizen journalism, and engage civic         organizations, media, and government officials (esp in our three test         communities) to refine our marketing plan.</p>
<h3>Is this a one-time experiment or do you think it will continue after         the grant?</h3>
<p>Citizen DAN is not an experiment. It is a working framework that gives         any locality and its citizenry the means to assemble, share and compare         measures of its community well-being with other communities. These         indicators, in turn, provide substance and grist for greater advocacy         and writing and blogging (&#8221;journalism&#8221;) at the local level.</p>
<p>Granted, there are unknowns: How many localities will adopt the Citizen         DAN appliance? How essential will its data be to local advocacy and         news? How active will each Citizen DAN installation be in attracting         contributions and local data?</p>
<p>We submit the better way to frame the question is the degree of         adoption, as opposed to will it work.</p>
<p>Web-based changes in our society and social interaction are leading to         the democratization of information, access to it, and channels for         expression. Whether ultimately successful in the specific form proposed         herein, Citizen DAN and its open source software and frameworks will         surely be adopted in one form or another &#8212; to one degree or another &#8212;         in the unassailable trend toward local government transparency and         citizen involvement.</p>
<p>In short, Yes: We believe Citizen DAN will continue long after the         grant.</p>
<h3>If it is to be self-sustainable, what is the plan for making that         happen?</h3>
<p>Our plan begins with the nature of Citizen DAN as software and         framework. Sustainability is a question of whether the appliance itself         is useful, and how users choose to leverage it.</p>
<p>Mediawiki, the software behind Wikipedia, is an analog. Mediawiki is an         enabling infrastructure. Some sites using it are not successful; others         wildly so. Success has required the combination of a good appliance         with topicality and good management. The same is true for Citizen DAN.</p>
<p>Our plan thus begins with Citizen DAN as a useful appliance, as free         open source with great documentation and prominent initial use cases.         Our plan continues with our commitment to the local citizen         marketplace.</p>
<p>We are developing Citizen DAN because of current trends. We foresee         many hundreds of communities adopting the system. Most will be able to         do so on their own. Some others may require modifications or         assistance. Our self-interest is to ensure a high level of adoption.</p>
<p>An era of citizen engagement is unfolding at the local level, fueled by         Web technologies and growing comfort with crowdsourcing and social         networks. Meanwhile, local government constraints and pressures for         transparency are unleashing locked-up data. These forces will create         new opportunities for data literacy by the public, that will itself         bring new understanding and improvements in governance and budgeting.         We plan on Citizen DAN and its offspring to be one of the catalysts for         those changes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/869/citizen-dan-prise-deux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two Contrasting Styles for the Semantic Enterprise</title>
		<link>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/</link>
		<comments>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 15:36:49 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[semantic enterprise]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=866</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
Our Own Approach is Adaptive and Incremental
It is gratifying to see the emergence of the term semantic enterprise, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Two Contrasting Styles for the Semantic Enterprise&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/&amp;rft.language=English"></span>
<h2><img style="border: 0px solid; width: 225px; height: 225px; float: left; margin-right: 10px;" title="Two Faces in Circle, from http://energeticrelations.com/" src="../wp-content/themes/ai3/images/2010Posts/100214_two_faces_in_circle.jpg" alt="Two Faces in Circle, from http://energeticrelations.com/" />Our Own Approach is Adaptive and Incremental</h2>
<p>It is gratifying to see the emergence of the term <span style="font-style: italic;">semantic enterprise</span>, with much increased         attention and commentary. But, similar to different styles and patterns         in software programming, there is not a single (nor best, depending on         circumstance) way to approach becoming a semantic enterprise.</p>
<p>In this piece I contrast two styles. The more traditional and familiar         one is comprehensive, complete and &#8220;engineered&#8221; in its approach. The         second, and emerging style, is more adaptive and incremental. While         <a href="http://structureddynamics.com/">Structured Dynamics</a> is a         proponent and thought leader for the adaptive style, the use and         applicability of either approach is really a function of objectives and         circumstances. The choice of approach depends on use case, and should not be a dogmatic one.</p>
<p>Any time a contrast is posed, one should be on guard about         setting up a rhetorical strawman. There may perhaps be a bit of this         flavor in this article; if so, it is unintended. It is probably best to         realize that there is a gradient &#8212; or spectrum &#8212; of possible         approaches between these contrasting styles. The real message is to         understand these differences such that you can comfortably place your         own organization at the right points along this spectrum.</p>
<h3>A Spectrum of Advantages and Differences</h3>
<p>The general idea of semantics in the enterprise preceeds the use of the         term, having been somewhat captured before by the ideas of <a href="http://en.wikipedia.org/wiki/Enterprise_application_integration">enterprise         application integration</a>, <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration">enterprise         information integration</a> and other concepts even related to <a href="http://en.wikipedia.org/wiki/Federated_database_system">data         federation</a> and <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a> stretching back to the 1980s. However, as a specific label, we can look         back to the first mentions in the late 1990s and more concerted         attention beginning from about 2002 or so onward <a href="#styles1">[1]</a>. As another         indicator, since 2005 the Semantic Technology Conference has given         specific prominence to the enterprise <a href="#styles2">[2]</a>.</p>
<p>Throughout this period, the sense from academic papers, many vendors,         and most pundits <a href="#styles3">[3]</a> has been on things like automated reasoning,         machine-aided decision making, aspects of artificial intelligence, and         so forth. The general tone is often framed as &#8220;revolution&#8221; or &#8220;massive         changes&#8221; or something &#8220;entirely new.&#8221; If you are a consultant or         software/implementation vendor &#8212; especially where VC money is backing         the venture with hopes for big returns and home runs &#8212; it may make         cynical sense to sell such large and costly change.</p>
<p>I believe there are circumstances where the <span style="font-style: italic;">Semantic Enterprise</span> writ this large may         make sense and be financially justified. But, this kind of &#8220;big change&#8221;         view has also seen relatively few visible (or successful) deployments.         It has colored what it means to be a semantic enterprise. And, I         believe, it has weakened market credibility by perhaps overpromising         and underdelivering. The conventional view of what it is         be a semantic enterprise deserves to be balanced.</p>
<p>So, as we balance this understanding of the semantic enterprise to one         that is more nuanced, we can contrast the characteristics of the two         apposite styles as follows:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style</td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Characteristics of the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style</td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>A focus on a more complete, comprehensive coverage of the                 semantics in the domain</li>
<li>More enterprise-wide, less partial or departmental</li>
<li>Greater emphasis on &#8220;<a href="http://en.wikipedia.org/wiki/Closed_world_assumption">closed                 world</a>&#8221; approaches <a href="#styles4">[4]</a>; more akin to relational database                 architecting and schema</li>
<li>Expansion is possible, but effort may be somewhat complex</li>
<li>A general implication is to replace or supplant existing                 information structures with semantic ones</li>
<li>Not necessarily based on semantic Web standards and                 languages <a href="#styles5">[5]</a> (<span style="font-style: italic;">e.g.</span>,                 may include <a href="http://en.wikipedia.org/wiki/Common_logic">Common Logic</a>,                 <a href="http://en.wikipedia.org/wiki/Frame_%28artificial_intelligence%29"> frame logics</a>, etc.)</li>
<li>Richer set of predicates (relations)</li>
<li>Though a distinction is maintained between                 schema and instances, their separation may not be consistently                 (physically) enforced</li>
<li>Often more complicated inferencing and logic tests</li>
<li>More complete enumeration and characterization of items</li>
<li>Much process around semantics agreement across groups</li>
<li>Fairly well-developed implementation tools, including for                 ontology engineering</li>
<li>Implementation times in months to years</li>
<li>Implementation costs akin to traditional large-scale IT                 projects</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>An emphasis on a simpler, incremental, &#8220;learn as you go&#8221;                 approach</li>
<li>Start with single departments or limited vertical apps</li>
<li>Embedded in the &#8220;<a href="http://en.wikipedia.org/wiki/Open_world_assumption">open                 world</a>&#8221; approach <a href="#styles4">[4]</a>, with incorporation of external                 information</li>
<li>Design and approach inherently allows incremental expansion                 and adaptation</li>
<li>A key premise is to build from and leverage existing                 information structures, vocabularies and assets</li>
<li>Fully based on semantic Web standards and languages <a href="#styles5">[5]</a>,                 often including linked data <a href="#styles6">[6]</a></li>
<li>Tends to start simply with hierarchical or related concepts                 (<span style="font-style: italic;">e.g.</span>, SKOS)</li>
<li>Conscious distinction in the structure for                 handling schema separate from instances <a href="#styles7">[7]</a></li>
<li>Inferencing logic based more on concept matching, or                 parent-child or part-of relationships</li>
<li>Degree of item characterization based on current scope</li>
<li>Initial semantic matching can be driven from existing                 assets</li>
<li>Fairly well-developed implementation tools, <span style="font-style: italic; text-decoration: underline;">except</span> for how to engage publics in the development process</li>
<li>Implementation times in weeks to months</li>
<li>Implementation costs driven by available budgets (and thus                 scope)</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Note we have labeled the conventional approach as the &#8220;comprehensive,         engineering&#8221; style; its contrast, and the one we position more closely to, is the         &#8220;adaptive, incremental&#8221; style.</p>
<p style="margin-left: 30px; margin-right: 30px;">[Others have posited contrasting styles, most often as "top down"         <span style="font-style: italic;">v.</span> "bottom up." However, in         one interpretation of that distinction, "top down" means a layer on top         of the existing Web <a href="#styles8">[8]</a>. On the other hand, &#8220;top down&#8221; is more often         understood in the sense of a &#8220;comprehensive, engineered&#8221; view,         consistent with my own understanding <a href="#styles9">[9]</a>. Yet no matter which  		characterization, neither captures what I feel to be the more         important considerations of mindset, logic and premise.]</p>
<p>Though the table above contrasts many points, I think there are two         main distinctions to the adaptive approach. First, it firmly embraces         the open world assumption. OWA is key to an incremental, &#8220;learn as you         go&#8221; deployment that is also well suited to incorporation of external         information. The second main distinction is to leverage and build from         existing assets.</p>
<h3>A Spectrum of Applications</h3>
<p>Yet as noted in the opening, which of these approaches makes better         sense depends on circumstance. One aspect of circumstance is available         budget and deployment times for pilots or proofs-of-concept. Another         aspect, of course, is the planned use or application         for the deployment.</p>
<p>These are by no means hard distinctions, but in general we can see         these contrasting approaches applying to the following uses:</p>
<table class="center_ok" style="text-align: left; width: 600px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="padding: 6px; vertical-align: top; text-align: center; width: 300px; font-weight: bold; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Comprehensive, &#8216;Engineered&#8217;</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more CWA driven)</span></td>
<td style="padding: 6px; vertical-align: top; width: 300px; font-weight: bold; text-align: center; background-color: #ffffcc;">Applications and Uses for the<br />
<span style="font-style: italic;">Adaptive, Incremental</span> Style<br />
<span style="font-weight: normal;">(<span style="font-style: italic;">i.e.</span>, more OWA driven)</span></td>
</tr>
<tr>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>Bounded, &#8220;inward&#8221; applications (high degree of control and                 completeness)</li>
<li>Engineering enterprises</li>
<li>Technical domains and organizations</li>
<li>Aeronautics</li>
<li>Pharmaceuticals</li>
<li>Chemicals</li>
<li>Petroleum</li>
<li>Energy</li>
<li>A/E firms (construction)</li>
</ul>
</td>
<td style="vertical-align: top;">
<ul style="margin-left: 5px;">
<li>External facing applications, organizations (customers,                 incorporation of external data)</li>
<li>Faceted Search</li>
<li>Taxonomy updates</li>
<li>Multi-domain master data management (MDM)</li>
<li>Simple (initially) inferencing</li>
<li>Consumer products</li>
<li>Finance</li>
<li>Health care</li>
<li>Knowledge enterprises</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>A critical distinction is the nature of the enterprise itself.         &#8220;External-facing&#8221; enterprises or functions that want or need to         incorporate much external information (say, marketing or competitive intelligence) are advised to look closely at         the adaptive approach. Organizations that have more complete control         over their circumstances should perhaps focus on the conventional         approach.</p>
<h3>Adoption Thresholds and Risks</h3>
<p>In previous writings I have pointed to the manifest benefits that can         accrue to the semantic enterprise [see, esp. <a href="#styles10">10</a>]. But we also have         witnessed nearly a decade of promotion for semantics in the enterprise,         with perhaps a lack of progress in some areas or unmet promises in         others. These raise questions and skepticism of the real eventual costs         and benefits.</p>
<p>I believe some of this skepticism is inherent with anything new &#8212; the         general IT fatigue from what the current &#8220;next great thing&#8221; might be.         But I also believe that some of this skepticism results from an         approach to semantics in the enterprise that is both lengthy to deploy         and high cost.</p>
<p>The key advantage of the adaptive, incremental approach is that the         whole IT game in the enterprise can change. An open world approach         enables adoption as it proves itself and as budgets allow. Commitments         made under this approach have, in essence, permanent value. Past fears         and concerns about making &#8220;wrong&#8221; bets no longer apply. With learning,         targets can be re-adjusted, structure re-defined and applications         re-focused, all as new discoveries and broadening scope dictate.</p>
<p>This does not make the adaptive approach better than the conventional         one. But, it does make it less risky and, well, more <span style="font-style: italic;">adaptive</span>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles1"></a>[1] For example, the earliest Google mentions on &#8220;semantic enterprise&#8221;         date to about 1998 or 1999. In 2002, the University of Georgia and Amit         Sheth offered the first known academic course on the Semantic         Enterprise; see <a href="http://lsdis.cs.uga.edu/SemanticEnterprise/">http://lsdis.cs.uga.edu/SemanticEnterprise/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles2"></a>[2] See the conference guide for the <a href="http://www.wilshireconferences.com/webfiles/STC05/Stc05Final.pdf">Semantic         Technology Conference 2005</a>. The sixth one, the <a href="http://www.semantic-conference.com/">2010 Semantic Technology         Conference</a>, is upcoming on June 21-25 in San Francisco.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles3"></a>[3] See, for example, Mitchell Ummell, ed., 2009. “The Rise of         the Semantic Enterprise,” special dedicated edition of the         <span style="font-style: italic;">Cutter IT Journal</span>, Vol. 22(9),         40 pp., September 2009. See <a href="http://www.cutter.com/offers/semanticenterprise.html">http://www.cutter.com/offers/semanticenterprise.html</a> (after filling out contact form). Partially in response to this         conventional view, I wrote <a href="#styles10">[10]</a>. In that article I offered as a working         definition that &#8220;<span style="font-style: italic;">a</span> <span style="font-weight: bold; font-style: italic;">semantic         enterprise</span> <span style="font-style: italic;">is one that adopts         the languages and standards of the</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Semantic_Web">semantic Web</a> <span style="font-style: italic;">. . .</span> <span style="font-style: italic;">and applies them to the issues of information         interoperability, preferably using the best practices of</span> <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/Linked_Data">linked data</a><span style="font-style: italic;">.</span>&#8221; That happens to be Structured Dynamics&#8217;         preferred definition, though as this posting indicates, there is a         spectrum of definitions of the term.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles4"></a>[4] See, M.K. Bergman, 2009. <a href="../852/the-open-world-assumption-elephant-in-the-room/"> “The Open World Assumption: Elephant in the Room</a>“,         <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog,         December 21, 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles5"></a>[5] See for example <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>,         <a href="http://en.wikipedia.org/wiki/RDF_Schema">RDFS</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> , <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a> and <a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and <a href="http://en.wikipedia.org/wiki/Semantic_Web#Components">others</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles6"></a>[6] <a href="http://en.wikipedia.org/wiki/Linked_data">Linked data</a> is a set of best practices for publishing and deploying instance and         class data using the RDF data model. Two of the best practices are to         name the data objects using uniform resource identifiers (URIs), and to         expose the data for access via the HTTP protocol. Both of these         practices enable the Web to become a distributed database, which also         means that Web architectures can also be readily employed.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles7"></a>[7] We use a basis in <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a> for defining the roles and splits in schema and instances.         As we define it:</p>
<div class="boxGraySolid">“Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.”</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles8"></a>[8] One article that got quite a bit of play a few years back was A.         Iskold, 2007. &#8220;<a href="http://www.readwriteweb.com/archives/the_top-down_semantic_web.php">Top         Down: A New Approach to the Semantic Web</a>,&#8221; in <em>ReadWrite Web</em>, Sept.         20, 2007. The problem with this terminology is that it offers a         completely different sense of &#8220;top down&#8221; to traditional uses. In         Iskold&#8217;s argument, his &#8220;top down&#8221; is a layering on top of the existing         Web.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles9"></a>[9] The more traditional view of &#8220;top down&#8221; with respect to the         semantic Web is in relation to how the system is constructed. This is         reflected well in a presentation from the <a href="http://lsdis.cs.uga.edu/SemNSF/SemWebWorkshopAgenda.htm">NSF Workshop         on DB &amp; IS Research for Semantic Web and Enterprises</a>, April 3,         2002, entitled &#8220;<a href="http://lsdis.cs.uga.edu/%7Ekashyap/talks/SWWS%20Panel.ppt">The         &#8216;Emergent, Semantic Web: Top Down Design or Bottom Up         Consensus?</a>&#8220;. Under this view, top down is design and         committee-driven; bottom up is more decentralized and based on social         processes, which is more akin to Iskold&#8217;s &#8220;top down.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="styles10"></a>[10] M.K. Bergman, 2009. &#8220;<a href="../825/fresh-perspectives-on-the-semantic-enterprise/">Fresh         Perspectives on the Semantic Enterprise</a>,&#8221; <a style="font-weight: bold;" href="http://mkbergman.com/"><span style="font-style: italic;">AI3:::Adaptive Information</span></a> blog, Sept.         28, 2009.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/866/two-contrasting-styles-for-the-semantic-enterprise/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Collaborating on Images</title>
		<link>http://www.mkbergman.com/863/collaborating-on-images/</link>
		<comments>http://www.mkbergman.com/863/collaborating-on-images/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 16:26:30 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Blogs and Blogging]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Site-related]]></category>
		<category><![CDATA[collaboration]]></category>
		<category><![CDATA[emf]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[inkscape]]></category>
		<category><![CDATA[Powerpoint]]></category>
		<category><![CDATA[svg]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=863</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Collaborating on Images&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Blogs and Blogging&amp;rft.subject=Open Source&amp;rft.subject=Site-related&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/863/collaborating-on-images/&amp;rft.language=English"></span>

The Inkscape Process Can Also Aid Image Interchanges with Powerpoint
As we see more collaboration forums emerge, one question that naturally         arises is the joint authoring or editing of images. This is         particularly important as &#8220;official&#8221; slide decks or presentations [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Collaborating on Images&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Blogs and Blogging&amp;rft.subject=Open Source&amp;rft.subject=Site-related&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-02-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/863/collaborating-on-images/&amp;rft.language=English"></span>
<p><a href="http://www.inkscape.org/"><img style="border: 0px solid; width: 200px; height: 194px; float: left; margin-right: 10px;" title="Inkscape Logo" src="../wp-content/themes/ai3/images/2010Posts/100131_inkscape_logo.png" alt="Inkscape Logo" hspace="5" vspace="5" align="left" /></a></p>
<h2>The Inkscape Process Can Also Aid Image Interchanges with Powerpoint</h2>
<p>As we see more collaboration forums emerge, one question that naturally         arises is the joint authoring or editing of images. This is         particularly important as &#8220;official&#8221; slide decks or presentations come         to the fore.</p>
<p>There are perhaps many different ways to skin this cat. In this         article, I describe how to do so using the free, open source <a href="http://en.wikipedia.org/wiki/SVG">SVG</a> editing program, <a href="http://www.inkscape.org/">Inkscape</a>.</p>
<h3>Why Inkscape?</h3>
<p>Like many of you, I have been creating and editing images for years. I         am by no means a graphics artist, but images and diagrams have been         essential for communicating my work.</p>
<p>Until a few years back, I was totally a bitmap man. I used <a href="http://en.wikipedia.org/wiki/Corel_Paint_Shop_Pro">Paint Shop Pro</a> (bought by Corel in 2004 and getting long in the tooth) and did a lot         of copying and pasting.</p>
<p>I switched to Inkscape about two years ago for the following reasons:</p>
<ul>
<li>I wanted re-use of image components via re-sizing and re-coloring,         etc., and vector graphics are far superior to raster images for this         purpose</li>
<li>I wanted a stable, free, usable editor and Inkscape was beginning         to mature nicely (the current version 0.47 is even nicer and more         stable)</li>
<li>Its SVG (<a href="http://en.wikipedia.org/wiki/SVG">scalable vector         graphics</a>) format was a standard adopted by the W3C after initial         development by Adobe</li>
<li>SVG is an easily read and editable XML format</li>
<li>There was a growing source of <a href="http://www.inkscape.org/doc/index.php?lang=en">online         documentation</a></li>
<li>There was a growing repository of <a href="http://www.openclipart.org/">SVG graphics examples</a>, including the         broadscale use within <a href="http://commons.wikimedia.org/wiki/Main_Page">Wikipedia</a> (a good way         to find stuff from this site is with the search &#8220;keywords         site:http://commons.wikimedia.org filetype:svg&#8221; on your favorite search         engine, after substituting your specific keywords).</li>
</ul>
<h3>How to Collaborate with Inkscape</h3>
<p>Once you have a working image in Inkscape, make sure all collaborators         have a copy of the software. Then:</p>
<ol>
<li>Isolate the picture (sometimes there are multiple images in a         single file) by deleting all extraneous image stuff in the file</li>
<li>From the toolbar, click on the <span style="font-style: italic;">Zoom to fit drawing in window</span> icon         [<img style="width: 16px; height: 16px;" title="Zoom to fit drawing in window" src="../wp-content/themes/ai3/images/2010Posts/zoom_icon.png" alt="Zoom to fit drawing in window" />];         this will resize and put your target image in the full display window</li>
<li>Under <span style="font-style: italic;">File -&gt; Document         Properties &#8230;</span> check <span style="font-style: italic;">Show page         border</span> and <span style="font-style: italic;">Show border         shadow</span>, then <span style="font-style: italic;">Fit page to         selection</span>. This helps size the image properly in the exported         file for sharing or collaboration</li>
<li>Save the file as an *.svg option, and name the file with a         date/time stamp and author extension (useful for tracking multiple         author edits over time)</li>
<li>If in multiple author mode, make sure who has current &#8220;ownership&#8221;         of the image is clear.</li>
</ol>
<h3>How to Share with Powerpoint</h3>
<p>Of course, it is more often the case that not all collaborators may         have a copy of Inkscape or that the image began in the SVG format.</p>
<p>The image below began as a Windows Powerpoint clip art file, which has         then gone through some modifications. Note the bearded guy&#8217;s hand         holding the paper is out of registry (because I screwed up in earlier         editing, but I also can easily fix because it is a vector image!          <img src='http://www.mkbergman.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />   ). Also note we have the border from Inkscape as suggested         above.  This file, BTW, is <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/people.png"> people.png</a>, and was created as a PNG after a screen capture from         Inkscape:</p>
<div style="margin: 10px; text-align: center;"><img style="border: 0px solid; width: 588px; height: 330px;" title="PNG representation of an SVG" src="http://mkbergman.com/wp-content/themes/ai3/images/2010Posts/people.png" alt="PNG representation of an SVG" /></div>
<p>When beginning in Powerpoint or as clip art, files in the format of         Windows metafile (*.wmf) or extended WMF (*.emf) work well. (For         example, you can download and play with the native Inkscape format of         <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/people.svg"> people.svg</a>, or the <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/people.wmf"> people.wmf</a> or <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/people.emf"> people.emf</a> versions of the image above.) If you already have images         in a Powerpoint presentation, save in one of these two formats, with         (*.emf) preferred. (EMF is generally better for text.)</p>
<p>You can open or load these files directly into Inkscape. Generally,         they will come in as a group of vectors; to edit the pieces, you should         &#8220;ungroup.&#8221;</p>
<p>After editing per the instructions in the previous section, if you need         to re-insert back into Powerpoint, please use the *.emf format (and         make sure you do not save text as paths).</p>
<p>For example, see the following <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.png"> PNG graphic</a> taken from a Inkscape file (<a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.svg">figure_text.svg</a>):</p>
<div style="margin: 10px; text-align: center;"><img style="border: 0px solid; width: 416px; height: 294px;" title="PNG representation of an SVG" src="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.png" alt="PNG representation of an SVG" /></div>
<p>We can save it as an EMF (<a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_textpath.emf">figure_textpath.emf</a>)         to a <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.ppt"> Powerpoint</a>, with the option of converting text to paths:</p>
<div style="margin: 10px; text-align: center;"><img style="border: 0px solid; width: 378px; height: 262px;" title="Text-to-path EMF" src="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text_emf_text-to-path.png" alt="Text-to-path EMF" /></div>
<p>Or, we can save it as an EMF (<a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.emf">figure_text.emf</a>)         to a <a href="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text.ppt"> Powerpoint</a>, only this time not converting text to paths and then         &#8220;ungrouping&#8221; once in Powerpoint:</p>
<div style="margin: 10px; text-align: center;"><img style="border: 0px solid; width: 376px; height: 268px;" title="EMF with no text to path" src="http://mkbergman.com/wp-content/themes/ai3/files/2010Posts/figure_text_emf_no-text-path.png" alt="EMF with no text to path" /></div>
<p>Note the latter option, text not as path, is the far superior one.         However, also note that borders are added to the figures and vertical         text is rotated 90<sup>o</sup> back to horizontal. Nonetheless, the         figure is fully editable, including text. Also, if the original         Inkscape figures are constructed with lines of the same color as fills,         the border conversion also works well.</p>
<p>Frankly, especially with text, because there can be orientation and         other changes going from Inkscape to Powerpoint, I recommend using         Inkscape and its native SVG for all early modifications and to keep a         canonical copy of your images. Then, prior to completion of the deck,         save as EMF for import into Powerpoint and then clean up. If changes         later need to be made to the graphic, I recommend doing so in Inkscape         and then re-importing.</p>
<h3>Other Alternatives</h3>
<p>I should note there is an option, as well, in Inkscape to convert         raster images to vector ones (use <span style="font-style: italic;">Path -&gt; Trace bitmap &#8230;</span> and invoke the         multiple scans with colors). This is doable, but involves quite a bit         of image copying, manipulation and color separation to achieve workable         results. You may want to see further Inkscape&#8217;s <a href="http://www.inkscape.org/doc/tracing/tutorial-tracing.html">documentation         on tracing</a>, or more fully <a href="http://confluence.concord.org/display/CCTR/Tracing+Color+Raster+Images"> this reference dealing with color</a>.</p>
<p>Of course, there are likely many other ways to approach these issues of         collaboration and sharing. I will leave it to others to suggest and         explain those options.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/863/collaborating-on-images/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Sweet Compendium of Ontology Building Tools</title>
		<link>http://www.mkbergman.com/862/the-sweet-compendium-of-ontology-building-tools/</link>
		<comments>http://www.mkbergman.com/862/the-sweet-compendium-of-ontology-building-tools/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 14:54:04 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[compendium]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[listing]]></category>
		<category><![CDATA[ontology editors]]></category>
		<category><![CDATA[ontology mapping]]></category>
		<category><![CDATA[ontology visualization]]></category>
		<category><![CDATA[vocabulary prompting]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=862</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Sweet Compendium of Ontology Building Tools&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-26&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/862/the-sweet-compendium-of-ontology-building-tools/&amp;rft.language=English"></span>

140 Tools: 20 Must Haves, 70 Possible Usefuls, and 50 Has Beens and Marginals
Well, for another client and another purpose, I was goaded into screening my Sweet Tools listing of semantic Web and -related tools and to assemble stuff from every other nook and cranny I could find. The net result is this enclosed listing [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The Sweet Compendium of Ontology Building Tools&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2010-01-26&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/862/the-sweet-compendium-of-ontology-building-tools/&amp;rft.language=English"></span>
<p><a href="http://www.mkbergman.com/category/ontologies/"><img style="border: 0px solid; width: 200px; height: 200px; float: left;" title="AI3's Ontologies category" src="http://www.cs.berkeley.edu/%7Esequin/GEOM/TILES/LizardTetrus1.JPG" alt="AI3's Ontologies category" /></a></p>
<h2>140 Tools: 20 Must Haves, 70 Possible Usefuls, and 50 Has Beens and Marginals</h2>
<p>Well, for another client and another purpose, I was goaded into screening my <span style="color: #993300;"><strong><a href="http://www.mkbergman.com/new-version-sweet-tools-sem-web/">Sweet Tools</a></strong></span> listing of semantic Web and -related tools and to assemble stuff from every other nook and cranny I could find. The net result is this enclosed listing of some 140 or so tools &#8212; most open source &#8212; related to semantic Web ontology building in one way or another.</p>
<p>Ever since I wrote my <em><a href="http://www.mkbergman.com/374/an-intrepid-guide-to-ontologies/">Intrepid Guide to Ontologies</a></em> nearly three years ago (and one of the more popular articles of this site, though it is now perhaps a bit long in the tooth), I have been intrigued with how these semantic structures are built and maintained. That interest, in no small measure, is why I continue to maintain the <strong><a href="../new-version-sweet-tools-sem-web/">Sweet Tools</a></strong> listing.</p>
<p>As far as I know, the following is the largest and most comprehensive listing of ontology         building tools available. I broadly interpret the classification of &#8216;ontology building&#8217;; I include, for example, vocabulary extraction and prompting tools, as well as ontology         visualization and mapping.</p>
<p>There are some 140 tools, perhaps 90 or so are still in active use.         (Given the scope, not every tool could be inspected in detail. Some         listed as being perhaps inactive may not be so, and others not in that         category perhaps should be.) Of the entire roster of tools, somewhere         on the order of 12 to 20 are quite impressive and deserving of local         installation, test runs, and close inspection.</p>
<p>There are relatively few tools useful to non-specialists (or useful to engaging knowledgeable publics in the ontology-building exercise). There appear         to be key gaps in the entire workflow from domain scoping and initial         ontology definition and vocabulary candidates, to longer-term         maintenance and revision. For example, spreadsheets would appear to be a  		possible useful first step in any workflow process (which is why  		<a title="http://openstructs.org/iron" href="http://openstructs.org/iron">irON</a> is listed), but the spreadsheet tool <em>per se</em> is not listed herein   		(nor are text editors).</p>
<p>I surely have missed some tools and likely improperly assigned others. Please drop me an email or comment on this post with any revisions or suggestions.</p>
<h3><span>Some Worth A Closer Look</span></h3>
<p>In my own view, there are some tools that definitely deserve a         closer look. My favorite candidates &#8212; for very different reasons and for very different places in the workflow &#8212; are (in no particular order): <a title="http://apelon-dts.sourceforge.net/index.html" href="http://apelon-dts.sourceforge.net/index.html">Apelon DTS</a>, <a title="http://openstructs.org/iron" href="http://openstructs.org/iron">irON</a>, <a title="http://www.thechiselgroup.org/flexviz" href="http://www.thechiselgroup.org/flexviz">FlexViz</a>, <a title="http://knoodl.com/ui/home.html" href="http://knoodl.com/ui/home.html">Knoodl</a>, <a title="http://protege.stanford.edu/" href="http://protege.stanford.edu/">Protégé</a>, <a title="http://diagramic.com/" href="http://diagramic.com/">diagramic.com</a>, <a title="http://www.boowa.com/" href="http://www.boowa.com/">BooWa</a>,         <a title="http://cmap.ihmc.us/coe" href="http://cmap.ihmc.us/coe">COE</a>, <a title="http://code.google.com/p/ontopia/" href="http://code.google.com/p/ontopia/">ontopia</a>, <a href="http://www.cambridgesemantics.com/products/anzo_for_excel">Anzo</a>, <a title="http://www.punkt.at/3/47/poolparty-thesaurus-server.htm" href="http://www.punkt.at/3/47/poolparty-thesaurus-server.htm">PoolParty</a>,         <a title="http://marinemetadata.org/vine" href="http://marinemetadata.org/vine">Vine</a> (and voc2rdf), <a title="http://code.google.com/p/erca/" href="http://code.google.com/p/erca/">Erca</a>, <a title="http://www.mediavirus.org/graphl/" href="http://www.mediavirus.org/graphl/">Graphl</a>, and <a title="http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html" href="http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html"> GrOWL</a>. Each one of these links is more fully described below. Also, all         tools in the <strong>Vocabulary Prompting Tools</strong> category  		(which also includes extraction) are worth reviewing since all or nearly  		all have online demos.</p>
<p>Other tools may also be deserving, depending on use case. Some of the         more specific analysis and conversion tools, for example, are in the         <strong>Miscellaneous</strong> category.</p>
<p>Also, some purists may quibble with why some tools are listed here (such as inclusion of some stuff related to <a href="http://en.wikipedia.org/wiki/Topic_Maps">Topic Maps</a>). Well, my answer to that is there are no real complete solutions, and whatever we can pragmatically do today requires glueing together many disparate parts.</p>
<h3><span>Comprehensive Ontology Tools</span></h3>
<ul>
<li> <a title="http://www.altova.com/products_semanticworks.html" href="http://www.altova.com/products_semanticworks.html">Altova           SemanticWorks</a> is a visual RDF and OWL editor that auto-generates           RDF/XML or nTriples based on visual ontology design. No open source           version available</li>
<li> <a title="http://amine-platform.sourceforge.net/" href="http://amine-platform.sourceforge.net/">Amine</a> is a rather           comprehensive, open source platform for the development of           intelligent and multi-agent systems written in Java. As one of its           components, it has an ontology GUI with text- and tree-based editing           modes, with some graph visualization</li>
<li>The <a title="http://apelon-dts.sourceforge.net/index.html" href="http://apelon-dts.sourceforge.net/index.html">Apelon DTS</a> (Distributed Terminology System) is an integrated set of open source         components that provides comprehensive terminology services in         distributed application environments. DTS supports national and         international data standards, which are a necessary foundation for         comparable and interoperable health information, as well as local         vocabularies. Typical applications for DTS include clinical data entry,         administrative review, problem-list and code-set management, guideline         creation, decision support and information retrieval.. Though not         strictly an ontology management system, Apelon DTS has plug-ins that         provide visualization of concept graphs and related functionality that         make it close to a complete solution</li>
<li> <a title="http://dome.sourceforge.net/" href="http://dome.sourceforge.net/">DOME</a> is a programmable XML editor           which is being used in a knowledge extraction role to transform Web           pages into RDF, and available as Eclipse plug-ins. DOME stands for           DERI Ontology Management Environment</li>
<li> <a title="http://www.thechiselgroup.org/flexviz" href="http://www.thechiselgroup.org/flexviz">FlexViz</a> is a Flex-based,           Protégé-like client-side ontology creation, management and viewing           tool; very impressive. The code is distributed from <a title="http://sourceforge.net/projects/flexviz/" href="http://sourceforge.net/projects/flexviz/">Sourceforge</a>; there is           a nice <a title="http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#" href="http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#">online           demo</a> available; there is a nice <a title="http://webhome.cs.uvic.ca/~seanf/files/demo_submission_flexviz.pdf" href="http://webhome.cs.uvic.ca/%7Eseanf/files/demo_submission_flexviz.pdf">explanatory           paper</a> on the system, and the developer, Chris Callendar, has a           useful <a title="http://flexdevtips.blogspot.com/" href="http://flexdevtips.blogspot.com/">blog</a> with Flex development           tips</li>
<li> <a title="http://knoodl.com/ui/home.html" href="http://knoodl.com/ui/home.html">Knoodl</a> facilitates           community-oriented development of OWL based ontologies and RDF           knowledge bases. It also serves as a semantic technology platform,           offering a Java service-based interface or a SPARQL-based interface           so that communities can build their own semantic applications using           their ontologies and knowledgebases. It is hosted in the Amazon EC2           cloud and is available for free; private versions may also be           obtained. See especially the <a title="http://knoodl.com/ui/site/webcast/intro.jsp" href="http://knoodl.com/ui/site/webcast/intro.jsp">screencast</a> for a           quick introduction</li>
<li>The <a title="http://neon-toolkit.org/wiki/Main_Page" rel="nofollow" href="http://neon-toolkit.org/wiki/Main_Page">NeOn toolkit</a> is a state-of-the-art, open source multi-platform ontology engineering environment, which provides comprehensive support for the ontology engineering life-cycle. The <a title="http://neon-toolkit.org/wiki/NTK_2.3_Release" rel="nofollow" href="http://neon-toolkit.org/wiki/NTK_2.3_Release">v2.3.0 toolkit</a> is based on the Eclipse platform, a leading development environment, and provides an extensive set of <a title="http://neon-toolkit.org/wiki/Neon_Plugins" rel="nofollow" href="http://neon-toolkit.org/wiki/Neon_Plugins">plug-ins</a> covering a variety of ontology engineering activities. You can add these plug-ins or get a current listing from the built-in updating mechanism</li>
<li> <a title="http://code.google.com/p/ontopia/" href="http://code.google.com/p/ontopia/">ontopia</a> is a relative           complete suite of tools for building, maintaining, and deploying           Topic Maps-based applications; open source, and written in Java.           Could not find online demos, but there are <a title="http://code.google.com/p/ontopia/wiki/Screenshots" href="http://code.google.com/p/ontopia/wiki/Screenshots">screenshots</a> and there is visualization of topic relationships</li>
<li> <a title="http://protege.stanford.edu/" href="http://protege.stanford.edu/">Protégé</a> is a free, open source           visual ontology editor and knowledge-base framework. The Protégé           platform supports two main ways of modeling ontologies via the           Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be           exported into a variety of formats including RDF(S), OWL, and XML           Schema. There are a large number of third-party plugins that extends           the platform&#8217;s functionality
<ul>
<li> <a title="http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType" href="http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType"> Protégé Plugin Library</a> &#8211; frequently consult this page to               review new additions to the Protégé editor; presently there are               dozens of specific plugins, most related to the semantic Web and               most open source</li>
<li> <a title="http://protegewiki.stanford.edu/index.php/Collaborative_Protege" href="http://protegewiki.stanford.edu/index.php/Collaborative_Protege"> Collaborative Protégé</a> is a plug-in extension of the existing               Protégé system that supports collaborative ontology editing as               well as annotation of both ontology components and ontology               changes. In addition to the common ontology editing operations,               it enables annotation of both ontology components and ontology               changes. It supports the searching and filtering of user               annotations, also known as notes, based on different criteria.               There is also an <a title="http://smi-protege.stanford.edu/collab-protege/" href="http://smi-protege.stanford.edu/collab-protege/">online demo</a></li>
</ul>
</li>
<li> <a title="http://www.topquadrant.com/products/TB_Composer.html" href="http://www.topquadrant.com/products/TB_Composer.html">TopBraid           Composer</a> is an enterprise-class modeling environment for           developing Semantic Web ontologies and building semantic           applications. Fully compliant with W3C standards, Composer offers           comprehensive support for developing, managing and testing           configurations of knowledge models and their instance knowledge           bases. It is based on the Eclipse IDE. There is a free version (after           registration) for small ontologies.</li>
</ul>
<h4><span>Not Apparently in Active Use</span></h4>
<ul>
<li> <a title="http://www.aktors.org/technologies/adaptiva/" href="http://www.aktors.org/technologies/adaptiva/">Adaptiva</a> is a           user-centred ontology building environment, based on using multiple           strategies to construct an ontology, minimising user input by using           adaptive information extraction</li>
<li> <a title="http://exteca.sourceforge.net/" href="http://exteca.sourceforge.net/">Exteca</a> is an ontology-based           technology written in Java for high-quality knowledge management and           document categorisation, including entity extraction. Though code is           still available, no updates have been provided since 2006. It can be           used in conjunction with search engines</li>
<li> <a title="http://www.alphaworks.ibm.com/tech/semanticstk" href="http://www.alphaworks.ibm.com/tech/semanticstk">IODT</a> is           IBM’s toolkit for ontology-driven development. The toolkit           includes EMF Ontolgy Definition Metamodel (EODM), EODM workbench, and           an OWL Ontology Repository (named Minerva)</li>
<li> <a title="http://kaon.semanticweb.org/" href="http://kaon.semanticweb.org/">KAON</a> is an open-source ontology           management infrastructure targeted for business applications. It           includes a comprehensive tool suite allowing easy ontology creation           and management and provides a framework for building ontology-based           applications. An important focus of KAON is scalable and efficient           reasoning with ontologies</li>
<li> <a title="http://www.ksl.stanford.edu/software/ontolingua/" href="http://www.ksl.stanford.edu/software/ontolingua/">Ontolingua</a> provides a distributed collaborative environment to browse, create,           edit, modify, and use ontologies. The server supports over 150 active           users, some of whom have provided us with descriptions of their           projects. Provided as an online service; software availability not           known.</li>
</ul>
<h3><span>Vocabulary Prompting Tools</span></h3>
<ul>
<li> <a title="http://www.alchemyapi.com/api/keyword/" href="http://www.alchemyapi.com/api/keyword/">AlchemyAPI</a> from           Orchestr8 provides an API based application that uses statistical and           natural language processing methods. Applicable to webpages, text           files and any input text in several languages</li>
<li> <a title="http://www.boowa.com/" href="http://www.boowa.com/">BooWa</a> is a set expander for any language           (formerly known as SEALS); developed by RC Wang of Carnegie Mellon</li>
<li><a title="https://adwords.google.com/select/KeywordToolExternal" rel="nofollow" href="https://adwords.google.com/select/KeywordToolExternal">Google Keywords</a> allows you to enter a few descriptive words or phrases or a site URL to generate keyword ideas</li>
<li> <a title="http://labs.google.com/sets" href="http://labs.google.com/sets">Google Sets</a> for automatically           creating sets of items from a few examples</li>
<li> <a title="http://opencalais.com/" href="http://opencalais.com/">Open           Calais</a> is free limited API web service to automatically attach           semantic metadata to content, based on either entities (people,           places, organizations, etc.), facts (person ‘x’ works for           company ‘y’), or events (person ‘z’ was           appointed chairman of company ‘y’ on date           ‘x’). The metadata results are stored centrally and           returned to you as industry-standard RDF constructs accompanied by a           Globally Unique Identifier (GUID)</li>
<li><a title="http://www.blogscope.net//tools/phrase.jsp" rel="nofollow" href="http://www.blogscope.net//tools/phrase.jsp">Query-by-document</a> from BlogScope has a nice phrase extraction service, with a choice of ranking methods. Can also be used in a Firefox plug-in (not texted with 3.5+)</li>
<li><a title="http://www.semantichacker.com/api" rel="nofollow" href="http://www.semantichacker.com/api">SemanticHacker</a> (from <a title="http://www.textwise.com/" rel="nofollow" href="http://www.textwise.com/">Textwise</a>) is an API that does a number of different things, including categorization, search, etc. By using &#8216;concept tags&#8217;, the API can be leveraged to generate metadata or tags for content</li>
<li><a title="http://zingosoft.com/tagfinder.htm" rel="nofollow" href="http://zingosoft.com/tagfinder.htm">TagFinder</a> is a Web service that automatically extracts tags from a piece of text. The tags are chosen based on both statistical and linguistic analysis of the original text</li>
<li> <a title="http://tagthe.net/" href="http://tagthe.net/">Tagthe.net</a> has a demo and an API for           automatic tagging of web documents and texts. Tags can be single           words only. The tool also recognizes named entities such as people           names and locations</li>
<li> <a title="http://lcl2.uniroma1.it/termextractor/" href="http://lcl2.uniroma1.it/termextractor/">TermExtractor</a> extracts           terminology consensually referred in a specific application domain.           The software takes as input a corpus of domain documents, parses the           documents, and extracts a list of “syntactically           plausible” terms (e.g. compounds, adjective-nouns, etc.)</li>
<li><a title="http://labs.translated.net/terminology-extraction/" rel="nofollow" href="http://labs.translated.net/terminology-extraction/">TermFinder</a> uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language; available for English, French and Italian</li>
<li> <a title="http://www.nactem.ac.uk/software/termine/" href="http://www.nactem.ac.uk/software/termine/">TerMine</a> is an online           and batch term extractor that emphasizes part of speech (POS) and           n-gram (phrase extraction). TerMine is the terminological management           system with the C-Value term extraction and AcroMine acronym           recognition integrated</li>
<li> <a title="http://pypi.python.org/pypi/topia.termextract/1.1.0" href="http://pypi.python.org/pypi/topia.termextract/1.1.0">Topia term           extractor</a> is a part-of-speech and frequency based term extraction           tool implemented in python. Here is a <a title="http://fivefilters.org/term-extraction/" href="http://fivefilters.org/term-extraction/">term extraction demo</a> based on this tool</li>
<li> <a title="http://www.topicalizer.com/" href="http://www.topicalizer.com/">Topicalizer</a> is a service which           automatically analyses a document specified by a URL or a plain text           regarding its word, phrase and text structure. It provides a variety           of useful information on a given text including the following: Word,           sentence and paragraph count, collocations, syllable structure,           lexical density, keywords, readability and a short abstract on what           the given text is about</li>
</ul>
<ul>
<li> <a title="http://www.trmkft.hu/en/extract/" rel="nofollow" href="http://www.trmkft.hu/en/extract/">TrMExtractor</a> does glossary extraction on pure text files for either English or Hungarian</li>
</ul>
<ul>
<li> <a title="http://www.wikifyer.com/" href="http://www.wikifyer.com/">Wikify!</a> is a system to automatically           &#8220;wikify&#8221; a text by adding Wikipedia-like tags throughout the           document. The system extracts keywords and then disambiguates and           matches them to their corresponding Wikipedia definition</li>
<li> <a title="http://developer.yahoo.com/geo/placemaker/" href="http://developer.yahoo.com/geo/placemaker/">Yahoo! Placemaker</a> is           a freely available geoparsing Web service. It helps developers make           their applications location-aware by identifying places in           unstructured and atomic content – feeds, web pages, news,           status updates – and returning geographic metadata for           geographic indexing and markup</li>
<li><a href="http://developer.yahoo.com/search/content/V1/termExtraction.html">Yahoo! Term Extraction Service</a> is an API to Yahoo&#8217;s term extraction service, as well as many other APIs and services in a variety of languages and for a variety of tasks; good general resource. The service has been reported to be shut down numerous times, but apparently is kept alive due to popular demand.</li>
</ul>
<h3><span>Initial Ontology Development</span></h3>
<ul>
<li> <a title="http://cmap.ihmc.us/coe" href="http://cmap.ihmc.us/coe">COE</a> COE (CmapTools Ontology Editor) is           a specialized version of the CmapTools from IMHC. COE &#8212; and its           CmapTools parent &#8212; is based on the idea of concept maps. A concept           map is a graph diagram that shows the relationships among concepts.           Concepts are connected with labeled arrows, with the relations           manifesting in a downward-branching hierarchical structure. COE is an           integrated suite of software tools for constructing, sharing and           viewing OWL encoded ontologies based on these constructs</li>
<li> <a title="http://www.conzilla.org/wiki/Overview/Main" href="http://www.conzilla.org/wiki/Overview/Main">Conzilla2</a> is a           second generation concept browser and knowledge management tool with           many purposes. It can be used as a visual designer and manager of RDF           classes and ontologies, since its native storage is in RDF. It also           has an online collaboration server</li>
<li> <a title="http://diagramic.com/" href="http://diagramic.com/">http://diagramic.com/</a> has an online Flex           network graph demo, which also has a neat facility for quick entry           and visualization of relationships; mostly small scale; pretty cool.           Does not appear to be code available anywhere</li>
<li> <a title="http://www.jarrar.info/Dogmamodeler/index.htm" href="http://www.jarrar.info/Dogmamodeler/index.htm">DogmaModeler</a> is a           free and open source, ontology modeling tool based on ORM. The           philosophy of DogmaModeler is to enable non-IT experts to model           ontologies with a little or no involvement of an ontology engineer;           project is quite old, but the software is still available and it may           provide some insight into naive ontology development</li>
<li> <a title="http://code.google.com/p/erca/" href="http://code.google.com/p/erca/">Erca</a> is a framework that eases           the use of Formal and Relational Concept Analysis, a neat clustering           technique. Though not strictly an ontology tool, Erca could be           implemented in a work flow that allows easy import of formal contexts           from CSV files, then algorithms that computes the concept lattice of           the formal contexts that can be exported as dot graphs (or in JPG,           PNG, EPS and SVG formats). Erca is provided as an Eclipse plug-in</li>
<li> <a title="http://drupal.org/project/graphmind" href="http://drupal.org/project/graphmind">GraphMind</a> is a mindmap           editor for Drupal. It has the basic mindmap features and some Drupal           specific enhancements. There is a <a title="http://www.youtube.com/watch?v=5_mVw_j1ukk" href="http://www.youtube.com/watch?v=5_mVw_j1ukk">quick screencast</a> about how GraphMind looks like and what is does. The Flex source is           also available from <a title="http://github.com/itarato/GraphMind/tree/master" href="http://github.com/itarato/GraphMind/tree/master">Github</a></li>
<li> <a title="http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html" href="http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html"> GrOWL</a> is the software framework to provide graphical, intuitive           browsing and editing of knowledge maps. GrOWL is open source and is           used in several projects worldwide. None of the online demos           apparently work, but the screenshots look interesting and the code is           still available</li>
<li> <a title="http://openstructs.org/iron" href="http://openstructs.org/iron">irON</a> using spreadsheets, via its           notation and specification. Spreadsheets can be used for initial           authoring, esp if the irON guidelines are followed. See further this           case study of Sweet Tools in a <a title="http://openstructs.org/iron/common-swt-annex" href="http://openstructs.org/iron/common-swt-annex">spreadsheet using irON           (commON)</a></li>
<li> <a title="http://www.mondeca.com/index.php/en/intelligent_topic_manager/applications/itm_t3_terminology_thesaurus_taxonomy_metadata_dictionary" href="http://www.mondeca.com/index.php/en/intelligent_topic_manager/applications/itm_t3_terminology_thesaurus_taxonomy_metadata_dictionary"> ITM T3</a> stands for Terminology, Thesaurus, Taxonomy, Metadata           dictionary. ITM T3 includes a range of functions for managing           enterprise shareable multilingual domain-specific taxonomies,           thesaurus, terminologies in a unified way. It uses XML, SKOS and RDF           standards. Commercial; from Mondeca</li>
<li> <a title="http://mindraider.sourceforge.net/index.html" href="http://mindraider.sourceforge.net/index.html">MindRaider</a> is           Semantic Web outliner. It aims to connect the tradition of outline           editors with emerging technologies. MindRaider mission is to organize           not only the content of your hard drive but also your cognitive base           and social relationships in a way that enables quick navigation,           concise representation and inferencing</li>
<li> <a title="http://www.cerny-online.com/topincs/" href="http://www.cerny-online.com/topincs/">Topincs</a> is a Topic Map           authoring software that allows groups to share their knowledge over           the web. It makes use of a variety of modern technologies. The most           important are Topic Maps, REST and Ajax. It consists of three           components: the Wiki, the Editor, and the Server. The servier           requires AMP; the Editor and Wiki are based on browser plug-ins.</li>
</ul>
<h3><span>Ontology Editing</span></h3>
<ul>
<li>First, see all of the <strong>Comprehensive Tools</strong> listing above</li>
<li><a href="http://www.cambridgesemantics.com/products/anzo_for_excel">Anzo for Excel</a> includes an (RDFS and OWL-based) ontology editor that can be used directly within Excel. In addition to that, Anzo for Excel includes the capability to automatically generate an ontology from existing spreadsheet data, which is very useful for quick bootstrapping of an ontology.</li>
<li><a title="http://www.hozo.jp/ckc07demo/" href="http://www.hozo.jp/ckc07demo/">Hozo</a> is an ontology visualization           and development tool that brings version control constructs to group           ontology development; limited to a prototype, with no online demo</li>
<li> <a title="http://www.vocman.com/?q=lexauruseditor" href="http://www.vocman.com/?q=lexauruseditor">Lexaurus Editor</a> is for           off-line creation and editing of vocabularies, taxonomies and           thesauri. It supports import and export in Zthes and SKOS XML           formats, and allows hierarchical / poly-hierarchical structures to be           loaded for editing, or even multiple vocabularies to be loaded           simultaneously, so that terms from one taxonomy can be re-used in           another, using drag and drop. Not available in open source</li>
<li> <a title="http://www.modelfutures.com/owl" href="http://www.modelfutures.com/owl">Model Futures OWL Editor</a> combines simple OWL tools, featuring UML (XMI), ErWin, thesaurus and           imports. The editor is tree-based and has a “navigator”           tool for traversing property and class-instance relationships. It can           import XMI (the interchange format for UML) and Thesaurus Descriptor           (BT-NT XML), and EXPRESS XML files. It can export to MS Word.</li>
<li> <a title="http://www.informatik.uni-ulm.de/ki/ontotrack/" href="http://www.informatik.uni-ulm.de/ki/ontotrack/">OntoTrack</a> is a           browsing and editing ontology authoring tool for OWL Lite. It           combines a sophisticated graphical layout with mouse enabled editing           features optimized for efficient navigation and manipulation of large           ontologies</li>
<li> <a title="http://www.co-ode.org/downloads/owlviz/" href="http://www.co-ode.org/downloads/owlviz/">OWLViz</a> is an attractive           visual editor for OWL and is available as a Protégé plug-in</li>
<li> <a title="PoolParty" href="http://poolparty.punkt.at/">PoolParty</a> is a triple store-based thesaurus management environment which uses           SKOS and text extraction for tag recommendations. See further this <a href="http://www.punkt.at/file_upload/root_tmpphptOZk8U.pdf">manual</a>, which describes more fully the system&#8217;s functionality. Also, there is a PoolParty <a href="http://demo.semantic-web.at:8080/SkosServices/zthes">Web service</a> that enables a Zthes thesaurus in XML format to be uploaded and converted to SKOS (via skos:Concepts)</li>
<li> <a title="http://code.google.com/p/skoseditor/" href="http://code.google.com/p/skoseditor/">SKOSEd</a> is a plugin for           Protege 4 that allows you to create and edit thesauri (or similar           artefacts) represented in the Simple Knowledge Organisation System           (SKOS).</li>
<li> <a title="http://sourceforge.net/projects/tematres/" href="http://sourceforge.net/projects/tematres/">TemaTres</a> is a Web           application to manage controlled vocabularies, taxonomies and           thesaurus. The vocabularies may be exported in Zthes, Skos, TopicMap,           etc.</li>
<li> <a title="http://thmanager.sourceforge.net/" href="http://thmanager.sourceforge.net/">ThManager</a> is a tool for           creating and visualizing SKOS RDF vocabularies. ThManager facilitates           the management of thesauri and other types of controlled           vocabularies, such as taxonomies or classification schemes</li>
<li> <a title="http://vitro.mannlib.cornell.edu/" href="http://vitro.mannlib.cornell.edu/">Vitro</a> is a general-purpose           web-based ontology and instance editor with customizable public           browsing. Vitro is a Java web application that runs in a Tomcat           servlet container. With Vitro, you can: 1) create or load ontologies           in OWL format; 2) edit instances and relationships; 3) build a public           web site to display your data; and 4) search your data with Lucene.           Still in somewhat early phases, with no online demos and with minimal           interfaces.</li>
</ul>
<h4><span>Not Apparently in Active Use</span></h4>
<ul>
<li> <a title="http://www.ontopia.net/omnigator/models/index.jsp" href="http://www.ontopia.net/omnigator/models/index.jsp">Omnigator</a> The           Omnigator is a form-based manipulaton tool centered on Topic Maps,           though it enables the loading and navigation of any conforming topic           map in XTM, HyTM, LTM or RDF formats. There is a free evaluation           version.</li>
<li> <a title="http://ontogen.ijs.si/" href="http://ontogen.ijs.si/">OntoGen</a> is a semi-automatic and           data-driven ontology editor focusing on editing of topic ontologies           (a set of topics connected with different types of relations). The           system combines text-mining techniques with an efficient user           interface. It requires .Net.</li>
<li> <a title="http://owlseditor.semwebcentral.org/" href="http://owlseditor.semwebcentral.org/">OWL-S-editor</a> is an editor           for the development of services in OWL-S, with graphical, WSDL and           import/export support</li>
<li> <a title="http://www.aktors.org/technologies/retax/" href="http://www.aktors.org/technologies/retax/">ReTAX+</a> is an aide to           help a taxonomist create a consistent taxonomy and in particular           provides suggestions as to where a new entity could be placed in the           taxonomy whilst retaining the integrity of the revised taxonomy           (c.f., problems in ontology modelling)</li>
<li> <a title="http://www.mindswap.org/2004/SWOOP/" href="http://www.mindswap.org/2004/SWOOP/">SWOOP</a> is a lightweight           ontology editor. (Swoop is no longer under active development at           mindswap. Continuing development can be found on SWOOP&#8217;s Google Code           homepage at <a title="http://code.google.com/p/swoop/" href="http://code.google.com/p/swoop/">http://code.google.com/p/swoop/</a>)</li>
<li> <a title="http://kmi.open.ac.uk/projects/webonto/" href="http://kmi.open.ac.uk/projects/webonto/">WebOnto</a> supports the           browsing, creation and editing of ontologies through coarse grained           and fine grained visualizations and direct manipulation.</li>
</ul>
<h3><span>Ontology Mapping</span></h3>
<ul>
<li> <a title="http://dbs.uni-leipzig.de/Research/coma.html" href="http://dbs.uni-leipzig.de/Research/coma.html">COMA++</a> is a schema           and ontology matching tool with a comprehensive infrastructure. Its           graphical interface supports a variety of interaction</li>
<li> <a title="http://www.aktors.org/technologies/conceptool/" href="http://www.aktors.org/technologies/conceptool/">ConcepTool</a> is a           system to model, analyse, verify, validate, share, combine, and reuse           domain knowledge bases and ontologies, reasoning about their           implication</li>
<li> <a title="http://www.revelytix.com/matchit.php" href="http://www.revelytix.com/matchit.php">MatchIT</a> automates and           facilitates schema matching and semantic mapping between different           Web vocabularies. MatchIT runs as a stand-alone or plug-in Eclipse           application and can be integrated with popular third party           applications. MatchIT’s uses Adaptive Lexicon™ as an           ontology-driven dictionary and thesaurus of English language           terminology to quantify and ank the semantic similarity of concepts.           It apparently is not available in open source</li>
<li> <a title="http://www.myontology.org/" href="http://www.myontology.org/">myOntology</a> is used to produce the           theoretical foundations, and deployable technology for the           Wiki-based, collaborative and community-driven development and           maintenance of ontologies instance data and mappings</li>
<li> <a title="https://gforge.inria.fr/projects/ola/" href="https://gforge.inria.fr/projects/ola/">OLA/OLA2</a> (OWL-Lite           Alignment) matches ontologies written in OWL. It relies on a           similarity combining all the knowledge used in entity descriptions.           It also deal with one-to-many relationships and circularity in entity           descriptions through a fixpoint algorithm</li>
<li> <a title="http://simile.mit.edu/potluck/" href="http://simile.mit.edu/potluck/">Potluck</a> is a Web-based user           interface that lets casual users—those without programming           skills and data modeling expertise—mash up data themselves.           Potluck is novel in its use of drag and drop for merging fields, its           integration and extension of the faceted browsing paradigm for           focusing on subsets of data to align, and its application of           simultaneous editing for cleaning up data syntactically. Potluck also           lets the user construct rich visualizations of data in-place as the           user aligns and cleans up the data.</li>
<li> <a title="http://www.sis.pitt.edu/~mingmao/om07/" href="http://www.sis.pitt.edu/%7Emingmao/om07/">PRIOR+</a> is a generic and           automatic ontology mapping tool, based on propagation theory,           information retrieval technique and artificial intelligence model.           The approach utilizes both linguistic and structural information of           ontologies, and measures the profile similarity and structure           similarity of different elements of ontologies in a vector space           model (VSM).</li>
<li> <a title="http://marinemetadata.org/vine" href="http://marinemetadata.org/vine">Vine</a> is a tool that allows users           to perform fast mappings of terms across ontologies. It performs           smart searches, can search using regular expressions, requires a           minimum number of clicks to perform mappings, can be plugged into           arbitrary mapping framework, is non-intrusive with mappings stored in           an external file, has export to text files, and adds metadata to any           mapping. See also <a title="http://sourceforge.net/projects/vine/" href="http://sourceforge.net/projects/vine/">http://sourceforge.net/projects/vine/</a>.</li>
</ul>
<h4><span>Not Apparently in Active Use</span></h4>
<ul>
<li> <a title="http://support.infotechsoft.com/integration/ASMOV/index.html" href="http://support.infotechsoft.com/integration/ASMOV/index.html">ASMOV</a> (Automated Semantic Mapping of Ontologies with Validation) is an           automatic ontology matching tool which has been designed in order to           facilitate the integration of heterogeneous systems, using their data           source ontologies</li>
<li> <a title="http://www-ksl-svc.stanford.edu:5915/doc/chimaera/chimaera-docs.html" href="http://www-ksl-svc.stanford.edu:5915/doc/chimaera/chimaera-docs.html"> Chimaera</a> is a software system that supports users in creating and           maintaining distributed ontologies on the web. Two major functions it           supports are merging multiple ontologies together and diagnosing           individual or multiple ontologies</li>
<li> <a title="http://projects.semwebcentral.org/projects/ontologymapping/" href="http://projects.semwebcentral.org/projects/ontologymapping/">CMS</a> (CROSI Mapping System) is a structure matching system that           capitalizes on the rich semantics of the OWL constructs found in           source ontologies and on its modular architecture that allows the           system to consult external linguistic resources</li>
<li> <a title="http://www.aktors.org/technologies/conref/" href="http://www.aktors.org/technologies/conref/">ConRef</a> is a service           discovery system which uses ontology mapping techniques to support           different user vocabularies</li>
<li> <a title="http://sra.itc.it/projects/drago/" href="http://sra.itc.it/projects/drago/">DRAGO</a> reasons across multiple           distributed ontologies interrelated by pairwise semantic mappings,           with a vision of peer-to-peer mapping of many distributed ontologies           on the Web. It is implemented as an extension to an open source           Pellet OWL Reasoner</li>
<li> <a title="http://iws.seu.edu.cn/projects/matching/" href="http://iws.seu.edu.cn/projects/matching/">Falcon-AO</a> (Finding,           aligning and learning ontologies) is an automatic ontology matching           tool that includes the three elementary matchers of String, V-Doc and           GMO. In addition, it integrates a partitioner PBM to cope with           large-scale ontologies</li>
<li> <a title="http://www.aifb.uni-karlsruhe.de/WBS/meh/foam/" href="http://www.aifb.uni-karlsruhe.de/WBS/meh/foam/">FOAM</a> is the           Framework for ontology alignment and mapping. It is based on           heuristics (similarity) of the individual entities (concepts,           relations, and instances)</li>
<li> <a title="http://sourceforge.net/projects/hmafra" href="http://sourceforge.net/projects/hmafra">hMAFRA (Harmonize Mapping           Framework)</a> is a set of tools supporting semantic mapping           definition and data reconciliation between ontologies. The targeted           formats are XSD, RDFS and KAON</li>
<li> <a title="http://www.aktors.org/technologies/ifmap/" href="http://www.aktors.org/technologies/ifmap/">IF-Map</a> is an           Information Flow based ontology mapping method. It is based on the           theoretical grounds of logic of distributed systems and provides an           automated streamlined process for generating mappings between           ontologies of the same domain</li>
<li> <a title="http://ontomappinglab.googlepages.com/oaei2007" href="http://ontomappinglab.googlepages.com/oaei2007">LILY</a> is a system           matching heterogeneous ontologies. LILY extracts a semantic subgraph           for each entity, then it uses both linguistic and structural           information in semantic subgraphs to generate initial alignments. The           system is presently in a demo version only</li>
<li> <a title="http://mafra-toolkit.sourceforge.net/" href="http://mafra-toolkit.sourceforge.net/">MAFRA Toolkit</a> &#8211; the           Ontology MApping FRAmework Toolkit allows users to create semantic           relations between two (source and target) ontologies, and apply such           relations in translating source ontology instances into target           ontology instances</li>
<li> <a title="http://projects.semwebcentral.org/projects/ontoengine/" href="http://projects.semwebcentral.org/projects/ontoengine/">OntoEngine</a> is a step toward allowing agents to communicate even though they use           different formal languages (i.e., different ontologies). It           translates data from a &#8220;source&#8221; ontology to a &#8220;target&#8221;</li>
<li> <a title="http://www.dfki.de/~klusch/owls-mx/" href="http://www.dfki.de/%7Eklusch/owls-mx/">OWLS-MX</a> is a hybrid           semantic Web service matchmaker. OWLS-MX 1.0 utilizes both           description logic reasoning, and token based IR similarity measures.           It applies different filters to retrieve OWL-S services that are most           relevant to a given query</li>
<li> <a title="http://keg.cs.tsinghua.edu.cn/project/RiMOM/" href="http://keg.cs.tsinghua.edu.cn/project/RiMOM/">RiMOM</a> (Risk           Minimization based Ontology Mapping) integrates different alignment           strategies: edit-distance based strategy, vector-similarity based           strategy, path-similarity based strategy, background-knowledge based           strategy, and three similarity-propagation based strategies</li>
<li> <a title="http://sites.wiwiss.fu-berlin.de/suhl/radek/semmf/doc/index.html" href="http://sites.wiwiss.fu-berlin.de/suhl/radek/semmf/doc/index.html">semMF</a> is a flexible framework for calculating semantic similarity between           objects that are represented as arbitrary RDF graphs. The framework           allows taxonomic and non-taxonomic concept matching techniques to be           applied to selected object properties</li>
<li> <a title="http://snoggle.projects.semwebcentral.org/" href="http://snoggle.projects.semwebcentral.org/">Snoggle</a> is a           graphical, SWRL-based ontology mapper. Snoggle attempts to solve the           ontology mapping problem by providing a graphical user interface           (similar to which of the Microsoft Visio) to guide the process of           ontology vocabulary alignment. In Snoggle, user-defined mappings can           be serialized into rules, which is expressed using SWRL</li>
<li> <a title="http://www.seco.tkk.fi/projects/semweb/dist.php" href="http://www.seco.tkk.fi/projects/semweb/dist.php">Terminator</a> is a tool for creating term to ontology resource mappings           (documentation in Finnish).</li>
</ul>
<h3><span>Ontology Visualization/Analysis</span></h3>
<p>Though all are not relevant, see my post from a couple of years back on         <a title="http://www.mkbergman.com/414/large-scale-rdf-graph-visualization-tools/" href="../414/large-scale-rdf-graph-visualization-tools/"> large-scale RDF graph software</a>.</p>
<ul>
<li> <a title="http://dml.cs.byu.edu/wiki/index.php/Social_Network_Graphing_Tools" href="http://dml.cs.byu.edu/wiki/index.php/Social_Network_Graphing_Tools">Social           network graphing tools</a> (many covered elsewhere)</li>
<li> <a title="http://cytoscape.org/index.php" href="http://cytoscape.org/index.php">Cytoscape</a> is a bioinformatics           software platform for visualizing molecular interaction networks and           integrating these interactions with gene expression profiles and           other state data; I have also written specifically about <a title="http://www.mkbergman.com/415/cytoscape-hands-down-winner-for-large-scale-graph-visualization/" href="../415/cytoscape-hands-down-winner-for-large-scale-graph-visualization/"> Cytoscape&#8217;s use in UMBEL</a>
<ul>
<li> <a title="http://www.bioinformatics.org/rdfscape/" href="http://www.bioinformatics.org/rdfscape/">RDFScape</a> is a               project that brings Semantic Web &#8220;features&#8221; to the popular               Systems Biology software Cytoscape</li>
<li> <a title="http://med.bioinf.mpi-inf.mpg.de/networkanalyzer/" href="http://med.bioinf.mpi-inf.mpg.de/networkanalyzer/">NetworkAnalyzer</a> performs analysis of biological networks and calculates network               topology parameters including the diameter of a network, the               average number of neighbors, and the number of connected pairs of               nodes. It also computes the distributions of more complex network               parameters such as node degrees, average clustering coefficients,               topological coefficients, and shortest path lengths. It displays               the results in diagrams, which can be saved as images or text               files; used by SD</li>
</ul>
</li>
<li> <a title="http://www.mediavirus.org/graphl/" href="http://www.mediavirus.org/graphl/">Graphl</a> is a tool for           collaborative editing and visualisation of graphs, representing           relationships between resources or concepts of the real world. Graphl           may be thought of as a visual wiki, a place where everybody can           contribute to a shared repository of knowledge</li>
<li> <a title="http://igraph.sourceforge.net/index.html" href="http://igraph.sourceforge.net/index.html">igraph</a> is a free           software package for creating and manipulating undirected and           directed graphs</li>
<li> <a title="http://nwb.slis.indiana.edu/" href="http://nwb.slis.indiana.edu/">Network Workbench</a> is a very           complex, comprehensive; Swiss Army Knife</li>
<li> <a title="http://networkx.lanl.gov/gallery.html" href="http://networkx.lanl.gov/gallery.html">NetworkX</a> &#8211; Python; very           clean</li>
<li> <a title="http://snap.stanford.edu/index.html" href="http://snap.stanford.edu/index.html">Stanford Network Analysis           Package</a> (SNAP) is a general purpose network analysis and graph           mining library. It is written in C++ and easily scales to massive           networks with hundreds of millions of nodes</li>
<li> <a title="http://socnetv.sourceforge.net/" href="http://socnetv.sourceforge.net/">Social Networks Visualizer</a> (SocNetV) is a flexible and user-friendly tool for the analysis and           visualization of Social Networks. It lets you construct networks           (mathematical graphs) with a few clicks on a virtual canvas or load           networks of various formats (GraphViz, GraphML, Adjacency, Pajek,           UCINET, etc) and modify them to suit your needs. SocNetV also offers           a built-in web crawler, allowing you to automatically create networks           from all links found in a given initial URL</li>
<li> <a title="http://www.tulip-software.org/" href="http://www.tulip-software.org/">Tulip</a> may be incredibly strong
<ul>
<li>quite active (but not much online stuff): <a title="http://sourceforge.net/projects/auber/files/" href="http://sourceforge.net/projects/auber/files/">http://sourceforge.net/projects/auber/files/</a></li>
</ul>
</li>
<li> <a title="http://mark-shepherd.com/blog/springgraph-flex-component/" href="http://mark-shepherd.com/blog/springgraph-flex-component/">Springgraph</a> component for Flex</li>
<li> <a title="http://code.google.com/p/vizierfx/" href="http://code.google.com/p/vizierfx/">VizierFX</a> is a Flex library           for drawing network graphs. The graphs are laid out using GraphViz on           the server side, then passed to VizierFX to perform the rendering.           The library also provides the ability to run ActionScript code in           response to events on the graph, such as mousing over a node or           clicking on it.</li>
</ul>
<h3><span>Miscellaneous Ontology Tools</span></h3>
<ul>
<li> <a title="http://apolda.sourceforge.net/" href="http://apolda.sourceforge.net/">Apolda</a> (Automated Processing of           Ontologies with Lexical Denotations for Annotation) is a plugin           (processing resource) for GATE (<a title="http://gate.ac.uk/" href="http://gate.ac.uk/">http://gate.ac.uk/</a>).           The Apolda processing resource (PR) annotates a document like a           gazetteer, but takes the terms from an (OWL) ontology rather than           from a list</li>
<li> <a title="http://dl-learner.org/Projects/DLLearner" href="http://dl-learner.org/Projects/DLLearner">DL-Learner</a> is a tool           for learning complex classes from examples and background knowledge.           It extends Inductive Logic Programming to Description Logics and the           Semantic Web. DL-Learner now has a flexible component based design,           which allows to extend it easily with new learning algorithms,           learning problems, reasoners, and supported background knowledge           sources. A new type of supported knowledge sources are SPARQL           endpoints, where DL-Learner can extract knowledge fragments, which           enables learning classes even on large knowledge sources like           DBpedia, and includes an OWL API reasoner interface and Web service           interface.</li>
<li> <a title="http://www.arity.com/?Tab=products&amp;Tab2=lexilink" href="http://www.arity.com/?Tab=products&amp;Tab2=lexilink">LexiLink</a> is a tool for building, curating and managing multiple lexicons and           ontologies in one enterprise-wide Web-based application. The core of           the technology is based on RDF and OWL</li>
<li> <a title="http://www.sourceforge.net/projects/motools" href="http://www.sourceforge.net/projects/motools">mopy</a> is the Music           Ontology Python library, designed to provide easy to use python           bindings for ontology terms for the creation and manipulation of           music ontology data. mopy can handle information from several           ontologies, including the Music Ontology, full FOAF vocab, and the           timeline and chord ontologies.</li>
<li> <a title="http://obda.inf.unibz.it/protege-plugin/" href="http://obda.inf.unibz.it/protege-plugin/">OBDA</a> (Ontology Based           Data Access) is a plugin for Protégé aimed to be a full-fledged OBDA           ontology and component editor. It provides data source and mapping           editors, as well as querying facilities that, in sum, allow you to           design and test every aspect of an OBDA system. It supports           relational data sources (RDBMS) and GLAV-like mappings. In its           current beta form, it requires Protege 3.3.1, a reasoner implementing           the OBDA extensions to DIG 1.1 (e.g., the DIG server for QuOnto) and           Jena 2.5.5</li>
<li> <a title="http://code.google.com/p/ontocomp/" href="http://code.google.com/p/ontocomp/">OntoComP</a> is a Protégé 4           plugin for completing OWL ontologies. It enables the user to check           whether an OWL ontology contains &#8220;all relevant information&#8221; about the           application domain, and extend the ontology appropriately if this is           not the case</li>
<li><a href="http://owl.cs.manchester.ac.uk/browser/manage/">Ontology Browser</a> is a browser created as part of the CO-ODE (<a title="http://www.co-ode.org/" href="http://www.co-ode.org/">http://www.co-ode.org/</a>) project; rather         simple interface and use</li>
<li> <a title="http://owl.cs.manchester.ac.uk/metrics/" href="http://owl.cs.manchester.ac.uk/metrics/">Ontology Metrics</a> is a           web-based tool that displays statistics about a given ontology,           including the expressivity of the language it is written in</li>
<li> <a title="http://moustaki.org/ontospec/" href="http://moustaki.org/ontospec/">OntoSpec</a> is a SWI-Prolog module,           aiming at automatically generating XHTML specification from           RDF-Schema or OWL ontologies</li>
<li> <a title="http://owlapi.sourceforge.net/" href="http://owlapi.sourceforge.net/">OWL API</a> is a Java interface and           implementation for the W3C Web Ontology Language (OWL), used to           represent Semantic Web ontologies. The API is focused towards OWL           Lite and OWL DL and offers an interface to inference engines and           validation functionality</li>
<li> <a title="http://owl.cs.manchester.ac.uk/modularity/" href="http://owl.cs.manchester.ac.uk/modularity/">OWL Module Extractor</a> is a Web service that extracts a module for a given set of terms from           an ontology. It is based on an implementation of locality-based           modules that is part of the OWL API.</li>
<li> <a title="http://owl.cs.manchester.ac.uk/converter/" href="http://owl.cs.manchester.ac.uk/converter/">OWL Syntax Converter</a> is an online tool for converting ontologies between different           formats, including several OWL syntaxes, RDF/XML, KRSS</li>
<li> <a title="http://www.ifi.unizh.ch/attempto/documentation/OWL_to_ACE/" href="http://www.ifi.unizh.ch/attempto/documentation/OWL_to_ACE/">OWL           Verbalizer</a> is an on-line tool that verbalizes OWL ontologies in           (controlled) English</li>
<li> <a title="http://pellet.owldl.com/ontology-browser/" href="http://pellet.owldl.com/ontology-browser/">OwlSight</a> is an OWL           ontology browser that runs in any modern web browser; it&#8217;s developed           with Google Web Toolkit and uses Gwt-Ext, as well as OWL-API.           OwlSight is the client component and uses Pellet as its OWL reasoner</li>
<li> <a title="http://pellet.owldl.com/pellint" href="http://pellet.owldl.com/pellint">Pellint</a> is an open source lint           tool for Pellet which flags and (optionally) repairs modeling           constructs that are known to cause performance problems. Pellint           recognizes several patterns at both the axiom and ontology level.</li>
<li> <a title="http://protege.stanford.edu/plugins/prompt/prompt.html" href="http://protege.stanford.edu/plugins/prompt/prompt.html">PROMPT</a> is a tab plug-in for Protégé is for managing multiple ontologies by           comparing versions of the same ontology, moving frames between           included and including project, merging two ontologies into one, or           extracting a part of an ontology.</li>
<li> <a title="http://www.co-ode.org/galen/" href="http://www.co-ode.org/galen/">SegmentationApp</a> is a Java           application that segments a given ontology according to the approach           described in &#8220;Web Ontology Segmentation: Analysis, Classification and           Use&#8221; (<a title="http://www.co-ode.org/resources/papers/seidenberg-www2006.pdf" href="http://www.co-ode.org/resources/papers/seidenberg-www2006.pdf">http://www.co-ode.org/resources/papers/seidenberg-www2006.pdf</a>)</li>
<li> <a title="http://seth-scripting.sourceforge.net/" href="http://seth-scripting.sourceforge.net/">SETH</a> is a software           effort to deeply integrate Python with Web Ontology Language (OWL-DL           dialect). The idea is to import ontologies directly into the           programming context so that its classes are usable alongside standard           Python classes</li>
<li> <a title="http://www.heppnetz.de/projects/skos2gentax/" href="http://www.heppnetz.de/projects/skos2gentax/">SKOS2GenTax</a> is an           online tool that converts hierarchical classifications available in           the W3C SKOS (Simple Knowledge Organization Systems) format into           RDF-S or OWL ontologies</li>
<li> <a title="http://forge.morfeo-project.org/wiki_en/index.php/SpecGen" href="http://forge.morfeo-project.org/wiki_en/index.php/SpecGen">SpecGen</a> (v5) is an ontology specification generator tool. It&#8217;s written in           Python using Redland RDF library and licensed under the MIT license</li>
<li> <a title="http://code.google.com/p/text2onto/" href="http://code.google.com/p/text2onto/">Text2Onto</a> is a framework           for ontology learning from textual resources that extends and           re-engineers an earlier framework developed by the same group           (TextToOnto). Text2Onto offers three main features: it represents the           learned knowledge at a metalevel by instantiating the modelling           primitives of a Probabilistic Ontology Model (POM), thus remaining           independent from a specific target language while allowing the           translation of the instantiated primitives</li>
<li> <a title="http://www.semanticweb.gr/TheaOWLLib/" href="http://www.semanticweb.gr/TheaOWLLib/">Thea</a> is a Prolog library           for generating and manipulating OWL (Web Ontology Language) content.           Thea OWL parser uses SWI-Prolog’s Semantic Web library for           parsing RDF/XML serialisations of OWL documents into RDF triples and           then it builds a representation of the OWL ontology</li>
<li> <a title="http://owl.cs.manchester.ac.uk/repository/" href="http://owl.cs.manchester.ac.uk/repository/">TONES Ontology           Repository</a> is primarily designed to be a central location for           ontologies that might be of use to tools developers for testing           purposes; it is part of the TONES project</li>
<li> <a title="http://www.sandsoft.com/products.html" href="http://www.sandsoft.com/products.html">Visual Ontology Manager</a> (VOM) is a family of tools enables UML-based visual construction of           component-based ontologies for use in collaborative applications and           interoperability solutions.</li>
<li> <a title="http://www.alphaworks.ibm.com/tech/wom?open&amp;S_TACT=105AGX59&amp;S_CMP=GR&amp;ca=dgr-lnxwd01awwom" href="http://www.alphaworks.ibm.com/tech/wom?open&amp;S_TACT=105AGX59&amp;S_CMP=GR&amp;ca=dgr-lnxwd01awwom"> Web Ontology Manager</a> is a lightweight, Web-based tool using J2EE           for managing ontologies expressed in Web Ontology Language (OWL). It           enables developers to browse or search the ontologies registered with           the system by class or property names. In addition, they can submit a           new ontology file</li>
<li> <a title="http://drupal.org/project/evoc" href="http://drupal.org/project/evoc">RDF evoc (external vocabulary           importer)</a> is an RDF external vocabulary importer module (evoc)           for Drupal caches any external RDF vocabulary and provides properties           to be mapped to CCK fields, node title and body. This module requires           the RDF and the SPARQL modules.</li>
</ul>
<h4><span>Not Apparently in Active Use</span></h4>
<ul>
<li> <a title="http://ontoware.org/projects/almo" href="http://ontoware.org/projects/almo">Almo</a> is an ontology-based           workflow engine in Java supporting the ARTEMIS project; part of the           OntoWare initiative</li>
<li> <a title="http://www.aktors.org/technologies/classakt/" href="http://www.aktors.org/technologies/classakt/">ClassAKT</a> is a text           classification web service for classifying documents according to the           ACM Computing Classification System</li>
<li> <a title="http://www.openrdf.org/" href="http://www.openrdf.org/">Elmo</a> provides a simple API to access           ontology oriented data inside a Sesame RDF repository. The domain           model is simplified into independent concerns that are composed           together for multi-dimensional, inter-operating, or integrated           applications</li>
<li> <a title="http://www.aktors.org/technologies/extrakt/" href="http://www.aktors.org/technologies/extrakt/">ExtrAKT</a> is a tool           for extracting ontologies from Prolog knowledge bases.</li>
<li> <a title="http://www.aktors.org/technologies/f-life/" href="http://www.aktors.org/technologies/f-life/">F-Life</a> is a tool for           analysing and maintaining life-cycle patterns in ontology           development.</li>
<li> <a title="http://www.aktors.org/technologies/foxtrot/" href="http://www.aktors.org/technologies/foxtrot/">Foxtrot</a> is a           recommender system which represents user profiles in ontological           terms, allowing inference, bootstrapping and profile visualization.</li>
<li> <a title="http://projects.semwebcentral.org/projects/hyperdaml/" href="http://projects.semwebcentral.org/projects/hyperdaml/">HyperDAML</a> creates an HTML representation of OWL content to enable hyperlinking           to specific objects, properties, etc.</li>
<li> <a title="http://www.landcglobal.com/pages/linkfactory.php" href="http://www.landcglobal.com/pages/linkfactory.php">LinKFactory</a> is           an ontology management tool, it provides an effective and           user-friendly way to create, maintain and extend extensive           multilingual terminology systems and ontologies (English, Spanish,           French, etc.). It is designed to build, manage and maintain large,           complex, language independent ontologies.</li>
<li> <a title="http://svn.mumble.net:8080/svn/lsw/trunk" href="http://svn.mumble.net:8080/svn/lsw/trunk">LSW</a> &#8211; the Lisp           semantic Web toolkit enables OWL ontologies to be visualized. It was           written by Alan Ruttenberg</li>
<li> <a title="http://www.seco.tkk.fi/projects/semweb/dist.php" href="http://www.seco.tkk.fi/projects/semweb/dist.php">Ontodella</a> is a           Prolog HTTP server for category projection and semantic linking</li>
<li> <a title="http://kmi.open.ac.uk/projects/akt/ontoweaver/" href="http://kmi.open.ac.uk/projects/akt/ontoweaver/">OntoWeaver</a> is an           ontology-based approach to Web sites, which provides high level           support for web site design and development</li>
<li> <a title="http://phpowllib.sourceforge.net/" href="http://phpowllib.sourceforge.net/">OWLLib</a> is a PHP library for           accessing OWL files. OWL is w3.org standard for storing semantic           information</li>
<li> <a title="http://powl.sourceforge.net/index.php" href="http://powl.sourceforge.net/index.php">pOWL</a> is a Semantic Web           development platform for ontologies in PHP. pOWL consists of a number           of components, including RAP</li>
<li> <a title="http://projects.semwebcentral.org/projects/rowl/" href="http://projects.semwebcentral.org/projects/rowl/">ROWL</a> is the           Rule Extension of OWL; it is from the Mobile Commerce Lab in the           School of Computer Science at Carnegie Mellon University</li>
<li> <a title="https://sourceforge.net/projects/semantag" href="https://sourceforge.net/projects/semantag">Semantic Net           Generator</a> is a utlity for generating Topic Maps automatically           from different data sources by using rules definitions specified with           Jelly XML syntax. This Java library provides Jelly tags to access and           modify data sources (also RDF) to create a semantic network</li>
<li> <a title="http://www.mindswap.org/2005/SMORE/" href="http://www.mindswap.org/2005/SMORE/">SMORE</a> is OWL markup for           HTML pages. SMORE integrates the SWOOP ontology browser, providing a           clear and consistent way to find and view Classes and Properties,           complete with search functionality</li>
<li> <a title="http://soboleo.fzi.de:8080/webPortal/" href="http://soboleo.fzi.de:8080/webPortal/">SOBOLEO</a> is a system for           Web-based collaboration to create SKOS taxonomies and ontologies and           to annotate various Web resources using them</li>
<li> <a title="http://sofa.projects.semwebcentral.org/" href="http://sofa.projects.semwebcentral.org/">SOFA</a> is a Java API for           modeling ontologies and Knowledge Bases in ontology and Semantic Web           applications. It provides a simple, abstract and language neutral           ontology object model, inferencing mechanism and representation of           the model with OWL, DAML+OIL and RDFS languages; from java.dev</li>
<li> <a title="http://www.isi.edu/webscripter/" href="http://www.isi.edu/webscripter/">WebScripter</a> is a tool that           enables ordinary users to easily and quickly assemble reports           extracting and fusing information from multiple, heterogeneous           DAMLized Web sources.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/862/the-sweet-compendium-of-ontology-building-tools/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>A Most un-commON Way to Author Datasets</title>
		<link>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/</link>
		<comments>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 02:19:54 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[case study]]></category>
		<category><![CDATA[commON]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[structured data]]></category>
		<category><![CDATA[structWSF]]></category>
		<category><![CDATA[Sweet Tools]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=845</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A Most un-commON Way to Author Datasets&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/&amp;rft.language=English"></span>

A Case Study of Turning Spreadsheets into Structured Data Powerhouses
In a former life, I had the nickname of &#8216;Spreadsheet King&#8217; (perhaps         among others that I did not care to hear). I had gotten the nick         because of my aggressive [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A Most un-commON Way to Author Datasets&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-11-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/&amp;rft.language=English"></span>
<p><a href="http://openstructs.org/iron"><img style="border: 0px solid; width: 235px; height: 125px; float: left; margin-right: 10px;" title="irON - instance record and Object Notation" src="../wp-content/themes/ai3/images/iron_logo_235.png" alt="irON - instance record and Object Notation" hspace="5" vspace="5" align="left" /></a></p>
<h2>A Case Study of Turning Spreadsheets into Structured Data Powerhouses</h2>
<p>In a former life, I had the nickname of &#8216;Spreadsheet King&#8217; (perhaps         among others that I did not care to hear). I had gotten the nick         because of my aggressive use of spreadsheets for financial models,         competitors tracking, time series analyses, and the like. However, in         all honesty, I have encountered many others in my career much more         knowledgeable and capable with spreadsheets than I&#8217;ll ever be. So,         maybe I was really more like a minor duke or a court jester than true         nobility.</p>
<p>Yet, pro or amateur, there are perhaps 1 billion spreadsheet users         worldwide <a href="#commON1">[1]</a>, making spreadsheets undoubtedly the most prevalent data         authoring environment in existence. And, despite moans and wails about         how spreadsheets can lead to chaos, spaghetti code, or violations of         internal standards, they are here to stay.</p>
<p>Spreadsheets often begin as simple notetaking environments. With the         addition of new findings and more analysis, some of these worksheets         may evolve to become full-blown datasets. Alternatively, some         spreadsheets start from Day One as intended datasets or modeling         environments. Whatever the case, clearly there is much accumulated         information and data value &#8220;locked up&#8221; in existing spreadsheets.</p>
<p>How to &#8220;unlock&#8221; this value for sharing and collaboration was a major         stimulus to development of the <span style="font-weight: bold;">commON</span> serialization of <span style="font-weight: bold;">irON</span> (<span style="font-style: italic;">instance record</span> and <span style="font-style: italic;">Object Notation</span>) <a href="#commON2">[2]</a>. I recently published         a <a href="http://openstructs.org/iron/common-swt-annex">case study</a> <a href="#commON3">[3]</a> that describes the reasons and benefits of dataset authoring in a         spreadsheet, and provides working examples and code based on         <span style="font-style: italic;">Sweet Tools</span> <a href="#commON4">[4]</a> to aid users         in understanding and using the <span style="font-weight: bold;">commON</span> notation. I summarize portions of         that study herein.</p>
<div class="boxGreenDotted" style="margin: 5px 0pt 5px 10px; width: 240px; float: right; text-align: center;">This is the second article of a two-part series related to the recent       <span style="font-style: italic;">Sweet Tools</span> <a href="../844/sweet-tools-shatters-the-sound-barrier/">update</a>.</div>
<h3>Background on <span style="font-style: italic;">Sweet Tools</span> and         irON</h3>
<p>The dataset that is the focus of this <a href="http://openstructs.org/iron/common-swt-annex">use case</a>,         <a href="../844/sweet-tools-shatters-the-sound-barrier/"><span style="font-style: italic;">Sweet Tools</span></a>, began as an         informal tracking spreadsheet about four years ago. I began it as a way         to learn about available tools in the semantic Web and -related spaces.         I began publishing it and others found it of value so I continued to         develop it.</p>
<p>As it grew over time, however, it gained in structure and size.         Eventually, it became a reference dataset, with which many other people         desired to use and interact. The current version has well over 800         tools listed, characterized by many structured data attributes such as         type, programming language, description and so forth. As it has grown,         a formal controlled vocabulary has also evolved to bring consistency to         the characterization of many of these attributes.</p>
<p>It was natural for me to maintain this listing as a spreadsheet, which         was also reinforced when I was one of the first to adopt an <a href="../326/converting-sweet-tools-to-an-exhibit/">Exhibit         presentation</a> of the data based on a Google spreadsheet about three         years back. Here is a partial view of this spreadsheet as I maintain it         locally:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_main_screen.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 356px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_main_screen.png" alt="Sweet Tools Main Spreadsheet Screen" width="1279" height="615" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>When we began to develop <span style="font-weight: bold;">irON</span> in earnest as a simple (&#8221;naïve&#8221;) dataset authoring framework, it was         clear that a comma-separated value, or <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> <a href="#commON5">[5]</a>,         option should join the other two serializations under consideration,         XML and JSON. CSV, though less expressive and capable as a data format         than the other serializations, still has an <a href="http://en.wikipedia.org/wiki/Attribute-value_pair">attribute-value         pair</a> (also known as key-value pairs and many other variants <a href="#commON6">[6]</a>)         orientation. And, via spreadsheets, datasets can be easily authored and         inspected, while also providing a rich functional environment including         sorting, formatting, data validation, calculations, macros, etc.</p>
<p>As a dataset very familiar to us as <span style="font-weight: bold;">irON</span>&#8217;s editors, and directly relevant to         the semantic Web, <span style="font-style: italic;">Sweet Tools</span> provided a perfect prototype case study for helping to guide the         development of <span style="font-weight: bold;">irON</span>, and         specifically what came to be known as the <span style="font-weight: bold;">commON</span> serialization for <span style="font-weight: bold;">irON</span>. The <span style="font-style: italic;">Sweet Tools</span> dataset is relatively large         for a speciality source, has many different types and attributes, and         is characterized by text, images, URLs and similar.</p>
<p>The premise was that if <span style="font-style: italic;">Sweet         Tools</span> could be specified and represented in <span style="font-weight: bold;">commON</span> sufficiently to be parsed and         converted to interoperable RDF, then many similar instance-oriented         datasets could likely be so as well. Thus, as we tried and refined         notation and vocabulary, we tested applicability against the CSV         representation of <span style="font-style: italic;">Sweet Tools</span> in addition to other CSV, JSON and XML datasets.</p>
<h3>Dataset Authoring in a Spreadsheet</h3>
<p>A large portion of the <a href="http://openstructs.org/iron/common-swt-annex">case study</a> describes         the many advantages of authoring small datasets within spreadsheets.         The useful thing about the CSV format is that these full functional         capabilities of the spreadsheet are available during authoring or later         updates and modifications, but, when exported, the CSV provides a         relatively clean format for processing and parsing.</p>
<p>So, some of the reasons for small dataset authoring in a spreadsheet         include:</p>
<ul>
<li> <span style="font-style: italic;">Formatting and on-sheet           management</span> -  the first usefulness of a spreadsheet comes           from being able to format and organize the records. Records can be           given background colors to highlight distinctions (new entries, for           example); live URL links can be embedded; contents can be wrapped and           styled within cells; and the column and row heads can be &#8220;frozen&#8221;,           useful when scrolling large workspaces</li>
<li> <span style="font-style: italic;">Named blocks and sorting</span> &#8211;           named blocks are a powerful feature of modern spreadsheets, useful           for data manipulation, printing and internal referencing by formulas           and the like.  Sorting with named blocks is especially important           as an aid to check consistency of terminology, records completeness,           duplicates checks, missing value checks, and the like. Named blocks           can also be used as references in calculations. All of these features           are real time savers, especially when datasets grow large and           consistency of treatment and terminology is important</li>
<li> <span style="font-style: italic;">Multiple sheets and consolidated           access</span> &#8211; <span style="font-weight: bold;">commON</span> modules can be specified on a single worksheet or multiple worksheets           and saved as individual CSV files; because of its size and relative           complexity, the <span style="font-style: italic;">Sweet Tools</span> dataset is maintained on multiple sheets. Multi-worksheet           environments help keep related data and notes consolidated and more           easily managed on local hard drives</li>
<li> <span style="font-style: italic;">Completeness and counts</span> - the spreadsheet <span style="font-style: italic;">counta</span> function is useful to sum counts           for cell entries by both column and row, a useful aid to indicate if           an attribute or type value is missing or if a record is           incomplete.  Of course, similar helps and uses can be found for           many of the hundreds of embedded functions within a spreadsheet</li>
<li> <span style="font-style: italic;">Controlled vocabularies and data           entry validation</span> &#8211; quality datasets often hinge on consistency           and uniform values and terminology; the data validation utilities           within spreadsheets can be applied to Boolean, ranges and mins and           maxes, and to controlled vocabulary lists. Here is an example for           <span style="font-style: italic;">Sweet Tools</span>, enforcing           proper tool category assignments from a 50-item pick list:</li>
</ul>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 609px; height: 373px;" title="Controlled Vocabularies and Data Entry Validation" src="http://openstructs.org/sites/openstructs.org/files/images/swt_validation.png" alt="Controlled Vocabularies and Data Entry Validation" width="609" height="373" /></div>
<ul>
<li> <span style="font-style: italic;">Specialized functions and           macros</span> &#8211; <span>all</span> functionality of           spreadsheets may be employed in the development of <span style="font-weight: bold;">commON</span> datasets. Then, once employed,           only the values embedded within the sheets are then exported as CSV.</li>
</ul>
<h3>Staging <span style="font-style: italic;">Sweet Tools</span> for commON</h3>
<p>The next major section of the <a href="http://openstructs.org/iron/common-swt-annex">case study</a> deals         with the minor conventions that must be followed in order to stage         spreadsheets for <span style="font-weight: bold;">commON</span>. Not         much of the specific <span style="font-weight: bold;">commON</span> vocabulary or notation is discussed below; for details, see <a href="#commON7">[7]</a>.</p>
<p>Because you can create multiple worksheets within a spreadsheet, it is         not necessary to modifiy existing worksheets or tabs. Rather, if you         are reluctant or can not change existing information, merely create         parallel duplicate sheets of the source information. These duplicate         sheets have as their sole purpose export to <span style="font-weight: bold;">commON</span> CSV. You can maintain your         spreadsheet as is while staging for <span style="font-weight: bold;">commON</span>.</p>
<p>To do so, use the simple <span style="font-style: italic;">=</span> formula to create cross-references between the existing source         spreadsheet tab and the target <span style="font-weight: bold;">commON</span> CSV export tab. (You can also do         this for complete, highlighted blocks from source to target sheet.)         Then, by adding the few minor conventions of <span style="font-weight: bold;">commON</span>, you have now created a staged         export tab without modifying your source information in the slightest.</p>
<p>In standard form and for Excel and Open Office, single quotes, double         quotes and commas when entered into a spreadsheet cell are         automatically &#8216;<a href="http://en.wikipedia.org/wiki/Escape_character">escaped</a>&#8216; when         issued as CSV. <span style="font-weight: bold;">commON</span> allows         you to specify your own delimiter for lists (the standard is the pipe         &#8216;|&#8217; character) and what the parser recognizes as the &#8216;escape&#8217; character         (&#8217;\&#8217; is the standard). However, you probably should not change for most         conditions.</p>
<p>The standard <span style="font-weight: bold;">commON</span> parsers and         converters are UTF-8 compatible. If your source content has unusual         encodings, try to target UTF-8 as your canonical spreadsheet output.</p>
<p>In the <a href="http://openstructs.org/iron/iron-specification"><span style="font-weight: bold;">irON</span> specification</a> there are a         small number of defined modules or processing sections. In <span style="font-weight: bold;">commON</span>, these         modules are denoted by the double-ampersand character sequence         (&#8217;<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;</span>&#8216;),         and apply to lists of instance records (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span>),         dataset specifications and associated metadata describing the dataset         (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;dataset</span>),         and mappings of attributes and types to existing schema (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;linkage</span>).         Similarly, attributes and types are denoted by a single ampersand         prefix (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeName</span>).</p>
<p>In <span style="font-weight: bold;">commON</span>, any or all of the         modules can occur within a single CSV file or in multiple files. In any         case, the start of one of these processing modules is signaled by the         module keyword and <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;keyword</span> convention.</p>
<h4>The RecordList Module</h4>
<p>The first spreadsheet figure above shows a <span style="font-style: italic;">Sweet Tools</span> example for the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> module. The module begins with that keyword, indicating one of more         instance records will follow. Note that the first line after the         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> keyword is devoted to the listing of attributes and types for the         instance records (designated by the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeName</span> convention in the columns for the first row after the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> keyword is encountered).</p>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span> format can also include the <span style="font-style: italic;">stacked</span> style (see similar Dataset example         below) in addition to the single <span style="font-style: italic;">row</span> style shown above.</p>
<p>At any rate, once a worksheet is ready with its instance records         following the straightforward <span style="font-weight: bold;">irON</span> and <span style="font-weight: bold;">commON</span> conventions, it can then be saved as         a CSV file and appropriately named. Here is an example of what this         &#8220;vanilla&#8221; CSV file now looks like when shown again in a spreadsheet:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_spreadsheet_view.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 342px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_spreadsheet_view.png" alt="Spreadsheet View of the CSV File" width="1271" height="587" /></a><span><br />
</span> <span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Alternatively, you could open this same file in a text editor. Here is         how this exact same instance record view looks in an editor:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_editor_view.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 389px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_csv_editor_view.png" alt="Editor View of the CSV Record File" width="1251" height="657" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>Note that the CSV format separates each column by the comma separator,         with escapes shown for the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;description</span> attribute when it includes a comma-separated clause. Without word wrap,         each record in this format occupies a single row (though, again, for         the <span style="font-style: italic;">stacked</span> style, multiple         entries are allowed on individual rows so long as a new instance record         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;id</span> is not encountered in the first column).</p>
<h4>The Dataset Module</h4>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;dataset</span> module defines the dataset parameters and provides very flexible         metadata attributes to describe the dataset <a href="#commON8">[8]</a>. Note the dataset         specification is exactly equivalent in form to the instance record         (<span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;recordList</span>)         format, and also allows the single <span style="font-style: italic;">row</span> or <span style="font-style: italic;">stacked</span> styles (see these <a href="http://openstructs.org/iron/iron-specification#mozTocId223991">instance         record examples</a>), with this one being the <span style="font-style: italic;">stacked</span> style:</p>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_dataset.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 105px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_dataset.png" alt="The Dataset Module" width="1579" height="223" /></a><br />
<span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<h4>The Linkage Module</h4>
<p>The <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;&amp;linkage</span> module is used to map the structure of the instance records to some         structural schema, which can also include external ontologies. The         module has a simple, but specific structure.</p>
<p>Either attributes (presented as the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;attributeList</span>)         or types (presented as the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;typeList</span>)         are listed sequentially by row until the listing is exhausted <a href="#commON8">[8]</a>. By         convention, the second column in the listing is the targeted         <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;mapTo</span> value. Absent a prior <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;prefixList</span> value, the <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">&amp;mapTo</span> value needs to be a full URL to the corresponding attribute or type in         some external schema:</p>
<div style="margin: 10px;"><img class="center_ok" style="border: 0px solid; width: 537px; height: 595px;" title="The Linkage Module" src="http://openstructs.org/sites/openstructs.org/files/images/swt_linkage.png" alt="The Linkage Module" width="537" height="595" /></div>
<p>Notice in the case of <span style="font-style: italic;">Sweet         Tools</span> that most values are from the actual COSMO mini-ontology         underlying the listing. These need to be listed as well, since absent         the specifications in <span style="font-weight: bold;">commON</span> the system has NO knowledge of linkages and mappings.</p>
<h4>The Schema (structure) Module</h4>
<p>In its current state of development, <span style="font-weight: bold;">commON</span> does not support a spreadsheet-based         means for specifying the schema structure (lightweight ontology)         governing the datasets <a href="#commON2">[2]</a>. Another <span style="font-weight: bold;">irON</span> serialization, <span style="font-weight: bold;">irJSON</span>, does. Either via this <span style="font-weight: bold;">irJSON</span> specification or via an offline         ontology, a link reference is presently used by <span style="font-weight: bold;">commON</span> (and, therefore, <span style="font-style: italic;">Sweet Tools</span> for this case study) to         establish the governing structure of the input instance record         datasets.</p>
<p>A spreadsheet-based schema structure for <span style="font-weight: bold;">commON</span> has been designed and tested in         prototype form. <span style="font-weight: bold;">commON</span> should         be enhanced with this capability in the near future <a href="#commON8">[8]</a>.</p>
<h4>Saving and Importing</h4>
<p>If the modules are spread across more than one worksheet, then each         worksheet must be saved as its own CSV file. In the case of         <span style="font-style: italic;">Sweet Tools</span>, as exhibited by         its reference current spreadsheet, <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">sweet_tools_20091110.xls</span>,         three individual CSV files get saved. These files can be named whatever         you would like. However, it is essential that the names be remembered         for later referencing.</p>
<p>My own naming convention is to use a format of <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">appname_date_modulename.csv</span> because it sorts well in a file manager accommodating multiple versions         (dates) and keeps related files clustered. The <span style="font-style: italic;">appname</span> in the case of <span style="font-style: italic;">Sweet Tools</span> is generally <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">swt</span>.         The <span style="font-style: italic;">modulename</span> is generally         the <span style="font-style: italic;">dataset</span>, <span style="font-style: italic;">records</span>, or <span style="font-style: italic;">linkage</span> convention. I tend to use the         <span style="font-style: italic;">date</span> specification in the         YYYYMMDD format. Thus, in the case of the <span style="font-style: italic;">records</span> listings for <span style="font-style: italic;">Sweet Tools</span>, its filename could be         something like:  <span style="font-family: Courier New,Courier,monospace; font-weight: bold;">swt_20091110_records.csv</span>.</p>
<p>Once saved, these files are now ready to be imported into a         <span style="font-weight: bold;">structWSF</span> <a href="#commON9">[9]</a> instance, which         is where the CSV parsing and conversion to interoperable RDF occurs<a href="#commON8"> [8]</a>. In this case study, we used the Drupal-based <span style="font-weight: bold;">conStruct SCS</span> system <a href="#commON10">[10]</a>. <span style="font-weight: bold;">conStruct</span> exposes the <span style="font-weight: bold;">structWSF</span> Web services via a user interface         and a user permission and access system. The actual case study write-up         offers more details about the import process.</p>
<h3>Using the Dataset</h3>
<p>We are now ready to interact with the <span style="font-style: italic;">Sweet Tools</span> structured dataset using         <span style="font-weight: bold;">conStruct</span> (assuming you have a         Drupal installation with the <span style="font-weight: bold;">conStruct</span> modules) <a href="#commON10">[10]</a>.</p>
<h4>Introduction to the App</h4>
<p>The screen capture below shows a couple of aspects of the system:</p>
<ul>
<li>First, the left hand panel (according to how this specific Drupal         install was themed) shows the various tools available to <span style="font-weight: bold;">conStruct</span>.  These include (with links         to their documentation) <a href="http://constructscs.com/documentation/instructions/search">Search</a>,         <a href="http://constructscs.com/documentation/instructions/browse">Browse</a>,         <a href="http://constructscs.com/documentation/instructions/view-record">View         Record</a>, <a href="http://constructscs.com/documentation/instructions/import">Import</a>,         <a href="http://constructscs.com/documentation/instructions/export">Export</a>,         <a href="http://constructscs.com/documentation/instructions/datasets"> Datasets</a>, <a href="http://constructscs.com/documentation/instructions/create-record">Create           Record</a>, <a href="http://constructscs.com/documentation/instructions/update-record">Update           Record</a>, <a href="http://constructscs.com/documentation/instructions/delete-record">Delete           Record</a> and <a href="http://constructscs.com/documentation/instructions/settings">Settings</a><a href="#commON11"> [11]</a>;</li>
<li>The Browse tree in the main part of the screen shows the full         mini-ontology that classifies <span style="font-style: italic;">Sweet         Tools</span>. Via simple inferencing, clicking on any parent link         displays all children projects for that category as well <span style="font-style: italic;">(click to expand)</span>:</li>
</ul>
<div style="margin: 10px; text-align: center;"><a href="http://openstructs.org/sites/openstructs.org/files/images/swt_drupal_browse.png"> <img class="center_ok" style="border: 0px solid; width: 740px; height: 1907px;" title="Click to expand" src="http://openstructs.org/sites/openstructs.org/files/images/swt_drupal_browse.png" alt="conStruct (Drupal) Browse Screen for Sweet Tools" width="1176" height="3031" /></a><span style="font-style: italic; font-size: 90%;">(click to         expand)</span></div>
<p>One of the absolutely cool things about this framework is that all         tools, inferencing, user interfaces and data structure are a direct         result of the ontology(ies) underlying the system (plus the         <span style="font-weight: bold;">irON</span> instance ontology, as         well). This means that switching datasets or adding datasets causes the         entire system structure to now reflect those changes — without         lifting a finger!!</p>
<h4>Some Sample Uses</h4>
<p>Here are a few sample things you can do with these generic tools driven         by the <em>Sweet Tools</em> dataset:</p>
<ul>
<li> <a href="http://constructscs.com/conStruct/browse/">Browsing the           ontology tree</a> (then, Browse by Kind)</li>
<li>Viewing an <a href="http://constructscs.com/conStruct/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fswt%2Firon&amp;dataset=http%3A%2F%2Fconstructscs.com%2Fwsf%2Fdatasets%2F181%2F"> instance record</a></li>
<li>Viewing a <a href="http://constructscs.com/conStruct/ontology/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23KRBrowser"> Class Type Report</a></li>
<li>Viewing an <a href="http://constructscs.com/conStruct/ontology/view/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Firon%23description"> Attribute Report</a></li>
<li> <a href="http://constructscs.com/conStruct/search/?filter_types_3=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23KRBrowser&amp;filter_attributes_4=http%3A%2F%2Fpurl.org%2Fontology%2Fcosmo%23status&amp;query=new&amp;filter=on"> Searching by facet</a> (check the tabs)</li>
<li>Doing a <a href="http://constructscs.com/conStruct/search/">multi-value filtering</a> (make selections from the various tabs),</li>
<li> <a href="http://constructscs.com/conStruct/export/">Exporting           stuff</a> in a variety of formats.</li>
</ul>
<p>Note, if you access this <span style="font-weight: bold;">conStruct</span> instance you will do so as a         <span style="font-style: italic;">demo</span> user. Unfortunately, as such, you may not be able to see all of the write and update tools, which in this case are reserved for curators or admins. Recall that <span style="font-weight: bold;">structWSF</span> has a comprehensive <a href="../497/structwsf-a-framework-for-collaboration-networks/"> user access and permissions layer</a>.</p>
<h4>Exporting in Alternative Formats</h4>
<p>Of course, one of the real advantages of the <span style="font-weight: bold;">irON</span> and <span style="font-weight: bold;">structWSF</span> designs is to enable different         formats to be interchanged and to interoperate. Upon submission, the         <span style="font-weight: bold;">commON</span> format and its datasets         can then be exported in these alternate formats and serializations <a href="#commON8">[8]</a>:</p>
<ul>
<li>commON</li>
<li>irJSON</li>
<li>irXML</li>
<li>N-Triples/CSV</li>
<li>N-Triples/TSV</li>
<li>RDF+N3</li>
<li>RDF+XML</li>
</ul>
<p>As should be obvious, one of the real benefits of the <span style="font-weight: bold;">irON</span> notation &#8212; in addition to easy         dataset authoring &#8212; is the ability to more-or-less treat RDF, CSV, XML         and JSON as interoperable data formats.</p>
<h3>The Formal Case Study</h3>
<p>The formal <span style="font-style: italic;">Sweet Tools</span> case       study based on <span style="font-weight: bold;">commON</span>, with       sample download files and PDF, is available from <a style="font-style: italic;" href="http://openstructs.org/iron/common-swt-annex">Annex: A commON Case Study       using Sweet Tools, Supplementary Documentation</a> <a href="#commON3">[3]</a>.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON1" name="commON1"></a> [1] In 2003, <a href="http://www.microsoft.com/presspass/press/2003/oct03/10-13vstoofficelaunchpr.mspx"> Microsoft estimated</a> its worldwide users of the Excel spreadsheet,         which then had about a 90% market share globally, at 400 million.         Others at that time estimated unauthorized use to perhaps double that         amount. There has been significant growth since then, and online         spreadsheets such as Google Docs and Zoho have also grown wildly. This         surely puts spreadsheet users globally into the 1 billion range.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON2" name="commON2"></a> [2] See Frédérick Giasson and         Michael Bergman, eds., <span style="font-style: italic;">Instance         Record and Object Notation (irON) Specification, Specification         Document</span>, version 0.82, 20 October 2009.  See <a href="http://openstructs.org/iron/iron-specification">http://openstructs.org/iron/iron-specification</a>.         Also see the <a href="http://openstructs.org/iron"><span style="font-weight: bold;">irON</span> Web site</a>, Google <a href="http://groups.google.com/group/iron-notation">discussion group</a>,         and <a href="http://code.google.com/p/iron-notation/">code distribution         site</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON3" name="commON3"></a> [3] Michael Bergman, 2009.         <span style="font-style: italic;">Annex: A commON Case Study using         Sweet Tools, Supplementary Documentation</span>, prepared by Structured         Dynamics LLC, November 10, 2009. See <a href="http://openstructs.org/iron/common-swt-annex">http://openstructs.org/iron/common-swt-annex</a>.         It may also be downloaded in PDF <a href="http://openstructs.org/sites/openstructs.org/files/downloads/common-case-study.pdf"> <img style="border: 0px solid; width: 13px; height: 16px;" src="http://openstructs.org/sites/openstructs.org/files/icons/pdfdoc.gif" alt="" /></a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON4" name="commON4"></a> [4] See Michael K. Bergman&#8217;s         <a href="http://mkbergman.com/">AI3:::Adaptive Information</a> blog,         <a href="../new-version-sweet-tools-sem-web/"><span style="font-style: italic;"> Sweet Tools (Sem Web)</span></a>. In addition, the <span style="font-weight: bold;">commON</span> version of <span style="font-style: italic;">Sweet Tools</span> is available at the <a href="http://constructscs.com/conStruct/browse/?browse=true&amp;attribute=all&amp;type=all&amp;dataset=http%3A%2F%2Fconstructscs.com%2Fwsf%2Fdatasets%2F122%2F&amp;page=0"> <span style="font-weight: bold;">conStruct</span> site</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON5" name="commON5"></a> [5] The CSV mime type is defined in         <span style="font-style: italic;">Common Format and MIME Type for         Comma-Separated Values (CSV) Files</span> [<a href="http://www.rfc-editor.org/rfc/rfc4180.txt">RFC 4180</a>]. A useful         overview of the CSV format is provided by <a title="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm" rel="nofollow" href="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm">The Comma Separated Value (CSV) File Format</a>. Also, see         that author&#8217;s related CTX reference for a discussion of how schema and         structure can be added to the basic CSV framework; see <a href="http://www.creativyst.com/Doc/Std/ctx/ctx.htm">http://www.creativyst.com/Doc/Std/ctx/ctx.htm</a>,         especially the section on the comma-delimited version (<a href="http://www.creativyst.com/Doc/Std/ctx/ctx.htm#CTC">http://www.creativyst.com/Doc/Std/ctx/ctx.htm#CTC</a>).</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON6" name="commON6"></a> [6] An <a href="http://en.wikipedia.org/wiki/Attribute-value_system">attribute-value         system</a> is a basic knowledge representation framework comprising a         table with columns designating &#8220;attributes&#8221; (also known as <span style="font-style: italic;">properties</span>, <span style="font-style: italic;">predicates</span>, <span style="font-style: italic;">features</span>, <span style="font-style: italic;">parameters</span>, <span style="font-style: italic;">dimensions</span>, <span style="font-style: italic;">characteristics</span> or <span style="font-style: italic;">independent variables</span>) and rows         designating &#8220;objects&#8221; (also known as <span style="font-style: italic;">entities</span>, <span style="font-style: italic;">instances</span>, <span style="font-style: italic;">exemplars</span>, <span style="font-style: italic;">elements</span> or <span style="font-style: italic;">dependent variables</span>). Each table cell         therefore designates the value (also known as <span style="font-style: italic;">state</span>) of a particular attribute of a         particular object. This is the basic table presentation of a         spreadsheet or relational data table.</p>
<p>Attribute-values can also be presented as pairs in a form of an         <a href="http://en.wikipedia.org/wiki/Associative_array">associative         array</a>, where the first item listed is the attribute, often followed         by a separator such as the colon, and then the value. JSON and many         simple data struct notations follow this format. This format may also         be called <span style="font-style: italic;">attribute-value         pairs</span>, <span style="font-style: italic;">key-value pairs</span>,         <span style="font-style: italic;">name-value pairs</span>, <span style="font-style: italic;">alists</span> or others. In these cases the         &#8220;object&#8221; is implied, or is introduced as the name of the array..</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON7" name="commON7"></a> [7] See especially <a style="font-style: italic;" href="http://openstructs.org/iron/iron-specification#mozTocId603499">SUB-PART         3: commON PROFILE</a> in, Frédérick Giasson and Michael Bergman, eds.,         <span style="font-style: italic;">Instance Record and Object Notation         (irON) Specification, Specification Document</span>, version 0.82, 20         October 2009.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON8" name="commON8"></a> [8] As of the date of this case         study, some of the processing steps in the <span style="font-weight: bold;">commON</span> pipeline are manual. For example,         the parser creates an intermediate N3 file that is actually submitted         to the <span style="font-weight: bold;">structWSF</span>. Within a week         or two of publication, these capabilities should be available as a         direct import to a <span style="font-weight: bold;">structWSF</span> instance. However, there is one exception to this:  the         specification for the schema structure. That module has been         prototyped, but will not be released with the first <span style="font-weight: bold;">commON</span> upgrade. That enhancement is likely         a few weeks off from the date of this posting. Please check the         <a href="http://groups.google.com/group/iron-notation"><span style="font-weight: bold;">irON</span></a> or <a style="font-weight: bold;" href="http://groups.google.com/group/structwsf">structWSF</a> discussion groups for announcements.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="commON9" name="commON9"></a> [9] <a style="font-weight: bold;" href="http://openstructs.org/">structWSF</a> is a platform-independent         Web services framework for accessing and exposing structured RDF data,         with generic tools driven by underlying data structures. Its central         perspective is that of the dataset. Access and user rights are granted         around these datasets, making the framework enterprise-ready and         designed for collaboration. Since a <span style="font-weight: bold;">structWSF</span> layer may be placed over         virtually any existing datastore with Web access &#8212; including large         instance record stores in existing relational databases &#8212; it is also a         framework for Web-wide deployments and interoperability.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="commON10"></a>[10] <a style="font-weight: bold;" href="http://constructscs.com/">conStruct SCS</a> is a structured content         system built on the Drupal content management framework. <span style="font-weight: bold;">conStruct</span> enables structured data and its         controlling vocabularies (ontologies) to drive applications and user         interfaces. It is based on RDF and SD&#8217;s <span style="font-weight: bold;">structWSF</span> platform-independent Web services         framework [6]. In addition to user access control and management and a         general user interface, <span style="font-weight: bold;">conStruct</span> provides Drupal-level CRUD, data         display templating, faceted browsing, full-text search, and import and         export over structured data stores based on RDF.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="commON11"></a> [11] More Web services are being         added to <span style="font-weight: bold;">structWSF</span> on a fairly         constant basis, and the existng ones have been through a number of         upgrades.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/845/a-most-un-common-way-to-author-datasets/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>irON: Semantic Web for Mere Mortals</title>
		<link>http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/</link>
		<comments>http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 00:12:26 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Bibliographic Knowledge Network]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Semantic Web Tools]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[spreadsheets]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=838</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=irON: Semantic Web for Mere Mortals&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/&amp;rft.language=English"></span>

New Cross-Scripting Frameworks for XML, JSON and Spreadsheets
On behalf of Structured         Dynamics, I am pleased to announce our release into the open source         community of irON — the         instance record and [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=irON: Semantic Web for Mere Mortals&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Bibliographic Knowledge Network&amp;rft.subject=Open Source&amp;rft.subject=Semantic Web Tools&amp;rft.subject=Structured Dynamics&amp;rft.subject=irON&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-10-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/&amp;rft.language=English"></span>
<p><a href="http://openstructs.org/iron/iron-specification"><img style="border: 0px solid; width: 235px; height: 125px; float: left; margin-right: 10px;" title="instance record and Object Notation" src="../wp-content/themes/ai3/images/iron_logo_235.png" alt="instance record and Object Notation" hspace="5" vspace="5" align="left" /></a></p>
<h2>New Cross-Scripting Frameworks for XML, JSON and Spreadsheets</h2>
<p>On behalf of <a href="http://structureddynamics.com/">Structured         Dynamics</a>, I am pleased to announce our release into the open source         community of <span style="font-style: italic; font-weight: bold;">irON</span> — the         <span style="font-style: italic;">instance record</span> and         <span style="font-style: italic;">Object Notation</span> — and         its family of frameworks and tools <a href="#ia1">[1]</a>. With <span style="font-weight: bold; font-style: italic;">irON</span>, you can now         author and conduct business solely in the formats and tools most         familiar and comfortable to you, all the while enabling your data to         interact with the semantic Web.</p>
<p><span style="font-weight: bold; font-style: italic;">irON</span> is an         abstract notation and associated vocabulary for specifying RDF triples         and schema in non-RDF forms. Its purpose is to allow users and tools in         non-RDF formats to stage interoperable datasets using RDF. The notation         supports writing RDF and schema in <a href="http://en.wikipedia.org/wiki/JSON">JSON</a> (<span style="font-weight: bold; font-style: italic;">irJSON</span>), <a href="http://en.wikipedia.org/wiki/Xml">XML</a> (<span style="font-weight: bold; font-style: italic;">irXML</span>) and         comma-delimited (<a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>) formats         (<span style="font-weight: bold; font-style: italic;">commON</span>).</p>
<p>The surprising thing about <span style="font-weight: bold; font-style: italic;">irON</span> is that — by         following its simple conventions and vocabulary — you will be         authoring and creating interoperable RDF datasets without doing much         different than your normal practice.</p>
<p>This first specification for the <a href="http://openstructs.org/iron/iron-specification"><span style="font-weight: bold; font-style: italic;">irON</span> notation</a> includes guidance for creating instance records         (including in bulk), linkages to existing ontologies and schema, and         schema definitions. In this newly published <a href="http://openstructs.org/iron/iron-specification"><span style="font-weight: bold; font-style: italic;">irON</span> specificatiion</a>, profiles and examples are also provided for each of         the <span style="font-weight: bold; font-style: italic;">irXML</span>,         <span style="font-weight: bold; font-style: italic;">irJSON</span> and         <span style="font-weight: bold; font-style: italic;">commON</span> serializations. The <span style="font-weight: bold; font-style: italic;">irON</span> release also         includes a number of parsers and converters of the specification into         RDF <a href="#ia2">[2]</a>. Data ingested in the <span style="font-weight: bold; font-style: italic;">irON</span> frameworks can         also be exported as RDF and staged as <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a>.</p>
<div class="boxRedDotted"><strong>UPDATE</strong>: Fred Giasson <a href="http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/">announced</a> on his blog today (10/20) the release of the <em><strong>irJSON </strong></em>and <em><strong>commON</strong></em> parsers.</div>
<h3>Background and Rationale</h3>
<p>The objective of <span style="font-weight: bold; font-style: italic;">irON</span> is to make it easy         for data owners to author, read and publish data. This means the         starting format should be a human readable, easily writable means for         authoring and conveying instance records (that is, instances and their         attributes and assigned values) and the datasets that contain them.         Among other things, this means that <span style="font-style: italic; font-weight: bold;">irON</span>&#8217;s notation does         not use RDF &#8220;<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">triples</a>&#8220;,         but rather the native notations of the host serializations.</p>
<p><span style="font-weight: bold; font-style: italic;">irON</span> is         premised on these considerations and observations:</p>
<ul>
<li> <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> (Resource Description Framework) is a powerful canonical data model           for data interoperability <a href="#ia3">[3]</a></li>
<li>However, most existing data is not written in RDF and many authors         and publishers prefer other formats for various reasons</li>
<li>Many formats that are easier to author and read than RDF are         variants of the attribute-value pair construct <a href="#ia4">[4]</a>, which can readily         be expressed as RDF, and</li>
<li>A common abstract notation for converting to RDF would also enable         non-RDF formats to become somewhat interchangeable, thus allowing the         strengths of each to be combined.</li>
</ul>
<p>The <span style="font-weight: bold; font-style: italic;">irON</span> notation and vocabulary is designed to allow the conceptual structure         (&#8221;schema&#8221;) of datasets to be described, to facilitate easy description         of the instance records that populate those datasets, and to link         different structures for different schema to one another. In these         manners, more-or-less complete RDF data structures and instances can be         described in alternate formats and be made interoperable. <span style="font-weight: bold; font-style: italic;">irON</span> provides a simple         and naïve information exchange notation expressive enough to describe         most any data entity.</p>
<p>The notation also provides a framework for extending existing schema.         This means that <span style="font-weight: bold; font-style: italic;">irON</span> and its three         serializations can represent many existing, common data formats and         standards, while also providing a vehicle for extending them. Another         intent of the specification is to be sparse in terms of requirements.         For instance, this reserved vocabulary is fairly minimal and optional         in most all cases. The <span style="font-weight: bold; font-style: italic;">irON</span> specification         supports skeletal submissions.</p>
<h3>irON Concepts and Vocabulary</h3>
<p>The aim of <span style="font-weight: bold; font-style: italic;">irON</span> is to describe         instance <span style="font-style: italic;">records</span>. An instance         record is simply a means to represent and convey the information         (”attributes”) describing a given instance. An instance is         the thing at hand, and need not represent an individual; it could, for         example, represent the entire holdings or collection of books in a         given library. Such instance records are also known as the         <em>ABox</em> <a href="#ia5">[5]</a>. The simple design of <span style="font-weight: bold; font-style: italic;">irON</span> is in keeping with         the limited roles and work associated with this <span style="font-style: italic;">ABox</span> role.</p>
<p><span style="font-style: italic;">Attributes</span> provide descriptive         characteristics for each instance. Every attribute is matched with a         value, which can range from descriptive text strings to lists or         numeric values. This design is in keeping with simple attribute-value         pairs where, in using the terminology of RDF triples, the         <em>subject</em> is the instance itself, the <em>predicate</em> is the         attribute, and the <em>object</em> is the value. <span style="font-weight: bold; font-style: italic;">irON</span> has a vocabulary         of about 40 reserved attribute terms, though only two are ever         required, with a few others strongly recommended for interoperability         and interface rendering purposes.</p>
<p>A <span style="font-style: italic;">dataset</span> is an aggregation of         instance records used to keep a reference between the instance records         and their source (provenance). It is also the container for         transmitting those records and providing any metadata descriptions         desired. A dataset can be split into multiple dataset slices. Each         slice is written to a file serialized in some way. Each slice of a         dataset shares the same <span style="font-family: Courier New,Courier,monospace;">&lt;id&gt;</span> of the         dataset.</p>
<p>Instances can also be assigned to <span style="font-style: italic;">types</span>, which provide the set or         classificatory structure for how to relate certain kinds of things         (instances) to other kinds of things. The organizational relationships         of these types and attributes is described in a <span style="font-style: italic;">schema</span>. <span style="font-weight: bold; font-style: italic;">irON</span> also has         conventions and notations for describing the <span style="font-style: italic;">linkage</span> of attributes and types in a given         dataset to existing schema. These linkages are often mapped to         established ontologies.</p>
<p>Each of these <span style="font-weight: bold; font-style: italic;">irON</span> concepts of         <span style="font-style: italic;">records</span>, <span style="font-style: italic;">attributes</span>, <span style="font-style: italic;">types</span>, <span style="font-style: italic;">datasets</span>, <span style="font-style: italic;">schema</span> and <span style="font-style: italic;">linkages</span> share similar notations with         keywords signaling to the <span style="font-weight: bold; font-style: italic;">irON</span> parsers and         converters how to interpret incoming files and data. There are also         provisions for metadata, name spaces, and local and global references.</p>
<p>In these manners, <span style="font-weight: bold; font-style: italic;">irON</span> and its three         serializations can capture virtually the entire scope and power of RDF         as a data model, but with simpler and familiar terminology and         constructs expected for each serialization.</p>
<h3>The Three Serializations</h3>
<p>For different reasons and for different audiences, the formats of XML,         JSON and CSV (spreadsheets) were chosen as the representative formats         across which to formulate the abstract <span style="font-weight: bold; font-style: italic;">irON</span> notation.</p>
<p><a href="http://en.wikipedia.org/wiki/Xml">XML</a>, or eXtensible         Markup Language, has become the leading data exchange format and syntax         for modern applications. It is frequently adopted by industry groups         for standards and standard exchange formats. There is a rich diversity         of tools that support the language, importantly including capable         parsers and query languages. There is also a serialization of RDF in         XML. As implemented in the <span style="font-weight: bold; font-style: italic;">irON</span> notation, we call         this serialization <span style="font-weight: bold; font-style: italic;">irXML</span>.</p>
<p><a href="http://en.wikipedia.org/wiki/JSON">JSON</a>, the JavaScript         Object Notation, has become very popular as a Web 2.0 data exchange         format and is often the format of choice to drive JavaScript         applications. There is a growing richness of tools that support JSON,         including support from leading Web and general scripting languages such         as JavaScript, Python, Perl, Ruby and PHP. JSON is relatively easy to         read, and is also now growing in popularity with lightweight databases,         such as CouchDB. As implemented in the <span style="font-weight: bold; font-style: italic;">irON</span> notation, we call         this serialization <span style="font-weight: bold; font-style: italic;">irJSON</span>.</p>
<p><a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>,         or comma-separated values, is a format that has been in existence for         decades. It was made famous by Microsoft as a spreadsheet exchange         format, which makes CSV very useful since spreadsheets are the most         prevalent data authoring environment in existence. CSV is less         expressive and capable as a data format than the other <span style="font-weight: bold; font-style: italic;">irON</span> serializations,         yet still has a attribute-value pair orientation. And, via         spreadsheets, datasets can be easily authored and inspected, while also         providing a rich functional environment including sorting, formatting,         data validation, calculations, macros, etc. As implemented in the         <span style="font-weight: bold; font-style: italic;">irON</span> notation, we call this serialization <span style="font-weight: bold; font-style: italic;">commON</span>.</p>
<p>The following diagram shows how these three formats relate to         <span style="font-weight: bold; font-style: italic;">irON</span> and         then the canonical RDF target data model:</p>
<div><img class="center_ok" style="width: 547px; height: 619px;" title="Data transformations path" src="../wp-content/themes/ai3/images/2009Posts/data_transform_path.png" alt="Data transformations path" width="547" height="619" /></div>
<p>We have used the unique differences amongst XML, JSON and CSV to guide         the embracing abstract notations within <span style="font-weight: bold; font-style: italic;">irON</span>. Note the         round-tripping implications of the framework.</p>
<p>One exciting prospect for the design is how, merely by following the         simple conventions within <span style="font-weight: bold; font-style: italic;">irON</span>, each of these         three data formats — and RDF !! — can be used more-or-less         interchangeably, and can be used to extend existing schema within their         domains.</p>
<h3>Links, References and More</h3>
<p>This first release of <span style="font-weight: bold; font-style: italic;">irON</span> is in version 0.8.         Updates and revisions are likely with use. Here are some key links for         <span style="font-weight: bold; font-style: italic;">irON</span>:</p>
<ul>
<li>The <a href="http://openstructs.org/iron/iron-specification"><span style="font-weight: bold; font-style: italic;">irON</span> specification</a>, also available in <a href="http://openstructs.org/sites/openstructs.org/files/downloads/irON_specification_v8.pdf">download</a> as a PDF <a href="http://openstructs.org/sites/openstructs.org/files/downloads/irON_specification_v8.pdf"><img src="http://www.mkbergman.com/wp-content/themes/ai3/images/pdfdoc.gif" alt="" width="13" height="16" /></a></li>
<li>The <span style="font-weight: bold; font-style: italic;">irON</span> <a href="http://code.google.com/p/iron-notation/">code and vocabulary release         site</a>, and</li>
<li>The <a href="http://groups.google.com/group/iron-notation">Google         discussion group</a> for the <span style="font-weight: bold; font-style: italic;">irON</span> notation.</li>
</ul>
<p>Mid-week, the parsers and converters for <span style="font-weight: bold;">structWSF</span> <a href="#ia6">[6]</a> will be released and         announced on Fred Giasson&#8217;s <a href="http://fgiasson.com/blog">blog</a>.</p>
<p>In addition, within the next week we will be publishing a case study of         converting the <a style="color: #820000; font-weight: bold;" href="../new-version-sweet-tools-sem-web/">Sweet         Tools</a> semantic Web and -related tools dataset to <span style="font-weight: bold; font-style: italic;">commON</span>.</p>
<p><span>The <span style="font-weight: bold; font-style: italic;">irON</span> specification and         notation</span> by <a rel="cc:attributionURL" href="http://openstructs.org/iron/iron-specification">Structured Dynamics         LLC</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative Commons         Attribution-Share Alike 3.0</a>. <span style="font-weight: bold; font-style: italic;">irON</span>&#8217;s parsers or         converters are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache License,         Version 2.0</a>.</p>
<h3>Editors&#8217; Notes</h3>
<p><span style="font-weight: bold; font-style: italic;">irON</span> is an         important piece in the semantic enterprise puzzle that we are building         at <a href="http://structureddynamics.com/">Structured Dynamics</a>. It         reflects our belief that knowledge workers should be able to author and         create interoperable datasets without having to learn the arcana of         RDF. At the same time we also believe that RDF is the appropriate data         model for interoperability. <span style="font-weight: bold; font-style: italic;">irOn</span> is an expression         of our belief that many data formats have appropriate places and uses;         there is no need to insist on a single format.</p>
<p>We would like to thank <a href="http://www.stat.berkeley.edu/%7Epitman/">Dr. Jim Pitman</a> for his         advocacy of the importance of human-readable and easily authored         datasets and formats. Via his leadership of the Bibliographic Knowledge         Network (BKN) project and our contractual relationship with it <a href="#ia7">[7]</a>, we         have learned much regarding the BKN&#8217;s own format, BibJSON. Experience         with this format has been a catalytic influence in our own work on         <span style="font-weight: bold; font-style: italic;">irON</span>.</p>
<p style="margin-left: 40px;">— <span style="font-style: italic;">Mike Bergman</span> and         <span style="font-style: italic;">Fred Giasson</span>, editors</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia1" name="ia1"></a> [1] Please <a href="http://structureddynamics.com/products.html">see here</a> for how         <span style="font-weight: bold; font-style: italic;">irON</span> fits         within Structured Dynamics&#8217; vision and family of products.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia2" name="ia2"></a> [2] Presently parsers and converters are         available for the <span style="font-weight: bold; font-style: italic;">irJSON</span> and <span style="font-weight: bold; font-style: italic;">commON</span> serializations,         and will be released this week. We have tentatively spec&#8217;ed the         <span style="font-weight: bold; font-style: italic;">irXML</span> converter, and would welcome working with another party to finalize a         converter. Absent an immediate contribution from a third party,         contractual work will likely result in our completing the <span style="font-weight: bold; font-style: italic;">irXML</span> converter within         the reasonable future.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia3" name="ia3"></a> [3] A pivotal premise of <span style="font-weight: bold; font-style: italic;">irON</span> is the         desirability of using the RDF data model as the canonical basis for         interoperable data. RDF provides a data model capable of representing         any extant data structure and any extant data format. This flexibility         makes RDF a perfect data model for federating across disparate data         sources. For a detailed discussion of RDF, see Michael K. Bergman,         2009. &#8220;Advantages and Myths of RDF,&#8221; in <span style="font-style: italic;">AI3 blog</span>, April 8, 2009. See <a href="../483/advantages-and-myths-of-rdf/">http://www.mkbergman.com/483/advantages-and-myths-of-rdf/</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia4" name="ia4"></a> [4] An <a href="http://en.wikipedia.org/wiki/Attribute-value_system">attribute-value         system</a> is a basic knowledge representation framework comprising a         table with columns designating &#8220;attributes&#8221; (also known as <span style="font-style: italic;">properties</span>, <span style="font-style: italic;">predicates</span>, <span style="font-style: italic;">features</span>, <span style="font-style: italic;">parameters</span>, <span style="font-style: italic;">dimensions</span>, <span style="font-style: italic;">characteristics</span> or <span style="font-style: italic;">independent variables</span>) and rows         designating &#8220;objects&#8221; (also known as <span style="font-style: italic;">entities</span>, <span style="font-style: italic;">instances</span>, <span style="font-style: italic;">exemplars</span>, <span style="font-style: italic;">elements</span> or <span style="font-style: italic;">dependent variables</span>). Each table cell         therefore designates the value (also known as <span style="font-style: italic;">state</span>) of a particular attribute of a         particular object. This is the basic table presentation of a         spreadsheet or relational data table.</p>
<p>Attribute-values can also be presented as pairs in the form of an         <a href="http://en.wikipedia.org/wiki/Associative_array">associative         array</a>, where the first item listed is the attribute, often followed         by a separator such as the colon, and then the value. JSON and many         simple data struct notations follow this format. This format may also         be called <span style="font-style: italic;">attribute-value         pairs</span>, <span style="font-style: italic;">key-value pairs</span>,         <span style="font-style: italic;">name-value pairs</span>, <span style="font-style: italic;">alists</span> or others. In these cases the         &#8220;object&#8221; is implied, or is introduced as the name of the array.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia5" name="ia5"></a> [5]We use the reference to the &#8220;<a href="http://en.wikipedia.org/wiki/Abox">ABox</a>&#8221; and “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” in accordance with         this <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia6" name="ia6"></a> [6] <a href="http://openstructs.org/">structWSF</a> is a platform-independent Web         services framework for accessing and exposing structured RDF data, with         generic tools driven by underlying data structures. Its central         perspective is that of the dataset. Access and user rights are granted         around these datasets, making the framework enterprise-ready and         designed for collaboration. Since a structWSF layer may be placed over         virtually any existing datastore with Web access &#8212; including large         instance record stores in existing relational databases &#8212; it is also a         framework for Web-wide deployments and interoperability.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ia7" name="ia7"></a> [7] BKN is a project to develop a suite of         tools and services to encourage formation of virtual organizations in         scientific communities of various types. BKN is a project started in         September 2008 with funding by the <a href="http://www.nsf.gov/crssprgm/cdi/">NSF Cyber-enabled Discovery and         Innovation (CDI) Program</a> (Award # 0835851). The major participating         organizations are the <a href="http://www.bibkn.org/drupal/conStruct/datasets/99/resource/bkncentral_AIM"> American Institute of Mathematics (AIM)</a>, <a href="http://www.bibkn.org/drupal/conStruct/datasets/99/resource/bkncentral_Harvard"> Harvard University</a>, <a href="http://www.bibkn.org/drupal/conStruct/datasets/99/resource/bkncentral_Stanford"> Stanford University</a> and the <a href="http://www.bibkn.org/drupal/conStruct/datasets/99/resource/bkncentral_Berkeley"> University of California, Berkeley</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>&#8216;SuperTypes&#8217; and Logical Segmentation of Instances</title>
		<link>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/</link>
		<comments>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 21:23:20 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[Structured Web]]></category>
		<category><![CDATA[UMBEL]]></category>
		<category><![CDATA[cyc]]></category>
		<category><![CDATA[instances]]></category>
		<category><![CDATA[named entities]]></category>
		<category><![CDATA[superTypes]]></category>
		<category><![CDATA[TBox]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=759</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=&#8216;SuperTypes&#8217; and Logical Segmentation of Instances&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/&amp;rft.language=English"></span>
 
The Significant Advantages to a Logically Segmented TBox
The Message Understanding Conferences (MUC)         were initiated in 1987 and financed by DARPA to encourage the         development of new and better methods of information        [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=&#8216;SuperTypes&#8217; and Logical Segmentation of Instances&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Ontologies&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=Structured Web&amp;rft.subject=UMBEL&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-09-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/&amp;rft.language=English"></span>
<p><img style="border: 0px solid; width: 200px; height: 172px; float: left; margin-right: 10px;" title="Segmented" src="../wp-content/themes/ai3/images/2009Posts/090831_segmented.jpg" alt="Segmented" hspace="5" vspace="5" align="left" /> <a href="http://www.umbel.org/"><img style="border: 0px solid; margin-left: 5px; width: 100px; height: 50px; float: right;" src="../wp-content/themes/ai3/images/umbel_logo_100.png" alt="UMBEL (Upper Mapping and Binding Exchange Layer)" /></a></p>
<h2>The Significant Advantages to a Logically Segmented TBox</h2>
<p>The Message Understanding Conferences (<a href="http://en.wikipedia.org/wiki/Message_Understanding_Conference">MUC</a>)         were initiated in 1987 and financed by <a href="http://en.wikipedia.org/wiki/DARPA">DARPA</a> to encourage the         development of new and better methods of <a href="http://en.wikipedia.org/wiki/Information_extraction">information         extraction</a> (IE). It was a seminal series that resulted in basic         measures of retrieval and semantic efficacy, <a href="http://en.wikipedia.org/wiki/Precision_and_recall">recall</a> (R) and         <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision</a> (P)         and the combined <a href="http://en.wikipedia.org/wiki/F-score">F-measure</a>, and other core         terminology and constructs used by IE today.</p>
<p>By the sixth version in the series (MUC-6), in 1995, the task of         recognition of <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">named         entities</a> and <a href="http://en.wikipedia.org/wiki/Coreference">coreference</a> was added.         That initial slate of named entities included the basic building blocks         of <span style="font-style: italic;">person</span> (PER), <span style="font-style: italic;">location</span> (LOC), and <span style="font-style: italic;">organization</span> (ORG); to these were added         the numeric building blocks of <span style="font-style: italic;">time</span>, <span style="font-style: italic;">percentage</span> or <span style="font-style: italic;">quantity</span>. The very terminology of         <span style="font-style: italic;">named entity</span> was coined for         this seminal meeting, as was the idea of inline markup <a href="#st1">[1]</a>.</p>
<h3>What is a &#8216;Nameable Thing&#8217;?</h3>
<p>The intuition surrounding &#8220;named entity&#8221; and nameable &#8220;things&#8221; was that they         were discrete and disjoint. A <span style="font-style: italic;">rock</span> is not a <span style="font-style: italic;">person</span> and is not a <span style="font-style: italic;">chemical</span> or an <span style="font-style: italic;">event</span>. As initially used, all &#8220;named         entities&#8221; were distinct individuals. But, there also emerged the         understanding that some classes of things could also be treated as         more-or-less distinct nameable &#8220;things&#8221;: <span style="font-style: italic;">beetles</span> are not the same as <span style="font-style: italic;">frogs</span> and are not the same as <span style="font-style: italic;">rocks</span>. While some of these &#8220;things&#8221; might         be a true individual with a discrete name, such as <a href="http://en.wikipedia.org/wiki/Kermit_the_Frog">Kermit the Frog</a>, or         <a href="http://en.wikipedia.org/wiki/The_Rock_%28Northwestern_University%29">The         Rock</a> at Northwestern University, most instances of such things are         unnamed.</p>
<p>The &#8220;nameability&#8221; (or logical categorization) of things is perhaps best         kept separate from other epistemological issues of distinguishing         <span style="font-style: italic;">sets</span>, <span style="font-style: italic;">collections</span>, or <span style="font-style: italic;">classes</span> from <span style="font-style: italic;">individuals</span>, <span style="font-style: italic;">members</span> or <span style="font-style: italic;">instances</span>.</p>
<p>In a closed-world system it is easier to enforce clean distinctions.         The <a href="http://en.wikipedia.org/wiki/Cyc">Cyc knowledge base</a>,         for example, the basis for <a href="http://umbel.org/">UMBEL</a> (<span style="font-style: italic;">Upper Mapping and Binding Exchange         Layer</span>),  makes clear the distinction between <span style="font-style: italic;">individuals</span> and <span style="font-style: italic;">collections</span>. In the semantic Web and RDF,         this can become smeared a bit with the favored terminology shifting to         <span style="font-style: italic;">instances</span> and <span style="font-style: italic;">classes</span>, and in pragmatic, real-world         terms we (as humans) readily distinguish John Smith as distinct from         Jane Doe but don&#8217;t generally (unless we&#8217;re entomologists!) make such distinctions for individual         beetles, let alone entire genera or species of beetles.</p>
<p>Under precise conditions, these distinctions are important. The fact         that Cyc, for example, is assiduous in its application of these         distinctions is a major reason for the overall <a href="../450/when-is-content-coherent/">coherence</a> of its knowledge base. But, for most circumstances, we think it is OK         to accept a distinction between &#8220;nameable&#8221; things such as frogs and         beetles, but also to accept that there may be nameable individuals at         times in those groupings such as Kermit that are truly an individual in         that more refined sense.</p>
<p>This digression sets the background for a natural progression from that         first MUC-6 conference. If we could cluster <span style="font-style: italic;">persons</span> or <span style="font-style: italic;">organizations</span>, why not other categories of         distinct and disjoint things such as <span style="font-style: italic;">frogs</span> or <span style="font-style: italic;">beetles</span> or <span style="font-style: italic;">rocks</span>?</p>
<p>From the first six entity categories of MUC-6 we begin to see an         expansion to broader coverage. Readers of this blog will recall that I         have been a fan for quite some time of the expanded coverage of 64         classes of entities proposed by BBN or the 200 proposed by Sekine <a href="#st2">[2]</a> (as discussed, for example in the April 2008 <a style="font-style: italic;" href="../432/subject-concepts-and-named-entities/">Subject         Concepts and Named Entities</a> article). Again, the intuition was that         real things in the real world could be logically categorized into         discrete and disjoint categories.</p>
<p>Thus, &#8220;named entities&#8221; inexorably moved to become a categorization         system, where the degree of familiarity and distinction dictated         whether it was the individual (with a unique name, such as <span style="font-style: italic;">Abraham Lincoln</span> or <span style="font-style: italic;">Mt. Rushmore</span>) or groupings such as animal         or plant species and their common names (such as <span style="font-style: italic;">beetle</span> or <span style="font-style: italic;">oak</span>) that was the standard &#8220;handle&#8221; for         assigning a name to the &#8220;nameable thing&#8221;.</p>
<p>While many can argue these individual &lt;&#8211;&gt; grouping distinctions         and whether we are talking about true, unique, named individuals or         names of convenience, I think that (at least for this blog post and         discussion), that misses the real, fundamental point.</p>
<p>The real, fundamental point is that some &#8220;things&#8221; (whether <span style="font-style: italic;">individuals</span>, <span style="font-style: italic;">instances</span> or <span style="font-style: italic;">classes</span>) are distinct from other &#8220;things&#8221;.         Such disjoint distinctions are a powerful concept that should not be         lost sight of by &#8220;<a href="http://en.wikipedia.org/wiki/How_many_angels_can_dance_on_the_head_of_a_pin%3F">angels         dancing on the head of a pin</a>&#8221; epistemological arguments. A         <span style="font-style: italic;">frog</span> is not a <span style="font-style: italic;">rock</span>, despite neither are &#8220;individuals&#8221;,         and how can we take advantage of that realilty?</p>
<h3>What Works for Entities, Works for Concepts</h3>
<p>Nearly from the outset of our work with UMBEL as a &#8216;TBox&#8217; <a href="#st3">[3]</a> &#8212; that         is, as a set of 20,000 or so common &#8220;subject concepts&#8221; &#8212; the natural         question was what the relation or correspondence was of these concepts         to the underlying &#8220;things&#8221; (entities) that they organized. As we probed         the disjoint categories within the Sekine 200 entity types, for         example, we began to see significant parallels and overlap. Also         gnawing at our sense of order was the rather artificial and arbitrary         class of concepts in UMBEL that we termed &#8220;Abstract Concepts&#8221;.</p>
<p>We <a href="../430/a-re-introduction-to-umbel/">introduced         Abstract Concepts</a> in the first release of UMBEL. When introduced,         we defined &#8220;<em>Abstract concepts</em> [as] representing abstract or         ephemeral notions such as truth, beauty, evil or justice, or [as]         thought constructs useful to organizing or categorizing things but are         not readily seen in the experiential world.&#8221; In pragmatic terms,         Abstract Concepts in UMBEL were often pivotal nodes in the UMBEL         subject graph necessary to maintain a high degree of concept         interconnectivity.</p>
<p>In any world view that attempts to be more-or-less comprehensive, there         is a gradation of concepts from the concrete and observable to the         abstract and ephemeral. The recognition that some of these concepts may         be more abstract, then, was not the issue. The issue was that there was         no definable basis for segregating a concrete Subject Concept from the         more Abstract Concept. Where was the bright line? What was the         actionable distinction?</p>
<p>Off and on we have probed this question for more than a year, and have         looked at what might constitute a more natural and logical ordering and         segmentation within UMBEL. After many tests and detailed analysis, we         are now releasing the first results of our investigations.</p>
<p>For, like nameable entities or things, we can see a logical         segmentation of (mostly) disjoint concepts within the UMBEL TBox. Here         are the summary percentages of these high-level splits:</p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td>Disjoint Concepts</td>
<td style="text-align: right;">90%</td>
</tr>
<tr>
<td>Attributes</td>
<td style="text-align: right;">1%</td>
</tr>
<tr>
<td>Classifications</td>
<td style="text-align: right;">9%</td>
</tr>
<tr>
<td>TOTAL</td>
<td style="text-align: right;">100%</td>
</tr>
</tbody>
</table>
<p>(Because the analysis is still being refined, exact counts and         percentages for the 20,000 concepts in UMBEL are not provided.)</p>
<h3>Why a Logical Segmentation?</h3>
<p>As we dove deeper into these ideas, not only could we see the basis for         a logical segmentation within UMBEL&#8217;s concepts, but manifest benefits         from doing so as well. Remember that UMBEL&#8217;s concept structure performs         two main roles. It:  1) provides a coherent framework for relating and         &#8220;mapping&#8221; other external ontologies; and 2) provides conceptual binding         points for organizing entities and instances <a href="#st4">[4]</a>. Via logical         segmentation, we get benefits for both roles.</p>
<p>Here are some of the broad areas of benefit from a logical UMBEL         segmentation that we have identified:</p>
<ul>
<li>Template-driven &#8212; as we <a href="../492/ontology-best-practices-for-data-driven-applications-part-3/"> discuss elsewhere</a>, <a href="http://structureddynamics.com/">Structured Dynamics</a> also uses its           ontologies to &#8220;drive applications&#8221; and the user interfaces (UI) that           support them. By proper segmentation of UMBEL concepts, we are able           to determine to what &#8220;cluster&#8221; of things (which we call either           <span style="font-style: italic;">dimensions</span> or <span style="font-style: italic;">superTypes</span>; see below) a given thing           belongs. This identification means we can also determine how best to           display information about that &#8220;thing&#8221;. This determination can           include either the attributes or the display templates appropriate           for that thing. For example, location-based things or time-based           things might invoke map or calendar or timeline type displays.           Moreover, because of the logical segmentation of concepts, we can           also use the power of the concept graph to infer more generic display           templates when specific matches are absent</li>
<li>Computational Efficiency &#8212; as the percentages above indicate, once         we identify what <span style="font-style: italic;">superType</span> concept to which a given instance belongs, we can eliminate nearly all         remaining UMBEL concepts from consideration. This logical winnowing         leads to computational efficiencies at all levels in the system. The         fastest computational work is not to do it, and when large chunks of         data are removed from consideration, many performance advantages accrue</li>
<li>Disambiguation &#8212; via this approach we now can         assess concept matches in addition         to entity matches. This means we can         triangulate between the two assessments to aid disambiguation. Because         of these logical segmentations, we also have multiple &#8220;clusters&#8221; (that         is, either the <span style="font-style: italic;">concept</span>,         <span style="font-style: italic;">type</span>, <span style="font-style: italic;">superType</span> or <span style="font-style: italic;">dimension</span>) upon which to do our         disambiguation evaluations, either between concepts and entities or         within the various concept clusters. We can do so via either multiple         <a href="http://en.wikipedia.org/wiki/Vector_space_model">semantic         vectors</a> (for statistical-based methods) or multiple <a href="http://en.wikipedia.org/wiki/Features_%28pattern_recognition%29">features</a> (for <a href="http://en.wikipedia.org/wiki/Machine_learning">machine         learning</a> methods). In other words, because of logical segmentation,         we have increased the informational power of our concept graph</li>
<li>Structure and Integrity Testing &#8212; the very mindset of looking for         logical segmentation has led to much learning about the UMBEL structure         and OpenCyc upon which it is based. In the process, missing nodes         (concepts), erroneous assignments, and superfluous nodes are all being         discovered. Further, many of these tests can be automated using basic         logical and inference approaches. The net result is a constant         improvement to the scope and completeness of the structure. Lastly,         these same approaches can be applied when mapping external ontologies         to UMBEL, providing similar consistency benefits.</li>
</ul>
<p>With these benefits in mind, we have undertaken concerted analysis of         UMBEL to discern what this &#8220;logical segmentation&#8221; might be. This         investigation has occurred over three concentrated periods over the         past year. (Intervening priorities or other work prevented         concentrating solely on this task.)</p>
<p>We are now complete with our first full iteraton of investigation. In         this post, and then the subsequent release of UMBEL version 0.80 in the         coming weeks, the fruits of this effort should be evident. However, it         should also be noted that we are still learning much from this new         mindset and approach. UMBEL structure refinement may be likely for some         time to come.</p>
<h3>UMBEL Analysis</h3>
<p>Most things and concepts about them are based on real, observable,         physical things in the real world. Because most of these things can not         occupy both the same moment in time and the same location in physical         space, a useful criterion for looking at these things and concepts is         <a href="http://en.wikipedia.org/wiki/Disjoint-set_data_structure">disjointedness</a>.</p>
<p>In a broad sense, then, we can split our concepts of the world between         those ideas that are disjoint because they pertain to separable objects         or ideas and those that are cross-cutting or organizational or         classificatory. Attributes, such as color (pink, for example), are         often cross-cutting in that they can be used to describe quite         disparate things. Inherent classification schemes such as academic         fields of study or library catalog systems &#8212; while useful ways to         organize the world &#8212; are not themselves in-and-of the world or         discrete from other ideas. Thus, classificatory or organizational         concepts are inherently not disjoint.</p>
<p>With the criterion of disjointedness in hand, then, we began an         evaluation process of the UMBEL subject concepts. We looked to         organizational schema such as the entity types of Sekine or BBN for         some starting guidance. We also kept in mind that we also wanted our         categories to inform logical clusterings of possible data presentation,         such as media types or locations or time.</p>
<p>For terminology, we adopted the term <span style="font-style: italic;">superType</span> to denote the largest cluster         designation upon which this disjointedness may occur. As a way to test         the basic coherence of these <span style="font-style: italic;">superTypes</span>, we also collected them into         larger groups which we termed <span style="font-style: italic;">dimensions</span>.</p>
<p>Our analysis process began with branch-by-branch testing of the UMBEL         concept graph using automated scripts, attempting to find pivotal nodes         where child instance members were disjoint from other <span style="font-style: italic;">superTypes</span>. This we term the &#8220;top-down&#8221;         method.</p>
<p>This automated analysis was then supplemented with a complete manual         inspection of all unassigned and assigned concepts, with a &#8220;bottom up&#8221;         assignment of concepts or corrections to the automated approach. This         inspection then led to new insights and identification of missing         concepts that needed to be added into UMBEL.</p>
<p>We are still converging between these two methods. Optimally, we should         be able to tease out all UMBEL <span style="font-style: italic;">superTypes</span> with a relatively few number of         <span style="font-weight: bold;">union</span>, <span style="font-weight: bold;">intersection</span>, or <span style="font-weight: bold;">complement</span> <a href="http://en.wikipedia.org/wiki/Set_theory#Basic_concepts">set         operations</a>. In its current form, we are close, but there are still         some rough spots.</p>
<p>Nonetheless, this analysis method has led us to identify some 33         <span style="font-style: italic;">superTypes</span> <a href="#st5">[5]</a>, clustered into         9 dimensions. Of these, 29 <span style="font-style: italic;">superTypes</span> and 8 dimensions are mostly         disjoint. The one dimension of Classificatory includes the four         cross-cutting <span style="font-style: italic;">superTypes</span> of         attributes and organizational schema that can apply to any of the 29         disjoint <span style="font-style: italic;">superTypes</span>.</p>
<h4>UMBEL superTypes</h4>
<p>Here is the schema, with the descriptions of each:</p>
<table style="border-collapse: collapse; width: 684px;" border="0" cellspacing="0" cellpadding="8">
<col style="width: 110pt;" width="146"></col>
<col style="width: 125pt;" width="166"></col>
<col style="width: 449pt;" width="599"></col>
<tbody>
<tr style="height: 25.5pt;" height="34">
<td style="height: 25.5pt; width: 110pt; font-weight: bold; background-color: #cccccc; text-align: center;">Dimension</td>
<td style="border-left: medium none; width: 125pt; font-weight: bold; background-color: #cccccc; text-align: center;" width="166">superType</td>
<td style="border-left: medium none; width: 449pt; font-weight: bold; background-color: #cccccc; text-align: center;" width="599">Description/Sub-types</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="border-top: medium none; height: 63.75pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Natural World</td>
<td style="border-top: medium none; font-weight: bold;">Natural Phenomena</td>
<td style="border-top: medium none; width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               natural phenomena and natural processes such as weather,               weathering, erosion, fires, lightning, earthquakes, tectonics,               etc. Clouds and weather processes are specifically included. Also               includes climate cycles, general natural events (such as               hurricanes) that are not specifically named, and biochemical               processes and pathways.</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold;">Natural Substances</td>
<td style="width: 449pt;" width="599">Notable inclusions are minerals, compounds, chemicals, or               physical objects that are not the outcome of purposeful human               effort, but are found naturally occurring. Other natural objects               (such as rock, fossil, etc.) are also found under this               <span style="font-style: italic;">superType</span>.</td>
</tr>
<tr style="height: 102pt;" height="136">
<td style="height: 102pt; font-weight: bold; background-color: #cccccc;" height="136"></td>
<td style="font-weight: bold;">Earthscape</td>
<td style="width: 449pt;" width="599">The Earthscape <span style="font-style: italic;">superType</span> consists mostly of the collection of               cartographic features that occur on the surface of the Earth.               Positive examples include Mountain, Ocean, and Mesa. Artificial               features such as canals are excluded. Most instances of these               features have a fixed location in space.</p>
<p>Underground and underwater are also explicitly contained.</p>
<p>This <span style="font-style: italic;">superType</span> is               explicitly disjoint with Extraterrestrial (see below).</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: #cccccc;" height="38"></td>
<td style="font-weight: bold;">Extraterrestrial</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               all natural things not specifically terrestrial, including               celestial bodies (planets, asteroids, stars, galaxies, etc., that               can be located within a sky map)</td>
</tr>
<tr style="height: 30pt;" height="40">
<td style="border-top: medium none; height: 30pt; font-weight: bold; background-color: white; vertical-align: top;">Living Things</td>
<td style="border-top: medium none; font-weight: bold;">Prokaryotes</td>
<td style="border-top: medium none; width: 449pt;" width="599">The Prokaryotes include all prokaryotic organisms, including the               Monera, Archaebacteria, Bacteria, and Blue-green algas. Also               included in this <span style="font-style: italic;">superType</span> are viruses and prions.</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: white;" height="38"></td>
<td style="font-weight: bold;">Protists or Fungus</td>
<td style="width: 449pt;" width="599">This is the remaining cluster of eukaryotic organisms,               specifically including the fungus and the protista (protozoans               and slime molds).</td>
</tr>
<tr style="height: 41.25pt;" height="55">
<td style="height: 41.25pt; font-weight: bold; background-color: white;" height="55"></td>
<td style="font-weight: bold;">Plants</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               all plant types and flora, including flowering plants, algae,               non-flowering plants, gymnosperms, cycads, and plant parts and               body types. Note that all Plant Parts are also included.</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="height: 63.75pt; font-weight: bold; background-color: white;" height="85"></td>
<td style="font-weight: bold;">Animals</td>
<td style="width: 449pt;" width="599">This large <span style="font-style: italic;">superType</span> includes all animal types, including specific animal types and               vertebrates, invertebrates, insects, crustaceans, fish, reptiles,               amphibia, birds, mammals, and animal body parts. Animal parts are               specifically included. Also, groupings of such animals are               included. Humans, as an animal, are included (versus as an               individual Person). Diseases are specifically excluded.</td>
</tr>
<tr style="height: 56.25pt;" height="75">
<td style="height: 56.25pt; font-weight: bold; background-color: white;" height="75"></td>
<td style="font-weight: bold;">Diseases</td>
<td style="width: 449pt;" width="599">Diseases are atypical or unusual or unhealthy conditions for               (mostly human) living things, generally known as conditions,               disorders, infections, diseases or syndromes. Diseases only               affect living things and sometimes are caused by living things.               This <span style="font-style: italic;">superType</span> also               includes impairments, disease vectors, wounds and injuries, and               poisoning</td>
</tr>
<tr style="height: 63.75pt;" height="85">
<td style="height: 63.75pt; font-weight: bold; background-color: white;" height="85"></td>
<td style="font-weight: bold;">Person Types</td>
<td style="width: 449pt;" width="599">The appropriate <span style="font-style: italic;">superType</span> for all named, individual               human beings. This <span style="font-style: italic;">superType</span> also includes the               assignment of formal, honorific or cultural titles given to               specific human individuals. It further includes names given to               humans who conduct specific jobs or activities (the latter case               is known as an avocation). Examples include steelworker,               waitress, lawyer, plumber, artisan. Ethnic groups are               specifically included.</td>
</tr>
<tr style="height: 181.5pt;" height="242">
<td style="border-top: medium none; height: 181.5pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Human Activities</td>
<td style="border-top: medium none; font-weight: bold;">Organizations</td>
<td style="border-top: medium none; width: 449pt;" width="599">Organization is a broad <span style="font-style: italic;">superType</span> and includes formal               collections of humans, sometimes by legal means, charter,               agreement or some mode of formal understanding. Examples include               geopolitical entities such as nations, municipalities or               countries; or companies, institutes, governments, universities,               militaries, political parties, game groups, international               organizations, trade associations, etc. All institutions, for               example, are organizations.</p>
<p>Also included are informal collections of humans. Informal or               less defined groupings of humans may result from ethnicity or               tribes or nationality or from shared interests (such as social               networks or mailing lists) or expertise (&#8221;communities of               practice&#8221;). This dimension also includes the notion of               identifiable human groups with set members at any given point in               time. Examples include music groups, cast members of a play,               directors on a corporate Board, TV show members, gangs, mobs,               juries, generations, minorities, etc.</p>
<p>Finally, Organizations contain the concepts of Industries and               Programs and Communities.</td>
</tr>
<tr style="height: 42pt;" height="56">
<td style="height: 42pt; font-weight: bold; background-color: #cccccc;" height="56"></td>
<td style="font-weight: bold;">Finance &amp; Economy</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> pertains               to all things financial and with respect to the economy,               including chartable company performance, stock index entities,               money, local currencies, taxes, incomes, accounts and accounting,               mortgages and property.</td>
</tr>
<tr style="height: 54pt;" height="72">
<td style="height: 54pt; font-weight: bold; background-color: #cccccc;" height="72"></td>
<td style="font-weight: bold;">Culture, Issues, Beliefs</td>
<td style="width: 449pt;" width="599">This category includes concepts related to political systems,               laws, rules or cultural mores governing societal or community               behavior, or doctrinal, faith or religious bases or entities               (such as gods, angels, totems) governing spiritual human matters.               Culture, Issues, beliefs and various activisms (most -isms) are               included</td>
</tr>
<tr style="height: 53.25pt;" height="71">
<td style="height: 53.25pt; font-weight: bold; background-color: #cccccc;" height="71"></td>
<td style="font-weight: bold;">Activities</td>
<td style="width: 449pt;" width="599">These are ongoing activities that result (mostly) from human               effort, often conducted by organizations to assist other               organizations or individuals (in which case they are known as               services, such as medicine, law, printing, consulting or               teaching) or individual or group efforts for leisure, fun,               sports, games or personal interests (activities)</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: white; vertical-align: top;">Human Works</td>
<td style="border-top: medium none; font-weight: bold;">Products</td>
<td style="border-top: medium none; width: 449pt;" width="599">This is the largest <span style="font-style: italic;">superType</span> and includes any instance               offered for sale or performed as a commercial service. Often               physical object made by humans that is not a conceptual work or a               facility, such as vehicles, cars, trains, aircraft, spaceships,               ships, foods, beverages, clothes, drugs, weapons. Products also               include the concept of &#8217;state&#8217; (e/g/., on/off)</td>
</tr>
<tr style="height: 25.5pt;" height="34">
<td style="height: 25.5pt; font-weight: bold; background-color: white;" height="34"></td>
<td style="font-weight: bold;">Food or Drink</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is any               edible substance grown, made or harvested by humans. The category               also specifically includes the concept of cuisines</td>
</tr>
<tr>
<td style="height: 12.75pt; font-weight: bold; background-color: white;"></td>
<td style="font-weight: bold;">Drugs</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is an               drug, medication or addictive substance</td>
</tr>
<tr style="height: 143.25pt;" height="191">
<td style="height: 143.25pt; font-weight: bold; background-color: white;" height="191"></td>
<td style="font-weight: bold;">Facilities</td>
<td style="width: 449pt;" width="599">Facilities are physical places or buildings constructed by               humans, such as schools, public institutions, markets, museums,               amusement parks, worship places, stations, airports, ports,               carstops, lines, railroads, roads, waterways, tunnels, bridges,               parks, sport facilities, monuments. All can be geospatially               located.</p>
<p>Facilities also include animal pens and enclosures and general               human &#8220;activity&#8221; areas (golf course, archeology sites, etc.).               Importantly, Facilities include infrastructure systems such as               roadways and physical networks.</p>
<p>Facilities also include the component parts that go into making               them (such as foundations, doors, windows, roofs, etc.)</td>
</tr>
<tr style="height: 39.75pt;" height="53">
<td style="border-top: medium none; height: 39.75pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Information</td>
<td style="border-top: medium none; font-weight: bold;">Chemistry (n.o.c)</td>
<td style="border-top: medium none; width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is a               residual category (n.o.c., not otherwise categorized) for               chemical bonds, chemical composition groupings, and the like. It               is formed by what is not a natural substance or living thing               (organic) substance.</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;" height="37"></td>
<td style="font-weight: bold;">Audio Info</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> is for               any audio-only human work. Examples include live music               performances, record albums, or radio shows or individual radio               broadcasts</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;" height="37"></td>
<td style="font-weight: bold;">Visual Info</td>
<td style="width: 449pt;" width="599">This<em> superType</em> includes any still image or picture or streaming video human work, with or               without audio. Examples include graphics, pictures, movies, TV               shows, individual shows from a TV show, etc.</td>
</tr>
<tr style="height: 28.5pt;" height="38">
<td style="height: 28.5pt; font-weight: bold; background-color: #cccccc;" height="38"></td>
<td style="font-weight: bold;">Written Info</td>
<td style="width: 449pt;" width="599">This <span style="font-style: italic;">superType</span> includes               any general material written by humans including books, blogs,               articles, manuscripts, but any written information conveyed via               text.</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold;">Structured Info</td>
<td style="width: 449pt;" width="599">This information <span style="font-style: italic;">superType</span> is for all kinds of               structured information and datasets, including computer programs,               databases, files, Web pages and structured data that can be               presented in tabular form</td>
</tr>
<tr style="height: 127.5pt;" height="170">
<td style="height: 127.5pt; font-weight: bold; background-color: #cccccc;" height="170"></td>
<td style="font-weight: bold;">Notations &amp; References</td>
<td style="width: 449pt;" width="599">Akin to conceptual works, these are codified means of human               expression. Examples range from human languages themselves, to               more domain-specific cases such as chemical symbols, genetic code               (A-G-C-T), protocols, and computer languages, mathematical and               set notations, etc.</p>
<p>Identifiers (numeric or alphanumeric identifiers for objects,               often in a highly patterned way, such as phone numbers, URLs, zip               and postal codes, SKUs, product codes, etc.), Units (any of the               various ways in which measurement, space, volume, weight, speed,               intensity, temperature, calories, siesmic intensity or other               quantitative descriptions of phenomena can be made) and key               reference types are also included in this <span style="font-style: italic;">superType</span></td>
</tr>
<tr style="height: 16.5pt;" height="22">
<td style="height: 16.5pt; font-weight: bold; background-color: #cccccc;"></td>
<td style="font-weight: bold;">Numbers</td>
<td style="width: 449pt;">This unique <span style="font-style: italic;">superType</span> is               for any abstract representation of numbers and numerics</td>
</tr>
<tr style="height: 27pt;" height="36">
<td style="border-top: medium none; height: 27pt; font-weight: bold; background-color: white; vertical-align: top;">Human Places</td>
<td style="border-top: medium none; font-weight: bold;">Geopolitical</td>
<td style="border-top: medium none; width: 449pt;">Named places that have some informal or formal political               (authorized) component. Important subcollections include Country,               IndependentCountry, State_Geopolitical, City, and Province.</td>
</tr>
<tr style="height: 27pt;" height="36">
<td style="height: 27pt; font-weight: bold; background-color: white;"></td>
<td style="font-weight: bold;">Workplaces, etc.</td>
<td style="width: 449pt;">These are various workplaces and areas of human activities,               ranging from single person workstations to large aggregations of               people (but which are not formal political entities)</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="border-top: medium none; height: 38.25pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Time-related</td>
<td style="border-top: medium none; font-weight: bold;">Events</td>
<td style="border-top: medium none; width: 449pt;">These are nameable occasions, games, sports events, conferences,               natural phenomena, natural disasters, wars, incidents,               anniversaries, holidays, or notable moments or periods in time</td>
</tr>
<tr style="height: 27.75pt;" height="37">
<td style="height: 27.75pt; font-weight: bold; background-color: #cccccc;"></td>
<td style="font-weight: bold;">Time</td>
<td style="width: 449pt;">This <span style="font-style: italic;">superType</span> is for               specific time or date or period (such as eras, or days, weeks,               months type intervals) references in various formats</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: white; vertical-align: top;">Descriptive</td>
<td style="border-top: medium none; font-weight: bold; background-color: #ffffcc;">Attributes</td>
<td style="border-top: medium none; width: 449pt; background-color: #ffffcc;">This general <span style="font-style: italic;">superType</span> category is for descriptive attributes of all kinds. Think of the             specific attributes in Wikipedia &#8220;infoboxes&#8221; to understand the             purpose and coverage of this <span style="font-style: italic;">superType</span>. It includes colors, shapes,             sizes, or other descriptive characteristics about an object</td>
</tr>
<tr style="height: 51pt;" height="68">
<td style="border-top: medium none; height: 51pt; font-weight: bold; background-color: #cccccc; vertical-align: top;">Classificatory</td>
<td style="border-top: medium none; font-weight: bold; background-color: #ffffcc;">Abstract-level</td>
<td style="border-top: medium none; width: 449pt; background-color: #ffffcc;" width="599">This general <span style="font-style: italic;">superType</span> category is largely composed of former AbstractConcepts, and               represent some of the more abstract upper-level nodes for               connecting the UMBEL structure together. This <span style="font-style: italic;">superType</span> also includes theories or               processes or methods for humans to do stuff or any human               technology</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold; background-color: #ffffcc;">Topics/Categories</td>
<td style="width: 449pt; background-color: #ffffcc;" width="599">This largely subject-oriented <span style="font-style: italic;">superType</span> is a means for using               controlled vocabularies and classification schemes for               characterizing what content &#8220;is about&#8221;. The key constituents of               this category are Types, Classifications, Concepts, Topics, and               controlled vocabularies</td>
</tr>
<tr style="height: 38.25pt;" height="51">
<td style="height: 38.25pt; font-weight: bold; background-color: #cccccc;" height="51"></td>
<td style="font-weight: bold; background-color: #ffffcc;">Markets &amp; Industries</td>
<td style="width: 449pt; background-color: #ffffcc;" width="599">This <span style="font-style: italic;">superType</span> is a               specialized classificatory system for markets and industries. It               could be combined with the <span style="font-style: italic;">superType</span> above, but is kept               separate in order to provide a separate, economy-oriented system.</td>
</tr>
</tbody>
</table>
<p>These may undergo some further refinement prior to release of UMBEL v         0.80, and some of the definitions will be tightened up.</p>
<p>(Note: It should also be mentioned that some of these <span style="font-style: italic;">superTypes</span> further lend themselves to         further splits and analysis. The Product <span style="font-style: italic;">superType</span>, for example, is ripe for such         treatment.)</p>
<h4>Distribution of superTypes</h4>
<p>The following diagram shows the distribution of these 20,000 UMBEL         concepts across major area. By far the largest <span style="font-style: italic;">superType</span> is Products, even with further         splits into Food and Drinks and Pharmaceuticals. The next largest         categories are Person and Places and Events <span style="font-style: italic;">superTypes</span>, with Organizations and Animals not far behind:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_count.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 527px;" title="Click to expand" src="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_count.png" alt="# of superTypes by Category" width="792" height="696" /></a></div>
<p>Even in its generic state, UMBEL provides a very rich vocabulary for         describing things or for tying in more detailed external ontologies.         There are nearly 5,000 concepts across products of all types, for         example.</p>
<h4>Possible Overlaps (non-disjoint) between superTypes</h4>
<p>You may recall that our analysis showed 29 of the <span style="font-style: italic;">superTypes</span> to be &#8220;mostly disjoint.&#8221;          This is because there are some concepts &#8212; say, <span style="font-family: monospace;">MusicPerformingAgent</span> &#8212;         that can apply to either a person or a group (band or orchestra, for         example). Thus, for this concept alone, we have a bit of overlap         between the normally disjoint Person and Organization <span style="font-style: italic;">superTypes</span>.</p>
<p>The following shows the resulting interaction matrix where there may be         some overlap between <span style="font-style: italic;">superTypes</span>:</p>
<div style="margin: 10px 0px;"><a href="../wp-content/themes/ai3/images/2009Posts/090831_UMBELmatrix.png"> <img class="center_ok" style="border: 0px solid; width: 600px; height: 513px;" title="Click to expand" src="../wp-content/themes/ai3/images/2009Posts/090831_UMBELmatrix.png" alt="Instance superTypes Overlap" width="856" height="732" /></a></div>
<p>This kind of interaction diagram is also useful for further analyzing         the concept graph structure, as well.</p>
<h4>Even Where Overlaps Occur, They are Minor</h4>
<p>Of the 29 &#8220;mostly&#8221; disjoint <span style="font-style: italic;">superTypes</span>, only a relatively few show         potential interactions, and then only in minor ways. We can illustrate         this (drawn to scale) for the interaction between the Product, Food         &amp; Drink and Drug (Pharmaceuticals) <span style="font-style: italic;">superTypes</span>, with the fully disjoint         Organization <span style="font-style: italic;">superType</span> thrown         in for comparison:</p>
<div style="margin: 10px 0px;"><img class="center_ok" style="border: 0px solid; width: 380px; height: 519px;" title="Example superTypes Overlap" src="../wp-content/themes/ai3/images/2009Posts/090831_supertypes_venn.png" alt="Example superTypes Overlap" width="380" height="519" /></div>
<p>Across all 20,000 concepts, then, fully 85% are disjoint from one         another (5% is lost due to overlaps between &#8220;mostly&#8221; disjoint         <span style="font-style: italic;">superTypes</span>). This is a         surprising high percentage, with even better likelihood to deliver the         benefits previously noted.</p>
<h3>Interim Conclusions and Observations</h3>
<p>These are exciting findings that bode well for UMBEL&#8217;s ongoing role and         usefulness. Also, the very detailed analysis that has led to these         interim findings very much reaffirms the wisdom of basing UMBEL on         Cyc.  Cyc showed itself to be admirably coherent and remarkably         complete. (It also appears that the first versions of UMBEL were also         extracted well in terms of good coverage.)</p>
<p>This approach now gives us an understandable and defensible basis for         logical segementation of UMBEL. It also provides a much-desired         alternative to the earlier Abstract Concepts, which will now be dropped         entirely as a schema concept.</p>
<p>One area deserving further attention is in the Attribute <span style="font-style: italic;">superType</span>. We are in the process, for         example, of analyzing attributes across Wikipedia and need to look         through a slightly different lens at this <span style="font-style: italic;">superType</span> <a href="#st6">[6]</a>. This area is further         important in its strong interaction with the <a href="../478/making-linked-data-reasonable-using-description-logics-part-4/"> Instance Record Vocabulary</a> that is accompanying this effort on the         entity side.</p>
<p>Another lesson for us has been to back away from the terminology of         named entity, introduced at MUC-6. The expansions of that idea into         other &#8220;nameable&#8221; things has caused us to embrace the &#8220;instance&#8221;         nomenclature, as evidenced by our emerging IRV.</p>
<p>It is rewarding to prepare this next iteration release of UMBEL with its new mindset of logical segmentation and disjointedness. But &#8212; what is also clear &#8212; there are many treasures left to mine still hidden in the inherent structure of         UMBEL and its Cyc parent.</p>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st1"></a> [1] The original labels were ENAMEX for <span style="font-style: italic;">entity named expression</span> and NUMEX for         <span style="font-style: italic;">numeric expression</span>. The markup         format specified was also SGML. For an interesting history of this         MUC-6 watershed, see Ralph Grishman and Beth Sundheim, 1996.         <em><a title="http://acl.ldc.upenn.edu/C/C96/C96-1079.pdf" rel="nofollow" href="http://acl.ldc.upenn.edu/C/C96/C96-1079.pdf">Message Understanding Conference &#8211; 6: A Brief         History</a></em>, in <em>Proceedings of the 16th International Conference         on Computational Linguistics (COLING),</em> I, Kopenhagen, 1996,         466–471.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st2"></a> [2] In a <em>named entity</em>, the word <em>named</em> applies to         entities that have a &#8220;rigid designators&#8221; as defined by Kripke for the         referent. For instance, the automotive company created by Henry Ford in         1903 is referred to as Ford or Ford Motor Company. Rigid designators         include proper names as well as certain natural kind of terms like         biological species and substances.</p>
<p><span style="font-size: x-small;">Sekine’s <a href="http://nlp.cs.nyu.edu/ene/version6_1_0eng.html">extended hierarchy</a> proposed in 2002 is made up of 200 subtypes, with 32 larger clusters         within that. Here is the top level of the Sekine type system:</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Name-Other</span></td>
<td><span style="font-size: x-small;">Title</span></td>
<td><span style="font-size: x-small;">Timex</span></td>
<td><span style="font-size: x-small;">Frequency</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">Unit</span></td>
<td><span style="font-size: x-small;">Periodx</span></td>
<td><span style="font-size: x-small;">Rank</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Vocation</span></td>
<td><span style="font-size: x-small;">Numex-Other</span></td>
<td><span style="font-size: x-small;">Age</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Location</span></td>
<td><span style="font-size: x-small;">Disease</span></td>
<td><span style="font-size: x-small;">Money</span></td>
<td><span style="font-size: x-small;">School Age</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">God</span></td>
<td><span style="font-size: x-small;">Stock Index</span></td>
<td><span style="font-size: x-small;">Latitude Longitude</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">ID Number</span></td>
<td><span style="font-size: x-small;">Point</span></td>
<td><span style="font-size: x-small;">Measurement</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Event</span></td>
<td><span style="font-size: x-small;">Color</span></td>
<td><span style="font-size: x-small;">Percent</span></td>
<td><span style="font-size: x-small;">Countx</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Natural Object</span></td>
<td><span style="font-size: x-small;">Time-Other</span></td>
<td><span style="font-size: x-small;">Multiplication</span></td>
<td><span style="font-size: x-small;">Ordinal Number</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">Though developed separately and for different purposes,         <a href="http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html"> BBN categories</a> also proposed in 2002 consists of 29 types and 64         subtypes. Here are the BBN types (Note: BBN claims 29 types because         there are double entries or considerations for the first five         entries):</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">Time</span></td>
<td><span style="font-size: x-small;">Animal</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NORP (adjectival GPEs)</span></td>
<td><span style="font-size: x-small;">Percent</span></td>
<td><span style="font-size: x-small;">Substance</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">Money</span></td>
<td><span style="font-size: x-small;">Disease</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Quantity</span></td>
<td><span style="font-size: x-small;">Work of Art</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">GPE (geopolitical places)</span></td>
<td><span style="font-size: x-small;">Ordinal</span></td>
<td><span style="font-size: x-small;">Law</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Location</span></td>
<td><span style="font-size: x-small;">Cardinal</span></td>
<td><span style="font-size: x-small;">Language</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">Events</span></td>
<td><span style="font-size: x-small;">Contact Info</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Date</span></td>
<td><span style="font-size: x-small;">Plant</span></td>
<td><span style="font-size: x-small;">Game</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">Of course, other entity extraction systems have similar         clusterings and approaches. Though less formal in the sense of a         hierarchy or purported complete entity coverage, here for example is         the listing of entity types within <a href="http://opencalais.com/documentation/calais-web-service-api/api-metadata/entity-index-and-definitions"> Calais</a>:</span></p>
<table style="margin: 10px 0pt 10px 60px;" border="0" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<td><span style="font-size: x-small;">Anniversary</span></td>
<td><span style="font-size: x-small;">FaxNumber</span></td>
<td><span style="font-size: x-small;">NaturalFeature</span></td>
<td><span style="font-size: x-small;">RadioProgram</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">City</span></td>
<td><span style="font-size: x-small;">Holiday</span></td>
<td><span style="font-size: x-small;">OperatingSystem</span></td>
<td><span style="font-size: x-small;">RadioStation</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Company</span></td>
<td><span style="font-size: x-small;">IndustryTerm</span></td>
<td><span style="font-size: x-small;">Organization</span></td>
<td><span style="font-size: x-small;">Region</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Continent</span></td>
<td><span style="font-size: x-small;">MarketIndex</span></td>
<td><span style="font-size: x-small;">Person</span></td>
<td><span style="font-size: x-small;">SportsEvent</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Country</span></td>
<td><span style="font-size: x-small;">MedicalCondition</span></td>
<td><span style="font-size: x-small;">PhoneNumber</span></td>
<td><span style="font-size: x-small;">SportsGame</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Currency</span></td>
<td><span style="font-size: x-small;">Movie</span></td>
<td><span style="font-size: x-small;">Position</span></td>
<td><span style="font-size: x-small;">SportsLeague</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">EmailAddress</span></td>
<td><span style="font-size: x-small;">MusicAlbum</span></td>
<td><span style="font-size: x-small;">Product</span></td>
<td><span style="font-size: x-small;">Technology</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">EntertainmentAwardEvent</span></td>
<td><span style="font-size: x-small;">MusicGroup</span></td>
<td><span style="font-size: x-small;">ProgrammingLanguage</span></td>
<td><span style="font-size: x-small;">TVShow</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Facility</span></td>
<td><span style="font-size: x-small;">NaturalDisaster</span></td>
<td><span style="font-size: x-small;">ProvinceOrState</span></td>
<td><span style="font-size: x-small;">TVStation</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><span style="font-size: x-small;"> </span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;"> </span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;">PublishedMedium</span></td>
<td style="vertical-align: top;"><span style="font-size: x-small;">URL</span></td>
</tr>
</tbody>
</table>
<p><span style="font-size: x-small;">See further the Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">named entity         recognition</a>.</span></div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st3"></a> [3] We use the reference to “<a href="http://en.wikipedia.org/wiki/Tbox">TBox</a>” in accordance with         our <a title="Permanent Link to Thinking ?Inside the Box? with Description Logics" href="../466/thinking-inside-the-box-with-description-logics/"> working definition</a> for <a href="http://en.wikipedia.org/wiki/Description_logics">description         logics</a>:</p>
<div class="boxGraySolid">&#8220;Description logics and their semantics traditionally split           <span style="font-style: italic;">concepts</span> and their           relationships from the different treatment of <span style="font-style: italic;">instances</span> and their attributes and           roles, expressed as fact assertions. The concept split is known as           the TBox (for <em>terminological</em> knowledge, the basis for           <span style="font-style: italic;">T</span> in <span style="font-style: italic;">TBox</span>) and represents the schema or           taxonomy of the domain at hand. The TBox is the structural and           intensional component of conceptual relationships. The second split           of instances is known as the ABox (for <span style="font-style: italic;">assertions</span>, the basis for <span style="font-style: italic;">A</span> in <span style="font-style: italic;">ABox</span>) and describes the attributes of           instances (and individuals), the roles between instances, and other           assertions about instances regarding their class membership with the           TBox concepts.&#8221;</div>
</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st4"></a> [4] UMBEL also provides a <a href="http://en.wikipedia.org/wiki/SKOS">SKOS</a>-based vocabulary extension         for describing other domains and mappings between classes and         instances. This purpose, however, is outside of the scope of this current         article.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st5"></a> [5] As a reference roadmap, UMBEL was specifically designed         <span style="font-weight: bold; font-style: italic;">not</span> to         include <a href="http://en.wikipedia.org/wiki/Meronymy">meronymous</a> (part of) relationships (see further this reference). Thus, all &#8220;part         of&#8221; type concepts were assigned to the whole <span style="font-style: italic;">superType</span> category for which they are a         part. Thus, &#8220;animal parts&#8221; are assigned to the <span style="font-style: italic;">superType</span> Animal; &#8220;car parts&#8221; to the         <span style="font-style: italic;">superType</span> Product.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a name="st6"></a> [6] For a general discussion of attributes and their relation to         entities, see Satoshi Sekine, 2008. Extended Named Entity Ontology with         Attribute Information, in <span style="font-style: italic;">Proceedings         of the 6th edition of the Language Resources and Evaluation Conference         (LREC 2008)</span>. Marrakech, Morocco. See <a href="http://www.lrec-conf.org/proceedings/lrec2008/pdf/21_paper.pdf">http://www.lrec-conf.org/proceedings/lrec2008/pdf/21_paper.pdf</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/759/supertypes-and-logical-segmentation-of-instances/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Minor Disruptions</title>
		<link>http://www.mkbergman.com/564/minor-disruptions/</link>
		<comments>http://www.mkbergman.com/564/minor-disruptions/#comments</comments>
		<pubDate>Mon, 24 Aug 2009 04:20:33 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Adaptive Information]]></category>
		<category><![CDATA[Information Automation]]></category>
		<category><![CDATA[Site-related]]></category>

		<guid isPermaLink="false">http://www.mkbergman.com/?p=564</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Minor Disruptions&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Information Automation&amp;rft.subject=Site-related&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/564/minor-disruptions/&amp;rft.language=English"></span>

In the Future, All of us May be SysAdmins
OK, well, I just finished moving and upgrading some dozen Web sites and wikis, including this one &#8212; my main blog &#8212; over the weekend, from fixed stuff to the &#8220;clouds&#8220;. Believe you me, there were some pretty massive changes required.
For someone like me who is relatively [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Minor Disruptions&amp;rft.aulast=Bergman&amp;rft.aufirst=Mike&amp;rft.subject=Adaptive Information&amp;rft.subject=Information Automation&amp;rft.subject=Site-related&amp;rft.source=AI3:::Adaptive Information&amp;rft.date=2009-08-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.mkbergman.com/564/minor-disruptions/&amp;rft.language=English"></span>
<p><a href="http://structureddynamics.com/"><img style="border: 0px solid; margin: 5px 10px 0px; width: 204px; height: 236px; float: left;" title="Snake Shedding" src="../wp-content/themes/ai3/images/2009Posts/090823_Snake.jpg" alt="Snake Shedding" /></a></p>
<h2>In the Future, All of us May be SysAdmins</h2>
<p>OK, well, I just finished moving and upgrading some dozen Web sites and wikis, including this one &#8212; my main blog &#8212; over the weekend, from fixed stuff to the &#8220;<a href="http://en.wikipedia.org/wiki/Cloud_computing">clouds</a>&#8220;. Believe you me, there were some pretty massive changes required.</p>
<p>For someone like me who is relatively clueless about such things, the process has been interesting (to say the least).</p>
<p>It seems like our modern era either involves moving digital things or converting digital things. As for moving, we all experience that laptop or hard drive dying, and then the move. (The <em>Death of a Laptop </em>actually happened to my wife this past week.) But it also is changing providers and venues &#8212; what caused me to move all of these Web sites.</p>
<h3>Shedding the Snake Skin</h3>
<p>So, the mainstream digital age has existed for what, now, some 40 years? How many data formats have we transitioned (ASCII, EBCDIC, UTF-8, an immense number)? And, how many systems and environments have we transitioned?</p>
<p>At the risk of dating myself, when I was in college we still used slide rules; truly the end of an era. Just a year or two later everyone transitioned to having TI or HP calculators, some they wore on their hips like some PDAs and cell phones today.</p>
<p>I won&#8217;t bore everyone with my own transition from my first computer (an <a href="http://en.wikipedia.org/wiki/HP_9100">HP 9100</a> with 4K RAM and program listings on cash register tapes) through many others including a <a href="http://en.wikipedia.org/wiki/DEC_Rainbow">DEC Rainbow</a> PC with <a href="http://en.wikipedia.org/wiki/CP/M">CP/M</a> (a beauty!). For many years, as we moved into the PC era and IBM legitimized the shift, every computer I bought seemed to cost about $3000. Each one was more capable, etc., but they all cost the same.</p>
<p>And, then, about the late 1990s, that changed. In fact, my last capable desktop machine cost way south of $1000.</p>
<p>But, I digress.</p>
<p>What has been the real constant across these decades has been system and data migration. Granted, many of the docs and many of the systems in my own experience from 30 yrs ago have no relevance today (god, do I miss <a href="http://en.wikipedia.org/wiki/WordPerfect">WordPerfect</a> with its embedded, editable codes!), but actually an important minor portion do.</p>
<p>For these, I need to move both apps and data (with readable formats) for each generational transition.</p>
<p>I know that organizations, like the Library of Congress in its NDIIPP program, need to worry about <a href="http://www.digitalpreservation.gov/">digital preservation</a>, potentially for millenia. These are worthwhile concerns.</p>
<p>But, from my own more prosaic standpoint, I see this issue with my own lens and own <em>bas relief</em>. I am constantly moving apps and data, each transition much like a snake shedding its skin.</p>
<p>It makes one wonder about the effort and process by which the entire meaningful cultural history of our species continues to adapt and transition forward.</p>
<h3>Getting Back to Real</h3>
<p>Hmmm. All of us have seen these transitions and the loss of productivity they bring in that shift. (Some might argue that the lack of productivity gains from computers until this decade was due to such transitions, which at least now with the Web we see a more common migration framework.)</p>
<p>I think we have no choice but to transition to the next latest and greatest as it emerges. Automated means at acceptable cost for doing such transitions will also be attractive.</p>
<p>But the real point, I think, is that such transitions are inevitable. Faster apps: Check! Better apps: Check! Easier data exchange: Check!!</p>
<p>Living with transition thus becomes a clear constant for all us as we move forward. And, part of that is accepting downtime to screw around moving the keepable old to the potentially useful new.</p>
<p>After this weekend, I&#8217;m now ready for a couple of days off before the real work week begins (yeah, right, keep dreaming).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mkbergman.com/564/minor-disruptions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
