At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing [1].
All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.
Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents). A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.
Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.
We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design [1]. However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:
(click for full size)
The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:
Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.
Current semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.
Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.
These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:
(click for full size)
New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
As the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.
Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.
We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.
There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.
Actual code SVN repositories are unchanged. These code repositories may be found at:
We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!

As I reported about a year ago after my first attendance, I think the Semantic Technology Conference is the best venue going for pragmatic discussion of semantic approaches in the enterprise. I’m really pleased that I will be attending again this year. The conference (#SemTech) will be held at the Hilton Union Square in downtown San Francisco on June 21-25, 2010. Now in its sixth year and the largest of its kind, it is again projected to attract 1500 attendees or so.
I will be presenting two papers this year, covering rather dramatically different topics. Such is the business of a young company like Structured Dynamics that wears many hats!
A really exciting presentation for us is, Sizzle for the Steak: Rich, Visual Interfaces for Ontology-driven Apps, on Wed, June 23 in the 2:00 PM – 3:00 PM session.
A nagging gap in the semantic technology stack is acceptable — better still, compelling — user experiences. After our exile for a couple of years doing essential infrastructure work, we have been unshackled over the past year or so to innovate on user interfaces for semantic technologies.
Our unique approach uses adaptive ontologies to drive rich Internet applications (RIAs) through what we call “semantic components.” This framework is unbelievably flexible and powerful and can seamlessly interact with our structWSF Web services framework and its conStruct Drupal implementations.
We will be showing these rich user interfaces for the first time in this session. We will show concept explorers, “slicer-and-dicers”, dashboards, information extraction and annotation, mapping, data visualization and ontology management. Get your visualization anyway you’d like, and for any slice you’d like!
While we will focus on the sizzle and demos, we will also explain a bit of the technology that is working behind Oz’s curtain.
A more informal, interactive F2F discussion will be, MIKE2.0 for the Semantic Enterprise, on Thurs, June 24 in the 4:45 PM – 5:45 PM slot.
MIKE2.0 (Method for an Integrated Knowledge Environment) is an open source methodology for enterprise information management that is coupled with a collaborative framework for information development. It is oriented around a variety of solution “offerings”, ranging from the comprehensive and the composite to specific practices and technologies. A couple of months back, I gave an overview of MIKE2.0 that was pretty popular.
We have been instrumental in adding a semantic enterprise component to MIKE2.0, with our specific version of it called Open SEAS. In this Face-to-Face session, experts and desirous practitioners will join together to discuss how to effectively leverage this framework. While I will intro and facilitate, expect many other MIKE2.0 aficionados to participate.
This is perhaps a new concept to many, but what is exciting about MIKE2.0 is that it provides a methodology and documentation complement to technology alone. When combined with that technology, all pieces comprise what might be called a total open solution. I personally think it is the next logical step beyond open source.
So, if you have not already made plans, consider adjusting your schedule today. And, contact me in advance (mike at structureddynamics dot com) if you’ll be there. We’d love to chat!

In earlier posts, I described the significant progress in climbing the data federation pyramid, today’s evolution in emphasis to the semantic Web, and the 40 or so sources of semantic heterogeneity. We now transition to an overview of how one goes about providing these semantics and resolving these heterogeneities.
In an excellent recent overview of semantic Web progress, Paul Warren points out:[1]
Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. . . . Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area . . . .Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge.
Of course, even assuming that ontologies are created and semantics and metadata are added to content, there still remains the nasty problems of resolving heterogeneities (semantic mediation) and efficiently storing and retrieving the metadata and semantic relationships.
Putting all of this process in place requires the infrastructure in the form of tools and automation and proper incentives and rewards for users and suppliers to conform to it.
In his paper, Warren repeatedly points to the need for “semi-automatic” methods to make the semantic Web a reality. He makes fully a dozen such references, in addition to multiple references to the need for “reasoning algorithms.” In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:
In a different vein, SemWebCentral lists these clusters of semantic Web-related tasks, each of which also requires tools:[2]
With some ontologies approaching tens to hundreds of thousands to millions of triples, viewing, annotating and reconciling at scale can be daunting tasks, the efforts behind which would never be taken without useful tools and automation.
A 2005 paper by Izza, Vincent and Burlat (among many other excellent ones) at the first International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA) provides a very readable overview on the role of semantics and ontologies in enterprise integration.[3] Besides proposing a fairly compelling unified framework, the authors also present a useful workflow perspective emphasizing Web services (WS), also applicable to semantics in general, that helps frame this challenge:

Generic Semantic Integration Workflow (adapted from [3])
For existing data and documents, the workflow begins with information extraction or annotation of semantics and metadata (#1) in accordance with a reference ontology. Newly found information via harvesting must also be integrated; however, external information or services may come bearing their own ontologies, in which case some form of semantic mediation is required.
Of course, this is a generic workflow, and depending on the interoperation task, different flows and steps may be required. Indeed, the overall workflow can vary by perspective and researcher, with semantic resolution workflow modeling a prime area of current investigations. (As one alternative among scores, see for example Cardoso and Sheth.[4])
Semantic mediation is a process of matching schemas and mapping attributes and values, often with intermediate transformations (such as unit or language conversions) also required. The general problem of schema integration is not new, with one prior reference going back as early as 1986. [5] According to Alon Halevy:[6]
As would be expected, people have tried building semi-automated schema-matching systems by employing a variety of heuristics. The process of reconciling semantic heterogeneity typically involves two steps. In the first, called schema matching, we find correspondences between pairs (or larger sets) of elements of the two schemas that refer to the same concepts or objects in the real world. In the second step, we build on these correspondences to create the actual schema mapping expressions.
The issues of matching and mapping have been addressed in many tools, notably commercial ones from MetaMatrix,[7] and open source and academic projects such as Piazza, [8] SIMILE, [9] and the WSMX (Web service modeling execution environment) protocol from DERI. [10] [11] A superb description of the challenges in reconciling the vocabularies of different data sources is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM’s Prestigious Doctoral Dissertation Award.[12]
What all of these efforts has found is the inability to completely automate the mediation process. The current state-of-the-art is to reconcile what is largely unambiguous automatically, and then prompt analysts or subject matter experts to decide the questionable matches. These are known as “semi-automated” systems and the user interface and data presentation and workflow become as important as the underlying matching and mapping algorithms. According to the WSMX project, there is always a trade-off between how accurate these mappings are and the degree of automation that can be offered.
Once all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community.
As Stonebraker and Hellerstein note in their retrospective covering 35 years of development in databases,[13] some of the first post-relational data models were typically called semantic data models, including those of Smith and Smith in 1977[14] and Hammer and McLeod in 1981.[15] Perhaps what is different now is our ability to address some of the fundamental issues.
At any rate, this subsection is included here because of the hidden importance of database foundations. It is therefore a topic often addressed in this series.
In all of these areas, there is a growing, but still spotty, set of tools for conducting these semantic tasks. SemWebCentral, the open source tools resource center, for example, lists many tools and whether they interact or not with one another (the general answer is often No).[16] Protégé also has a fairly long list of plugins, but not unfortunately well organized. [17]
In the table below, I begin to compile a partial listing of semantic Web tools, with more than 50 listed. Though a few are commercial, most are open source. Also, for the open source tools, only the most prominent ones are listed (Sourceforge, for example, has about 200 projects listed with some relation to the semantic Web though most of minor or not yet in alpha release).
|
NAME |
URL |
DESCRIPTION |
| Almo | http://ontoware.org/projects/almo | An ontology-based workflow engine in Java |
| Altova SemanticWorks | http://www.altova.com/products_semanticworks.html | Visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design |
| Bibster | http://bibster.semanticweb.org/ | A semantics-based bibliographic peer-to-peer system |
| cwm | http://www.w3.org/2000/10/swap/doc/cwm.html | A general purpose data processor for the semantic Web |
| Deep Query Manager | http://www.brightplanet.com/products/dqm_overview.asp | Search federator from deep Web sources |
| DOSE | https://sourceforge.net/projects/dose | A distributed platform for semantic annotation |
| ekoss.org | http://www.ekoss.org/ | A collaborative knowledge sharing environment where model developers can submit advertisements |
| Endeca | http://www.endeca.com | Facet-based content organizer and search platform |
| FOAM | http://ontoware.org/projects/map | Framework for ontology alignment and mapping |
| Gnowsis | http://www.gnowsis.org/ | A semantic desktop environment |
| GrOWL | http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html | Open source graphical ontology browser and editor |
| HAWK | http://swat.cse.lehigh.edu/projects/index.html#hawk | OWL repository framework and toolkit |
| HELENOS | http://ontoware.org/projects/artemis | A Knowledge discovery workbench for the semantic Web |
| Jambalaya | http://www.thechiselgroup.org/jambalaya | Protégé plug-in for visualizing ontologies |
| Jastor | http://jastor.sourceforge.net/ | Open source Java code generator that emits Java Beans from ontologies |
| Jena | http://jena.sourceforge.net/ | Opensource ontology API written in Java |
| KAON | http://kaon.semanticweb.org/ | Open source ontology management infrastructure |
| Kazuki | http://projects.semwebcentral.org/projects/kazuki/ | Generates a java API for working with OWL instance data directly from a set of OWL ontologies |
| Kowari | http://www.kowari.org/ | Open source database for RDF and OWL |
| LuMriX | http://www.lumrix.net/xmlsearch.php | A commercial search engine using semantic Web technologies |
| MetaMatrix | http://www.metamatrix.com/ | Semantic vocabulary mediation and other tools |
| Metatomix | http://www.metatomix.com/ | Commercial semantic toolkits and editors |
| MindRaider | http://mindraider.sourceforge.net/index.html | Open source semantic Web outline editor |
| Model Futures OWL Editor | http://www.modelfutures.com/OwlEditor.html | Simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports |
| Net OWL | http://www.netowl.com/ | Entity extraction engine from SRA International |
| Nokia Semantic Web Server | https://sourceforge.net/projects/sws-uriqa | An RDF based knowledge portal for publishing both authoritative and third party descriptions of URI denoted resources |
| OntoEdit/OntoStudio | http://ontoedit.com/ | Engineering environment for ontologies |
| OntoMat Annotizer | http://annotation.semanticweb.org/ontomat | Interactive Web page OWL and semantic annotator tool |
| Oyster | http://ontoware.org/projects/oyster | Peer-to-peer system for storing and sharing ontology metadata |
| Piggy Bank | http://simile.mit.edu/piggy-bank/ | A Firefox-based semantic Web browser |
| Pike | http://pike.ida.liu.se/ | A dynamic programming (scripting) language similar to Java and C for the semantic Web |
| pOWL | http://powl.sourceforge.net/index.php | Semantic Web development platform |
| Protégé | http://protege.stanford.edu/ | Open source visual ontology editor written in Java with many plug-in tools |
| RACER Project | https://sourceforge.net/projects/racerproject | A collection of Projects and Tools to be used with the semantic reasoning engine RacerPro |
| RDFReactor | http://rdfreactor.ontoware.org/ | Access RDF from Java using inferencing |
| Redland | http://librdf.org/ | Open source software libraries supporting RDF |
| RelationalOWL | https://sourceforge.net/projects/relational-owl | Automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OW |
| Semantical | http://semantical.org/ | Open source semantic Web search engine |
| SemanticWorks | http://www.altova.com/products_semanticworks.html | SemanticWorks RDF/OWL Editor |
| Semantic Mediawiki | https://sourceforge.net/projects/semediawiki | Semantic extension to the MediaWiiki wiki |
| Semantic Net Generator | https://sourceforge.net/projects/semantag | Utility for generating topic maps automatically |
| Sesame | http://www.openrdf.org/ | An open source RDF database with support for RDF Schema inferencing and querying |
| SMART | http://web.ict.nsc.ru/smart/index.phtml?lang=en | System for Managing Applications based on RDF Technology |
| SMORE | http://www.mindswap.org/2005/SMORE/ | OWL markup for HTML pages |
| SPARQL | http://www.w3.org/TR/rdf-sparql-query/ | Query language for RDF |
| SWCLOS | http://iswc2004.semanticweb.org/demos/32/ | A semantic Web processor using Lisp |
| Swoogle | http://swoogle.umbc.edu/ | A semantic Web search engine with 1.5 M resources |
| SWOOP | http://www.mindswap.org/2004/SWOOP/ | A lightweight ontology editor |
| Turtle | http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/ | Terse RDF “Triple” language |
| WSMO Studio | https://sourceforge.net/projects/wsmostudio | A semantic Web service editor compliant with WSMO as a set of Eclipse plug-ins |
| WSMT Toolkit | https://sourceforge.net/projects/wsmt | The Web Service Modeling Toolkit (WSMT) is a collection of tools for use with the Web Service Modeling Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX) |
| WSMX | https://sourceforge.net/projects/wsmx/ | Execution environment for dynamic use of semantic Web services |
Individually, there are some impressive and capable tools on this list. Generally, however, the interfaces are not intuitive, integration between tools is lacking, and why and how standard analysts should embrace them is lacking. In the semantic Web, we have yet to see an application of the magnitude of the first Mosaic browser that made HTML and the World Wide Web compelling.
It is perhaps likely that a similar “killer app” may not be forthcoming for the semantic Web. But it is important to remember just how entwined tools are to accelerating acceptance and growth of new standards and protocols.
This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on June 12, 2006. It was the follow-on to last week’s Brown Bag Lunch posting. It is also the first attempt I made at assembling semantic Web- and -related tools, which has now grown into the 800+ Sweet Tools listing. No changes have been made to the original posting.
Huzzah! for Local Government Open Data, Transparency, Community Indicators and Citizen JournalismWhile the Knight News Challenge is still working its way through the screening details, Structured Dynamics‘ Citizen DAN proposal remains in the hunt. Listen to this:
To date, we have been the most viewed proposal by far (2x more than the second most viewed!!! Hooray!) and are in the top five of highest rated (have also been at #1 or #2, depending. Hooray!). Thanks to all of you for your interest and support.
There is much to recommend this KNC approach, not the least of which being able to attract some 2,500 proposals seeking a piece of the 2010 $5 million potential grant awards. Our proposal extends SD’s basic structWSF and conStruct Drupal frameworks to provide a data appliance and network (DAN) to support citizen journalists with data and analysis at the local, community level.
None of our rankings, of course, guarantees anything. But, we also feel good about how the market is looking at these frameworks. We have recently been awarded some pretty exciting and related contracts. Any and all of these initiatives will continue to contribute to the open source Citizen DAN vision.
And, what might that vision be? Well, after some weeks away from it, I read again our online submission to the Knight News Challenge. I have to say: It ain’t too bad! (Plus many supporting goodies and details.)
So, I repeat in its entirety below, the KNC questions and our formal responses. This information from our original submittal is unchanged, except to add some live links where they could not be submitted as such before. (BTW, the bold headers are the KNC questions.) Eventual winners are slated to be announced around mid-June. We’re keeping our fingers crossed, but we are pursuing this initiative in any case.
Citizen DAN is an open source framework to leverage relevant local data for citizen journalists. It is a:
Good decisions and good journalism require good information. Starting with pre-loaded government data, Citizen DAN provides any citizen the framework to learn and compare local statistics and data with other similar communities. This helps to promote the grist for citizen journalism; it is also a vehicle for discovery and learning across the community.
Citizen DAN comes pre-packaged with all necessary deployment components and documentation, including local data from government sources. It includes facilities for direct upload of additional local data in formats from spreadsheets to standard databases. Many standard converters are included with the basic package.
Citizen DAN may be implemented by local governments or by community advocacy groups. When deployed, using its clear documentation, sponsors may choose whether or what portions of local data are exposed to the broader Citizen DAN network. Data exposed on the network is automatically available to any other network community for comparison and analysis purposes.
This data appliance and network (DAN) is multi-lingual. It will be tested in three cities in Canada and the US, showing its multi-lingual capabilities in English, Spanish and French.
With Citizen DAN, anyone with Web access can now get, slice, and dice information about how their community is doing and how it compares to other communities. We have learned from Web 2.0 and user-generated content that once exposed, useful information can be taken and analyzed in valuable and unanticipated ways.
The trick is to get information that already exists. Citizen journalists of the past may not have either known:
By removing these hurdles, Citizen DAN improves the ways information is delivered to communities and provides the framework for sifting through it to extract meaning.
Government public data in electronic tabular form or as published listings or tables in local newspapers has been available for some time. While meeting strict ‘disclosure’ requirements, this information has neither been readily analyzable nor actionable.
The meaning of information lies in its interpretation and analysis.
Citizen DAN is innovative because it:
Structured Dynamics has already developed and released as open-source code structWSF and conStruct , the basic foundations to this proposal. structWSF provides the network and dataset “backbone” to this proposal; conStruct provides the Drupal portal and Web site framework.
To this foundation we add proven experience and knowledge of datasets and how to access them, as well as tools and converters for how to stage them for standard public use. A key expertise of Structured Dynamics is the conversion of virtually any legacy data format into interoperable canonical forms.
These are important challenges, which require experience in the semantics of data and mapping from varied forms into useful and common frameworks. Structured Dynamics has codified its expertise in these areas into the software underlying Citizen DAN.
Structured Dynamics’ principals are also multi-lingual, with language-neutral architectures and code. The company’s principals are also some of the most prominent bloggers and writers in the semantic Web. We are acknowledged as attentive to documentation and communication.
Finally, Structured Dynamics’ principals have more than a decade of track record in successful data access and mining, and software and venture development.
To this strong basis, we have preliminary city commitments for deploying this project in the United States (English and Spanish) and Canada (French and English).
ThisWeKnow offers local Census data, but no community or publishing aspects. Data sharing is in DataSF and DataMine (NYC), but they lack collaboration, community networks and comparisons, or powerful data visualization or mapping.
Citizen DAN is a turnkey platform for any size community to create, publish, search, browse, slice-and-dice, visualize or compare indicators of community well-being. Its use makes the Web more locally focused. With it, researchers, watchdog groups, reporters, local officials and interested citizens can now discover hard data for ‘new news’ or fact-check mainstream media.
There are two releases with feedback. Each task summary, listing of task hours (hr) and duration in months (mo), in rough sequence order with overlaps, is:
See attached task details.
"Information is the currency of democracy." Thomas Jefferson (n.b.)
We intuitively understand that an informed citizenry is a healthy polity. At the global level and in 250 languages, we see how Wikipedia, matched with the Internet and inexpensive laptops, is bringing unforeseen information and enrichment to all. Across the board, we are seeing the democratization of information.
But very little of this revolution has percolated to the local level.
Only in the past decade or so have we seen free, electronic access to national Census data. We still see local data only published in print or not available at all, limiting both awareness but more importantly understanding and analysis. Data locked up in municipal computers or available but not expressed via crowdsourcing is as good as non-existent.
Though many citizens at the local level are not numeric, intuition has to tell us that the absense of empirical, local data hurts our ability to understand, reason and debate our local circumstances. Are we doing better or worse than yesterday? Than in comparison with our peers? Under what measures does this have meaning about community well being?
The purpose of the Citizen DAN project is to create an appliance — in the same sense of refrigerators keeping our food from spoiling — by which any citizen can crack open and expose relevant data at the local level. Citizen DAN is about enrichening our local information and keeping our communities healthy.
We will measure the progress of the project by the number of communities and local organizations that use the Citizen DAN platform to create and publish community data. Subsidiary measures include the number of:
These measures, plus active sites with profiles of each, will be monitored and tracked on the central Citizen DAN portal.
‘Ultimate success’ is related to the general growth in transparent government at the local level. Growth in Citizen DAN-related measures on a year-over-year basis or in relation to Gov2.0 would indicate success.
There is no technical risk to this proposal, but there are risks in scope, awareness and acceptance. Our system has been operational for one year for relevant use cases; all components have been integrated, debugged, and put into production.
Scope risks relate to how much data the Citizen DAN platform is loaded with, and how much functionality is included. We balance the data question by using common public datasets for baseline data, then add features for localities to “crowdsource” their own supplementary data. We balance the functionality question by limiting new development to data visualization/mapping and to upload functions (per above), and then to refine what already exists.
Awareness risks arise from a crowded attention space. We can overcome this in two ways. The first is to satisfy users at our test sites. That will result in good recommendations to help seed a snowball effect. The second way is to use social media and our existing Web outlets aggressively. We have been building awareness for our own properties in steady, inch-by-inch measures. While a notable few Web efforts may go viral, the process is not predictable. Steady, constant focus is our preferred recipe.
Acceptance risk is intimately linked with awareness and use. If we can satisfy each Citizen DAN community, then new datasets, new functionality and new awareness will naturally arise. More users and more contributions through the network effect are the best way to broad acceptance.
Marketing and awareness efforts will include our use of social media, dedicated Web sites, support from test communities, and outreach to relevant community Web sites.
Our own blogs are popular in the semantic Web and structured data space (~3K uniques daily); we have published two posts on Citizen DAN and will continue to do so with more frequency once the effort gets underway.
We will create a central portal (http://citizen-dan.org) based on the project software (akin to our other project sites). The model for this apps and deployments clearinghouse is CrimeReports.com. Using social aspects and crowdsourcing, the site will encourage sharing and best practices amongst the growing number of Citizen DAN communities.
We will blog and post announcements for key releases and milestones on relevant external Web sites including various Gov 2.0 sites, Community Indicators Consortium, GovLoop, Knight News Challenge, the Sunlight Foundation, and so forth. In addition, we will collate and track individual community efforts (maintained on the central Citizen DAN site) and make specific outreach to community data sites (such as DataSF or DataMine at NYC.gov). We will use Twitter (#CitizenDAN, etc) and the social networks of LinkedIn, Facebook, and Meetup to promote Citizen DAN activity.
We will interact with advocates of citizen journalism, and engage civic organizations, media, and government officials (esp in our three test communities) to refine our marketing plan.
Citizen DAN is not an experiment. It is a working framework that gives any locality and its citizenry the means to assemble, share and compare measures of its community well-being with other communities. These indicators, in turn, provide substance and grist for greater advocacy and writing and blogging (”journalism”) at the local level.
Granted, there are unknowns: How many localities will adopt the Citizen DAN appliance? How essential will its data be to local advocacy and news? How active will each Citizen DAN installation be in attracting contributions and local data?
We submit the better way to frame the question is the degree of adoption, as opposed to will it work.
Web-based changes in our society and social interaction are leading to the democratization of information, access to it, and channels for expression. Whether ultimately successful in the specific form proposed herein, Citizen DAN and its open source software and frameworks will surely be adopted in one form or another — to one degree or another — in the unassailable trend toward local government transparency and citizen involvement.
In short, Yes: We believe Citizen DAN will continue long after the grant.
Our plan begins with the nature of Citizen DAN as software and framework. Sustainability is a question of whether the appliance itself is useful, and how users choose to leverage it.
Mediawiki, the software behind Wikipedia, is an analog. Mediawiki is an enabling infrastructure. Some sites using it are not successful; others wildly so. Success has required the combination of a good appliance with topicality and good management. The same is true for Citizen DAN.
Our plan thus begins with Citizen DAN as a useful appliance, as free open source with great documentation and prominent initial use cases. Our plan continues with our commitment to the local citizen marketplace.
We are developing Citizen DAN because of current trends. We foresee many hundreds of communities adopting the system. Most will be able to do so on their own. Some others may require modifications or assistance. Our self-interest is to ensure a high level of adoption.
An era of citizen engagement is unfolding at the local level, fueled by Web technologies and growing comfort with crowdsourcing and social networks. Meanwhile, local government constraints and pressures for transparency are unleashing locked-up data. These forces will create new opportunities for data literacy by the public, that will itself bring new understanding and improvements in governance and budgeting. We plan on Citizen DAN and its offspring to be one of the catalysts for those changes.
Well, for another client and another purpose, I was goaded into screening my Sweet Tools listing of semantic Web and -related tools and to assemble stuff from every other nook and cranny I could find. The net result is this enclosed listing of some 140 or so tools — most open source — related to semantic Web ontology building in one way or another.
Ever since I wrote my Intrepid Guide to Ontologies nearly three years ago (and one of the more popular articles of this site, though it is now perhaps a bit long in the tooth), I have been intrigued with how these semantic structures are built and maintained. That interest, in no small measure, is why I continue to maintain the Sweet Tools listing.
As far as I know, the following is the largest and most comprehensive listing of ontology building tools available. I broadly interpret the classification of ‘ontology building’; I include, for example, vocabulary extraction and prompting tools, as well as ontology visualization and mapping.
There are some 140 tools, perhaps 90 or so are still in active use. (Given the scope, not every tool could be inspected in detail. Some listed as being perhaps inactive may not be so, and others not in that category perhaps should be.) Of the entire roster of tools, somewhere on the order of 12 to 20 are quite impressive and deserving of local installation, test runs, and close inspection.
There are relatively few tools useful to non-specialists (or useful to engaging knowledgeable publics in the ontology-building exercise). There appear to be key gaps in the entire workflow from domain scoping and initial ontology definition and vocabulary candidates, to longer-term maintenance and revision. For example, spreadsheets would appear to be a possible useful first step in any workflow process (which is why irON is listed), but the spreadsheet tool per se is not listed herein (nor are text editors).
I surely have missed some tools and likely improperly assigned others. Please drop me an email or comment on this post with any revisions or suggestions.
In my own view, there are some tools that definitely deserve a closer look. My favorite candidates — for very different reasons and for very different places in the workflow — are (in no particular order): Apelon DTS, irON, FlexViz, Knoodl, Protégé, diagramic.com, BooWa, COE, ontopia, Anzo, PoolParty, Vine (and voc2rdf), Erca, Graphl, and GrOWL. Each one of these links is more fully described below. Also, all tools in the Vocabulary Prompting Tools category (which also includes extraction) are worth reviewing since all or nearly all have online demos.
Other tools may also be deserving, depending on use case. Some of the more specific analysis and conversion tools, for example, are in the Miscellaneous category.
Also, some purists may quibble with why some tools are listed here (such as inclusion of some stuff related to Topic Maps). Well, my answer to that is there are no real complete solutions, and whatever we can pragmatically do today requires glueing together many disparate parts.
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.

If you are like me, you like to clear the decks before the start of major new projects. In Structured Dynamics‘ case, we actually have multiple new initiatives getting underway, so the deck clearing has been especially focused this time.
As a result, we have updated Sweet Tools, AI3’s listing of semantic Web and -related tools, with the addition of some 30 new tools, updates to others, and deletions of five expired entries. The dataset now lists 835 tools. And, as before, there is also now a new structured data view via conStruct (pick the Sweet Tools dataset).
We have also updated SWEETpedia, a listing of 246 research articles that use Wikipedia in one way or another to do semantic-Web related research. Some 20 new papers were added to this update.
Please use the comments section on this post to suggest new tools or new research articles for inclusion in future updates.
Structured Dynamics and its Citizen DAN project has been selected as one of the finalists to proceed with a formal proposal for the 2010 $5 million Knight News Challenge. The proposal extends SD’s basic structWSF and conStruct Drupal frameworks to provide a data appliance and network (DAN) to support citizen journalists with data and analysis at the local, community level.
Thanks to all of you who submitted votes in support of the earlier draft proposal. The News Challenge received 2,489 proposals for the 2010 contest, according to Jose Zamora, journalism program associate at the Knight Foundation. According to the Nieman Journalism Lab, Zamora said 65 percent of proposals came through the closed category and 35 percent were open.
The next-round full proposals are due by January 31. Eventual winners are slated to be announced around mid-June 2010.