
Sweet Tools, AI3‘s listing of semantic Web and -related tools, now has a total of 810 tools listed, a significant expansion from the last update. With the retirement of 19 prior tools, this new listing represents an increase of 93 tools, or 13%, from the previous version that listed 736.
The Sweet Tools dataset is also now showing the way to a couple of exciting innovations: new generic ontology-driven applications for structured data; and, tools for authoring structured data via spreadsheets.
So, here is the summary of major changes in this new listing:
A completely new structured data view of the listing, courtesy of Structured Dynamics‘ structWSF and conStruct open source frameworks. This version can be viewed on the conStruct SCS Web site (pick the Sweet Tools dataset). You can compare this server-side presentation and version to the client-side JavaScript version using Exhibit that has been part of this blog for some timeTo see the major Sweet Tools page for this updated listing in its existing format, filter on ‘New’ under New or Existing? to see the recent additions. Alternatively, you can also see this same filtering using the conStruct structured data view by searching on the Status attribute using the value ‘New’; see example here.
Though still formative, the most exciting change with the Sweet Tools listing is this new presentation via conStruct. It is a structured data Web services framework with a UI, all offered as a set of modules to Drupal. To kick the tires with this new system, you may want to look at:
BTW, there are some helpful documentation pages that show how all of these various tools work and more, such as, for example, Browse. (Also, BTW, as a demo user, you also are not seeing all of the write and update tools, either; again, see the documentation.)
The essential underlying basis to conStruct is the structWSF Web services framework. There are still some aspects to this system that we feel are incomplete and we are working on. Some of these things include dropdown selections (controlled vocabulary selects); easier template creation; and intuitive template re-use. Nonetheless, these additions will come quickly, and what is here is already a great demonstration of how structured data can drive generic tools and interfaces.
The case study of how this system was constructed from a spreadsheet input using the irON vocabulary is described in an earlier post.
The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories are browser tools (RDF, OWL), information extraction, parsers or converters, composite application frameworks and general ontology tools. Each accounts for more than 8% — or more than 50 tools — of the total. This breakdown is as follows (click to expand):
As for the languages these applications are written in, that has stayed pretty steady, too. Java is still the leading language at about 46%, which has been very slightly trending downward over the past three years or so. PHP has increased a bit as well. The current splits are (click to expand):
Background on prior listings and earlier statistics may be found on these previous posts:
With interim updates periodically over that period.
Structured Dynamics is one of the 320+ plus submitters (and counting!) to the 2010 $5 million Knight News Challenge. Our proposal is to extend our basic structWSF and conStruct Drupal frameworks to provide a data appliance and network (DAN) to support citizen journalists with data and analysis at the local, community level.
We invite you to look at our application and to provide comments or your rating of the application. The deadline for comments is tomorrow, and we will incorporate any appropriate last-minute suggestions. You can find our submission at:
Citizen DAN Proposal (or, search on ‘citizen dan’)
Please note: you must be signed in via a short submission to vote or comment on the application (or others shown in the listings).
Citizen DAN is meant to be a complete, open source framework for promoting citizen journalism. It is a:
Good decisions and good journalism require good starting information. Citizen DAN is a framework to provide access for any citizen to learn and compare local statistics and data with other similar communities. This helps to promote the grist for citizen journalism, as well as to provide a vehicle for discovery and learning across the community.
Citizen DAN will come pre-packaged with all necessary deployment components and documentation, including local data from government sources. It will include facilities for direct upload of additional local data in formats from spreadsheets to standard databases; many standard converters are included with the basic package.
Citizen DAN may be implemented either by local governments or by community advocacy groups. When deployed, using its clear documentation, sponsors may choose whether or what portions of local data are exposed to the broader Citizen DAN network. Data exposed on the network is automatically available to any other network community for comparison and analysis purposes.
(You may want to see our separate description — structWSF: A Framework for Collaboration Networks — of how this framework can lead to collaboration through widely distributed community nodes.)
The complete data appliance and network (DAN) is multi-lingual. If funded, this project will be tested and deployed in at least two prominent cities; one in Canada (French and English), and one in the United States (English and Spanish).
We think Citizen DAN is an exciting new prospect for local communities to share and use local data. Your support can help make this app available to any community for free.
And, in any case, do check out the other fine submissions to the challenge.
It has been eight months since the last major update to Sweet Tools, AI3‘s listing of semantic Web and -related tools. With today’s release, there are now a total of 810 tools listed, crashing through the sound barrier of 761 tools. With the retirement of 19 prior tools, this new listing represents an increase of 93 tools, or 13%, from the previous version that listed 736.
But simply adding to the tools listing is not the cause of this longer than normal period between updates.
This little Sweet Tools dataset is now showing the way to a couple of exciting innovations: new generic ontology-driven applications for structured data; and, tools for authoring structured data via spreadsheets.
We deal with the former in this post. I will deal with the spreadsheet business in a subsequent post.
So, here is the summary of major changes in this new listing:
A completely new structured data view of the listing, courtesy of Structured Dynamics‘ structWSF and conStruct open source frameworks. This version can be viewed on the conStruct SCS Web site (pick the Sweet Tools dataset). You can compare this server-side presentation and version to the client-side JavaScript version using Exhibit that has been part of this blog for some timeTo see the major Sweet Tools page for this updated listing in its existing format, filter on ‘New’ under New or Existing? to see the recent additions. Alternatively, you can also see this same filtering using the conStruct structured data view by searching on the Status attribute using the value ‘New’; see example here.
Though still formative, the most exciting change with the Sweet Tools listing is this new presentation via conStruct. It is a structured data Web services framework with a UI, all offered as a set of modules to Drupal. To kick the tires with this new system, you may want to look at:
BTW, there are some helpful documentation pages that show how all of these various tools work and more, such as, for example, Browse. (Also, BTW, as a demo user, you also are not seeing all of the write and update tools, either; again, see the documentation.)
The essential underlying basis to conStruct is the structWSF Web services framework. There are still some aspects to this system that we feel are incomplete and we are working on. Some of these things include dropdown selections (controlled vocabulary selects); easier template creation; and intuitive template re-use. Nonetheless, these additions will come quickly, and what is here is already a great demonstration of how structured data can drive generic tools and interfaces.
As I said: More on this in a later post.
The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories are browser tools (RDF, OWL), information extraction, parsers or converters, composite application frameworks and general ontology tools. Each accounts for more than 8% — or more than 50 tools — of the total. This breakdown is as follows (click to expand):
As for the languages these applications are written in, that has stayed pretty steady, too. Java is still the leading language at about 46%, which has been very slightly trending downward over the past three years or so. PHP has increased a bit as well. The current splits are (click to expand):
Background on prior listings and earlier statistics may be found on these previous posts:
With interim updates periodically over that period.
Note: Because of comments expirations on prior posts, this entry is now the new location for adding a suggested new tool. Simply provide your information in the comments section, and your tool will be included in the next update.
Much has been happening on the Structured Dynamics front of late. Besides welcoming Steve Ardire as a senior advisor to the company, we also have been issuing a steady stream of new products from our semantic Web pipeline.
This new slide show attempts to capture these products and relate them to the various layers in Structured Dynamics’ enterprise product stack:
The show indicates the role of scones, irON, structWSF, UMBEL, conStruct and others and how they leverage existing information assets to enable the semantic enterprise. And, oh, by the way, all of this is done via Web-accessible linked data and our practical technologies.
Enjoy!
On behalf of Structured Dynamics, I am pleased to announce our release into the open source community of irON — the instance record and Object Notation — and its family of frameworks and tools [1]. With irON, you can now author and conduct business solely in the formats and tools most familiar and comfortable to you, all the while enabling your data to interact with the semantic Web.
irON is an abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON).
The surprising thing about irON is that — by following its simple conventions and vocabulary — you will be authoring and creating interoperable RDF datasets without doing much different than your normal practice.
This first specification for the irON notation includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. In this newly published irON specificatiion, profiles and examples are also provided for each of the irXML, irJSON and commON serializations. The irON release also includes a number of parsers and converters of the specification into RDF [2]. Data ingested in the irON frameworks can also be exported as RDF and staged as linked data.
The objective of irON is to make it easy for data owners to author, read and publish data. This means the starting format should be a human readable, easily writable means for authoring and conveying instance records (that is, instances and their attributes and assigned values) and the datasets that contain them. Among other things, this means that irON‘s notation does not use RDF “triples“, but rather the native notations of the host serializations.
irON is premised on these considerations and observations:
The irON notation and vocabulary is designed to allow the conceptual structure (“schema”) of datasets to be described, to facilitate easy description of the instance records that populate those datasets, and to link different structures for different schema to one another. In these manners, more-or-less complete RDF data structures and instances can be described in alternate formats and be made interoperable. irON provides a simple and naïve information exchange notation expressive enough to describe most any data entity.
The notation also provides a framework for extending existing schema. This means that irON and its three serializations can represent many existing, common data formats and standards, while also providing a vehicle for extending them. Another intent of the specification is to be sparse in terms of requirements. For instance, this reserved vocabulary is fairly minimal and optional in most all cases. The irON specification supports skeletal submissions.
The aim of irON is to describe instance records. An instance record is simply a means to represent and convey the information (”attributes”) describing a given instance. An instance is the thing at hand, and need not represent an individual; it could, for example, represent the entire holdings or collection of books in a given library. Such instance records are also known as the ABox [5]. The simple design of irON is in keeping with the limited roles and work associated with this ABox role.
Attributes provide descriptive characteristics for each instance. Every attribute is matched with a value, which can range from descriptive text strings to lists or numeric values. This design is in keeping with simple attribute-value pairs where, in using the terminology of RDF triples, the subject is the instance itself, the predicate is the attribute, and the object is the value. irON has a vocabulary of about 40 reserved attribute terms, though only two are ever required, with a few others strongly recommended for interoperability and interface rendering purposes.
A dataset is an aggregation of instance records used to keep a reference between the instance records and their source (provenance). It is also the container for transmitting those records and providing any metadata descriptions desired. A dataset can be split into multiple dataset slices. Each slice is written to a file serialized in some way. Each slice of a dataset shares the same <id> of the dataset.
Instances can also be assigned to types, which provide the set or classificatory structure for how to relate certain kinds of things (instances) to other kinds of things. The organizational relationships of these types and attributes is described in a schema. irON also has conventions and notations for describing the linkage of attributes and types in a given dataset to existing schema. These linkages are often mapped to established ontologies.
Each of these irON concepts of records, attributes, types, datasets, schema and linkages share similar notations with keywords signaling to the irON parsers and converters how to interpret incoming files and data. There are also provisions for metadata, name spaces, and local and global references.
In these manners, irON and its three serializations can capture virtually the entire scope and power of RDF as a data model, but with simpler and familiar terminology and constructs expected for each serialization.
For different reasons and for different audiences, the formats of XML, JSON and CSV (spreadsheets) were chosen as the representative formats across which to formulate the abstract irON notation.
XML, or eXtensible Markup Language, has become the leading data exchange format and syntax for modern applications. It is frequently adopted by industry groups for standards and standard exchange formats. There is a rich diversity of tools that support the language, importantly including capable parsers and query languages. There is also a serialization of RDF in XML. As implemented in the irON notation, we call this serialization irXML.
JSON, the JavaScript Object Notation, has become very popular as a Web 2.0 data exchange format and is often the format of choice to drive JavaScript applications. There is a growing richness of tools that support JSON, including support from leading Web and general scripting languages such as JavaScript, Python, Perl, Ruby and PHP. JSON is relatively easy to read, and is also now growing in popularity with lightweight databases, such as CouchDB. As implemented in the irON notation, we call this serialization irJSON.
CSV, or comma-separated values, is a format that has been in existence for decades. It was made famous by Microsoft as a spreadsheet exchange format, which makes CSV very useful since spreadsheets are the most prevalent data authoring environment in existence. CSV is less expressive and capable as a data format than the other irON serializations, yet still has a attribute-value pair orientation. And, via spreadsheets, datasets can be easily authored and inspected, while also providing a rich functional environment including sorting, formatting, data validation, calculations, macros, etc. As implemented in the irON notation, we call this serialization commON.
The following diagram shows how these three formats relate to irON and then the canonical RDF target data model:

We have used the unique differences amongst XML, JSON and CSV to guide the embracing abstract notations within irON. Note the round-tripping implications of the framework.
One exciting prospect for the design is how, merely by following the simple conventions within irON, each of these three data formats — and RDF !! — can be used more-or-less interchangeably, and can be used to extend existing schema within their domains.
This first release of irON is in version 0.8. Updates and revisions are likely with use. Here are some key links for irON:
Mid-week, the parsers and converters for structWSF [6] will be released and announced on Fred Giasson’s blog.
In addition, within the next week we will be publishing a case study of converting the Sweet Tools semantic Web and -related tools dataset to commON.
The irON specification and notation by Structured Dynamics LLC is licensed under a Creative Commons Attribution-Share Alike 3.0. irON‘s parsers or converters are available under the Apache License, Version 2.0.
irON is an important piece in the semantic enterprise puzzle that we are building at Structured Dynamics. It reflects our belief that knowledge workers should be able to author and create interoperable datasets without having to learn the arcana of RDF. At the same time we also believe that RDF is the appropriate data model for interoperability. irOn is an expression of our belief that many data formats have appropriate places and uses; there is no need to insist on a single format.
We would like to thank Dr. Jim Pitman for his advocacy of the importance of human-readable and easily authored datasets and formats. Via his leadership of the Bibliographic Knowledge Network (BKN) project and our contractual relationship with it [7], we have learned much regarding the BKN’s own format, BibJSON. Experience with this format has been a catalytic influence in our own work on irON.
— Mike Bergman and Fred Giasson, editors
Attribute-values can also be presented as pairs in the form of an associative array, where the first item listed is the attribute, often followed by a separator such as the colon, and then the value. JSON and many simple data struct notations follow this format. This format may also be called attribute-value pairs, key-value pairs, name-value pairs, alists or others. In these cases the “object” is implied, or is introduced as the name of the array.