AI3’s SemWeb Tools Survey is Now Largely Completed
It has taken nearly six months, but I believe my survey of existing semantic Web and related tools is now largely complete. While new tools will certainly be discovered and new ones are constantly being developed (which, I also believe, is at an accelerating pace), I think the existing backlog has largely been found. Though important gaps remain, today’s picture is one of a surprisingly robust tools environment for promoting semantic Web objectives.
Growth and Tools Characterization
My most recent update of Sweet Tools, also published today, now lists 500 semantic Web and related tools and is in its 8th version. Starting with the W3C’s listing of 70 tools, first referenced in August 2006, I have steadily found and added to the listing. Until now, the predominant source of growth in these listings has come through discovery of extant tools.
In its earliest versions, my search strategy very much focused on all topics directly related to the “semantic Web.” However, as time went on, I came to understand the importance of many ancillary tool sets to the entire semantic Web pipeline (such as language processing and information extraction) and came to find whole new categories of pragmatic tools that embodied semantic Web and data mediation processes but which did not label themselves as such. This latter category has been an especially rich vein to mine, with notable contributions from the humanities, biology and the physical sciences.
But the pace of discovery is now approaching its asymptote. Though I by no means believe I have comprehensively found all extant tools, I do believe that most new tools in future listings will come more from organic growth and new development than discovery of hidden gems. So, enjoy!
My view of what is required for the semantic Web vision to reach some degree of fruition begins with uncharacterized content, which then proceeds through a processing pipeline ultimately resulting in the storage of RDF triples that can be managed at scale. By necessity, such a soup-to-nuts vision embraces tools and requirements that, individually, might not constitute semantic technology strictly defined, but is nonetheless an integral part of the overall pipeline. By (somewhat arbitrary) category, here is the breakdown of the current listing of 500 tools:
|30||Parser or Converter|
|23||Browser (RDF, OWL or semantic)|
|22||Wiki- or blog-related|
|22||Wrapper (Web data extractor)|
|13||Query Language or Service|
|6||NOT ACTIVE (???)|
|3||Description or Formal Logics|
I find it amusing that the diversity and sources of such tool listings — importantly including what is properly in the domain or not — is itself an interesting example of the difficulties facing semantic mediation and resolution. Alas, such is the real world.
Java is the Preferred Language
There are interesting, and historical, trends as well in the use of primary development languages around these tools. Older ones rely on C or C++ or, if they are logic or inference oriented, on the expected languages of Prolog or LISP.
So, here is the listing of Sweet Tools apps by primary development language:
An Alternative Simple Listing