AI3’s SemWeb Tools Survey is Now Largely Completed
It has taken nearly six months, but I believe my survey of existing semantic Web and related tools is now largely complete. While new tools will certainly be discovered and new ones are constantly being developed (which, I also believe, is at an accelerating pace), I think the existing backlog has largely been found. Though important gaps remain, today’s picture is one of a surprisingly robust tools environment for promoting semantic Web objectives.
Growth and Tools Characterization
My most recent update of Sweet Tools, also published today, now lists 500 semantic Web and related tools and is in its 8th version. Starting with the W3C’s listing of 70 tools, first referenced in August 2006, I have steadily found and added to the listing. Until now, the predominant source of growth in these listings has come through discovery of extant tools.
In its earliest versions, my search strategy very much focused on all topics directly related to the “semantic Web.” However, as time went on, I came to understand the importance of many ancillary tool sets to the entire semantic Web pipeline (such as language processing and information extraction) and came to find whole new categories of pragmatic tools that embodied semantic Web and data mediation processes but which did not label themselves as such. This latter category has been an especially rich vein to mine, with notable contributions from the humanities, biology and the physical sciences.
But the pace of discovery is now approaching its asymptote. Though I by no means believe I have comprehensively found all extant tools, I do believe that most new tools in future listings will come more from organic growth and new development than discovery of hidden gems. So, enjoy!
My view of what is required for the semantic Web vision to reach some degree of fruition begins with uncharacterized content, which then proceeds through a processing pipeline ultimately resulting in the storage of RDF triples that can be managed at scale. By necessity, such a soup-to-nuts vision embraces tools and requirements that, individually, might not constitute semantic technology strictly defined, but is nonetheless an integral part of the overall pipeline. By (somewhat arbitrary) category, here is the breakdown of the current listing of 500 tools:
No. Tools | Category |
43 | Information Extraction |
32 | Ontology (general) |
30 | Parser or Converter |
29 | Composite App/Framework |
29 | Database/Datastore |
26 | Annotator |
25 | Programming Environment |
23 | Browser (RDF, OWL or semantic) |
23 | Language Processor |
22 | Reasoner/Inference Engine |
22 | Wiki- or blog-related |
22 | Wrapper (Web data extractor) |
20 | RDF (general) |
19 | Search Engine |
15 | Visualization |
13 | Query Language or Service |
11 | Ontology Mapper/Mediator |
9 | Ontology Editor |
8 | Data Language |
8 | Validator |
6 | NOT ACTIVE (???) |
5 | Semantic Desktop |
4 | Harvester |
3 | Description or Formal Logics |
3 | RDF Editor |
2 | RDF Generator |
48 | Miscellaneous |
500 | |
I find it amusing that the diversity and sources of such tool listings — importantly including what is properly in the domain or not — is itself an interesting example of the difficulties facing semantic mediation and resolution. Alas, such is the real world.
Java is the Preferred Language
There are interesting, and historical, trends as well in the use of primary development languages around these tools. Older ones rely on C or C++ or, if they are logic or inference oriented, on the expected languages of Prolog or LISP.
One might be tempted to say that Java is the language of the semantic Web with about 50% of all tools, especially the more developed and prominent ones that embrace a broader spectrum of needs, but I’m not so sure. I’m seeing a trend in the more recently announced tools to use JavaScript or Ruby — these deserve real attention. And while the P languages (Perl, PHP, Python) are also showing some strength, it is not clear that this is anything specific to semantic Web needs but a general reflection of standard Web trends.
So, here is the listing of Sweet Tools apps by primary development language:
About half of all apps are written in Java. The next most prevalent language is JavaScript, at 13%, which is two times the amount of the next leading choices of C/C++ or PHP, which have about 6% each. As might be expected, the “major” apps are more likely to be written in Java or the C languages; user interface emphases tend to occur in the P languages; browser extensions or add-ons are generally in JavaScript; and logic applications are often in Lisp or Prolog.
An Alternative Simple Listing
I have also created and will maintain a simple listing of Sweet Tools that lists all 500 tools on a single page with live links and the each tool’s category. This listing is being provided to provide a single access point to users and because the Exhibit presentation is based on JavaScript, which is not adequately indexed by virtually all search engines.
Please see: http://dannyayers.com/2007/03/06/sds-one-for-the-industry
and:
http://article.gmane.org/gmane.comp.misc.ontology.protege.owl/19995/match=silico+discovery
and:
http://www.insilicodiscovery.com
and:
http://insilicodiscovery.com/phpbb/index.php – login user name: guest, password: guest..
Thanks,
Ian Goldsmid
In Silico Discovery
Semantic Discovery System