Posted:February 7, 2007

Sweet Tools Listing

This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

An update to the Sweet Tools semantic Web and -related tools listing has been posted. Forty-two tools were added, bringing the running total to 420. As always, please provide any corrections or additions here.

Posted by AI3's author, Mike Bergman Posted on February 7, 2007 at 3:51 pm in Semantic Web Tools | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/335/sweet-tools-updated-to-420-tools/
The URI to trackback this post is: http://www.mkbergman.com/335/sweet-tools-updated-to-420-tools/trackback/
Posted:February 5, 2007

Collex is the Next Example in a Line of Innovative Tools from the Humanities

I seem to be on a string of discovery of new tools from unusual sources — that is, at least, unusual for me. For some months now I have been attempting to discover the “universe” of semantic Web tools, beginning obviously with efforts that self-label in that category. (See my ongoing Sweet Tools comprehensive listing of semantic Web and related tools.) Then, it was clear that many “Web 2.0″ tools potentially contribute to this category via tagging, folksonomies, mashups and the like. I’ve also been focused on language processing tools that relate to this category in other ways (a topic for another day.) Most recently, however, I have discovered a rich vein of tools in areas that take pragmatic approaches to managing structure and metadata, but often with little mention of the semantic Web or Web 2.0. And in that vein, I continue to encounter impressive technology developed within the humanities and library science (see, for example, the recent post on Zotero).

To many of you, the contributions from these disciplines have likely been obvious for years. I admit I’m a slow learner. But I also suspect there is much that goes on in fields outside our normal ken. My own mini-epiphany is that I also need to be looking at the pragmatists within many different communities — some of whom eschew the current Sem Web and Web 2.0 hype — yet are actually doing relevant and directly transferable things within their own orbits. I have written elsewhere about the leadership of physicists and biologists in prior Internet innovations. I guess the thing that has really surprised me most recently is the emerging prominence of the humanities (I feel like the Geico caveman saying that).

Collex is the Next in A Fine Legacy

The latest discovery is Collex, a set of tools for COLLecting and EXhibiting information in the humanities. According to Bethany Nowviskie, a lecturer in media studies at the University of Virginia, and a lead designer of the effort in her introduction, COLLEX: Semantic Collections & Exhibits for the Remixable Web:

Collex is a set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated collections and exhibits mechanism for the semantic web. It allows users to collect, annotate, and tag online objects and to repurpose them in illustrated, interlinked essays or exhibits. Collex functions within any modern web browser without recourse to plugins or downloads and is fully networked as a server-side application. By saving information about user activity (the construction of annotated collections and exhibits) as “remixable” metadata, the Collex system writes current practice into the scholarly record and permits knowledge discovery based not only on the characteristics or “facets” of digital objects, but also on the contexts in which they are placed by a community of scholars. Collex builds on the same semantic web technologies that drive MIT’s SIMILE project and it brings folksonomy tagging to trusted, peer-reviewed scholarly archives. Its exhibits-builder is analogous to high-end digital curation tools currently affordable only to large institutions like the Smithsonian. Collex is free, generalizable, and open source and is presently being implemented in a large-scale pilot project under the auspices of NINES.

(BTW, NINES stands for the Networked Infrastructure for Nineteenth-century Electronic Scholarship, a trans-Atlantic federation of scholars.)

The initial efforts that became Collex were to establish frameworks and process within this community, not tools. But the group apparently recognized the importance of leverage and enablers (i.e, tools) and hired Erik Hatcher, a key contributor to the Apache open-source Lucene text-indexing engine and co-author of Lucene in Action, to spearhead development of an actually usable tool. Erik proceeded to grab best-of-breed stuff in area such as Ruby and Rails and Solr (a faceted enhancement to Lucene that has just graduated from the Apache incubator), and then to work hard on follow-on efforts such as Flare (a display framework) to create the basics of Collex. A sample screenshot of the application is shown below:

The Collex app is still flying under the radar, but it has sufficient online functionality today to support annotation, faceting, filtering, display, etc. Another interesting aspect of the NINES project (but not apparently a programmatic capability of the Collex software itself) is it only allows “authoritative” community listings, an absolute essential for scaling the semantic Web.

You can play with the impressive online demo of the Collex faceted browser at the NINES Web site today, though clearly the software is still undergoing intense development. I particularly like its clean design and clear functionality. The other aspect of this software that deserves attention is that it is a server-side option with cross-browser Ajax, without requiring any plugins. It works equally within Safari, Firefox and Windows IE. And, like the Zotero research citation tool, this basic framework could easily lend itself to managing structured information in virtually any other domain.

Collex is one of the projects of Applied Research in Patacriticism, a software development research team located at the University of Virginia and funded through an award to professor Jerome McGann from the Andrew Mellon Foundation. (“Shuuu, sheeee. Impressive. Most impressive.” — Darth Vader)

(BTW, forgive my sexist use of “guys” in this post’s title; I just couldn’t get a sex-neutral title to work as well for me!)

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:January 27, 2007

Zotero is Perhaps the Most Polished Firefox Extension Yet

Zotero, first released in October, is perhaps the best Firefox extension that most users have never heard of, unless you are an academic historian or social scientist, in which case Zotero is becoming quite the rage. It is also percolating into other academic fields, including law, math and science.

Zotero is a complete research citation platform, or what its developer, George Mason University’s Center for History and New Media (CHnM), calls, “The next-generation research tool.” Zotero allows academics and researchers to extract, manage, annotate, organize, export and publish citations in a variety of formats and from a variety of sources — all within the Firefox browser, and all while obviously the user is interacting directly with the Web.

What it Is

Like all Firefox extensions, Zotero is easy to install. From the Firefox add-on site or the Zotero site itself, a single click downloads the app to your browser. Upon a prompted re-start the app is now active. (Later alerts for any version upgrades are similarly automatic — as for any Firefox extension.)

Upon installation, Zotero puts an icon in your status bar and places new options on menus. When you encounter a site that Zotero supports (currently, mostly university libraries, but also Amazon and major publication outlets as well, totaling more than 150; here is a listing of Zotero’s so-called translators), you will see a folder symbol in your address bar telling you Zotero is active. A single click downloads the citations from that site automatically to your local system.

Citations have traditionally been one of the more “semantically challenging” data sets, with variations in style, order, format, presentation, coverage and you name it rampant. The fact that Zotero supports a given source site means that it understands these nuances and is ready to store the information in a single, canonical representation. Once downloaded, this citation representation can now be easily managed and organized. More importantly, you can now export this internal, standard representation into a multitude of export formats (including, most recently, MS Word). In short, like for-fee citation software in the past, Zotero now provides a free and universal mechanism for managing this chaos.

While the address icon acts to download one or more citations (yes, they also work in groups if there are multiple listings on the page!), choosing the Zotero icon itself invokes the full Zotero as an app within the browser, as this screen shot shows:

The left panes provide organization and management and tag support; the middle pane shows the active resources; and the right pane shows the structure associated with the active citation. This is all supported with attractive icons and logical tooltips and organization.

Zotero also offers utilities for creating your own scrapers (“translators”) for new sites not yet in the standard, supported roster. This capability is itself an extension to Zotero, called Scaffold, that also points to the building block nature of the core app. (Other utilities such as Solvent from MIT or others surely to come could either enhance or replace the current Scaffold framework.)

What is Impressive

Though supposedly in “beta,” Zotero already shows a completeness, sophistication and attention to detail not evident in most Firefox extensions. Indeed, this system approaches a complete application in its scope and professionalism. The fact it can be so easily installed and embedded in the browser itself is worth noting.

Firefox extensions have continuously evolved from single-function wonders to crude apps and now, as Zotero and a handful of other extensions show, complete functional applications. And, like OSes of the past, these extensions also adhere to standards and practices that make them pretty easy to use across applications. Firefox is indeed becoming a full-fledged platform.

This system is also using the new SQLite local database function (“mozStorage”) in Firefox 2.x to manage the local data (perhaps one of the first Firefox extensions to do so). This provides a clean and small install footprint for the extension, as well as opens it up to other standard data utilities.

What it Implies

So, what Zotero is exemplifying — beyond its own considerable capabilities — are some important implications. First, full-bodied apps, building on many piece-parts, can now be expected around the Fireflox platform. (Indeed, I earlier noted the emergence of such “Web OS” prospects as Parakey, whose developers also come from earlier Firefox legacies. One of those developers, Joe Hewitt, is also the author of the impressive Firebug extension.)

Second, the openness of Firefox for web-centric computing will, as I’ve stated before, continue to put competitive pressure on Microsoft’s Internet Explorer. This is good for all users at large and will continue to spur innovation.

Third, the pending version 2.0 of Zotero is slated to have a server-side component. What we are potentially seeing, then, are local client-side instantiations in the browser that can then communicate with remote data servers. This opens up a wealth of possibilities in social networking and collaboration.

And, last, and more specific to Zotero itself (but also enabled with Firefox’s native RDF support), we are now seeing a complete app framework for dealing with structured information and tagging on the Web. While clearly Zotero has a direct audience for citation management and research, the same infrastructure and techniques used by the system could become a general semantic Web or data framework for any other structured application.

Hmmm. Now that sounds like an opportunity . . . .

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:January 24, 2007

Structure for the Masses

or

Instant Mashups between WordPress, Google Spreadsheets and Exhibit

The past couple of days has seen a flurry of activity and much excitement revolving around a new “database-free” mashup and publication system called Exhibit. Another in a string of sneaky-cool software from MIT’s Simile program (and written by David Huynh, a pragmatic semantic Web developer of the first order), Exhibit (and its sure to follow rapid innovations) will truly revolutionize Web publishing and the visualization and presentation of structured data. Exhibit is quite simply “structure for the masses.”

What is It?

With just a few simple steps, even the most novice blog author can now embed structured data — such as sortable and filtered table displays, thumbnails, maps, timelines and histograms — in her blog posts and blog pages. Using Exhibit, you can now create rich data visualizations of web pages using only HTML (and optional CSS and Javascript code).

Exhibit requires no traditional database technology, no server-side code, and no need for a web server. Here is a sampling of Exhibit‘s current capabilities:

  • No external databases or hassles
  • Data filtering and sorting
  • Simple HTML embeds and calls
  • Automatic and dynamic page layouts and results rendering
  • Completely tailorable (with CSS and Javascript)
  • Direct updates and presentations (mashups) from Google spreadsheets
  • Pre-prepared timelines, map mashups, tabular display options, and Web page formatting
  • Easily embedded in WordPress blogs (see the first tutorial here).

Exhibit is as simple as defining a spreadsheet; after that you have a complete database! And, if you want to get wild and crazy with presentation and display, then that is easy as well!

What Are Some Examples?

Though Exhibit has been released barely one month, already there are some pretty impressive examples:

What Are People Saying?

Granted, we’re only talking about the last 24 hours or so, but interesting people are noticing and commenting on this phenomenon:

  • Ajaxian“Exhibit is a new project that lets you build rich sorting and filtering data applications in a simple way.”
  • Danny Ayers“Although the person using these tools doesn’t need to think about gobbledegook like RDF, when they use the tools, they are putting first class data on the web in a Semantic Web-friendly fashion.”
  • David Huynh“Now, we’re not just talking about the usual suspects of domains like photos and bookmarks. We’re talking about any arbitrary domain of data — Yes, the real world data. The data that people care about. I am hoping that we can create tools that let the average blogger easily publish structured data without ever having to coin URIs or design or pick ontologies — But there it is: this is, in my humble opinion, a beginning of something great.”
  • Kyler “If you don’t see the value of this, you are a fool.”
  • Derek Kinsman“Exhibit is an amazing web app … I am beginning to work alongside my WordPress mates in the hopes that we can create some sort of Administration area inside WordPress that connects to the Google accounts. Right inside WP. Also, we’re attempting to create some sort of plugin or downloadable template to which Exhibit can run mostly out of the box.”

What is Coming?

Johan Sundström has created an Instant Google Spreadsheets Exhibit, which lets you turn any Google spreadsheet (with certain formatting requirements) into an “exhibit” just by pasting in its public feed URL with immediate faceted browsing; maps and timelines are forthcoming.

Well, a WordPress plug-in is in the works (to be announced, with Derek helping to take the lead on it). Though incorporation into a blog is easy, it does require the author to have system administration rights and access to the WordPress server. A plug-in could remove those hurdles and make usage still easier.

Exhibit‘s very helpful online tutorials are being expanded, particularly with more examples and more templates. For those seriously interested in the technology, definitely monitor the Simile project site.

There continues to be activity and expansion of the Babel translation formats. You can now convert BibTeX, Excel, Notation 3 (N3), RDF/XML or tab-separated values (TSV) to a choice of Exhibit JSON , N3 or RDF/XML. And, since Exhibit itself internally stores its data representation as triples, it is tantalizing to think that another Simile project, RDFizers, with its impressive storehose of RDF converters, may also be more closely tied with Babel. Is it possible that Exhibit JSON may become the lingua franca of small-scale data representation formats?

And, within the project team of Huynh and his Ph.D. thesis advisor, David Karger, there are also efforts underway to extend the syntax and functionality of Exhibit. We’ve just seen the expansion to direct Google spreadsheet support, and support for more spreadsheet functionality is desired, including possible string concatenation and numeric operations.

Exhibit itself has been designed with extensibility in mind. Its linkage to Timeline, for example, is one such example. What will be critical in the weeks and months ahead is the development of a developer and user community surrounding Exhibit. There is presently a fairly active mailing list and I’m sure the MIT folks would welcome serious contributions.

Finally, other aspects of the Simile project itself and related intiatives at MIT have direct and growing ties to Exhibit both in terms of team members and developers and in terms of philosophy. You may want to check out these additional MIT projects including Longwell, Piggy Bank, Solvent, Semantic Bank, Welkin, DSpace, Haystack, Dwell, Ajax, Sifter, Relo plugin, Re:Search, Chickenfoot, and LAPIS. This is a program on the move, to which the closest attention is warranted.

Expected Growing Pains

There are some known issues sometimes with display in Safari and Opera browsers; these are being worked on and should be resolved shortly. There are also some style issues and conflicts when embedding in blogs (easily fixed with CSS modifications). There are likely performance problems when data sets get into the hundreds or thousands, but that exceeds Exhibit‘s lightweight objectives anyway. There may be other problems that emerge as use broadens.

These issues are to be expected and should not diminish playing with the system immediately. You’ll be amazed at what you can do, and how rapidly with so little code.

It has been a fun few days. It’s exciting to be able to be a camp follower during one of those seminal moments in Web development. And, so I say to David and colleagues at MIT and the band of merry collaborators on their mailing list: Thanks! This is truly cool.

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:January 22, 2007

MIT’s Exhibit Continues the Simile Project’s Long String of Innovative Tools

I have just come across a new innovative Web development, and its simplicity and elegance have literally taken my breath away! Exhibit, from the Simile project at MIT and its lead author David Huynh, whose contributions include the stellar Piggy Bank (semantic Web Firefox extension), Sifter (little known, but excellent automatic Web data extractor), Babel (data format translator), Timeline (Javascript timeline creator), Ajax (toolset), Solvent (Web data extractor used by Piggy Bank) and Longwell (web-based RDF-powered faceted browser). David is the lead author on the first five tools listed. As a Ph.D. student at MIT, David is truly becoming one of the leading lights in practical semantic Web tool development. Exhibit only reinforces that reputation.

According to its Web site:

Exhibit is a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code.

It’s like Google Maps and Timeline, but for structured data normally published through database-backed web sites. Exhibit essentially removes the need for a database or a server side web application. Its Javascript-based engine makes it easy for everyone who has a little bit of knowledge of HTML and small data sets to share them with the world and let people easily interact with them. . . .

“No Database, No Web Application” means that you can create your own exhibits using just a text editor. . . It’s quite easy to make exhibits. We even let you copy data straight out of a boring spreadsheet and convert it into an exhibit automatically. . . .

Exhibit consists of a bunch of Javascript files that you include in your web page. At load time, this Javascript code reads in one or more JSON data files that you link from within your web page and constructs a database implemented in Javascript right inside the browser of whoever visits your web page. It then dynamically re-constructs the web page as the visitor sorts and filters through the data. . . .

The advantages of Exhibit are as follows:

  • No traditional database technology involved even though Exhibit-embedding web pages appear as if they are backed by databases. So you don’t have to design any database, configure it, and maintain it. After all, if you only have a few dozens of things to publish rather than thousands, why would you spend so much effort in dealing with database technologies?
  • No server-side code required even though Exhibit-embedding web pages are heavily templated. So, there is no need to learn ASP, PHP, JSP, CGI, Velocity, etc. There is no need to worry which server-side scripting technology your hosting provider supports.
  • No need for web server if you only want to create exhibits and keep them on your own computer for your own use. They work straight from the file system.

We also provide a complementary service called Babel that lets you convert data from various sources, including tab-separated values (copied straight from spreadsheets) and Bibtex files, into formats that Exhibit understands.

The Exhibit Web site offers a growing list of helpful tutorials and some live examples of database-related “exhibits,” one of which is this U.S. Presidents’ example that shows maps, timelines, thumbnails and other nifty displays (see the actual site for the interactive displays):

You can get Exhibit today and embed it in your own Web site (more on this to come!).

To learn more about the background to this project, please see the submitted paper, Exhibit: Lightweight Structured Data Publishing, submitted to WWW, 2007, by David Huynh, Robert Miller, and David Karger.

Gentlemen, on behalf of the community, let me say, “Thanks! Most excellent work!” It’s discoveries like these that make the Internet so worthwhile.