Posted:February 19, 2007

Jewels & Doubloons

We’re So Focused on Plowing Ahead We Often Don’t See The Value Around Us

For some time now I have been wanting to dedicate a specific category on this blog to showcasing tools or notable developments. It is clear that tools compilations for the semantic Web — such as the comprehensive Sweet Tools listing — or new developments have become one focus of this site. But as I thought about this focus, I was not really pleased with the idea of a simple tag of “review” or “showcase” or anything of that sort. The reason such terms did not turn my crank was my own sense that the items that were (and are) capturing my interest were also items of broader value.

Please, don’t get me wrong. One (among many) observations within the past few months has been the amazing diversity, breadth and number of communities, and most importantly, the brilliance and innovation that I was seeing. My general sense in this process of discovery is that I have kind of stumbled blindly into many existing — and sometimes mature — communities that have existed for some time, but for which I was not part of nor privy to their insights and advances. These communities are seemingly endless and extend to topics such as semantic Web and its constituent components, Web 2.0, agile development, Ruby, domain-specific languages, behavior driven development, Ajax, JavaScript frameworks and toolkits, Rails, extractors/wrappers/data mining, REST, why the lucky stiff, you name it.

Announcing ‘Jewels & Doubloons’

As I have told development teams in the past, as you cross the room to your workstation each morning look down and around you. The floor is literally strewn with jewels, pearls and doubloons –tremendous riches based on work that has come before — and all we have to do is take the time to look, bend over, investigate and pocket those riches. It is that metaphor, plus in honor of Fat Tuesday tomorrow, that I name my site awards ‘Jewels & Doubloons.’

Jewels & Doubloons (or J & D for short) may get awarded to individual tools, techniques, programming frameworks, screencasts, seminal papers and even blog entries — in short, anything that deserves bending over, inspecting and taking some time with, and perhaps even adopting. In general, the items so picked will be more obscure (at least to me, though they may be very well known to their specific communities), but what I feel to be of broader cross-community interest. Selection is not based on anything formal.

Why So Many Hidden Riches?

I’ll also talk on occasion as to why these riches of such potential advantage and productivity to the craft of software development may be so poorly known or overlooked by the general community. In fact, while many can easily pick up the mantra of adhering to DRY, perhaps as great of a problem is NIH — reinventing a software wheel due to pride, ignorance, discontent, or simply the desire to create for creation’s sake. Each of these reasons can cause the lack of awareness and thus lack of use of existing high value.

There are better ways and techniques than others to find and evaluate hidden gems. One of the first things any Mardi Gras partygoer realizes is not to reach down with one’s hand to pick up the doubloons and plastic necklaces flung from the krewes’ floats. Ouch! and count the loss of fingers! Real swag aficionados at Mardi Gras learn how to air snatch and foot stomp the manna dropping from heaven. Indeed, with proper technique, one can end up with enough necklaces to look like a neon clown and enough doubloons to trade for free drinks and bar top dances. Proper technique in evaluating Jewels & Doubloons is one way to keep all ten fingers while getting rich the lazy man’s way.
Jewels & Doubloons are designated with either a medium-sized Jewels & Doubloons or small-sized (see below) icon and also tagged as such.

Past Winners

I’ve gone back over the posts on AI3 and have postdated J & D awards for these items:

Posted:February 5, 2007

Collex is the Next Example in a Line of Innovative Tools from the Humanities

I seem to be on a string of discovery of new tools from unusual sources — that is, at least, unusual for me. For some months now I have been attempting to discover the “universe” of semantic Web tools, beginning obviously with efforts that self-label in that category. (See my ongoing Sweet Tools comprehensive listing of semantic Web and related tools.) Then, it was clear that many “Web 2.0″ tools potentially contribute to this category via tagging, folksonomies, mashups and the like. I’ve also been focused on language processing tools that relate to this category in other ways (a topic for another day.) Most recently, however, I have discovered a rich vein of tools in areas that take pragmatic approaches to managing structure and metadata, but often with little mention of the semantic Web or Web 2.0. And in that vein, I continue to encounter impressive technology developed within the humanities and library science (see, for example, the recent post on Zotero).

To many of you, the contributions from these disciplines have likely been obvious for years. I admit I’m a slow learner. But I also suspect there is much that goes on in fields outside our normal ken. My own mini-epiphany is that I also need to be looking at the pragmatists within many different communities — some of whom eschew the current Sem Web and Web 2.0 hype — yet are actually doing relevant and directly transferable things within their own orbits. I have written elsewhere about the leadership of physicists and biologists in prior Internet innovations. I guess the thing that has really surprised me most recently is the emerging prominence of the humanities (I feel like the Geico caveman saying that).

Collex is the Next in A Fine Legacy

The latest discovery is Collex, a set of tools for COLLecting and EXhibiting information in the humanities. According to Bethany Nowviskie, a lecturer in media studies at the University of Virginia, and a lead designer of the effort in her introduction, COLLEX: Semantic Collections & Exhibits for the Remixable Web:

Collex is a set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated collections and exhibits mechanism for the semantic web. It allows users to collect, annotate, and tag online objects and to repurpose them in illustrated, interlinked essays or exhibits. Collex functions within any modern web browser without recourse to plugins or downloads and is fully networked as a server-side application. By saving information about user activity (the construction of annotated collections and exhibits) as “remixable” metadata, the Collex system writes current practice into the scholarly record and permits knowledge discovery based not only on the characteristics or “facets” of digital objects, but also on the contexts in which they are placed by a community of scholars. Collex builds on the same semantic web technologies that drive MIT’s SIMILE project and it brings folksonomy tagging to trusted, peer-reviewed scholarly archives. Its exhibits-builder is analogous to high-end digital curation tools currently affordable only to large institutions like the Smithsonian. Collex is free, generalizable, and open source and is presently being implemented in a large-scale pilot project under the auspices of NINES.

(BTW, NINES stands for the Networked Infrastructure for Nineteenth-century Electronic Scholarship, a trans-Atlantic federation of scholars.)

The initial efforts that became Collex were to establish frameworks and process within this community, not tools. But the group apparently recognized the importance of leverage and enablers (i.e, tools) and hired Erik Hatcher, a key contributor to the Apache open-source Lucene text-indexing engine and co-author of Lucene in Action, to spearhead development of an actually usable tool. Erik proceeded to grab best-of-breed stuff in area such as Ruby and Rails and Solr (a faceted enhancement to Lucene that has just graduated from the Apache incubator), and then to work hard on follow-on efforts such as Flare (a display framework) to create the basics of Collex. A sample screenshot of the application is shown below:

The Collex app is still flying under the radar, but it has sufficient online functionality today to support annotation, faceting, filtering, display, etc. Another interesting aspect of the NINES project (but not apparently a programmatic capability of the Collex software itself) is it only allows “authoritative” community listings, an absolute essential for scaling the semantic Web.

You can play with the impressive online demo of the Collex faceted browser at the NINES Web site today, though clearly the software is still undergoing intense development. I particularly like its clean design and clear functionality. The other aspect of this software that deserves attention is that it is a server-side option with cross-browser Ajax, without requiring any plugins. It works equally within Safari, Firefox and Windows IE. And, like the Zotero research citation tool, this basic framework could easily lend itself to managing structured information in virtually any other domain.

Collex is one of the projects of Applied Research in Patacriticism, a software development research team located at the University of Virginia and funded through an award to professor Jerome McGann from the Andrew Mellon Foundation. (“Shuuu, sheeee. Impressive. Most impressive.” — Darth Vader)

(BTW, forgive my sexist use of “guys” in this post’s title; I just couldn’t get a sex-neutral title to work as well for me!)

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:January 31, 2007

Adopting Standardized Identification and Metadata

Since it’s becoming all the rage, I decided to go legit and get myself (and my blog) a standard ID and enable the site for easy metadata exchange.

OpenID

OpenID is an open and free framework for users (and their blogs or Web sites) to establish a digital identity. OpenID starts with the concept that anyone can identify themselves on the Internet the same way websites do via a URI and that authentication is essential to prove ownership of the ID.

Sam Ruby provides a clear article on how to set up your own OpenID. One of the listing services is MyOpenID, from which you can get a free ID. With Sam’s instructions, you can also link a Web site or blog to this ID. When done, you can check whether your site is valid; you can do a quick check using my own ID (make sure and enter the full URI!). However, to get the valid ID message as shown below, you will need to have your standard (and secure) OpenID password handy.

unAPI

unAPI addresses a different problem — the need for a uniform, simple method for copying rich digital objects out of any Web application. unAPI is a tiny HTTP API for the few basic operations necessary to copy discrete, identified content from any kind of web application. An interesting unAPI background and rationale paper from the Ariadne online library ezine is “Introducing unAPI”.

To invoke the service, I installed the unAPI Server, a WordPress plug-in from Michael J. Giarlo. The server provides records for each WordPress post and page in the following formats: OAI-Dublin Core (oai_dc), Metadata Object Description Schema (MODS), SRW-Dublin Core (srw_dc), MARCXML, and RSS. The specification makes use of LINK tag auto-discovery and an unendorsed microformat for machine-readable metadata records.

Validating the call is provided by the unAPI organization, with the return for this blog entry shown here (which is error free and only partially reproduced):

The actual unAPI results do not appear anywhere in your page, since the entire intent is to be machine readable. However, to show what can be expected from these five formats, I reproduce what the internal metadata listings for the five different formats are in these links:

MARCXML:

http://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= http://www.mkbergman.com/?p=320&format=marcxml

MODS:

http://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= http://www.mkbergman.com/?p=320&format=mods

OAI-DC:

http://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= http://www.mkbergman.com/?p=320&format=oai_dc

RSS:

http://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= http://www.mkbergman.com/?p=320&format=rss

SRW-DC:

http://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= http://www.mkbergman.com/?p=320&format=srw_dc

unAPI adoption is just now beginning, and some apps like the Firefox Dublin Core Viewer add-on do not yet recognize the formats. But, because the specification is so simple and is being pushed by the library community, I suspect adoption will become very common. I encourage you to join!

Posted:January 27, 2007

Zotero is Perhaps the Most Polished Firefox Extension Yet

Zotero, first released in October, is perhaps the best Firefox extension that most users have never heard of, unless you are an academic historian or social scientist, in which case Zotero is becoming quite the rage. It is also percolating into other academic fields, including law, math and science.

Zotero is a complete research citation platform, or what its developer, George Mason University’s Center for History and New Media (CHnM), calls, “The next-generation research tool.” Zotero allows academics and researchers to extract, manage, annotate, organize, export and publish citations in a variety of formats and from a variety of sources — all within the Firefox browser, and all while obviously the user is interacting directly with the Web.

What it Is

Like all Firefox extensions, Zotero is easy to install. From the Firefox add-on site or the Zotero site itself, a single click downloads the app to your browser. Upon a prompted re-start the app is now active. (Later alerts for any version upgrades are similarly automatic — as for any Firefox extension.)

Upon installation, Zotero puts an icon in your status bar and places new options on menus. When you encounter a site that Zotero supports (currently, mostly university libraries, but also Amazon and major publication outlets as well, totaling more than 150; here is a listing of Zotero’s so-called translators), you will see a folder symbol in your address bar telling you Zotero is active. A single click downloads the citations from that site automatically to your local system.

Citations have traditionally been one of the more “semantically challenging” data sets, with variations in style, order, format, presentation, coverage and you name it rampant. The fact that Zotero supports a given source site means that it understands these nuances and is ready to store the information in a single, canonical representation. Once downloaded, this citation representation can now be easily managed and organized. More importantly, you can now export this internal, standard representation into a multitude of export formats (including, most recently, MS Word). In short, like for-fee citation software in the past, Zotero now provides a free and universal mechanism for managing this chaos.

While the address icon acts to download one or more citations (yes, they also work in groups if there are multiple listings on the page!), choosing the Zotero icon itself invokes the full Zotero as an app within the browser, as this screen shot shows:

The left panes provide organization and management and tag support; the middle pane shows the active resources; and the right pane shows the structure associated with the active citation. This is all supported with attractive icons and logical tooltips and organization.

Zotero also offers utilities for creating your own scrapers (“translators”) for new sites not yet in the standard, supported roster. This capability is itself an extension to Zotero, called Scaffold, that also points to the building block nature of the core app. (Other utilities such as Solvent from MIT or others surely to come could either enhance or replace the current Scaffold framework.)

What is Impressive

Though supposedly in “beta,” Zotero already shows a completeness, sophistication and attention to detail not evident in most Firefox extensions. Indeed, this system approaches a complete application in its scope and professionalism. The fact it can be so easily installed and embedded in the browser itself is worth noting.

Firefox extensions have continuously evolved from single-function wonders to crude apps and now, as Zotero and a handful of other extensions show, complete functional applications. And, like OSes of the past, these extensions also adhere to standards and practices that make them pretty easy to use across applications. Firefox is indeed becoming a full-fledged platform.

This system is also using the new SQLite local database function (“mozStorage”) in Firefox 2.x to manage the local data (perhaps one of the first Firefox extensions to do so). This provides a clean and small install footprint for the extension, as well as opens it up to other standard data utilities.

What it Implies

So, what Zotero is exemplifying — beyond its own considerable capabilities — are some important implications. First, full-bodied apps, building on many piece-parts, can now be expected around the Fireflox platform. (Indeed, I earlier noted the emergence of such “Web OS” prospects as Parakey, whose developers also come from earlier Firefox legacies. One of those developers, Joe Hewitt, is also the author of the impressive Firebug extension.)

Second, the openness of Firefox for web-centric computing will, as I’ve stated before, continue to put competitive pressure on Microsoft’s Internet Explorer. This is good for all users at large and will continue to spur innovation.

Third, the pending version 2.0 of Zotero is slated to have a server-side component. What we are potentially seeing, then, are local client-side instantiations in the browser that can then communicate with remote data servers. This opens up a wealth of possibilities in social networking and collaboration.

And, last, and more specific to Zotero itself (but also enabled with Firefox’s native RDF support), we are now seeing a complete app framework for dealing with structured information and tagging on the Web. While clearly Zotero has a direct audience for citation management and research, the same infrastructure and techniques used by the system could become a general semantic Web or data framework for any other structured application.

Hmmm. Now that sounds like an opportunity . . . .

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:January 24, 2007

Structure for the Masses

or

Instant Mashups between WordPress, Google Spreadsheets and Exhibit

The past couple of days has seen a flurry of activity and much excitement revolving around a new “database-free” mashup and publication system called Exhibit. Another in a string of sneaky-cool software from MIT’s Simile program (and written by David Huynh, a pragmatic semantic Web developer of the first order), Exhibit (and its sure to follow rapid innovations) will truly revolutionize Web publishing and the visualization and presentation of structured data. Exhibit is quite simply “structure for the masses.”

What is It?

With just a few simple steps, even the most novice blog author can now embed structured data — such as sortable and filtered table displays, thumbnails, maps, timelines and histograms — in her blog posts and blog pages. Using Exhibit, you can now create rich data visualizations of web pages using only HTML (and optional CSS and Javascript code).

Exhibit requires no traditional database technology, no server-side code, and no need for a web server. Here is a sampling of Exhibit‘s current capabilities:

  • No external databases or hassles
  • Data filtering and sorting
  • Simple HTML embeds and calls
  • Automatic and dynamic page layouts and results rendering
  • Completely tailorable (with CSS and Javascript)
  • Direct updates and presentations (mashups) from Google spreadsheets
  • Pre-prepared timelines, map mashups, tabular display options, and Web page formatting
  • Easily embedded in WordPress blogs (see the first tutorial here).

Exhibit is as simple as defining a spreadsheet; after that you have a complete database! And, if you want to get wild and crazy with presentation and display, then that is easy as well!

What Are Some Examples?

Though Exhibit has been released barely one month, already there are some pretty impressive examples:

What Are People Saying?

Granted, we’re only talking about the last 24 hours or so, but interesting people are noticing and commenting on this phenomenon:

  • Ajaxian“Exhibit is a new project that lets you build rich sorting and filtering data applications in a simple way.”
  • Danny Ayers“Although the person using these tools doesn’t need to think about gobbledegook like RDF, when they use the tools, they are putting first class data on the web in a Semantic Web-friendly fashion.”
  • David Huynh“Now, we’re not just talking about the usual suspects of domains like photos and bookmarks. We’re talking about any arbitrary domain of data — Yes, the real world data. The data that people care about. I am hoping that we can create tools that let the average blogger easily publish structured data without ever having to coin URIs or design or pick ontologies — But there it is: this is, in my humble opinion, a beginning of something great.”
  • Kyler “If you don’t see the value of this, you are a fool.”
  • Derek Kinsman“Exhibit is an amazing web app … I am beginning to work alongside my WordPress mates in the hopes that we can create some sort of Administration area inside WordPress that connects to the Google accounts. Right inside WP. Also, we’re attempting to create some sort of plugin or downloadable template to which Exhibit can run mostly out of the box.”

What is Coming?

Johan Sundström has created an Instant Google Spreadsheets Exhibit, which lets you turn any Google spreadsheet (with certain formatting requirements) into an “exhibit” just by pasting in its public feed URL with immediate faceted browsing; maps and timelines are forthcoming.

Well, a WordPress plug-in is in the works (to be announced, with Derek helping to take the lead on it). Though incorporation into a blog is easy, it does require the author to have system administration rights and access to the WordPress server. A plug-in could remove those hurdles and make usage still easier.

Exhibit‘s very helpful online tutorials are being expanded, particularly with more examples and more templates. For those seriously interested in the technology, definitely monitor the Simile project site.

There continues to be activity and expansion of the Babel translation formats. You can now convert BibTeX, Excel, Notation 3 (N3), RDF/XML or tab-separated values (TSV) to a choice of Exhibit JSON , N3 or RDF/XML. And, since Exhibit itself internally stores its data representation as triples, it is tantalizing to think that another Simile project, RDFizers, with its impressive storehose of RDF converters, may also be more closely tied with Babel. Is it possible that Exhibit JSON may become the lingua franca of small-scale data representation formats?

And, within the project team of Huynh and his Ph.D. thesis advisor, David Karger, there are also efforts underway to extend the syntax and functionality of Exhibit. We’ve just seen the expansion to direct Google spreadsheet support, and support for more spreadsheet functionality is desired, including possible string concatenation and numeric operations.

Exhibit itself has been designed with extensibility in mind. Its linkage to Timeline, for example, is one such example. What will be critical in the weeks and months ahead is the development of a developer and user community surrounding Exhibit. There is presently a fairly active mailing list and I’m sure the MIT folks would welcome serious contributions.

Finally, other aspects of the Simile project itself and related intiatives at MIT have direct and growing ties to Exhibit both in terms of team members and developers and in terms of philosophy. You may want to check out these additional MIT projects including Longwell, Piggy Bank, Solvent, Semantic Bank, Welkin, DSpace, Haystack, Dwell, Ajax, Sifter, Relo plugin, Re:Search, Chickenfoot, and LAPIS. This is a program on the move, to which the closest attention is warranted.

Expected Growing Pains

There are some known issues sometimes with display in Safari and Opera browsers; these are being worked on and should be resolved shortly. There are also some style issues and conflicts when embedding in blogs (easily fixed with CSS modifications). There are likely performance problems when data sets get into the hundreds or thousands, but that exceeds Exhibit‘s lightweight objectives anyway. There may be other problems that emerge as use broadens.

These issues are to be expected and should not diminish playing with the system immediately. You’ll be amazed at what you can do, and how rapidly with so little code.

It has been a fun few days. It’s exciting to be able to be a camp follower during one of those seminal moments in Web development. And, so I say to David and colleagues at MIT and the band of merry collaborators on their mailing list: Thanks! This is truly cool.

Jewels & Doubloons An AI3 Jewels & Doubloon Winner