Adopting Standardized Identification and Metadata
Since it’s becoming all the rage, I decided to go legit and get myself (and my blog) a standard ID and enable the site for easy metadata exchange.
OpenID is an open and free framework for users (and their blogs or Web sites) to establish a digital identity. OpenID starts with the concept that anyone can identify themselves on the Internet the same way websites do via a URI and that authentication is essential to prove ownership of the ID.
Sam Ruby provides a clear article on how to set up your own OpenID. One of the listing services is MyOpenID, from which you can get a free ID. With Sam’s instructions, you can also link a Web site or blog to this ID. When done, you can check whether your site is valid; you can do a quick check using my own ID (make sure and enter the full URI!). However, to get the valid ID message as shown below, you will need to have your standard (and secure) OpenID password handy.
unAPI addresses a different problem — the need for a uniform, simple method for copying rich digital objects out of any Web application. unAPI is a tiny HTTP API for the few basic operations necessary to copy discrete, identified content from any kind of web application. An interesting unAPI background and rationale paper from the Ariadne online library ezine is “Introducing unAPI”.
To invoke the service, I installed the unAPI Server, a WordPress plug-in from Michael J. Giarlo. The server provides records for each WordPress post and page in the following formats: OAI-Dublin Core (oai_dc), Metadata Object Description Schema (MODS), SRW-Dublin Core (srw_dc), MARCXML, and RSS. The specification makes use of LINK tag auto-discovery and an unendorsed microformat for machine-readable metadata records.
Validating the call is provided by the unAPI organization, with the return for this blog entry shown here (which is error free and only partially reproduced):
The actual unAPI results do not appear anywhere in your page, since the entire intent is to be machine readable. However, to show what can be expected from these five formats, I reproduce what the internal metadata listings for the five different formats are in these links:
unAPI adoption is just now beginning, and some apps like the Firefox Dublin Core Viewer add-on do not yet recognize the formats. But, because the specification is so simple and is being pushed by the library community, I suspect adoption will become very common. I encourage you to join!
Zotero, first released in October, is perhaps the best Firefox extension that most users have never heard of, unless you are an academic historian or social scientist, in which case Zotero is becoming quite the rage. It is also percolating into other academic fields, including law, math and science.
Zotero is a complete research citation platform, or what its developer, George Mason University’s Center for History and New Media (CHnM), calls, “The next-generation research tool.” Zotero allows academics and researchers to extract, manage, annotate, organize, export and publish citations in a variety of formats and from a variety of sources — all within the Firefox browser, and all while obviously the user is interacting directly with the Web.
What it Is
Like all Firefox extensions, Zotero is easy to install. From the Firefox add-on site or the Zotero site itself, a single click downloads the app to your browser. Upon a prompted re-start the app is now active. (Later alerts for any version upgrades are similarly automatic — as for any Firefox extension.)
Upon installation, Zotero puts an icon in your status bar and places new options on menus. When you encounter a site that Zotero supports (currently, mostly university libraries, but also Amazon and major publication outlets as well, totaling more than 150; here is a listing of Zotero’s so-called translators), you will see a folder symbol in your address bar telling you Zotero is active. A single click downloads the citations from that site automatically to your local system.
Citations have traditionally been one of the more “semantically challenging” data sets, with variations in style, order, format, presentation, coverage and you name it rampant. The fact that Zotero supports a given source site means that it understands these nuances and is ready to store the information in a single, canonical representation. Once downloaded, this citation representation can now be easily managed and organized. More importantly, you can now export this internal, standard representation into a multitude of export formats (including, most recently, MS Word). In short, like for-fee citation software in the past, Zotero now provides a free and universal mechanism for managing this chaos.
While the address icon acts to download one or more citations (yes, they also work in groups if there are multiple listings on the page!), choosing the Zotero icon itself invokes the full Zotero as an app within the browser, as this screen shot shows:
The left panes provide organization and management and tag support; the middle pane shows the active resources; and the right pane shows the structure associated with the active citation. This is all supported with attractive icons and logical tooltips and organization.
Zotero also offers utilities for creating your own scrapers (“translators”) for new sites not yet in the standard, supported roster. This capability is itself an extension to Zotero, called Scaffold, that also points to the building block nature of the core app. (Other utilities such as Solvent from MIT or others surely to come could either enhance or replace the current Scaffold framework.)
What is Impressive
Though supposedly in “beta,” Zotero already shows a completeness, sophistication and attention to detail not evident in most Firefox extensions. Indeed, this system approaches a complete application in its scope and professionalism. The fact it can be so easily installed and embedded in the browser itself is worth noting.
Firefox extensions have continuously evolved from single-function wonders to crude apps and now, as Zotero and a handful of other extensions show, complete functional applications. And, like OSes of the past, these extensions also adhere to standards and practices that make them pretty easy to use across applications. Firefox is indeed becoming a full-fledged platform.
This system is also using the new SQLite local database function (“mozStorage”) in Firefox 2.x to manage the local data (perhaps one of the first Firefox extensions to do so). This provides a clean and small install footprint for the extension, as well as opens it up to other standard data utilities.
What it Implies
So, what Zotero is exemplifying — beyond its own considerable capabilities — are some important implications. First, full-bodied apps, building on many piece-parts, can now be expected around the Fireflox platform. (Indeed, I earlier noted the emergence of such “Web OS” prospects as Parakey, whose developers also come from earlier Firefox legacies. One of those developers, Joe Hewitt, is also the author of the impressive Firebug extension.)
Second, the openness of Firefox for web-centric computing will, as I’ve stated before, continue to put competitive pressure on Microsoft’s Internet Explorer. This is good for all users at large and will continue to spur innovation.
Third, the pending version 2.0 of Zotero is slated to have a server-side component. What we are potentially seeing, then, are local client-side instantiations in the browser that can then communicate with remote data servers. This opens up a wealth of possibilities in social networking and collaboration.
And, last, and more specific to Zotero itself (but also enabled with Firefox’s native RDF support), we are now seeing a complete app framework for dealing with structured information and tagging on the Web. While clearly Zotero has a direct audience for citation management and research, the same infrastructure and techniques used by the system could become a general semantic Web or data framework for any other structured application.
Hmmm. Now that sounds like an opportunity . . . .
|An AI3 Jewels & Doubloon Winner|
The past couple of days has seen a flurry of activity and much excitement revolving around a new “database-free” mashup and publication system called Exhibit. Another in a string of sneaky-cool software from MIT’s Simile program (and written by David Huynh, a pragmatic semantic Web developer of the first order), Exhibit (and its sure to follow rapid innovations) will truly revolutionize Web publishing and the visualization and presentation of structured data. Exhibit is quite simply “structure for the masses.”
What is It?
Exhibit requires no traditional database technology, no server-side code, and no need for a web server. Here is a sampling of Exhibit‘s current capabilities:
Exhibit is as simple as defining a spreadsheet; after that you have a complete database! And, if you want to get wild and crazy with presentation and display, then that is easy as well!
What Are Some Examples?
Though Exhibit has been released barely one month, already there are some pretty impressive examples:
What Are People Saying?
Granted, we’re only talking about the last 24 hours or so, but interesting people are noticing and commenting on this phenomenon:
What is Coming?
Johan Sundström has created an Instant Google Spreadsheets Exhibit, which lets you turn any Google spreadsheet (with certain formatting requirements) into an “exhibit” just by pasting in its public feed URL with immediate faceted browsing; maps and timelines are forthcoming.
Well, a WordPress plug-in is in the works (to be announced, with Derek helping to take the lead on it). Though incorporation into a blog is easy, it does require the author to have system administration rights and access to the WordPress server. A plug-in could remove those hurdles and make usage still easier.
Exhibit‘s very helpful online tutorials are being expanded, particularly with more examples and more templates. For those seriously interested in the technology, definitely monitor the Simile project site.
There continues to be activity and expansion of the Babel translation formats. You can now convert BibTeX, Excel, Notation 3 (N3), RDF/XML or tab-separated values (TSV) to a choice of Exhibit JSON , N3 or RDF/XML. And, since Exhibit itself internally stores its data representation as triples, it is tantalizing to think that another Simile project, RDFizers, with its impressive storehose of RDF converters, may also be more closely tied with Babel. Is it possible that Exhibit JSON may become the lingua franca of small-scale data representation formats?
And, within the project team of Huynh and his Ph.D. thesis advisor, David Karger, there are also efforts underway to extend the syntax and functionality of Exhibit. We’ve just seen the expansion to direct Google spreadsheet support, and support for more spreadsheet functionality is desired, including possible string concatenation and numeric operations.
Exhibit itself has been designed with extensibility in mind. Its linkage to Timeline, for example, is one such example. What will be critical in the weeks and months ahead is the development of a developer and user community surrounding Exhibit. There is presently a fairly active mailing list and I’m sure the MIT folks would welcome serious contributions.
Finally, other aspects of the Simile project itself and related intiatives at MIT have direct and growing ties to Exhibit both in terms of team members and developers and in terms of philosophy. You may want to check out these additional MIT projects including Longwell, Piggy Bank, Solvent, Semantic Bank, Welkin, DSpace, Haystack, Dwell, Ajax, Sifter, Relo plugin, Re:Search, Chickenfoot, and LAPIS. This is a program on the move, to which the closest attention is warranted.
Expected Growing Pains
There are some known issues sometimes with display in Safari and Opera browsers; these are being worked on and should be resolved shortly. There are also some style issues and conflicts when embedding in blogs (easily fixed with CSS modifications). There are likely performance problems when data sets get into the hundreds or thousands, but that exceeds Exhibit‘s lightweight objectives anyway. There may be other problems that emerge as use broadens.
These issues are to be expected and should not diminish playing with the system immediately. You’ll be amazed at what you can do, and how rapidly with so little code.
It has been a fun few days. It’s exciting to be able to be a camp follower during one of those seminal moments in Web development. And, so I say to David and colleagues at MIT and the band of merry collaborators on their mailing list: Thanks! This is truly cool.
|An AI3 Jewels & Doubloon Winner|
My earlier post gushed about the new Exhibit lightweight, structured data publishing system for Web pages from MIT’s Simile project. Because I was so impressed with the project’s examples, I decided to convert my existing 350+ semantic Web tools listing, Sweet Tools, to an online database. I also wanted to maintain the Google spreadsheet listing for others to make new tools suggestions.
Please see the NEW and IMPROVED Sweet Tools here, and now in database format! (And now updated to 378 tools!)
The remainder of this posting describes how I did this, following the online Exhibit tutorials. To my knowledge, this is the first time that an “exhibit” has been embedded within a blog system (WordPress in my example).
Five Easy Steps
The remote data feed from Google spreadsheets is a very nice feature that also removes one further step from the standard (though simple!) Exhibit set-up.
You can used mixed case in your attribute descriptors (but no spaces!) for better label displays (such as capitalization). However, for some reason (I suspect it’s an early bug) you can not use mixed case on the first Google field, which also defaults to “label”.
The remote Exhibit styles should probably be called out separately and better commented. I’m still seeing some squirrelly styles behavior. (For example, the right-margin browse and filter selection box has some overlapping characters.) Though I can inspect the styles with tools like Web Developer, it is tricky to make local changes.
Like Tim Isenheim’s Timeline plug-in for WordPress, it probably makes sense for someone with PHP experience (not me!) to make an Exhibit plug-in as well.
Lastly, the existing Exhibit tutorials are very helpful, but more would also help. (Hint, hint David!) More examples of filtering, lenses, and layout templates would be especially helpful.
However, the most important observation is that Exhibit is a much more useful and flexible presentation format than simple spreadsheets. Though as a first-time experience it took me some trial-and-error to work out the details, it is really very easy and straightforward to add such capabilities to a WordPress blog.
According to its Web site:
“No Database, No Web Application” means that you can create your own exhibits using just a text editor. . . It’s quite easy to make exhibits. We even let you copy data straight out of a boring spreadsheet and convert it into an exhibit automatically. . . .
The advantages of Exhibit are as follows:
- No traditional database technology involved even though Exhibit-embedding web pages appear as if they are backed by databases. So you don’t have to design any database, configure it, and maintain it. After all, if you only have a few dozens of things to publish rather than thousands, why would you spend so much effort in dealing with database technologies?
- No server-side code required even though Exhibit-embedding web pages are heavily templated. So, there is no need to learn ASP, PHP, JSP, CGI, Velocity, etc. There is no need to worry which server-side scripting technology your hosting provider supports.
- No need for web server if you only want to create exhibits and keep them on your own computer for your own use. They work straight from the file system.
We also provide a complementary service called Babel that lets you convert data from various sources, including tab-separated values (copied straight from spreadsheets) and Bibtex files, into formats that Exhibit understands.
The Exhibit Web site offers a growing list of helpful tutorials and some live examples of database-related “exhibits,” one of which is this U.S. Presidents’ example that shows maps, timelines, thumbnails and other nifty displays (see the actual site for the interactive displays):
You can get Exhibit today and embed it in your own Web site (more on this to come!).
To learn more about the background to this project, please see the submitted paper, Exhibit: Lightweight Structured Data Publishing, submitted to WWW, 2007, by David Huynh, Robert Miller, and David Karger.
Gentlemen, on behalf of the community, let me say, “Thanks! Most excellent work!” It’s discoveries like these that make the Internet so worthwhile.