Posted:February 7, 2007

Sweet Tools Listing

This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

An update to the Sweet Tools semantic Web and -related tools listing has been posted. Forty-two tools were added, bringing the running total to 420. As always, please provide any corrections or additions here.

Posted by AI3's author, Mike Bergman Posted on February 7, 2007 at 3:51 pm in Semantic Web Tools | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/335/sweet-tools-updated-to-420-tools/
The URI to trackback this post is: https://www.mkbergman.com/335/sweet-tools-updated-to-420-tools/trackback/
Posted:February 6, 2007

When I first read Roy Fielding’s 2000 Ph.D. thesis a few months back, Architectural Styles and the Design of Network-based Software Architectures, which coined the REpresentational State Transfer (REST) acronym, I was mostly trying to understand what I dimly perceived as one side in an argument between the WS* brand of Web services crowd and a passionate group of “simpler” HTTP-oriented advocates. (Tim Bray’s 2004 piece helps draw the distinction, mostly related to complexity, but also the use of messaging and states with resulting server dependencies.) I found Fielding’s thesis to be very interesting and historically informative about the standards development in the early Web (yes, I read the whole thing), but it was also frankly above my sophistication level and pay grade at the time.

I’ve come to learn much more about these differences and debates in the intervening months. Recently, one of the best sources I’ve discovered that explicates the RESTful approach in a way I find very understandable and approachable is Bill Higgins’ two-part Ajax and REST series on IBM’s developerWorks.

In Part I, Advantages of the Ajax/REST Architectural Style for Immersive Web Applications, Bill talks about the evolution to “immersive” Web applications and how by slipping into server-side architectural dependencies reliablity and scalability are compromised. He also talks about dynamic content and its particular challenges to cacheability and performance. He notes how highly interactive (“immersive”) applications practically forced a server-side approach during the Web’s early development, but that Ajax now allows the return to more stateless server-side and scalable architectures.

“The fact that Ajax lets you interact with a server without a full refresh puts the option of a stateful client back on the table. This has profound implications for the architectural possibilities for dynamic immersive Web applications: Because application resource and data resource binding is shifted to the client side, these applications can enjoy the best of both worlds — the dynamic, personalized user experience we expect of immersive Web applications and the simple, scalable architecture we expect from RESTful applications.”

And, of course, screen refreshes are also reduced, improving the user’s experience. As someone directly responsible for how a very large, individualized app moved in the direction of server-side dominance, these observations loudly resonate.

In Part II, Meeting the Challenges of Ajax Software Development, Bill goes on to discuss how to evaluate and, if warranted, adopt Ajax-style development within organizations that have traditionally used server-side Web applications. His logical recommendations to move forward include starting small, use frameworks and use tools. He perhaps most importantly observes that new team members may need to be recruited, since RESTful mindsets, technologies, architectures and paradigms differ from traditional enterprise development.

Thus, both of Higgin’s articles address the questions of network efficiencies raised by Fielding in his thesis (p. 32):

“An interesting observation about network-based applications is that the best application performance is obtained by not using the network. This essentially means that the most efficient architectural styles for a network-based application are those that can effectively minimize use of the network when it is possible to do so, through reuse of prior interactions (caching), reduction of the frequency of network interactions in relation to user actions (replicated data and disconnected operation), or by removing the need for some interactions by moving the processing of data closer to the source of the data (mobile code [or Ajax!; comment added]).”

One criticism of Ajax is the need to account for various browser differences and incompatibilities. While these differences are real, they are also narrowing and, in any case, still pose differences even for relatively simple HTML or CSS rendering independent of Ajax. I also believe it can be argued that some of the more prominent Javascript libraries such as Dojo or Prototype are themselves doing much to smooth over browser differences. It seems to me that since good Web-wide design already calls for attention to cross-browser differences, that the increase in scalability and resilience occasioned by RESTful designs warrants moving in that direction.

The more fundamental point is whether time to market and development time can co-exist with a start-small, incrementalist approach. Another problem with traditional enterprise development is that the languages and their IDEs and supporting infrastructure have not been designed from the ground up to support RESTful designs and architectures. Ruby on Rails, for example, may be a far superior choice, but it is difficult for existing incumbents with market share to junk their prior investments. One option for existing Java shops could be to carefully re-factor the existing code base and segregate out the Web interaction layer to use JRuby.

As Fielding succinctly states: “Semantics are a by-product of the act of assigning resource identifiers and populating those resources with representations.” Like the mediations of representational states within the Web itself, the interoperability of the semantic Web lends itself to simple conceptual design (though, of course, potentially complex implementation).

(BTW, because of this Higgin’s series I went back and re-read Fielding’s thesis. I now can certainly appreciate more of the arguments. I suspect the Fielding document is one of those that has insights and a mindset that warrant frequent re-reading in the years to come, esp. Chapters 5 and 6. For example, it has really aided my better understanding of the relation to the Rails CRUD approach and its mapping to the HTTP and SQL verbs, or the wider adoption of Javascript v. Java for embedded Web apps.)

Posted by AI3's author, Mike Bergman Posted on February 6, 2007 at 11:41 am in Adaptive Innovation, Software Development | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/333/and-now-you-know-the-rest-of-the-story/
The URI to trackback this post is: https://www.mkbergman.com/333/and-now-you-know-the-rest-of-the-story/trackback/
Posted:February 5, 2007

Collex is the Next Example in a Line of Innovative Tools from the Humanities

I seem to be on a string of discovery of new tools from unusual sources — that is, at least, unusual for me. For some months now I have been attempting to discover the “universe” of semantic Web tools, beginning obviously with efforts that self-label in that category. (See my ongoing Sweet Tools comprehensive listing of semantic Web and related tools.) Then, it was clear that many “Web 2.0” tools potentially contribute to this category via tagging, folksonomies, mashups and the like. I’ve also been focused on language processing tools that relate to this category in other ways (a topic for another day.) Most recently, however, I have discovered a rich vein of tools in areas that take pragmatic approaches to managing structure and metadata, but often with little mention of the semantic Web or Web 2.0. And in that vein, I continue to encounter impressive technology developed within the humanities and library science (see, for example, the recent post on Zotero).

To many of you, the contributions from these disciplines have likely been obvious for years. I admit I’m a slow learner. But I also suspect there is much that goes on in fields outside our normal ken. My own mini-epiphany is that I also need to be looking at the pragmatists within many different communities — some of whom eschew the current Sem Web and Web 2.0 hype — yet are actually doing relevant and directly transferable things within their own orbits. I have written elsewhere about the leadership of physicists and biologists in prior Internet innovations. I guess the thing that has really surprised me most recently is the emerging prominence of the humanities (I feel like the Geico caveman saying that).

Collex is the Next in A Fine Legacy

The latest discovery is Collex, a set of tools for COLLecting and EXhibiting information in the humanities. According to Bethany Nowviskie, a lecturer in media studies at the University of Virginia, and a lead designer of the effort in her introduction, COLLEX: Semantic Collections & Exhibits for the Remixable Web:

Collex is a set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated collections and exhibits mechanism for the semantic web. It allows users to collect, annotate, and tag online objects and to repurpose them in illustrated, interlinked essays or exhibits. Collex functions within any modern web browser without recourse to plugins or downloads and is fully networked as a server-side application. By saving information about user activity (the construction of annotated collections and exhibits) as “remixable” metadata, the Collex system writes current practice into the scholarly record and permits knowledge discovery based not only on the characteristics or “facets” of digital objects, but also on the contexts in which they are placed by a community of scholars. Collex builds on the same semantic web technologies that drive MIT’s SIMILE project and it brings folksonomy tagging to trusted, peer-reviewed scholarly archives. Its exhibits-builder is analogous to high-end digital curation tools currently affordable only to large institutions like the Smithsonian. Collex is free, generalizable, and open source and is presently being implemented in a large-scale pilot project under the auspices of NINES.

(BTW, NINES stands for the Networked Infrastructure for Nineteenth-century Electronic Scholarship, a trans-Atlantic federation of scholars.)

The initial efforts that became Collex were to establish frameworks and process within this community, not tools. But the group apparently recognized the importance of leverage and enablers (i.e, tools) and hired Erik Hatcher, a key contributor to the Apache open-source Lucene text-indexing engine and co-author of Lucene in Action, to spearhead development of an actually usable tool. Erik proceeded to grab best-of-breed stuff in area such as Ruby and Rails and Solr (a faceted enhancement to Lucene that has just graduated from the Apache incubator), and then to work hard on follow-on efforts such as Flare (a display framework) to create the basics of Collex. A sample screenshot of the application is shown below:

The Collex app is still flying under the radar, but it has sufficient online functionality today to support annotation, faceting, filtering, display, etc. Another interesting aspect of the NINES project (but not apparently a programmatic capability of the Collex software itself) is it only allows “authoritative” community listings, an absolute essential for scaling the semantic Web.

You can play with the impressive online demo of the Collex faceted browser at the NINES Web site today, though clearly the software is still undergoing intense development. I particularly like its clean design and clear functionality. The other aspect of this software that deserves attention is that it is a server-side option with cross-browser Ajax, without requiring any plugins. It works equally within Safari, Firefox and Windows IE. And, like the Zotero research citation tool, this basic framework could easily lend itself to managing structured information in virtually any other domain.

Collex is one of the projects of Applied Research in Patacriticism, a software development research team located at the University of Virginia and funded through an award to professor Jerome McGann from the Andrew Mellon Foundation. (“Shuuu, sheeee. Impressive. Most impressive.” — Darth Vader)

(BTW, forgive my sexist use of “guys” in this post’s title; I just couldn’t get a sex-neutral title to work as well for me!)

Jewels & Doubloons An AI3 Jewels & Doubloon Winner
Posted:February 1, 2007

Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary

or, Why is It Always so Easy to Eat Our Young?

In the last couple of weeks I have observed two of the more exciting open source initiatives around — the bibliographic and research citation plug-in Zotero from George Mason University's Center for History and New Media (CHnM) and the semantic Web efforts from MIT's Simile program — juggle some really interesting problems that result from academic institutions committing to and promoting open source projects. Both cases have caused me to raise my appreciation for the efforts involved. Both cases have caused me to temper my expectations from the programs. And both cases perhaps offer some lessons and cautions for others considering entering the bazaar.

Early Growing Pains

Zotero, though still quite young as a project, is tackling a pragmatic issue of citation and research management of interest to virtually every academic and researcher. While capable commercial bibliographic and citation software has existed for many years, there remains much pent-up interest in the community regarding features, standards and ease-of-use. This common need, plus the laudable initial efforts of the project itself, have led to quick adoption and community scrutiny.

Such promise generates excitement and early evangelists. One of those evangelists, however, frustrated with the pace and degree of responsiveness from the program, went public with a call to better embrace Raymond’s “bazaar” aspects of open source (see this reference and its comments). That, in turn, with its comments, spurred another response from one of the Zotero team leaders speaking unofficially called Cathedrals and Bazaars, also in reference to Raymond’s treatise.

Now, granted, this is not my community and I am a total outsider, but there is, I feel, a dynamic here at work that I believe deserves broader attention. Let me get to that in a moment, after citing another example . . . .

An “Army of Programmers”

As any casual reader of my blog has observed, I have been touting MIT's Simile program for the past couple of weeks (they’ve actually deserved the praise for years, but I just had not been aware of them). As part of my interest and involvement, I took the unusual steps for me of joining the project’s mailing list, downloading code, writing to systems, blogging about it, participating on the mailing list, and other atypical efforts. Because of some recent innovations from the program I have discussed elsewhere (in fact more than once), there has been a real spike of interest in the program and general scrutiny and activity from many others besides me.

Like George Mason, MIT is of course an educational research institution committed to training new intellectual leaders and moving their fields forward. Both institutions are doing so with pragmatic and (at times) cutting-edge tools, all offered in the spirit of open source and community involvement. This is not unusual within the academic community, but, on the other hand, is not common, and most often is not done well with respect to real, usable software. Both of these programs do it well.

Committing to this involvement is not trivial. There are wikis providing documentation; there are forums at which ideas get spun and new users get answers; there are blogs to inform the (growing, burgeoning) community of future plans and directions, and (oh, by the way), there are real jobs like teaching, committee involvement, thesis writing, and the real aspects of being either a professor or student.

Meanwhile, if your project is successful within the broader community, those very same forums and blogs and wikis become the venue for any Internet user to request information, assistance, criticisms or new features. To that point, within the MIT program, the shorthand over the past days in dealing with requests for new features has been a call to a mythical “army of programmers.” Right on . . . . This software has, in fact, been developed largely on graduate assistant time and salaries (if such) from a very few individuals.

So, yes, with openness and a successful offering comes the old Chinese curse, and not enough time in the day.

People such as me — not a part of the community but looking globally for answers — jump onto these projects with glee and focus: We’re seeing some really cutting edge stuff being developed by the brightest in their fields! Non-participants from within the community see these efforts and may like (perhaps, even envy) what they see, and may also want to help in the spirit of their communities.

If the project is compelling, the result, of course, is attention, scrutiny and demands. Excitement and interest drives the engine. Promise is now evident, and all who have seen it feel a part of the eventual success of the dream — even though not all (especially all of those on the outside whether academics or not) are in a position to make that dream come real. And most do not truly appreciate what it has taken to get the project to its current point.

So, What’s My Take?

There is a real excitement to goosing the tiger. There is also so much effort to even get close to doing so that few (except those on the inside) can appreciate. The people on the Zotero and Simile projects (and the many like them) know the long hours; are posting answers on forums at 3 am; are patient with the newbies; are struggling with project demands that have nothing at all to do with tenure or getting their degrees; and are juggling a myriad of other demands like families and spouses and other interests.

Yet, that being said, since these fortunate few hold the batons of visible, successful projects at the top of the power curve, I actually don’t truly feel sorry for them. My real point for those of us that constitute the vast majority of outsiders is to be realistic about what we can expect from open source projects run by essentially graduate student labor or, if via paid staff, ones that are likely understaffed and underpaid (though clearly committed) and overworked.

Watching these two exemplary programs has been educational — in the true sense of the word — but also instructive in what it takes to conduct an open source and “open kimono” initiative.

What It Takes to be Successful

I have managed many large-scale, multi-year software projects, but only one in academia and none open source. Any professional software development manager would be amazed at what it now takes to be successful in this arena.

You need, first, of course, to develop the software. That has its standard challenges, but is not so easy within an academic setting since the components themselves need to be best-of-breed, open source, and cutting edge. The project itself may need to be thesis-worthy. Though not imperative, the project choice should be of broad community interest, rightfully anticipating potential job prospects for participating students. Then, the code needs to be magically produced. (I find it truly amazing how many non-coders so cavalierly look to major software projects and apparently dismiss the person-years of efforts already embedded in the code.) Finally, and most difficultly, the resulting software must be posted as open source, with all of the attendant community demands that imposes. In fact, it is this last requirement that is often the hidden, frozen banana.

Obviously, any successful project needs to address a latent need with a sufficient user base. This could be one of thousands of end users or simply dozens of prominent influencers. And, of course, much of open source is tool- or function-based, so that formulation of a successful initiative by no means requires developing an application in the conventional sense. Indeed, successful projects may span from a relatively complete “app” such as Zotero to code libraries or parsers or even standards development. Yet independent of the code base itself, there does seem to need to be some essential moving parts in place to be successful:

  • The Software — it begins here. The software itself must address a need and be written in a current language and architecture, all of which may require sophisticated familiarity with contemporary practices, languages and code bases
  • Documentation — arguably the lack of documentation is the biggest cause of failure for open source projects. While this sounds self-evident, in fact its importance is generally overlooked. The bane of a successful open source project is the demand of the community for new features and functionality, all potentially with inadequate project resources. The only way this conundrum can be broken is through active engagement of the community, which directly requires documentation in order for outsiders to understand the code and then be able to make contributions
  • Wikis and Blogs — effective means for engaging the user (and influencer) community is essential. Besides the time needed to set up the infrastructure, these media require daily care and feeding
  • Forums and Mailing Lists — same as for wikis and blogs, though additionally a nearly full-time moderator is required (could multi-task in other areas if demand is low). New users are constantly discovering the project and asking many of the same familiarization questions over and over. An active forum means that many existing participants can feed the discussion, but again, constant care and feeding of documentation and FAQs is important to reduce duplicated responses
  • Code Management and Tracking — assuming a minority of users desires and is qualified to test or modify code, the general management of this process requires policies, and authorization and version control with enterprise-grade code level management tools (such as Jira, Trac and Subversion, among many)
  • Time — of course, all of this takes time, often robbed from other priorities.

And finally, and most importantly, the project participants need:

  • Grace — it is likely not comfortable for many to open themselves up to the involvement and scrutiny of an open source initiative. Outsiders bring many passions, motivations. levels of knowledge and sophistication; and some see the promise or excitement and want to truly become part of the internal community. Juggling these disparate forces takes an awful lot of grace and patience. Anyone who tackles this job has my admiration . . . .

Why I Like These Two Projects

I am an omnivore for finding new, exciting projects. I have specific aims in mind, and they are (generally) different than what is motivating the specific project developers. Of the many hundreds of projects I have investigated, I think these two are among the best, but for different reasons and with different strengths.

The Zotero project, though early in the process, has all of the traits to be an exemplar in terms of documentation, professionalism, openness and active outreach to its communities. I take the criticisms from some in the community (motivated, I believe, by good will) to be a result of possible frustrations regarding pent-up needs and expectations, rather than the project’s poor execution or arrogance. I think posing the discussion as the dialectic of the cathedral v. the bazaar is silly and does the project and its sponsors a disservice. What looks to be going on is simply the real-world of the open-source sausage factory in the face of constraints.

As for Simile, we are seeing true innovation being conducted in real time in the open. And while some of these innovations have practical applications today, many are still cutting edge and not fully vetted. With the program’s stated aims of addressing emerging computer science challenges, this tension will remain and is healthy. Criticisms of the Simile efforts as “research programming”, I think, miss the boat. If you want to know what is going to be in your browser or influencing your Internet experience a few years hence, look to Simile (among others).

A Final Thought

Now that my spree of global searching for software ideas is somewhat winding down, I am truly taken with the sea change that open source and its spur to innovation is bringing to us. Costs are declining, barriers to entry are lowering, time to completion is shortening, ideas and innovations are blossoming, interoperability is improving, and our world is changing — and for the better. Zotero and Simile are amongst the young that deserve to be nurtured, not eaten.

Posted by AI3's author, Mike Bergman Posted on February 1, 2007 at 9:15 pm in Open Source, Software Development | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/330/seeking-grace-a-not-so-innocent-bystanders-view-of-academic-open-source/
The URI to trackback this post is: https://www.mkbergman.com/330/seeking-grace-a-not-so-innocent-bystanders-view-of-academic-open-source/trackback/
Posted:January 31, 2007

Adopting Standardized Identification and Metadata

Since it’s becoming all the rage, I decided to go legit and get myself (and my blog) a standard ID and enable the site for easy metadata exchange.

OpenID

OpenID is an open and free framework for users (and their blogs or Web sites) to establish a digital identity. OpenID starts with the concept that anyone can identify themselves on the Internet the same way websites do via a URI and that authentication is essential to prove ownership of the ID.

Sam Ruby provides a clear article on how to set up your own OpenID. One of the listing services is MyOpenID, from which you can get a free ID. With Sam’s instructions, you can also link a Web site or blog to this ID. When done, you can check whether your site is valid; you can do a quick check using my own ID (make sure and enter the full URI!). However, to get the valid ID message as shown below, you will need to have your standard (and secure) OpenID password handy.

unAPI

unAPI addresses a different problem — the need for a uniform, simple method for copying rich digital objects out of any Web application. unAPI is a tiny HTTP API for the few basic operations necessary to copy discrete, identified content from any kind of web application. An interesting unAPI background and rationale paper from the Ariadne online library ezine is “Introducing unAPI”.

To invoke the service, I installed the unAPI Server, a WordPress plug-in from Michael J. Giarlo. The server provides records for each WordPress post and page in the following formats: OAI-Dublin Core (oai_dc), Metadata Object Description Schema (MODS), SRW-Dublin Core (srw_dc), MARCXML, and RSS. The specification makes use of LINK tag auto-discovery and an unendorsed microformat for machine-readable metadata records.

Validating the call is provided by the unAPI organization, with the return for this blog entry shown here (which is error free and only partially reproduced):

The actual unAPI results do not appear anywhere in your page, since the entire intent is to be machine readable. However, to show what can be expected from these five formats, I reproduce what the internal metadata listings for the five different formats are in these links:

MARCXML:

https://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= https://www.mkbergman.com/?p=320&format=marcxml

MODS:

https://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= https://www.mkbergman.com/?p=320&format=mods

OAI-DC:

https://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= https://www.mkbergman.com/?p=320&format=oai_dc

RSS:

https://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= https://www.mkbergman.com/?p=320&format=rss

SRW-DC:

https://www.mkbergman.com/wp-content/plugins/unapi/server.php?id= https://www.mkbergman.com/?p=320&format=srw_dc

unAPI adoption is just now beginning, and some apps like the Firefox Dublin Core Viewer add-on do not yet recognize the formats. But, because the specification is so simple and is being pushed by the library community, I suspect adoption will become very common. I encourage you to join!