Posted:March 5, 2008

A Not so Long Wave at Blacklight

Nines

Another Innovative Faceted Browser from UVa and the Humanities

A bit over a year ago I spotlighted Collex, a set of tools for COLLecting and EXhibiting information in the humanities. Collex was developed for the NINES project (which stands for the Networked Infrastructure for Nineteenth-century Electronic Scholarship, a trans-Atlantic federation of scholars). Collex has now spawned Blacklight, a library faceted browser and discovery tool.

Project Blacklight

Blacklight is intended as a general faceted browser with keyword inclusion for use by libraries and digital collections. As with Collex, Blacklight is based on the Lucene/Solr facet-capable full-text engine. The name Blacklight is based on the combination of Solr + UV(a).

Blacklight is being prototyped on UVa’s Digital Collections Repository. It was first shown at the 2007 code4lib meeting, but has recently been unveiled on the Web and released as an open source project. More on this aspect can be found at the Project Blacklight Web site.

Blacklight was developed by Erik Hatcher, the lead developer of Flare and Collex, with help from library staff Bess Sadler, Bethany Nowviskie, Erin Stalberg, and Chris Hoebeke. You can experiment yourself with Blacklight at: http://blacklight.betech.virginia.edu/.

The figure below shows a typical output. Various pre-defined facets, such as media type, source, library held, etc., can be combined with standard keyword searches.

Many others have pursued facets, and the ones in this prototype are not uniquely interesting. What is interesting, however, is the interface design and the relative ease of adding, removing or altering the various facets or queries to drive results:

BlacklightDL

An extension of this effort, BlacklightDL, provides image and other digital media support to the basic browser engine. This instance, drawn from a separate experiment at UVa, shows a basic search of ‘Monticello’ when viewed through the Image Gallery:

Like the main Blacklight browser, flexible facet selections and modification are offered. With the current DL prototype, using similar constructs from Collex, there are also pie chart graphics to show the filtering effects of these various dimensions (in this case, drilling down on ‘Monticello’ by searching for ‘furniture’):

BlacklightDL is also working in conjunction with the OpenSource Connections (a resource worth reviewing in its own right).

Blacklight has just been released as an open source OPAC (online public access catalog). That means libraries (or anyone else) can use it to allow people to search and browse their collections online. Blacklight uses Solr to index and search, and uses Ruby on Rails for front-end configuration. Currently, Blacklight can index, search, and provide faceted browsing for MaRC records and several kinds of XML documents, including TEI, EAD, and GDMS; the code is available for downloading here.

Faceted Browsing

There is a rich and relatively long history of faceted browsing in the humanities and library science community. Notably, of course is Flamenco, one of the earliest dating from 2001 and still active, to MIT’s SIMILE Exhibit, which I have written of numerous times. Another online example is Footnote, a repository of nearly 30 million historical images. It has a nice interface and an especially nifty way of using a faceted timeline. Also see Solr in Libraries from Ryan Eby.

In fact, faceted browsing and search, especially as it adapts to more free-form structure, will likely be one of the important visualization paradigms for the structured Web. (It is probably time for me to do a major review of the area. 🙂 )

The library and digital media and exhibits communities (such as museums) are working hard at the intersection of the Web, search, display and metadata and semantics. For example, we also have recently seen the public release of the Omeka exhibits framework from the same developers of Zotero, one of my favorite Firefox plug-ins. And Talis continues to be a leader in bringing the semantic Web to the library community.

The humanities and library/museum communities have clearly joined the biology community as key innovators of essential infrastructure to the semantic Web. Thanks, community. The rest of us should be giving more than a cursory wave to these developments.

* * *

BTW, I’d very much like to thank Mark Baltzegar for bringing many of these initiatives to my attention.

Posted:March 2, 2008

Glut: Information Underload for Information Overload

Wright’s Book Has Strong Scope, Disappointing Delivery

When I first saw the advanced blurb for Glut: Mastering Information through the Ages by Alex Wright I thought, “Wow, here is the book I have been looking for or wanting to write myself.” As the book jacket explains:

Spanning disciplines from evolutionary theory and cultural anthropology to the history of books, libraries and computer science, Wright weaves an intriguing narrative that connects such seemingly far-flung topics as insect colonies, Stone Age jewelry, medieval monasteries, Renaissance encyclopedias, early computer networks, and the World Wide Web. Finally, he pulls these threads together to reach a surprising conclusion, suggesting that the future of the information age may lie deep in our cultural past.

Wham, bang! The PR snaps with promise and scope!

These are themes that have been my passion for decades, and I ordered the book as soon as it was announced. It was therefore with great anticipation that I cracked open the cover as soon as I received it. (BTW, the actual date of posting for this review is much later only because I left this review in draft for some months; itself an indication of how, unfortunately, I lost interest in it. 🙁 ).

Otlet is a Gem

The best aspect of Glut is the attention it brings to Paul Otlet, quite likely one of the most unique and overlooked innovators in information science in the 20th century. Frankly, I had only an inkling of who Otlet was prior to this book, and Wright provides a real service by bringing more attention to this forgotten hero.

(I have since gone on to try to learn more about Otlet and his pioneering work in faceted classification — as carried on more notably by S. R. Ranganathan with the Colon classification system — and his ideas behind the creation of the Mundaneum in Brussels in 1910. The Mundaneum and Otlet’s ideas were arguably a forerunner to some aspects of the Internet, Wikipedia and the semantic Web. Unfortunately, the Mundaneum and its 14 million ‘permanent encyclopedia’ items were taken over by German troops in World War II. The facility was ravaged and sank into obscurity, as did Otlet’s reputation, who died in 1944 before the war ended. It was not until Boyd Rayward translated many of Otlet’s seminal works to English in the late 1980s that he was rediscovered.)

Alex Wright’s own Google Tech Talk from Oct. 23, 2007, talks much about Otlet, and is a good summary of some of the other topics in Glut.

Stapled Book Reviews

The real disappointment in Glut is the lack of depth and scholarship. The basic technique seemed to be find a prominent book on a given topic, summarize it in a popularized tone, sprinkle in a couple of extra references from the source book relied on for that chapter to show a patina of scholarship, and move on to the next chapter. Then, add a few silly appendices to pad the book length.

So, we see, for example, key dependence on a relative few sources for the arguments and points made. Rather than enumerate them here, one approach if interested is to simply peruse the expanded bibliography on Wright’s Glut Web site. That listing is actually quite a good basis for beginning your own collection.

Books are Different

It seems like today, with blogging and digital content flying everywhere, that a greater standard should be set for creating a book and asking the buying public to actually pay for something. That greater standard should be effort and diligence to research the topic at hand.

I feel like Glut is related to similar efforts where not enough homework was done. For example, see Walter Underwood, who in his review of the Everything is Miscellaneous (not!) book, chastises author David Weinberger on similar grounds. (A conclusion I had also reached after viewing this Weinberger video cast.)

In summary, I give Wright an A for scope and a C or D in execution and depth. I realize that is a pretty harsh review; but it is one occasioned by my substantially unmet high hopes and expectations.

The means by which information and document growth has come to be organized, classified and managed have been major factors in humanity’s progress and skyrocketing wealth. Glut‘s skimpy hors d’œuvre merely whet the appetite: the full historical repast has yet to be served.

Posted:February 29, 2008

Upgrades, and a Stroll Down Memory Lane

Resurrecting Old Posts Brings a Smile, and Some Shudders

(Holy Leap Year, Batman!)

I’ve stated many times I hate WordPress upgrades. I know the sponsors have tried to make it easier over time, but upgrades are still painful, wrought with risk and error, and always force me to research and figure out what went wrong.

Why the Upgrade?

I last upgraded to WP v. 2.2.1, and with a real rant to accompany it.

Since then, some of us had been seeing some insidious stuff getting inserted into our RSS feeds, but had not been able to stem it. Then, I was doing my normal morning systems check and saw that my site was completely down, completely blank. Grrrr. Who knows what that specific problem was.

Version 2.3.3. had been announced with a fix for the RSS feed spam problem, so, rather than trying to diagnose and fix my current version, it was time to upgrade. (Grrrr.)

But, then I realized, possibly by doing so, I might also see a fix to a longstanding issue I had had with plug-ins somehow limiting my chronological listing of past posts. (Hooray!) That one had really been sticking in my craw, had caused me to de-activate some plug-ins I thought useful, and had led to only a handful of prior posts appearing.

The Benefits (sort of)

So, the upgrade was made. Sadly, no problems (other than the XML-RPC implementation issue) were solved. And, unfortunately, my chronological listings still only displayed when throttled back to the past 30 or so. (Grrrr.)

Well, s**t. So after (for what was for me, with some of my more complicated site aspects) nearly a two hour minor upgrade, the only real benefit I or my readers would see is that the site was no longer blank! This hardly looked like a good deal.

So, assuming the chronology problem fix was not near at hand, I decided to manually add the past entries back to my chronology page. (Actually, this sounds worse than it really is since I have learned some quick tricks for gleaning listings from other sites; I just turned those techniques on my own blog!). While grinding teeth to nubs, I did what everyone who works intimately with software often does: I did the workaround.

So, now all full listings have been restored (though still with some recent postings overlap; Grrrr).

What brought a smile was seeing some posts from a year or two ago that I liked and had completely forgotten; some others brought a shudder. Here are some older personal favorites:

Nonetheless, now all 250 or so posts on my site from Day 1 in early 2005 can now be seen again; it has been awhile! 🙂

The Problems

Naturally, that was not the end of the saga.

After making the upgrade, I noticed that all category listings and lookups had been wiped off my blog. I could see them in the MySQL and the editor still had the listing, but the site itself and the admin panel were blank.

Grrrr. (Try to stay calm and not panic.)

It’s another one of those deals where it is time to search like crazy and hope that someone more knowledgable than me has encountered the same problem and fixed it. Sure enough, in an obscure reference, I got the glimmer that maybe re-starting MySQL could fix the problem.

Well, it did. But go figure. . . .

Advanced TinyMCE

Thankfully, my Advanced TinyMCE plug-in that gives more editing functions works great for me in WP v. 2.3.3. At least that is a relief!

And so, we end on an anti-Grrrr note. 🙂 Sweet dreams.

Posted:February 20, 2008

So, What Might The Web’s Subject Backbone Look Like?

Here’s a Sneak Peek at Some UMBEL Subject Graphs

We are proceeding apace with the first release of the UMBEL (Upper-level Mapping and Binding Exchange Layer) lightweight subject concept ontology. The internal working version presently has 21,580 subject nodes, though further review will certainly change that number before public release of the first draft.

UMBEL defines “subject concepts” as a distinct subset of the more broadly understood concept such as used in the SKOS RDFS controlled vocabulary or formal concept analysis or the very general concepts common to some upper ontologies. Subject concepts are a special kind of concept: ones that are concrete, subject-related and non-abstract. We further contrast these with named entities, which are the real things or instances in the world that are members of these subject concept classes.

Thus, in UMBEL parlance, there are abstract concepts, subject concepts and named entities.

The “backbone” to UMBEL is its set of these reference (“canonical” if you will) subject concepts. These subject concepts are being derived from the OpenCyc version of the Cyc knowledge base. The resulting 22 K nodes of this subject structure are related via the predicates of subclassof and type; these are the graph’s edges. The graph pictures herein are the first glimpse of this UMBEL backbone structure.

The Deep Dive

We can take the full network graph and do a bit of simulation of diving deep into its structure, as the following figures show.

The Big Graph

So, here is the big graph, with all nodes and edges (blue) displayed. This is just about at the limit of our graphing program, Cytoscape, which we estimate is limited to about 30 K nodes:

The Top 750

Through the manipulation of the topological coefficient, which is a relative measure for the extent to which a node shares neighbors with other nodes, we can zoom in on the Top 750 (actually, 759!) node gateways or hubs. There are other ways to evaluate key nodes in a network, but this one fairly nicely approximates the upper structure or hierarchy within the graph:

The Top 350

By tightening the coefficient further, we can get a view of the Top 350 (actually, the top 336). Were the system live and not a captured jpeg, we could zoom in and read the actual node labels.

Two Degrees of Separation: Saab Example

The real value from a graph structure, of course, is that now we can make selections based on relationships, neighbors and distances for various reasoning, inference or relatedness purposes. This diagram begins by inputting “saab” as my car concept, and then getting all nodes within two links:

The Saab Neighborhood

Alternatively, for the same “saab” car concept, I asked for all directly related links (in yellow) and did some pruning of car types to make the subgraph more readable and interesting:

This ability to manipulate and navigate this large subject backbone at will should bring immense benefits. And, because of its common sense grounding, the early explorations of this first-glimpse UMBEL structure look very logical and clean.

UMBEL Status

Once we complete the next packaging and draft release steps, anyone will be able to play with and manipulate this UMBEL structure at will. The ontology and the tools we are using to manipulate it are all open source.

Our next steps on UMBEL will have us publishing the technical report (TR) of how we screened and vetted the subject concepts from the Cyc knowledge base, using an updated OpenCyc version. That document will hopefully gain some broader review and scrutiny for the canonical listing of subject concepts.

Of course, all of that is merely leading up to the Release 0 of the published ontology. We are working diligently to get that posted as well in the very near future.

A Note on the Graphs

These graphs were built using the super Cytoscape large-graph visualization framework, which I previously reviewed with glowing praise. The subgraph extractions were greatly aided by a fantastic add-in called NetworkAnalyzer from the Max-Planck-Institut fÃ¼r Informatik. I will be writing more about this add-in at a later time, including some guidance for how to use it for meaningful ontology analysis. But, in the meantime, do check this add-in tool out. Mucho cool, and another winner !

Spot On Semantic Web and Linked Data

Danny Ayers Has Penned a Must Read Semantic Web 101

Danny Ayers has just posted perhaps the best and most succinct explanation of the semantic Web and its relation to Linked Data I have seen: Semantic Web…in a Nutshell?

Please, all, I encourage you to read the bottom portion of this short posting carefully and to bookmark it for future reference. It doesn’t get much shorter or sweeter than this.

Nice job, Danny! Now, do you care to take on the update of the layer cake? 🙂

Main Links

Search

Author: Mike Bergman