
Pandora’s Music Genome Project is Another Example of the Pragmatic Semantic Web
An initiative now gaining buzz — and one that doesn’t once mention the semantic Web — is showing how metadata characterization can bring new discovery and benefits to music listening. Pandora, which bills itself as your personal DJ, has a fantastic online music listening experience organized through a series of random shuffle-play “channels,” including those you can design and share yourself. The system presently contains more than 400,000 songs from more than 20,000 artists, most contemporary.
The Music Genome Project is the basis for the system’s recommendations. It was started by a group of musicians in 2000 with the aim of assembling the most comprehensive database of music analysis yet created. According to its Web site:
Together we set out to capture the essence of music at the most fundamental level. We ended up assembling literally hundreds of musical attributes or “genes” into a very large Music Genome. Taken together these genes capture the unique and magical musical identity of a song – everything from melody, harmony and rhythm, to instrumentation, orchestration, arrangement, lyrics, and of course the rich world of singing and vocal harmony. It’s not about what a band looks like, or what genre they supposedly belong to, or about who buys their records – it’s about what each individual song sounds like.
The basic Pandora service is shown in this screen shot:

I’ve created my own radio “channel” emphasizing Eric Clapton and specifically his blues interests with other blues artists (Clapton + Friends). The more you listen and refine, the better the shuffle play delightfully surprises. Pandora offers both free, persistent channels (with advertising) and (nominal) paid channels without advertising. (Refinement is not strictly necessary, but it overcomes the licensing limitation that prevents one from skipping too many songs in a given hour.)
Here’s the email that Pandora creates to send to friends (plus the specific reference to my Clapton channel) that you are welcomed to try out:
Hi,
I wanted to let you know about Pandora, a free internet radio site based on the Music Genome Project. It helps you discover and listen to great music. Here’s the link if you’d like to check it out:
Just tell Pandora the name of your favorite song or artist, and it will create a radio station that plays songs with similar musical attributes.
Here’s a link to my profile page. From this page, you can listen to my stations and check out new music I’ve found on Pandora:
BTW, I’d like to thank a posting by Michael Daconta for turning me on to the Pandora service.
Custom Search Engines (CSEs) Moving Forward at Internet Speed
Since its release a mere two months ago in late October, Google’s custom search engine (CSE) service, built on its Co-op platform, has gone through some impressive refinements and expansions. Clearly, the development team behind this effort is dedicated and capable.
I recently announced the release of my own CSE — SweetSearch — that is a comprehensive and authoritative search engine for all topics related to the semantic Web and Web 2.0. Like Ethan Zuckerman who published his experience in creating a CSE for Ghana in late October, I too have had some issues. Ethan’s first post was entitled, “What Google Coop Search Doesn't Do Well,” posted on October 27. Yet, by November 6, the Google Co-op team had responded sufficiently that Ethan was able to post a thankful update, “Google Fixes My Custom Search Problems.” I’m hoping some of my own issues get a similarly quick response.
Fast, Impressive Progress
It is impressive to note the progress and removal of some early issues in the last two months. For example, early limits of 1,000 URLs per CSE have been upped to 5,000 URLs, with wildcard pattern matches improving this limit still further. Initial limits to two languages have now been expanded to most common left-to-right languages (Arabic and Hebrew are still excluded). Many bugs have been fixed. The CSE blog has been a welcome addition, and file upload capabilities are quite stable (though not all eventual features are yet supported). The Google Co-op team actively solicits support and improvement comments (http://www.google.com/support/coop/) and a useful blog has been posted by the development team (http://googlecustomsearch.blogspot.com/).
In just a few short weeks, at least 2,100 new CSEs have been created (found by issuing the advanced search query, ‘site:http://google.com/coop/cse?cx=‘ to Google itself, with cx representing the unique ID key for each CSE). This number is likely low since newly created or unreleased CSEs do not appear in the results. This growth clearly shows the pent up demand for vertical search engines and the desire for users to improve authoritativeness and quality. Over time, Google will certainly reap user-driven benefits from these CSEs in its own general search services.
My Pet Issues
So, in the spirit of continued improvement, I offer below my own observations and pet peeves with how the Google CSE service presently works. I know these points will not fall on deaf ears and perhaps other CSE authors may see some issues of their own importance in this listing.
The best way to get a listing of current CSEs still appears to be using the Google site: query above matched with a topic description, though that approach is not browsable and does not link to CSEs hosted on external sites.
Yet, despite these quibbles, this CSE service is pointing the way to entirely new technology and business models. It is interesting that the Amazon S3 service and Yahoo!’s Developer Network are experimenting with similar Internet and Web service approaches. Let the fun begin!
How sweet it is!
I am pleased to announce the release of the most comprehensive and authoritative search engine yet available on all topics related to the semantic Web and Web 2.0. SweetSearch, as it is named, is a custom search engine (CSE) built using Google’s CSE service. SweetSearch can be found via this permanent link on the AI3 blog site. I welcome suggested site additions or improvements to the service by commenting below.
SweetSearch Statistics
SweetSearch is comprised of 3,736 unique host sites containing 4,038 expanded search locations (some hosts have multiple searchable components). Besides the most authoritative sites available, these sites include comprehensive access to 227 companies involved in the semantic Web or Web 2.0, more than 3100 Web 2.0 sites, 53 blogs relating specifically to these topics, 101 non-profit organizations, 219 specific semantic Web and related tools, 21 wikis and other goodies. Search results are also faceted into nine different categories, including papers, references, events, organizations, companies, tools, etc.
Other Semantic Web CSE Sites
SweetSearch is by no means the first Google CSE devoted to the semantic Web and related topics — but it may be the best and largest. Other related custom search engines (with the number of URLs they search) are Web20 (757 sites), the Web 2.0 Search Co-op (310), the University of Maryland’s Baltimore Campus (UMBC) Ebiquity service (65), Elias Torres’ site (160), Andreas Blumauer’s site (20), Web 20 Ireland (67), NextGen WWW (21), and Sr-Ultimate (4), among others that will surely emerge.
General Resources
Besides the general Google CSE site, the development team’s blog and a the user forum for a group of practitioners are also good places to learn more about CSEs.
Vik Singh’s AI Research site is also very helpful in related machine learning areas, plus he has written a fantastic tutorial on how to craft a powerful technology portal using the Google CSE service.
Contributors and Comments Welcomed!
I welcome any contributors who desire to add to SweetSearch. See this Google link for general information about this site; please contact me directly at the email address in the masthead if you desire to contribute. For suggested additional sites or other comments or refinements, please comment below. I will monitor these suggestions and make improvements on a frequent basis.
Thanks to a post from NewsForge on Open source search technology goes beyond keywords, I was directed to a description of the Semantic Indexing Project at Middlebury College. Aaron Coburn, the lead developer of the project, says his team is currently documenting its open source search toolkit and finishing up a new desktop search application that should be released later this month. From the project Web site:
The National Institute for Technology in Liberal Education (NITLE) and Middlebury College have been experimenting with algorithms to help unstructured data organize itself into conceptually useful categories without human intervention. Part of our motivation is to find an alternative to spending prohibitive amounts of time and money on marking up course materials, documents, and online collections with metadata by hand. For many of the most common markup standards in use today, such as SCORM or Dublin Core, it can actually take longer to create markup than it did to create the course materials themselves.
The method being applied is a more scalable variant of latent semantic indexing that the team calls contextual network graphing. A PDF paper from the project, Semantic Search of Unstructured Data using Contextual Network Graphs by Maciej Ceglowski, Aaron Coburn and John Cuadrado explains this promising technique in greater detail and notes its debt to a 1981 Ph.D. dissertation by Scott Preece at the University of Illinois describing an almost identical technique under the name spreading activation search.
The Semantic Indexing Project is an umbrella effort over a number of subsidiary projects including a blog census, literary analysis tool, refinement of search and clustering algorithms, bioinformatics, use of ontologies, and semantic relationship visualization through a Semantic Explorer, as this example shows:

All of the source code is available for download from the project, published under the terms of the GNU General Public License. The project’s core technology is the Semantic Engine, which is distributed with its C++ code, Perl bindings, and all the necessary code for building the GUI. A new desktop application, called the the Standalone Engine, will be available later this month.
This work looks very, very promising as a step forward to bringing automation to semantic Web markup, among related advantages deriving from tagged documents.
The late Douglas Adams, of Doctor Who and A Hitchhiker’s Guide to the Galaxy fame, produced an absolutely fascinating, prescient and entertaining TV program 16 years ago for BBC2 presaging the Internet. Called Hyperland (see also the IMDB write up), this self-labelled ‘fantasy documentary’ 50-min video from 1990 can now be seen in its entirety from Google video. Mind you, this was well in advance of the World Wide Web (remember the source for ‘www’?) and the browser, though both that name and hypertext are liberally sprinked thrughout the show.
The presentation, written by and starring Adams as the protoganist having a fantasy dream, features Tom, the semantic simulacrum (actually, Tom Baker from Doctor Who), who is the “obsequious, and fully customizable” personal software agent who introduces, anticipates and guides Adams through what in actuality is a semantic Web of interconnected information. Laptops (actually an early Apple), pointing devices, icons and avatars sprinkle this tour de force in an uncanny glimpse into the (now) future.
Sure, some details are gotten wrong and perhaps there is a bit too much emphasis (given today’s realities) on virtual reality, but the vision presented is exactly that promised by the semantic Web and an interconnected global digital library of information and multimedia. Wow! And entertaining and fun to boot!
This is definitely Must See TV!
I’d like to thank Buzzsort for first writing about the availability of this video. Apparently fans and aficiandos have been clamoring for some time to see this show again, which has only recently been posted. Indeed, the access to an archived video such as this is a great example of Hyperland coming to reality.
![]() |
An AI3 Jewels & Doubloon Winner |