Posted:January 18, 2007

We Are All Groping for Better and More Authoritative Information

NOTE: After my release of SweetSearch, a Google custom search engine (CSE) devoted to the semantic Web, I have been having numerous offline conversations as to why and the benefits of a CSE. Reproduced below is one of the threads from these email conversations.


Hi Mike -

Forgive my ignorance, but I’m trying to understand the CSE concept. I get that building one enables you to have Google search specific sites for you, but given the intended scope of a general Google search, isn’t the result of a CSE a more limited one, given that it only looks at what you specify? Also, how would a CSE stay current – other than through the manual addition of sites to include?

Thanks in advance.



Thanks for your question.

In my opinion, there are three compelling reasons for a CSE:

  1. The ability to restrict results to sites that you (or others) as an “expert” feels to be trusted, AUTHORITATIVE sources
  2. The ability to “disambiguate” false results. For example, someone interested in golf drivers such as a Big Bertha doesn’t want NASCAR driver, printer driver, pile driver or other ambiguous “driver” results, and
  3. The ability to organize or cluster results into facets useful to your chosen community.

So, yes, while a CSE has fewer results than a standard Google search, it is smaller by eliminating spurious results that don’t meet those conditions above. Correct and smaller is always better than larger and indiscriminant, isn’t it?

Thanks, Mike


Yes. Correct and smaller IS better than bigger and more indiscriminant. Thanks. So it seems that a CSE is more a shortcut for looking where you know, than it is a tool for discovery (in places you don’t know).

It would be interesting if there were a way to draw “clusters” from CSE results, and search within those “clusters” of a general (non CSE) search – either one-off or as a way to fine-tune or update the sites to include in a CSE.


Hi Michael -

I hope you had a chance to think about the brainstorm above. My partner and I have a couple more thoughts – in particular, the way it is now, the scalability relies on the crowd participating to fine tune and grow it. Introducing automation (like what I described above) to the manual part could produce results that are greater than those that either one alone could achieve – don’t you think? Could this be a way of extending/propagating the Semantic Web?



You’re obviously poking at the same issues that got me experimenting with CSEs in the first place. Let me offer some further thoughts.

First, in one of your earlier comments you mentioned that you did not see CSEs as a “discovery” venue. That is likely true for the cognoscenti in a given field, but if authoritatively constructed, then “outsiders” could certainly gain useful discovery. For example, assume a CSE managed by the astrophysics community. If I was interested in black holes, that is a great place for me to go and discover.

Second, the theme of discovery also requires some STRUCTURE. Sometimes this can be a taxonomic or directory structure. Who knew that pulsars were in fact related to black holes? Another structure is the categorization or classification of experts that is the subject of ontologies and controlled vocabularies within the semantic Web.

Third, these things can not be done manually given the massive scale of the Internet, yet automation without operators at the control is also of poor quality and ambiguous. The trick, as I argue within a few different posts in my blog (but for which I offer no truly compelling techniques), is to find human-mediated, semi-automated methods that can scale AND produce quality.

(As for automated clustering, two free examples are Clusty and Carrot2; NLP deserves pages of discussion in its own right.)

Fourth, if THAT is done, then multiples of these expert- or interest-driven communities can be aggregated to produce a more meaningful across-the-board resource for information.

Frankly, I think all of this is eminently doable and is happening today in disparate, disconnected ways. The tools for doing this are right at hand. Venue and packaging are lacking.

I find CSEs to be one technique, among truly many, that can contribute to this vision (though, truthfully, CSEs also have some early growing pains — that is likely why you personally are monitoring the Google Group). But, improvements keep coming and it is easily foreseeable that CSEs, plus MANY other techniques, are nearly at hand to do so, so much.

Information is now universal. Collaboration is now doable and demanded. Authoritativeness remains a challenge, but things like OpenId, OpenURL, labels and certificates (not to mention the efforts of existing “authorities” such as professional societies) will create the new social structures to replace the publisher hegemony and peer review methods of prior generations. In the end, society WILL figure out how to bring authoritativeness to the chaotic, distributed, undisciplined Web.

Thanks, Mike

Posted by AI3's author, Mike Bergman Posted on January 18, 2007 at 10:15 pm in Searching | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: