Posted:January 2, 2007

How sweet it is!

I am pleased to announce the release of the most comprehensive and authoritative search engine yet available on all topics related to the semantic Web and Web 2.0. SweetSearch, as it is named, is a custom search engine (CSE) built using Google’s CSE service. SweetSearch can be found via this permanent link on the AI3 blog site. I welcome suggested site additions or improvements to the service by commenting below.

SweetSearch Statistics

SweetSearch is comprised of 3,736 unique host sites containing 4,038 expanded search locations (some hosts have multiple searchable components). Besides the most authoritative sites available, these sites include comprehensive access to 227 companies involved in the semantic Web or Web 2.0, more than 3100 Web 2.0 sites, 53 blogs relating specifically to these topics, 101 non-profit organizations, 219 specific semantic Web and related tools, 21 wikis and other goodies. Search results are also faceted into nine different categories, including papers, references, events, organizations, companies, tools, etc.

Other Semantic Web CSE Sites

SweetSearch is by no means the first Google CSE devoted to the semantic Web and related topics — but it may be the best and largest. Other related custom search engines (with the number of URLs they search) are Web20 (757 sites), the Web 2.0 Search Co-op (310), the University of Maryland’s Baltimore Campus (UMBC) Ebiquity service (65), Elias Torres’ site (160), Andreas Blumauer’s site (20), Web 20 Ireland (67), NextGen WWW (21), and Sr-Ultimate (4), among others that will surely emerge.

General Resources

Besides the general Google CSE site, the development team’s blog and a the user forum for a group of practitioners are also good places to learn more about CSEs.

Vik Singh’s AI Research site is also very helpful in related machine learning areas, plus he has written a fantastic tutorial on how to craft a powerful technology portal using the Google CSE service.

Contributors and Comments Welcomed!

I welcome any contributors who desire to add to SweetSearch. See this Google link for general information about this site; please contact me directly at the email address in the masthead if you desire to contribute. For suggested additional sites or other comments or refinements, please comment below. I will monitor these suggestions and make improvements on a frequent basis.

Posted by AI3's author, Mike Bergman Posted on January 2, 2007 at 8:07 pm in Searching, Semantic Web, Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/308/authoritative-sweetsearch-semantic-web-and-web-20-custom-search-engine/
The URI to trackback this post is: http://www.mkbergman.com/308/authoritative-sweetsearch-semantic-web-and-web-20-custom-search-engine/trackback/
Posted:December 20, 2006

In my earlier Pro Blogging Guide (which is beginning to get long in the tooth, though it remains very popular) I documented the process of setting up my own instance of WordPress, the very popular open source blogging software. My memories of that effort were a little painful because of the need to set up a local server and sandbox for testing that site before it went live. In fact, one whole chapter in the Guide was devoted to that topic alone.

Well, I’ve just completed another hurdle, and that is moving from a company-hosted Web site and server to one that I own and manage on my own. I’m sure I could have made this easier on myself, but, actually, I wanted to learn the ropes and become self-reliant. I’ll be posting more of the specifics of this transfer, but here are the major areas that I needed to understand and embrace:

  • For a multitude of reasons, I decided I wanted complete control of my environment, but at acceptable cost. That led me to decide on going with a virtual private server (VPS, also sometimes called a virtual dedicated server) wherein the user/owner has total “root” control. This was not too dissimilar from the virtual private network experience I had with my previous company. What a VPS means is that you have a footprint and total software installation and configuration control as if you owned the server, all accomplished remotely. Thus, I needed to research providers, services, responsiveness, etc. Per my SOP, I created spreadsheets and weighting matrices to help decide my choices. (A great resource for such discussion is webhostingtalk.com.) My goal was to spend no more than $30 per month; mission accomplished!
  • Then, I also decided I did not want to be hooked into proprietary Microsoft software. I’m looking to low-cost scalability in my new venture, and clearly no one with a clue is using MS for Internet-based ventures. Thus, I needed to decide a flavor of Linux, needed to figure out a whole new raft of software and utilities geared to remote administration and standard computer management (editors, file managers, transfer utilities, etc., etc.), and then, most importantly, I needed to start learning CLI ( command-line interfaces). In so many ways, it felt like returning to the womb. I’ve been so GUI-ized and window-fied for more than 10 years that I felt like a stroke victim learning to speak and walk again! But wow, I like this stuff and it is cool! In fact, it feels “purer” and “cleaner” (including such excesses as using emacs or vi(m) again!)
  • These commitments made, then choices were necessary to start making the decisions actionable
  • We’re now into the real nitty-gritty of open source, where LAMP comes into play. First, you need the OS (Linux, CentOS 4 in my case). Then, you need the Web server (Apache). Then, because I’m using WordPress, PHP needs to be installed. This has the side benefit of also allowing phpMyAdmin, a useful MySQL management framework. Oh, of course, that also means that a database to support all of this is needed, which again for WP is MySQL. That requires utilities for database transfer, backup and restoration. Unix provides an entirely different way to understand and manage permissions and privileges, also meaning more learning. Then, all of this infrastructure environment needs to be tested and then verified as working with a clean WP install. (One useful guide I found, based on Windows but still applicable, is Jeff Lundborg’s Apache Guide.)
  • However, big problem, my WP database is apparently larger than most, about 150 MB in size! (I like long posts and attachments.) The standard mechanisms for most blogs fail at these scales, including the phyMyAdmin approaches and a WP plug-in called skippy. I was going to have to deal directly with MySQL, so I began learning its CLI syntax. Again, on-line guides for such backup and restore purposes can do the trick
  • Then, and only then, I began migrating my blog set-up specifics (pretty nicely isolated within WP) and the database. Actually, here is where I had my first pleasant surprise: the CLI utilities for MySQL (which are really the same bash-like stuff that makes the environment so productive) work beautifully!
  • These efforts then needed to be updated with respect to other dependencies within my blog referring to links, or other internal but no longer applicable references, to make the entire new environment now appropriately self-contained and integral. This actually requires cruising through the entire blog site, with specific attention to pages over posts, to ascertain integrity
  • Now, with all of that getting choked down, I also decided to update my static pages and some of the layout and also to upgrade to WP version 2.05 (from 2.04, very straightforward), and then
  • Finally, the transfer of the domain and name servers to create the new hosting presence (not to mention email accounts, a challenge of a different order not further mentioned here). The domain transfer may take a few days, complicated if you also need to transfer domain registrars (something which I also needed to do).

This latter point actually is a challenge. Internal WP links from your blog require your hosting URLs to be integral. However, if you understand this, and are able to use IP addresses (216.69.xxx.xxx in my blog’s case) during development, you can actually use the delayed transfer time between registrars to your benefit as you work out details. Again, it’s a matter of perspective. A delay in registration transfer actually gives you a free sandbox for getting the bugs worked out!

So, is this stuff painful? Yes, absolutely, if this is a one-off deal. In my case, however, the real pay-off will come (is coming) from using a transfer such as this as a real-world exercise in learning and exposure: Linux, own-hosting, tools, scripting. Seen in that light, this effort has been tremendously humbling and rewarding.

And, so, you are now seeing the fruits of this transfer! I will expand on specific steps in this process in future postings.

Posted by AI3's author, Mike Bergman Posted on December 20, 2006 at 12:46 pm in Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/307/its-moving-day/
The URI to trackback this post is: http://www.mkbergman.com/307/its-moving-day/trackback/
Posted:December 17, 2006

After six fantastic years with BrightPlanet, I am no longer an employee (CTO) of the company nor chairman of the Board. I felt the company should go in one direction; the Board felt otherwise . . . . Such events, while not prosaic, are also not uncommon. I wish the company all possible success. It is now time to move on. . . .

Even though comparatively small, BrightPlanet is being challenged, as are all software companies today, in managing the transition to (still another) brave new world. Think about the major generational shifts of the past 15 years: personal computers, local networking, Internet browsers and thin clients, Internet ubiquity, open source, now Web 2.0. Certainly other items could be listed in that progression, but the general point remains that the pace of software and computing technology development has been furious and relentless.

These challenges are huge, and have resulted in technology shifts literally measured in months, not years. It is not for small reason that today’s buzzwords include agile, productive and efficient. My goodness, as few as eight years ago, choosing to commit to Java for production-scale enterprise development was considered by some risky and radical; today, some may argue that Java is becoming passé and dynamic languages such as Ruby and DSLs such as Rails hold the keys to the future.

Young Turk to Old Fart

I laugh now about that (truly) instantaneous moment when one morphs from being a Young Turk to an an Old Fart. (I myself passed that breakpoint long ago.) I remember with pride having the Y.T. moniker when in my teens and twenties. We see it still today. One of the things, however, that has blown my mind in the past 5-6 years is the age of the next successful generation. Look at the ages of Brin, Page, Ross, Cannon-Brookes, Farquhar, Hansson, and many others (please forgive me if your name is not on the list), who are (or will be) hauling down some serious dough at very young ages. Now, as an older guy (‘Old Fart’), I have to ask myself whether I can play in this new game. (I guess the best that I can say in that regard is that the world is not populated entirely with my daughter’s friends, but all of us can learn from this newest generation more efficient and agile ways of doing the old tasks.)

The Horizon Ahead

The horizon ahead is one of those places where I truly think I DO have a clue. When one sees multiple major shifts of stuff over many years, it is not too difficult (though some may miss it) to see some major trends. I don’t have the time (nor inclination nor luck nor skill) to write another Peter’s In Search of Excellence, the biggest business book of all time, even assuming I could write that simply or hit the lottery. But, a close reading of trends suggests the pending convergence of open source, semantic tagging and mediation, interoperability, agile development, social collaboration, and mechanisms to assign authoritativeness to information. This convergence will be democratic with a small D, disruptive and rapid. Fasten your seat belts . . . .

Trying ‘Web Scientist’ on for Size

As for myself, I am now on my own and not running a company for the first time in 12 years. I am striking out more directly into the semantic Web — directions that have clearly been my passion on this blog over the past few months. Though I have taken up the obligatory consultant shingle (after all, we all must eat) for the time being, I have also taken on the moniker of ‘Web Scientist’ on my new email signature.

As the person who first explicated and coined the term “deep Web”, the person who wrote the Web’s most popular search tutorial in its early years, and the person who helped bring into being many of the automation techniques and bots for accessing dynamic Web content, I feel pretty comfortable with that label. I also especially like that TBL and others have put a marker out there to give the title some legitimacy. (See Creating a Science of the Web by Tim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt and Daniel J. Weitzner in Science 313(11), 11 August 2006.) (See also this recent NYT article.)

I’ll now see how it feels to have the Web scientist label for a while.

For those of you who have been faithful readers since I put this blog out now more than a year ago, you know that my abiding passion has been effective information use and management in relation to the Internet. I look forward to further discussions with you on these very same topics in the months ahead.

Posted by AI3's author, Mike Bergman Posted on December 17, 2006 at 10:40 am in Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/296/trying-web-scientist-on-for-size/
The URI to trackback this post is: http://www.mkbergman.com/296/trying-web-scientist-on-for-size/trackback/
Posted:December 12, 2006

As a vehement moderate (or perhaps a non-academic researcher), I very much enjoyed a recent podcast by Tom Morris looking at the intersection of current tagging systems and other more “unstructured” Web data practices with the more “structured” semantic Web end of the spectrum. Tom’s perspective is very realistic and pragmatic about where current trends are heading.

Some of Tom’s pithy quotes are:

“It is not a choice between one single categorization system and no categorization system . . . . We need to build categorization systems that scale . . . . We need to find a way to bridge the gap between simple and really complex stuff . . . . Web standards are slowly making their way into the consciousness of [Web] designers and their clients.”

What is refreshing about Morris’ perspective is that it avoids the polar advocacies and recognizes inexorable trends. The semantic Web is inevitable because it brings value to users (the “demand side”). It is not happening at the pace nor with the perfection that some computer science advocates may like because that vision is overly complicated and academic. It is happening in the incremental ways of tagging and now microformats that are consistent with the simplicity imperatives that have made the Web what it is.

Tools and tipping points are near at hand for when the network effect of better data-enabled Web pages will finally take hold. Yes, there are issues and hurdles, but much of what is now so exciting about current Web developments is at heart the first expresssions of these trends.

(I do recommend you skip the first 7-minutes of the podcast where Morris is clearing his throat about his planned podcast series.) To listen to Tom’s podcast, you may click here.

Posted by AI3's author, Mike Bergman Posted on December 12, 2006 at 12:12 pm in Semantic Web | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/306/the-pragmatic-semantic-web/
The URI to trackback this post is: http://www.mkbergman.com/306/the-pragmatic-semantic-web/trackback/
Posted:November 17, 2006

Getting the Words Right

There has been some laudable progress in test-driven development (TDD), leading to what is now being touted as “behaviour-driven development” (note the English spelling). Two key proponents of this approach have been Dave Astels and Dan North, obviously among others, in setting up the BDD organization.

According to Dave’s first posting on this subject more than a year ago:

Maybe 10% of the people I talk to really understand what [TDD is] really about. Maybe only 5%. That sucks. What’s wrong? Well… one thing is that people think it’s about testing. That’s just not the case.

Sure, there are similarities, and you end up with a nice low level regression suite… but those are coincidental or happy side effects. So why have things come to this unhappy state of affairs? Why do so many not get it?

The thing about BDD is that it is not a new discipline or a radical change from earlier initiatives. It begins from the observation that test-driven design deals mostly with behavior and only in a small portion with unit tests. It extends the metaphor from development to engage the sponsor and (as I argue below) the market as well.

One of the things I find most compelling about the BDD approach is its emphasis on what sales people in the SPIN methodology have called “common language” and the domain-driven design people have called “ubiquitous language.” The notion is that all stakeholders in a project — including importantly the market, users and sponsors — need to have a common vocabulary that is simple, accurate, accessible, descriptive and consistent. In short, if such a language can be defined and used assiduiously, it becomes compelling and memorable. From the standpoint of development, this leads to consistency and clear communications, with the real side benefit of being more productive. From the standpoint of use and acceptance (“sales”), clear language leads to broader and quicker adoption.

Mindset matters. The language we use in our actual code, the language we use to describe our projects internally, the language we use to communicate the wonderful stuff we have created to the outside world, all of this matters. (Three cheers for dynamic languages and domain-specific languages – DSLs.) In fact, it matters so much, that if we are not taking the market’s viewpoint about what and how to explain this stuff we are likely producing crap that no one is interested in.

We all reflect the tools and the terminology that we use to work our way in the world. Development, testing (behaviorial design), and programming languages should all be in sync with our users’ end goals. What is wrong with users being able to read our code and understand what it is intending to do?

The BDD Web site does not yet offer any “cookbooks” for how such language is actually developed nor what specific steps need to be followed. (All practitioners would agree this is a hard process that requires focused attention.) But I think the protagonists are on to something very meaningful and real here.

Dave has also offered a nice PDF overview and there is also some great Ruby documentation associated with the rSpec Ruby gem.

Modular code development through agile dynamic languages, well-tested, and designed for clarity and purpose with all stakeholders is good code. I encourage the community to pay close attention and to get involved with BDD.

“Gort! Klaatu barada nikto!”

Jewels & Doubloons An AI3 Jewels & Doubloon Winner