Posted:June 3, 2007

To All,

You may encounter poor site performance and other weird behavior for the next day or so. I’m upgrading the site to WordPress 2.2, as well as adding caching to improve site load times and other changes. Plug-ins are breaking left and right, and the going is pretty rough.

Pardon the interruption, and thanks for your patience!

Posted by AI3's author, Mike Bergman Posted on June 3, 2007 at 12:23 pm in Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/378/pardon-the-interruption/
The URI to trackback this post is: http://www.mkbergman.com/378/pardon-the-interruption/trackback/
Posted:April 16, 2007

I apologize for some system problems from my virtual dedicated server provider over the past 2-1/2 to 3 hrs. They now appear to have been resolved; I will monitor closely.

Posted by AI3's author, Mike Bergman Posted on April 16, 2007 at 9:55 am in Site-related | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/360/system-problems/
The URI to trackback this post is: http://www.mkbergman.com/360/system-problems/trackback/
Posted:April 5, 2007

Picture of fresh-cut rose from alibaba.comFor some reason I have a really hard time starting a big, new initiative without having a comfortable, identifiable name for “it.” I need a name to give my efforts a handle; to mentally visualize the “whatever” it is I am working on and losing sleep over. I can’t count the many times I have spent (wasted?) hours trying to come up with the right name.

But perhaps needing to have a name is not all that ridiculous. Most of us name our children close on to birth, if not even in advance. We are not comfortable thinking or referring to our new precious addition solely as “you” or “she” or “he” or “it” or, god forbid, the “whatever,” for any prolonged period of time. Such is also the case with projects or incipient products.

Another thing I’ve never quite understood is how big companies can refer to years-long major software efforts with internal code names during development like ‘Malta’ or ‘Jiminy’ (or ‘whatever’), when they know they will eventually release the product with different branding. I mean, after years of familiarity, isn’t Jimmy always Jimmy, and not even any longer Jiminy? And if those who know him best as Jimmy think of him such, how can they ever identify or relate to its new product persona as ‘Schlitz’? While customers may first be introduced to Schlitz, isn’t there a disconnect when it comes time (eventually, ultimately) for the parents (that is, the original creators) to be asked about Jimmy?

I myself have never worked in such large shops. Maybe the developers that labored on Jimmy for most of their aware working lives don’t really mind when they are told they have really nurtured ‘Schlitz, the SuperContent Server.’ After all, Jimmy did sound kind of lame and we all want him to go to graduate school. Schlitz or SuperContent Server does sound more likely to wear a necktie.

For many years I worked in a company that produced “turnkey software systems.” Fine. My only problem with all of that, however, was that it seemed like every time I typed ‘turnkey’ to discuss this system it came out “turkey.” It seemed no matter how aware of this possible finger error I was, that I still initially typed “turkey.” And, while my care (yeah, right!) and awareness led me to think I caught all of these dyslectic errors, I do admit I’m not really sure and it seemed that I could never get all of the sweat off my palms whenever I described the system. Was the name somehow lingering in the background, was it really a turkey, akin to the three-handed handshake of giving a boy child a name like Trevor, Claire, Audrey, or (even) Jiminy?

So, what’s in a name? Well, for me, it is a friendship and a handle. After all, once a big project begins, we’re going to be with Jimmy for quite some time. We might as well like the name we give it, no matter how sweet. ;)

Posted by AI3's author, Mike Bergman Posted on April 5, 2007 at 11:30 am in Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/358/whats-in-a-name/
The URI to trackback this post is: http://www.mkbergman.com/358/whats-in-a-name/trackback/
Posted:March 25, 2007

Like millions of other Americans, I love college basketball and believe the NCAA tournament to be the best of the annual sporting events. This year has been no exception with some of the best play and closest games within my decades-long memories of the event (even though our family alma mater, Duke, lost in the first round). As of the time of this writing, two of the Final Four teams have been decided (Ohio State and UCLA) with the last two slots to be determined later today. Goooood stuff!

Also, like millions of Americans, I also enter various pools vying to pick the various round and final winners over the tournament’s six rounds of play. This collective obsession has caused many to comment on the weeks of lost productivity within the American workforce at this time of year. Bring it on!

I remember fondly some of those prior years when I’ve won some of those office pools and the accompanying few hundred bucks. But, as with so much else, the Internet has also changed how mad this March Madness has become. There are more than 2.9 million entries in ESPN’s $10,000 tournament bracket challenge, with other large competitions occurring for Facebook (~ 1.6 million), CBS Sportsline (~ 1/2 million), Sports Illustrated, and literally dozens of other large online services. (Note these are games of chance and are free to enter; they are different than wagering money bets in informal office betting pools.)

According to R.J. Bell at Pregame.com, some 30 million Americans in total are participating in pools of various kinds, wagering about $2.5 billion, both numbers I can easily believe. Your chances of picking a perfect bracket with all correct winners? How about 9,223,372,036,854,775,808 to 1. Or, as Bell puts it, “if every man, woman, and child on the planet randomly filled out 10 million brackets each, the odds would be LESS than 1% that even one would have a perfect bracket.”

Well, I’ve actually done pretty well on the ESPN pool to date this year, ranking in what at first sounds like an impressive 99.1 percentile as of this writing. But, while scoring in such high percentages on standardized tests is pretty cool, it is humbling in these tournament pools. I only have more than 25,000 entries better than mine and 0% chance of winning anything! (I also assume that my percent ranking will drop further as the final results come in. BTW, though there are tens of thousands better, my bracket is shown below.)

Of course, one shouldn’t enter these bracket games with any expectation other than to increase the enjoyment of watching the actual games and having some fun with statistics. For better odds, play your office pool. Or, better still, try starting up a company that eventually gets venture backing and becomes profitable. At least there, your chances of winning are much improved to say, 1,000 to 1.

Posted by AI3's author, Mike Bergman Posted on March 25, 2007 at 12:30 pm in Site-related | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/352/when-the-99th-percentile-is-not-good-enough/
The URI to trackback this post is: http://www.mkbergman.com/352/when-the-99th-percentile-is-not-good-enough/trackback/
Posted:February 7, 2007

How to Process Your Own Large Libraries into Thumbnails

When I decided to upgrade my Sweet Tools semantic Web and -related tools listing, I wanted to add some images to make the presentation more attractive. It was also becoming the case that many metadata aggregation service providers were adopting image representations for data (see this Dlib article). Since the focus of my listing is software, I either could install all of the programs and take screenshots (not doable given the numbers involved) or adopt what many others have used as a sort of visual index for content: thumbnails, or, as specifically called when applied to Web pages, thumbshots.

Quick Review of Alternatives

Unless you get all of your Web content via feeds or have been living in a cave, you may have recently contracted a form of popup vertigo. Since its introduction just a few months back, the Snap Preview Anywhere thumbnail popup has become the eggplant that eats Chicago, with more than a half million sites now reported to be using the service. Since I don’t want this service myself for my blog (see below) and I therefore did not want to go through the effort of signing up for SPA nor restricting its use to just this posting (even though the signup appears clean and straightforward), I reproduce below what one of these Snap link-over popups looks like:

The sheer ubiquity of these popup thumbnails is creating its own backlash (check out this sample rant from UNEASYsilence and its comments) and, early promoters, such as TechCrunch, have now gone to the use of a clickable icon [ ] for a preview, rather than automatically popping up the image from the link hover.

Not only had the novelty of these popups worn off for me, but my actual desired use for Sweet Tools was to present a gallery of images for multiple results simultaneously. So, besides its other issues, the Snap service was not suitable for my purpose.

I had earlier used a Firefox add-on called BetterSearch that places thumbnails on results pages when doing searches with Google (including international versions), Amazon, MSN Search, Yahoo!, A9, Answers.com, AllTheWeb, Dogpile.com, del.icio.us and Simpy.com. But, like the Snap service, I personally found this service to be distracting. I also don’t like the fact that my use was potentially being logged and promo messages were inserted on each screen. (There is another Firefox browser extension called GooglePreview that appears less intrusive, but I have not tried it.) As it turns out, both of these services themselves piggyback on a free (for some uses) thumbnail acquisition and server service from Thumbshots.org.

Since my interest in thumbnails was limited and focused to a bounded roster of sites (not the dynamic results from a search query), I decided to cut out the middleman and try the Thumbshots.org source directly myself. However, my candidate sites are mostly obscure academic ones or semantic Web ones not generally in the top rankings, meaning that most of the Sweet Tools Web sites unfortunately had no thumbnails on Thumbshots.org.

Of course, throughout these investigations, I had always had the option of taking physical screen captures myself and converting them manually to thumbnails. This is a very straightforward process with standard graphics packages; I had done so often for other purposes using my standard Paint Shop Pro software. But with the number of the Sweet Tools growing into the hundreds, such a manual approach clearly wouldn’t scale.

Knowing there are literally hundreds of cheap or free graphics and image manipulation programs out there, I thus set out to see if I could find a utility that would provide most, if not all, of the automation required.

My Sweet Tools records don’t change frequently, so I could accept a batch mode approach. I wanted to also size the thumbnails to whatever displayed best in my Exhibit presentation. As well, if I was going to adopt a new utility, I decided I might as well seek other screen capture and display flexibilities for other purposes. I also importantly needed the individual file names created to be unique and readable (not just opaque IDs). Finally, like any tool I ultimately adopt, I wanted quality output and professional design.

Off and on I reviewed options and packages, mostly getting disgusted with the low quality of the dross that mostly exists out there, and appalled at the difficulty in using standard search services to find such candidates. (There truly is becoming whole categories of content such as products of all types, reviews, real data, market info and statistics, that are becoming nearly impossible to effectively find on the Web with current search engines; but those are topics for another day.)

Nonetheless, after much looking and trial runs of perhaps a dozen packages, I finally stumbled across a real gem, WebShot. (Reasons this product was difficult to find included its relatively recent vintage, apparent absence of any promotion, and the mismatch between the product name and Web site name.)

The WebShot Utility

WebShot is a program that allows you to take screenshots and thumbnails of web pages or whole websites. I find its GUI easy to use, but it also comes with a command line interface for advanced users or for high-volume services. WebShot can produce images in the JPG, GIF, PNG, or BMP formats. It was developed in C by Nathan Moinvaziri.

The program is free for use on Windows XP, though PayPal donations are encouraged. Nominal charges are applied to other Windows versions and use the command line. Linux is not supported and Internet Explorer must be installed.

The graphical UI on Windows XP has a standard tabbed design. Single thumbnails or ones in batch driven from a text file may be used. Output files can be flexibly sized via the above formats. The screen capture itself can be based on mandatory or max and min browser display parameters. There are a variety of file naming parameters and system settings allow WebShot to work in Web-friendly ways. Here’s an example of the Image tab for the GUI:

The command-liine version accepts about 20 different parameters.

Depending on settings, you can get a large variety of outputs. The long banner image to the left, for example, is a “complete” Web page dump of my Web site at the time of this posting, with about 8 consecutive posts shown (160 x ~2300). The system automatically stitches together the multiple long page screenshots, with the resolution in this case being set by the input width parameter of 160 pixels.

Another option is this sample “cropped” one (440 x 257) where I’m actually cutting the standard screen display to about 50% of its normal vertical (height) dimension:

And, then, the next example shows what I have chosen as my “standard” thumbnail size (160 x 120) (I added the image borders, not the program):

In batch mode, I set the destination parameter such that I got both a logical domain portion in the file name (%d) and a hashed portion (%m) since there were a few occasions of multiple, but different Web pages, from the same host domain.

As noted, download re-tries, delays and timeouts are all settable to be a good Web citizen while getting acceptable results. With more-or-less standard settings, I was able to complete the 400 thumbnail downloads (without error, I should mention) in just a few minutes for the Sweet Tools dataset.

How I Do Bulk Thumbnails for Sweet Tools

Your use will obviously vary, but I kept notes for myself so that I could easily repeat or update this batch process (in fact, I have done so already a couple of times with the incremental updates to Sweet Tools). This general work flow is:

  1. Create a text file with host Web site URLs in spreadsheet order
  2. Run WebShot with these general settings:
    • destination switches of %d%m (core domain, plus hash)
    • image at 160w x 120h (my standard; could be anything as long as proper aspect maintained)
    • use of Multiple tab, with a new destination directory for each incremental update
    • browser setting at 1024 x 768 required (most common aspect today); min of 800 x 600; highest quality image
  3. At completion, go to a command window and write-out image file names (images complete in the same order as submitted). (In Windows, this is the dir/o:d > listing.txt command.) Then, copy the file names in the resulting text file back into the spreadsheet for the record < --> image correspondence
  4. Upload to the appropriate WordPress image directory.

Some Other Tips

Like many such tools, there is insufficient documentation for the WebShot package. But, with some experimentation, it is in fact quite easy to accomplish a number of management or display options. Some of the ones I discovered are:

  • If harvesting multiple individual Web pages from the same domain, use the domain (%d) and hash (%m) options noted above
  • For complete capture of long Web pages (such as the image of my own Web site to the left), first decide on a desired resolution set via ‘width’ on the Image tab, leave height blank, and leave the browser settings open
  • For partial screen captures without distortion. set the image dimensions to the desired final size with height the desired partial percentage value, then adjust the browser dimensions to equal the image aspect.
Jewels & Doubloons An AI3 Jewels & Doubloon Winner