Posted:October 24, 2005

I’ve just finished reading a fascinating 228 pp transcript on the topic of peer production and open data architectures. This discussion, the first of the so-called Union Square Sessions, involved more than 40 prominent Web and other thinkers with a heavy sprinkling of VCs.

That being said, I was disappointed that neither interoperability nor extensibility directly entered into any of the discussions. I suspect this may be due to conjoining the important singular topic of open data architectures with the lens of peer production or social networks. For me, the quote closest to my interests among this disparate group was from Dick Costolo, who stated “the bottom line implication is that an open data architecture will be one that is purely API based and not destination based.”

Nonetheless, this is an interesting start. I’d like to humbly suggest open and extensible data architectures (including, importantly, database engines in addition to extensible exchange formats such as XML) for a future discussion topic.

Here is the link to the Union Square Sessions Transcript.

Posted:October 13, 2005

I just came across a VC blog pondering the value to a start-up of operating in "Stealth Mode" or not.  I’ve amusingly come to the conclusion that all of this — particularly the "stealth" giveaway — is so much marketing hype.  When a start-up claims they’re coming out of stealth mode, grab your wallet.

The most interesting and telling example I have of this is Rearden Commerce, which was announced in a breathy cover story in InfoWorld in February 2005 about the company and its founder/CEO Patrick Grady.  The company has an obvious "in" with the magazine; in 2001 InfoWorld also carried a similar piece on the predecessor company to Rearden, Talaris Corporaton.

According to a recent Business Week article, Rearden Commerce and its predecessors reaching back to a earlier company called Gazoo founded in 1999 have raised $67 million in venture capital.  While it is laudable the founder has reportedly put his own money into the venture, this venture through its massive funding and high-water mark of 80 employees or so hardly qualifies as "stealth."

As early as 2001 with the same technology and business model, this same firm was pushing the "stealth" moniker.  According to an October 2001 press release:

 "The company, under its stealth name Gazoo, was selected by Red Herring magazine as one of its ‘Ten to Watch’ in 2001."  [emphasis added]

Even today though no longer the active name Talaris Corporation has close to 115,000 citations on Yahoo! Notable VCs such as Charter Ventures, Foundation Capital, JAFCo and Empire Capital have backed it through its multiple incubations.

Holmes Report a marketing company, provides some insight into how the earlier Talaris was spun in 2001:

"The goal of the Talaris launch was to gain mindshare among key business and IT trade press and position Talaris as a ‘different kind of start-up’ with a multi-tiered business model, seasoned executive team and tested product offering."

The Holmes Report documents the analyst firms and leading journals and newspapers to which it made outreach.  Actually, this outreach is pretty impressive.  Good companies do the same all of the time and that is to be lauded.  What is to be questioned, however, is how many "stealths" a cat can have.  Methinks this one is one too many.

"Stealth" thus appears to be code for an existing company of some duration that has had disappointing traction and now has new financing, a new name, new positioning, or all of the above.  So, interested in a start-up that just came out of stealth mode?  Let me humbly suggest standard due diligence.

Posted by AI3's author, Mike Bergman Posted on October 13, 2005 at 9:19 am in Software and Venture Capital | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:October 11, 2005

BrightPlanet has announced a major upgrade to its Deep Query Manager knowledge worker document platform.  According to its press release, the new version achieves extreme scalability and broad internationalization and file format support, among other enhancements.  The DQM has added the ability to harvest and process up to 140 different foreign languages in more than 370 file formats plus new content export and system administration features.  The company also claims the new distributed architecture allows scalability into hundreds or thousands of users across multiple machines with the ability to handle incremental growth and expansions.

According to the company:

The Deep Query Manager is a content discovery, harvesting, management and analysis platform used by knowledge workers to collaborate across the enterprise. It can access any document content — inside or outside the enterprise — with strengths in deep content harvesting from more than 70,000 unique searchable databases and automated techniques for the analyst to add new ones at will. The DQM’s differencing engine supports monitoring and tracking, among the product’s other powerful project management, data mining, reporting and analysis capabilities.

Posted:September 12, 2005

On Aug. 22 BEA announced it was acquiring Plumtree Software for $200 million.  By any stretch, this is a fire sale price with Plumtree’s 10 years of operating history, 700 customers (including some large notables such as the US Army, US Navy, Airbus, Ford, Proctor & Gamble, Swiss Re), and 21 million reported users.  In addition, Plumtree has failrly significant intellectual property with .NET and J2EE implementations.  It also has $70 million in cash, lowering the acquisition cost still further.

According to IDC, Plumtree was #5 as a general portal vendor, behind IBM and BEA itself, among others.  Nonetheless, this acquisition appears to be the end of the independent general portal vendor.  Corechange (acquired by Open Text for $4.2 million in 2003) and Epicentric (acquired by Vignette for $32 million in 2002) were the previous two independent portal vendors.

Is it open source?  Is is the failure of the general portal model?  Is is ongoing consolidation?  Is is another example of BEA stumbling its way?  Is it all of these?

Only time will tell.  My own suspicion, however, is that the document challenge remains sufficiently broad and interconnected that the general portal is merely a gluing framework, and not the most important piece at that. 

Posted:August 22, 2005

Last week I came across a reference from Search  Engine Watch – for which I have been a subscriber for many years and have been a speaker at their conferences — that TOTALLY FRIED me.  It’s related to a topic near and dear to me, because, I am both the father and the steward.  What I am speaking about is the general topic of the “deep Web.”  I began a public response to that last week’s posting, but then, after cooling down, simply notified the author, Gary Price, of my attribution concerns.  He graciously and subsequently amended his posting with appropriate attribution.  Thanks, Gary, for proper and ethical behavior!

With some of the issues handled privately, I decided that discretion was the better part of valor and I would let the topic alone with respect to some of the other parties in the chain of lack of attribution.  After all, Gary was merely reporting information from a reporter.  The genesis of the issues resided elsewhere.

Then, today, I saw the issue perpetuated still further by the VC backer of Glenbrook Networks, piling onto to the previous egregious oversights.  I could sit still no longer.

First, let me say, I am not going to get into the question of “invisible Web” versus “deep Web” (the latter being the term which Thane Paulsen and I coined nearly 5 years ago to reflect dynamic content not accessbile via standard search engine crawlers).  Deep Web has become the term of art, much like kleenex, and if you know what the term means then the topic of this post  needs no further intro.

However, I’m going to make a few points below about the misappropriation of the term ‘deep Web’ and the technology around it.  I believe that some may legitimately say, “Tough luck; it is your responsibility to monitor such things, and if they did not credit or acknowledge your rights, that is your own damn fault.”  Actually, I will generally agree with this sentiment.

My real point in this posting, therefore, is not my term versus your term, but the integrity of intellectual property, attribution and “truth” in the dynamic Internet.  If I step back from my own circumstance and disappointment, the real implication, I believe, is that future historians will be terribly hard-pressed to discern past truths from Internet content.  If we think it is difficult to extract traceable DNA from King Tut today, it will be close to impossible to discern the true genesis, progression, linkages and idea flow based on Internet digital information into the future.  But I digress …

Last Week’s Posting

The genesis of this issue began with a posting on Silicon Beat by Matt Marshall, Diving deep into the web: Glenbrook Networks.  Marshall is a reporter for the San Jose Mercury News.  Much was made of the “deep Web” phenomenon and the fact that Glenbrook Networks now had technology to tap into it.  This story was then picked up by the Search Engine Watch blog.  SEW is one of the best and most authoritative sources for search engine related information on the Web.  The blog author was Gary Price.  The SEW blog entry cited two references on deep Web topics, both of which referenced my seminal paper as their own first references.  Neither of these press articles mentioned BrightPlanet.  I notified Gary Price of what I thought was an oversight of attribution, and he properly and graciously added an addendum to the original piece:

PPS: Other companies doing work in mining and providing access to the deep web include long-time player BrightPlanet and Deep Web Technologies whose technology powers the portal.

Using this press, Jeff Clavier, one of the VCs backing the vendor, Glenbrook Networks, began flogging the press coverage on his own blog site.  There were assertions made in that original piece that deserved countering, but there have been vendors that have come and gone in the past (see below) that have attempted to misappropriate this “space” and its technology and have generally fallen by the wayside or gone out of business.  I chose to let the matter go quiet publicly, ground some more enamel off my teeth, and referred the matter to our general counsel for private action.

Today’s Posting

The flogging continued today under a new posting on Jeff Clavier’s site, Glenbrook Networks: Trawling the Deep Web.  This new posting extended the misappropriation further, and since part of an ongoing series obviously planned to push the investment, goaded me to finally make a public response.  In part, here is some of what that new post said:

The majority of web pages one can access through search engines were collected by crawling the so-called Static or Surface Web. It is a smaller portion of the Internet reportedly containing between 8 and 20 billion pages (Google vs. Yahoo index sizes). Though this number is already very large, the total number of pages available on the Web is estimated to 500 billion pages. This part of the Internet is often referred to as Deep Web, Dynamic Web, or Invisible Web. All these names reflect some of the features of this gigantic source of information – stored deep down in databases, rendered through DHTML, not accessible to standard crawlers. ….

Because the Deep Web contains a lot of factual information, it can be seen metaphorically as an ocean with a lot of fish. That is why we call the system that navigates the Deep Web a trawler.

Note that the figures used come directly out of our research, and are frequently used by others without attribution, as is the case here.  However, the trawler imagery is especially egregious, since it is a direct rip-off of our original papers!.  In fact, here are the two trrawler images from our original Deep Web:  Surfacing Hidden Value first published in 2000, the first representing surface content retrieval:

The next image represents deep Web content retrieval:

The post then goes on to overview some “technology” with fancy names that is very straightforward, has been documented extensively before by BrightPlanet, and is covered by existing patents to our company.

Misappropriation is Nothing New

Such misappropriations have happened before.  In one instance, now out of business, complete portions of BrightPlanet’s white paper were plagiarized on the home page of a competitor.  We have also had competitors name themselves after the deep Web (e.g., Deep Web Technologies), appropriate the name and grab Web addresses (Quigo, with, now largely abandoned), government agencies make videos (the Deep Web DOE deep Web search engine), national clubs form (Deep Web Club),  or competitor push products and technologies citing our findings and insights (e.g.,  Grokker from Groxis or Connotate), all instances without attribution or mention of BrightPlanet.

Imitation is the sincerest form of flattery and enforcement of intellectual property rights depends on the vigilance of the owner.  We understand this, though small company size often means it is difficult to discover and police.  Indeed, in the initial naming of the “deep Web” we wanted it to become the term of art.  By not keeping it proprietary, it largely has.  We have thus welcomed the growth of the concept.  However, we do not welcome the blatant infringement on intellectual property and technology by competitors.  We particularly expect VC-backed companies to adhere to ethical standards.   I admonish Glenbrook Networks and its financial backers to provide attribution where attribution is due.  This degree of misappropriation is too great.  Shame, shame ….

BTW, for the record, you can see the most recent update of my and BrightPlanet’s deep Web paper and analysis at the University of Michigan’s Journal of Electronic Publishing, July 2001.  Of course, there remains the definitive information on this topic in spades at BrightPlanet’s Web site.

Defense via Electrons

Actually, the real sadness here is that perhaps what is ”truth” is only as good as what has been posted last.  Post it last, say it loudest, and the whole world only knows what it sees.  The Internet certainly poses challenges to past institutions such as peer review or professional publishing that helped to reinforce standards of truth, verification and defensibility.  What standards will emerge on the Web to help affirm authoritativeness?

Certainly, one hopes that the community itself, which has shown constantly it can do so, will find and expose lies, deceit,  fraud, or other crimes of the information commons.  This appears to work well in the political arena, perhaps is working okay in the academic arena, but how well is it working in the general arena of ideas and intellectual property?  Unfortunately, as perhaps this example shows, maybe not so great at times.  The thing that I fear is that defense can only occur by how many electrons we shower onto the Internet, how broadly we broadcast them, and how frequently we do so.  May the electrons be with you ….

Posted by AI3's author, Mike Bergman Posted on August 22, 2005 at 4:39 pm in Adaptive Information, Deep Web, Software and Venture Capital | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is: