BrightPlanet has announced a major upgrade to its Deep Query Manager knowledge worker document platform. According to its press release, the new version achieves extreme scalability and broad internationalization and file format support, among other enhancements. The DQM has added the ability to harvest and process up to 140 different foreign languages in more than 370 file formats plus new content export and system administration features. The company also claims the new distributed architecture allows scalability into hundreds or thousands of users across multiple machines with the ability to handle incremental growth and expansions.
According to the company:
The Deep Query Manager is a content discovery, harvesting, management and analysis platform used by knowledge workers to collaborate across the enterprise. It can access any document content — inside or outside the enterprise — with strengths in deep content harvesting from more than 70,000 unique searchable databases and automated techniques for the analyst to add new ones at will. The DQM’s differencing engine supports monitoring and tracking, among the product’s other powerful project management, data mining, reporting and analysis capabilities.
On Aug. 22 BEA announced it was acquiring Plumtree Software for $200 million. By any stretch, this is a fire sale price with Plumtree’s 10 years of operating history, 700 customers (including some large notables such as the US Army, US Navy, Airbus, Ford, Proctor & Gamble, Swiss Re), and 21 million reported users. In addition, Plumtree has failrly significant intellectual property with .NET and J2EE implementations. It also has $70 million in cash, lowering the acquisition cost still further.
According to IDC, Plumtree was #5 as a general portal vendor, behind IBM and BEA itself, among others. Nonetheless, this acquisition appears to be the end of the independent general portal vendor. Corechange (acquired by Open Text for $4.2 million in 2003) and Epicentric (acquired by Vignette for $32 million in 2002) were the previous two independent portal vendors.
Is it open source? Is is the failure of the general portal model? Is is ongoing consolidation? Is is another example of BEA stumbling its way? Is it all of these?
Only time will tell. My own suspicion, however, is that the document challenge remains sufficiently broad and interconnected that the general portal is merely a gluing framework, and not the most important piece at that.
Last week I came across a reference from Search Engine Watch – for which I have been a subscriber for many years and have been a speaker at their conferences — that TOTALLY FRIED me. It’s related to a topic near and dear to me, because, I am both the father and the steward. What I am speaking about is the general topic of the “deep Web.” I began a public response to that last week’s posting, but then, after cooling down, simply notified the author, Gary Price, of my attribution concerns. He graciously and subsequently amended his posting with appropriate attribution. Thanks, Gary, for proper and ethical behavior!
With some of the issues handled privately, I decided that discretion was the better part of valor and I would let the topic alone with respect to some of the other parties in the chain of lack of attribution. After all, Gary was merely reporting information from a reporter. The genesis of the issues resided elsewhere.
Then, today, I saw the issue perpetuated still further by the VC backer of Glenbrook Networks, piling onto to the previous egregious oversights. I could sit still no longer.
First, let me say, I am not going to get into the question of “invisible Web” versus “deep Web” (the latter being the term which Thane Paulsen and I coined nearly 5 years ago to reflect dynamic content not accessbile via standard search engine crawlers). Deep Web has become the term of art, much like kleenex, and if you know what the term means then the topic of this post needs no further intro.
However, I’m going to make a few points below about the misappropriation of the term ‘deep Web’ and the technology around it. I believe that some may legitimately say, “Tough luck; it is your responsibility to monitor such things, and if they did not credit or acknowledge your rights, that is your own damn fault.” Actually, I will generally agree with this sentiment.
My real point in this posting, therefore, is not my term versus your term, but the integrity of intellectual property, attribution and “truth” in the dynamic Internet. If I step back from my own circumstance and disappointment, the real implication, I believe, is that future historians will be terribly hard-pressed to discern past truths from Internet content. If we think it is difficult to extract traceable DNA from King Tut today, it will be close to impossible to discern the true genesis, progression, linkages and idea flow based on Internet digital information into the future. But I digress …
Last Week’s Posting
The genesis of this issue began with a posting on Silicon Beat by Matt Marshall, Diving deep into the web: Glenbrook Networks. Marshall is a reporter for the San Jose Mercury News. Much was made of the “deep Web” phenomenon and the fact that Glenbrook Networks now had technology to tap into it. This story was then picked up by the Search Engine Watch blog. SEW is one of the best and most authoritative sources for search engine related information on the Web. The blog author was Gary Price. The SEW blog entry cited two references on deep Web topics, both of which referenced my seminal paper as their own first references. Neither of these press articles mentioned BrightPlanet. I notified Gary Price of what I thought was an oversight of attribution, and he properly and graciously added an addendum to the original piece:
Using this press, Jeff Clavier, one of the VCs backing the vendor, Glenbrook Networks, began flogging the press coverage on his own blog site. There were assertions made in that original piece that deserved countering, but there have been vendors that have come and gone in the past (see below) that have attempted to misappropriate this “space” and its technology and have generally fallen by the wayside or gone out of business. I chose to let the matter go quiet publicly, ground some more enamel off my teeth, and referred the matter to our general counsel for private action.
Today’s Posting
The flogging continued today under a new posting on Jeff Clavier’s site, Glenbrook Networks: Trawling the Deep Web. This new posting extended the misappropriation further, and since part of an ongoing series obviously planned to push the investment, goaded me to finally make a public response. In part, here is some of what that new post said:
Because the Deep Web contains a lot of factual information, it can be seen metaphorically as an ocean with a lot of fish. That is why we call the system that navigates the Deep Web a trawler.
Note that the figures used come directly out of our research, and are frequently used by others without attribution, as is the case here. However, the trawler imagery is especially egregious, since it is a direct rip-off of our original papers!. In fact, here are the two trrawler images from our original Deep Web: Surfacing Hidden Value first published in 2000, the first representing surface content retrieval:

The next image represents deep Web content retrieval:

The post then goes on to overview some “technology” with fancy names that is very straightforward, has been documented extensively before by BrightPlanet, and is covered by existing patents to our company.
Misappropriation is Nothing New
Such misappropriations have happened before. In one instance, now out of business, complete portions of BrightPlanet’s white paper were plagiarized on the home page of a competitor. We have also had competitors name themselves after the deep Web (e.g., Deep Web Technologies), appropriate the name and grab Web addresses (Quigo, with http://www.deepweb.com, now largely abandoned), government agencies make videos (the Deep Web DOE deep Web search engine), national clubs form (Deep Web Club), or competitor push products and technologies citing our findings and insights (e.g., Grokker from Groxis or Connotate), all instances without attribution or mention of BrightPlanet.
Imitation is the sincerest form of flattery and enforcement of intellectual property rights depends on the vigilance of the owner. We understand this, though small company size often means it is difficult to discover and police. Indeed, in the initial naming of the “deep Web” we wanted it to become the term of art. By not keeping it proprietary, it largely has. We have thus welcomed the growth of the concept. However, we do not welcome the blatant infringement on intellectual property and technology by competitors. We particularly expect VC-backed companies to adhere to ethical standards. I admonish Glenbrook Networks and its financial backers to provide attribution where attribution is due. This degree of misappropriation is too great. Shame, shame ….
BTW, for the record, you can see the most recent update of my and BrightPlanet’s deep Web paper and analysis at the University of Michigan’s Journal of Electronic Publishing, July 2001. Of course, there remains the definitive information on this topic in spades at BrightPlanet’s Web site.
Defense via Electrons
Actually, the real sadness here is that perhaps what is ”truth” is only as good as what has been posted last. Post it last, say it loudest, and the whole world only knows what it sees. The Internet certainly poses challenges to past institutions such as peer review or professional publishing that helped to reinforce standards of truth, verification and defensibility. What standards will emerge on the Web to help affirm authoritativeness?
Certainly, one hopes that the community itself, which has shown constantly it can do so, will find and expose lies, deceit, fraud, or other crimes of the information commons. This appears to work well in the political arena, perhaps is working okay in the academic arena, but how well is it working in the general arena of ideas and intellectual property? Unfortunately, as perhaps this example shows, maybe not so great at times. The thing that I fear is that defense can only occur by how many electrons we shower onto the Internet, how broadly we broadcast them, and how frequently we do so. May the electrons be with you ….
Reports began surfacing in recent months about rekindled interest by venture capital firms (VC) in open source software companies. The first wave of VC interest in 1999-2000 or so resulted in $714 million in venture funding.[1] Most of these open source companies were based in one manner or another around the Linux operating system. Of the reported 71 open source companies that received VC financing at that time, most failed ($150 M in VC financing alone for Linuxcare and TurboLinux), though Red Hat among some other notables succeeded quite well.
Matt Asay, the organizer of the Open Source Business Conference (OSBC), among other open source advocacies, was the first to note the renewed interest by VCs in next-generation open source companies. In April of this year, he provided a rough tally of about $150 million in VC funding had come into open source companies in 2004. This story was picked up by Gary Rivlin of the New York Times in late April. Using estimates from the VentureOne database, Gary estimated 20 open source companies had received $149 million in VC funding in 2004. On this Monday Aug. 8 the Wall Street Journal updated a retrieval from the Dow Jones VentureOne database suggesting $290 million was invested by VCs in new open source start-ups in 2004.[2]
New and more mature business models, plus the growing acceptance of open source and the need for related services by business, as others and I have documented elsewhere, are fueling this rekindled interest. In fact, this new interest began more approximately in 2003, though it is accelerating today.
With Matt Asay’s assistance, I have assembled a listing of about 45 firms that have received more than $425 million in VC financing over the past 18-24 months. The trigger point or date appears to be the last financing round into MySQL of $20 million in 2003. Some of these firms, such as JasperSoft, are already in their third round (Series C) of financing.
The table below lists these firms and financing received since 2003. The companies were broadly clustered as either professional services firms (installation, training, support, services, custom programming, or commercial software add-ons), subscription (hosted applications usually provided under per user fees), or dual license where there is a mix of open source and commercial licenses.
| Subscription | Professional | Dual License | |||
| Company | $$ (M) | Company | $$ (M) | Company | $$ (M) |
| JotSpot | $5.2 | 5Bridge | $2.7 | Active Endpoints | $2.0 |
| OpenLogic | $4.0 | Aduva | $7.8 | ActiveGrid | $13.0 |
| Simula Labs | $12.5 | Black Duck | $5.0 | Akibia | $8.0 |
| SpikeSource | $15.0 | Cymphonix | $4.0 | Astaro | $12.9 |
| Emic Networks | $10.0 | Coridan | $2.5 | ||
| Groundwork IT | $11.5 | db4objects | $1.5 | ||
| Jboss | $10.0 | Forum Systems | $30.5 | ||
| Medsphere | $10.0 | Funambol | $5.0 | ||
| Optaros | $7.0 | Gluecode | $5.0 | ||
| Palamida | $5.0 | Green Plum | $20.0 | ||
| Ping Identify | $13.3 | Jabber | $7.2 | ||
| pingtel | $10.0 | JasperSoft | $23.3 | ||
| Rally Software | $4.5 | Klocwork | $24.0 | ||
| Realm Systems | $8.5 | Laszlo Systems | $18.3 | ||
| Social Text | $0.5 | LignUp | $5.9 | ||
| SourceLabs | $3.5 | MySQL | $19.5 | ||
| Transitive | $24.5 | Scalix | $19.2 | ||
| Univa | $1.0 | Six Apart | $13.0 | ||
| Xen Source | $6.0 | SugarCRM | $7.8 | ||
| Zend Technologies | $6.0 | ||||
| SUB-CATEGORY | $36.7 | $150.8 | $238.5 | ||
| TOTAL | $426.0 | ||||
This information likely has omissions and other errors. Data has been collected from standard venture databases, plus news releases and open reporting. Corrections and updates are welcomed. Though Matt’s assistance is greatly appreciated, any errors are my own.
Dual licensing opportunities have received the largest share of the funding, though recent trends have tended to support the professional services and subscription models. Very few of the most recent wave of financings are a straight Linux play, and then mostly only for large clustered applications. Services around certification and interoperability have been especially attractive to the VC community.
Though there is always a high failure rate for VC-backed software companies, the more mature and sophisticated business models surrounding the new crop of open source start-ups suggests some cause for optimism. Clearly, both the market and the vendor community are beginning to discover new roles and new needs surrounding open source use in the enterprise. Open-source based companies appear to be moving into the mainstream from the standpoint of venture capitalists.
[2] Robert A. Guth and Don Clark, “Linux Feels Growing Pains as Users Demand More Features,” Wall Street Journal, p. B1, August 8, 2005.