Posted:November 23, 2005

More Semantic Web in the Enterprise

In earlier posts I have put forward a vision for the semantic Web in the enterprise that has an extensible database supporting semi-structured data at its core with XML mediating multiple ingest feeds, interaction with analytic tools, and sending results to visualization and reporting tools.

This is well and good as far as it goes. However, inevitably, whenever more than one tool or semi-structured dataset is added to a system, it brings with it a different “view” of the world. Formalized and standardized protocols and languages are needed to both: 1) capture these disparate “views” and 2) provide facilities to map them to resolve data and schema federation heterogeneities. These are the roles of RDF and OWL.

Fortunately, there is a very active community with tools and insights for working in RDF and OWL. Stanford and UMBC are perhaps the two leading centers of academic excellence.

If you are not generally familiar with this stuff, I recommend you begin with the recent “Order from Chaos” from Natalya Noy of the Protégé group at Stanford Medical. This piece describes issues like trust, etc., that are likely not as relevant to application of the semantic Web to enterprise intranets as they are to the cowboy nature of the broader Internet. However, much else of this article is of general use to the architect considering enterprise applications.

To keep things simple and to promote interoperability, a critical aspect of any enterprise semantic Web implementation will be providing the “data API” (including extensible XML, and RDF and OWL) standards that govern the rules of how to play in the sandbox. Time spent defining these rules of engagement will pay off in spades in relation to any other appproach for multiple ingest, multiple analytic tools and multiple audiences, reports and collaboration.

Another advantage of this approach is the existence of many open source tools for managing such schema (e.g., Protégé) and visualization (literally dozens), among thousands of ontologies and other intellectual property.

Posted:November 15, 2005

Why Are $800 Billion in Document Assets Wasted Annually? IV. The Problem is Too Close for Focus

Today, the value of the information contained within documents created each year in the United States represents about a third of total gross domestic product, or an amount of about $3.3 trillion.[1] Moreover, about $800 billion of these expenditures are wasted and are readily recoverable by businesses, but are not. Up to 80% of all corporate information is contained within documents. Perhaps up to 35% of all company employees in the U.S. can be classified as knowledge workers using and relying on documents. So, given these factors, how could such large potential cost savings from better document use be overlooked?

Previous installments in this series have looked at issues of private v. public information, barriers to collaboration, and solutions as being too expensive as possible reasons for why these potential savings are not realized. This fourth installment looks at a fourth reason; namely, what might be called issues of attention, perception or psychology. Interesting observations in this area come from disciplines as diverse as sales, behaviorial psychology, economics and operations research.

The SPIN Rationale

One explanation for this lack of attention can be described by the fact that document problems are still in the area of implicit needs as opposed to explicit needs. In other words, the perception of the problem is still situational but has not yet become concrete in terms of bottom-line impacts.

In Neil Rackham’s SPIN sales terminology (Situation → Problems → Implications → Needs/pay-off),[2] the enterprise document market is still at a “situational” level of understanding. Decisions to buy or implement solutions are largely strategic and limited to early adopters that are the visionaries in their market segments. The inability to express and quantify the implications of not realizing the value of document assets means that ROI analysis can not justify a deployment and market growth can not cross the chasm.

The situation begins with the inability to quantify the importance of both internal and external document assets to all aspects of the enterprise’s bottom line. Early adopters of enterprise content software typically capture less than 1% of valuable internal documents available; large enterprises are witnessing the proliferation of internal and external Web sites, sometimes exceeding thousands; use of external content is presently limited to Internet search engines, producing non-persistent results and no capture of the investment in discovery or results; and “deep” content in searchable databases, which is common to large organizations and represents 90% of external Internet content, is completely untapped. Indeed, the issue of poor document use in an organizaation can be seen in terms of the figure below:

The diagram indicates that these root conditions or situations cause problems in low quality of decisions or low staff productivity. For examples, documents or proposals get duplicated without knowledge of prior effort that could be leveraged; opportunities are missed; or outdated or incomplete information is applied to various tasks. These root problems can impact virtually all aspects of the organization’s operations: sales are lost; competitors are overlooked; compliance requirements are missed. These problems can lead to significant bottom-line implications from revenue and market share, to reputation and valuation and even indeed survival.

Thus, in the view of the SPIN model, the lack of attention to the issue of document assets can, in part, be ascribed to the sales or investigatory process. Specific questions have not been posed that move the decision maker from a position of situational awareness to one of explicit bottom-line implications.

There is undoubtedly truth to this observation. Sales of large document solutions to enterprises require a consultative sales approach and significant education of the market is required. As a first-order circumstance, this implies long sales leadtimes and the dreaded “educating the market” that most VCs try to avoid.

But there are even larger factors at play than a lack of explicitness regarding document assets.

The Ubiquitous and Obvious Are Often Overlooked

Put your index finger one inch from your nose. That is how close — and unfocused — document importance is to an organization. Documents are the salient reality of a knowledge economy, but like your finger, documents are often too close, ubiquitous and commonplace to appreciate.

The dismissal of the ubiquitous, common or obvious can be seen in a number of areas. In terms of R&D and science, this issue has been termed “mundane science” wherein most academic research topics exclude many of the issues that affect the largest number of people or have the most commonality. [3] In organizational and systems research, such issues have also been the focus of better, more rigorous problem identificaton and analysis techniques such as the “rational model” or the “theory of constraints” (TOC).[4]

Compounding the issue of the overlooked obvious is the lack of a quantified understanding of the problem. There is an old Chinese saying that roughly translated is “what cannot be measured, cannot be improved.” Many corporate executives surely believe this to be the case for document creation and productivity.

More Specifically: Bounded Awareness

Chugh and Bazerman have recently coined a term “bounded awareness” for the phenomenon of missing easily observed and relevant data.[5] As they explain:

“Bounded awareness is a phenomenon that encompasses a variety of psychological processes, all of which lead to the same error: a failure to see, seek, use, or share important and relevant information that is easily seen, sought, used, or shared.”

The authors note the experiments from Simons[6] that extend Neisser’s 1979 video in which a person in a gorilla costume walks through a basketball game, thumping his chest, and is clearly and comically visible for more than five seconds, but is not generally recalled by observers without prompting.

Chugh and Bazerman classify a number of these phenomena, with two most applicable to the document assets problem:

Inattentional blindness — direct information when attention is drawn or focused elsewhere
System neglect — this phenomenom is the tendency to undervalue a broader, pivotal factor to subsidiary ones, as in for example the effect of campaign finance-reform on specific political issues. In the document assets case, the general role of document access and management is neglected as a system over more readily understood specific issues such as search or spell checking. In other words, people tend to value issues that are more clearly seen as end states or outcomes.

Note the relation of these studies by behaviorial psychologists to the SPIN terminology of the sales executive. Clearly, perceptual studies by scientists will lead to better understandings of market outreach.

Perceptions of Intractability?

An earlier installment in this series noted the high cost of enterprise content solutions, more generally linked to software that performed poorly and did not scale. In computer science, intractable problems are those which take too long to execute, the problem may not be computable, or we may not know how to solve the problem (e.g., problems in artificial intelligence). Tractable problems can run in a reasonable amount of time for even very large amounts of input data. Intractable problems require huge amounts of time for even modest input sizes.[7]

At low scales, the efficiency of various computer algorithms is not terrible important because multiple methods can produce acceptable performance times. But at large scales whether a problem is tractable or not is not fixed: it depends critically on the efficiency of the algorithm applied to the problem. Let’s take for example the issue of searching text items:

Take n to represent the number of keys in a list, and let O represent the order of the number of comparison operations required to find an entry. For a small number of n items, the algorithm used is unimportant, and even a slow sequential search will work well. Sequentially searching the list until the desired match is found is O (n), or linear time. If there are 1000 items in a list, and there is an equal probability of searching for any item in the list, on average it will require n/2 = 500 comparisons to find the item (assuming all items already are on the list). A binary search works by dividing the list in half after each comparison. This is logarithmic time O (log n ), much faster than linear time. For a 1000 item example it works out to about 10 comparisons. An O (1) operation, such as hashing, is applicable when some algorithm computes the item location and then retrieves it. On large lists it will significantly outperform a binary search, because it makes no comparisons. (It is a little more complicated than that because there may be collisions for the same address computed for different keys.) However, if the location is already known, even the hashing computation is unnecessary. This is what happens with direct addressing (the technique used by BrightPlanet), which will obtain the desired item in a single step.[8]

Poorly performing algorithms at large scales can require processing times for updates that take longer than the period between updates, and, thus, at least for that algorithm, are intractable at those scales.

This is one of the key and perceived problems to most document processing software at large scales — their computational inefficiencies do not allow updates to occur for the meaningful document volumes important to larger organizations. Whether the specific reasons are known by company managers and IT personnel, it is a widespread understanding — correct for most vendors — within the marketplace.

Since BrightPlanet‘s core index work engine is more efficient than other approaches (due, in part, to better sorting mechanisms as noted above, but also due to other factors), current perceived limits of intractability may not apply. However, these advances are still not generally known. Until broader understanding for how more contemporary approaches to document use and management are gained, perceptions of past poor performance will limit market acceptance.

Educating the Market

Thus, factors of awareness, attention and perception are also limiting the embrace of meaningful approaches to improve document access and use and achieve meaningful cost savings. These challenges may mean that the document intelligence and document information automation markets still fall within the category of needing to “educate the market.” Since this category is generally dreaded by most venture capitalists (VCs), that perception is also acting to limit the achievable improvements and cost savings available to this market.

But there is perhaps a very important broader question that remains open here: educating the market through the individual customer (viz. the SPIN sale) vs. educating the market through breaking market-wide bounded awareness. In fact the latter, much as what occurred with data warehousing 15-20 years ago, can create entirely new markets. This latter category should perhaps be of much greater VC interest with its accompanying potential for first-mover advantage.

[1] Michael K. Bergman, “Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents,” BrightPlanet Corporation White Paper, July 2005, 42 pp. All 80 references, 150 citations and calculations are fully documented in the full paper. See http://www.brightplanet.com/technology/whitepapers.asp.

[2] Neil Rackham, SPIN Selling, McGraw Hill, 197 pp., 1988.

[3] Daniel M. Kammen and Michael R. Dove, “The Virtues of Mundane Science,” Environment, Vol. 39 No. 6, July/August 1997. See http://ist-socrates.berkeley.edu/~rael/Mundane_Science.pdf

[4] Victoria Mabin, “Goldratt’s ‘Theory of Constraints’ Thinking Processes: A Systems Methodology linking Soft with Hard,” The 17th International Conference of The System Dynamics Society and the 5th Australian & New Zealand Systems Conference, July 20 – 23, 1999, Wellington, New Zealand, 12 pp. See http://www.systemdynamics.org/conf1999/PAPERS/PARA104.PDF

[5] Dolly Chugh and Max Bazerman, “Bounded Awareness: What You Fail to See Can Hurt You,” Harvard Business School Working Paper #05-037, 35 pp., August 25, 2005 revision. See http://www.people.hbs.edu/mbazerman/Papers/05-037.pdf

[6] See the various demos available at http://viscog.beckman.uiuc.edu/djs_lab/demos.html.

[7] Professor Constance Royden, College of the Holy Cross, course uutline for CSCI 150, Tractable and Intractable Problems, Spring 2003. See http://mathcs.holycross.edu/~croyden/csci150spr03/notes/lec33_tractable.html

[8] R. L. Kruse, Data Structures and Program Design, Prentice Hall Press, Englewood Cliffs, New Jersey, 1987.

NOTE: This posting is part of a series looking at why document assets are so poorly utilized within enterprises. The magnitude of this problem was first documented in a BrightPlanet white paper by the author titled, Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents. An open question in that paper was why more than $800 billion per year in the U.S. alone is wasted and available for improvements, but enterprise expenditures to address this problem remain comparatively small and with flat growth in comparison to the rate of document production. This series is investigating the various technology, people, and process reasons for the lack of attention to this problem.

Posted:November 13, 2005

The Myth of Superman

Venture capitalists, when the straw gets short or the proverbial hits the fan, are famous for calling for new managerial blood. After all, we did our due dilgence on this company, it is not profitable — perhaps even bleeding excessively — so what went wrong?

Actually, to be fair, perhaps the founding entrepreneurs are having the same thoughts. We wrote the business plan, we beat the odds to even get angel and (“Isn’t that special,” says the Church Lady) VC financing, thus we have had affirmation about our markets, technology, team and other aspects from the “smart” money, so why is it not working? Why aren’t we profitable? What went wrong?

Getting external financing from professional VCs is non-trivial and itself is putting a company in the “less-than-0.1% club.” And, of course, getting any financing is hard to do, be it an angel, your own checking account, your spouse or your friends and family. Forsaking Janie’s college education for a chance on a start-up requires tremendous belief and suspension of dis-belief for any early investor.

But, the initial financing hurdle has been met. Some time has passed. Neither profits nor the plan are fulfilling themselves. What do we — obviously the smart ones since we put up the money or had the ideas — do about our belief while return is not being fulfilled?

Time for Superman?

In nearly two decades of mentoring various ventures I’ve observed one possible reaction is to look for Superman. If only the company had the right missing individual in a CEO or senior manager position, then many of the current problems would go away. But as my Mom used to say, nothing is easy. Easy answers can lead to uneasy situations. And, I think, the myth of Superman more often than not fits into such a facile error.

When things go wrong (or, at least, are not going as desired), things are tough for all of those with a stake in success. Is the source of discomfort that money was put up and is now at risk of loss? Is it that individuals were supported but are not yet achieving success? Is it ego that due diligence was made but success is looking tenuous? And, if things are going wrong or progress is disappointing, what is the root cause? Is the market needful or ready? Is the technology or product responsive or ready? Is the business model correct? Are other pieces such as partners, advisors, infrastructure, collateral, or whatever in place?

New people do not need to be hired to pose these questions nor to spend purposeful and thoughtful time addressing them. And, even if new people and skills are deemed critical to supplement the skills presently available, setting expectations that are too high or too superhuman are likely to not be fulfilled, take to long to do so if even achievable, and cost too much in focus and precious resources.

The Kryptonite

In fact, pursuing the myth of Superman can actually worsen a current situation for the following reasons:

Supermen Are Rare — there are thousands of new startups formed each year, hundreds of which receive significant venture funding from VCs, angels, or small business R&D efforts or grants. Only a very small percentage achieve high returns and only a small percentage of those can be ascribed to the “superstar” performance of a specific individual. Sure, names are known and the business and trade press love to lionize these individuals. But the statistical occurrence of a clearly superior manager or executive is measured in the tenths of a single percent or less
Supermen Are Not Infallible — even that small minority of individuals that do receive recognition as “superstars” may have achieved that lofty status as much due to luck or circumstance. Serially successful entrepreneurs are rarer still than one-off “superstars.” And, for those few individuals that have shown repeated success, they are more often interested in pursuing their own loves and interests and are not for hire for someone else’s venture
Supermen Are Not Obvious — perhaps because of serendipity and some of the reasons above, “superstars” also defy characterization by sex, background, age, appearance, personality, education or other discernible metric. So, if a Superman is not reliably a Superman in his next engagement, nor if there is a way to reliably identify Supermen-in-waiting, then why is so much time spent on finding the unfindable?
Supermen Are Expensive — both in terms of equity and compensation, any individual brought in as a savior will cost the startup plenty. Resources are always most precious and constrained for startups. Perhaps, if the identification of the “superstar” could be reliably assured, then this expense could be justified. But since that reliability is not there, the hiring may only drain limited cash and resources and create resistance by the key founders who don’t receive the Superman rents
They Can Screw Up Dynamics — by the time the Superman option is considered the company has alreadly likely achieved some success, visibility and funding. Founders and key employees, not to mention early financial backers, have worked hard to bring things to their current point. Raising the Superman spectre not only affects the morale of existing players and sends a negative message but, if an individual is then subsequently hired, existing dynamics can be challenged and irreparable harmed. Of course, outside money that controls such decisions may have reached the conclusion that dynamics were already broken and needed fixing, but the likelihood of a new player augmenting and bolstering existing positive interactions is less than the opposite prospect
Finding Superman Diverts Attention — a Superman initiative poses a huge opportunity cost to the limited bandwidth of existing executive and director attention within a startup. Defining the qualifications, collecting the names, conducting the recruiting, interviewing the prospects, and then deciding to offer and negotiating the compensation package are extremely time consuming activities. All time spent on this stuff is time not spent on building the company, its products and pursuing sales
In Fact, Superman May Not Exist — this is actually the most interesting observation. It is seductive and a statistical error to look at the instance of a managerial or entrepreurial success and conclude it is repeatable. After all, haven’t some individuals beat the track, the stock market, or the start-up venture odds? Let me first say there are perhaps the spectacular individuals — say a Warren Buffet — who consistently outperforms the normal. But Howard Hughes did the same and still ended up with a Spruce Goose that barely flew and fingernails inches long, and there are compelling few numbers of billionaires for the millions of existing businesses. At the statistically low numbers here, we can safely say that for practical purposes Superman does not exist.

Change the Perspective, Change the Mindset

Raising the Superman option only occurs when a company is in trouble and needs help. The key individuals associated with a startup — Board and management alike — are better advised to concentrate on business model, strategy, execution and maintaining focus than searching for the impossible or (at least) statistically highly unlikely.

When problems arise, look to problem identification and problem-solving approaches before copping out with easy Superman answers.

Efforts should be focused; business models should be clear; execution should be emphasized; resources should be zealously protected and stewarded; questions should be constantly asked; and team efforts and building should be fostered. Patience is not a four-letter word, especially if progress is steady and being accomplished in a cost-effective manner.

Nurture and training of initial founders and staff is important. Financing would not have been initially achieved without some belief in these individuals. Not now actually performing to plan is, in fact, an expected outcome, not one warranting excoriation.

These positive mindsets are hard to keep when the venture’s performance or sales is not meeting plan. And, of course, some of these instances will warrant abandonment of the venture rather throwing more good after bad. There are no guarantees. And mistakes get made.

But make the choice. Commit to the venture and improving its prospects through hard work and engagement, or walk away. Superman is a false middle ground.

Don’t Get Me Wrong

Please, don’t get me wrong. Without a doubt some people are better managers, some are some are better salespeople, some are better intellects, some are better strategists, some are better marketers and some are better networkers than others. Anyone who is superior, committed and a believer in the cause of your venture will likely bring some value. And there are indeed rare individuals and rare circumstances when hiring the right new executive could and should make all of the difference toward success.

The more important point, however, is that startups are more often than not constrained in their team and resources. Be smart about where to spend limited time and focus. Hiring good and even great people is a good focus. Searching for Superman is not. Rather than the impossible combination in a single person, look to a collective team that embodies the needed and valuable trailts deemed important for your venture’s success.

Posted:November 6, 2005

Semantic Web on the Desktop

Nova Spivack has announced the imminent release of a semantic Web tool for the desktop, Open Iris:

Following in the footsteps of Douglas Engelbart’s pioneering work, SRI has announced the upcoming open-source (LGPL) release of Open IRIS — an experimental Semantic Web personal information manager that runs on the desktop. IRIS was developed for the DARPA CALO project and makes use of code libraries and ontology components developed at SRI, and my own startup, Radar Networks, as well as other participating research organizations.

IRIS is designed to help users make better sense of their information. It can run on it’s own, or can be connected to the CALO system which provides advanced machine learning capabilities to it. I am very proud to see IRIS go open source — I think it has potential to become a major platform for learning applications on the desktop.

IRIS is still in its early stages of evolution, and much work will be done this year to add further functionality, improve the GUI and make IRIS even more user-friendly. But already it is perhaps the most sophisticated and comprehensive semantic desktop PIM ever created. If you would like to read more about IRIS, this paper provides a good overview.

Posted:November 4, 2005

The Sick Search Market: Autonomy Buys Verity

For some years now the enterprise search market has been sick and in search of relief. In frankly a shocking development, Autonomy announced today it was acquiring the veritable Verity search company for $500 million. See this Bloomberg story ….

This acquisition amount is itself a signal of how poorly this market has been going. Enterprise search has averaged about $500 million annually in revenues over the past few years with little or slow growth. Autonomy has been the small cousin from England, but has now gobbled up the old dinosaur in Verity.

BTW, if someone mentions synergies or complementarity to you about this acquistion, don’t believe it. This is totally an indication of how poor the entire enterprise search market has become. With unstructured data representing 60-80% of all data available to an enterprise, these valuations are indicatiive of how piss-poor the enterprise document technology is as present.

Bon voyage! I expect all of these dinosaurs to find their final resting place in the sun. RIP

Main Links

Search

Author: Mike Bergman