Posted:June 8, 2006

Problems with Bloglines Search Feed?

Somehow, since Bloglines (via its parent Ask.com) announced its new blog and feed search I have noticed my standard search feeds no longer return as many results. I’ve checked via searches on Google’s Blogsearch and Technorati (not to mention Bloglines itself) and see no mention of this problem from others. Has anyone else been experiencing Bloglines search degradation?

I sent a ding to the Bloglines tech support, which was acknowledged, but no real response as yet:

I have noticed that some of my standard ‘Search feeds’ are no longer returning the number of results they did a week or so ago, by massively large amounts. Is this related to the new search function announced this week (which I like)? Or due to some other problem? (BTW, the ‘Search feed’ that most shows this behavior is "semantic web").

If anyone has an insight, I welcome your comments.

Posted:June 6, 2006

Sources and Classification of Semantic Heterogeneities

Download PDF

Semantic mediation — that is, resolving semantic heterogeneities — must address more than 40 discrete categories of potential mismatches from units of measure, terminology, language, and many others. These sources may derive from structure, domain, data or language.

Earlier postings in this recent series traced the progress in climbing the data federation pyramid to today’s current emphasis on the semantic Web. Partially this series is aimed at disabusing the notion that data extensibility can arise simply by using the XML (eXtensible Markup Language) data representation protocol. As Stonebraker and Hellerstein correctly observe:

XML is sometimes marketed as the solution to the semantic heterogeneity problem . . . . Nothing could be further from the truth. Just because two people tag a data element as a salary does not mean that the two data elements are comparable. One could be salary after taxes in French francs including a lunch allowance, while the other could be salary before taxes in US dollars. Furthermore, if you call them “rubber gloves” and I call them “latex hand protectors”, then XML will be useless in deciding that they are the same concept. Hence, the role of XML will be limited to providing the vocabulary in which common schemas can be constructed.[1]

This series also covers the ontologies and the OWL language (written in XML) that now give us the means to understand and process these different domains and “world views” by machine. According to Natalya Noy, one of the principal researchers behind the Protégé development environment for ontologies and knowledge-based systems:

How are ontologies and the Semantic Web different from other forms of structured and semi-structured data, from database schemas to XML? Perhaps one of the main differences lies in their explicit formalization. If we make more of our assumptions explicit and able to be processed by machines, automatically or semi-automatically integrating the data will be easier. Here is another way to look at this: ontology languages have formal semantics, which makes building software agents that process them much easier, in the sense that their behavior is much more predictable (assuming they follow the specified explicit semantics–but at least there is something to follow). [2]

Again, however, simply because OWL (or similar) languages now give us the means to represent an ontology, we still have the vexing challenge of how to resolve the differences between different “world views,” even within the same domain. According to Alon Halevy:

When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies–or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel. [3]

In the sections below, we describe the sources for how this heterogeneity arises and classify the many different types of heterogeneity. I then describe some broad approaches to overcoming these heterogeneities, though a subsequent post looks at that topic in more detail.

Causes and Sources of Semantic Heterogeneity

There are many potential circumstances where semantic heterogeneity may arise (partially from Halevy [3]):

Enterprise information integration
Querying and indexing the deep Web (which is a classic data federation problem in that there are literally tens to hundreds of thousands of separate Web databases) [4]
Merchant catalog mapping
Schema v. data heterogeneity
Schema heterogeneity and semi-structured data.

Naturally, there will always be differences in how differing authors or sponsors create their own particular “world view,” which, if transmitted in XML or expressed through an ontology language such as OWL may also result in differences based on expression or syntax. Indeed, the ease of conveying these schemas as semi-structured XML, RDF or OWL is in and of itself a source of potential expression heterogeneities. There are also other sources in simple schema use and versioning that can create mismatches [3]. Thus, possible drivers in semantic mismatches can occur from world view, perspective, syntax, structure and versioning and timing:

One schema may express a similar “world view” with different syntax, grammar or structure
One schema may be a new version of the other
Two or more schemas may be evolutions of the same original schema
There may be many sources modeling the same aspects of the underlying domain (“horizontal resolution” such as for competing trade associations or standards bodies), or
There may be many sources that cover different domains but overlap at the seams (“vertical resolution” such as between pharmaceuticals and basic medicine).

Regardless, the needs for semantic mediation are manifest, as are the ways in which semantic heterogeneities may arise.

Classification of Semantic Heterogeneities

The first known classification scheme applied to data semantics that I am aware of is from William Kent nearly 20 years ago.[5] (If you know of earlier ones, please send me a note.) Kent’s approach dealt more with structural mapping issues (see below) than differences in meaning, which he pointed to data dictionaries as potentially solving.

The most comprehensive schema I have yet encountered is from Pluempitiwiriyawej and Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources.” [6] They classify heterogeneities into three broad classes:

Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying DTDs. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names.

Domain conflicts arise when the semantic of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the DTDs and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts.

Data conflicts refer to discrepancies among similar or related data values across multiple sources. Data conflicts can only be detected by comparing the underlying DOCs. The class of data conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the element contents and the attribute values.

Moreover, mismatches or conflicts can occur between set elements (a “population” mismatch) or attributes (a “description” mismatch).

The table below builds on Pluempitiwiriyawej and Hammer’s schema by adding the fourth major explicit category of language, leading to about 40 distinct potential sources of semantic heterogeneities:

Class	Category	Subcategory
STRUCTURAL	Naming	Case Sensitivity
		Synonyms
		Acronyms
		Homonyms
	Generalization / Specialization
	Aggregation	Intra-aggregation
		Inter-aggregation
	Internal Path Discrepancy
	Missing Item	Content Discrepancy
		Attribute List Discrepancy
		Missing Attribute
		Missing Content
	Element Ordering
	Constraint Mismatch
	Type Mismatch
DOMAIN	SchematicDiscrepancy	Element-value to Element-label Mapping
		Attribute-value to Element-label Mapping
		Element-value to Attribute-label Mapping
		Attribute-value to Attribute-label Mapping
	Scale or Units
	Precision
	DataRepresentation	Primitive Data Type
		Data Format
DATA	Naming	Case Sensitivity
		Synonyms
		Acronyms
		Homonyms
	ID Mismatch or Missing ID
	Missing Data
	Incorrect Spelling
LANGUAGE	Encoding	Ingest Encoding Mismatch
		Ingest Encoding Lacking
		Query Encoding Mismatch
		Query Encoding Lacking
	Languages	Script Mismatches
		Parsing / Morphological Analysis Errors (many)
		Syntactical Errors (many)
		Semantic Errors (many)

Most of these line items are self-explanatory, but a few may not be:

Homonyms refer to the same name referring to more than one concept, such as Name referring to a person v. Name referring to a book
A generalization/specialization mismatch can occur when single items in one schema are related to multiple items in another schema, or vice versa. For example, one schema may refer to “phone” but the other schema has multiple elements such as “home phone,” “work phone” and “cell phone”
Intra-aggregation mismatches come when the same population is divided differently (Census v. Federal regions for states, or full person names v. first-middle-last, for examples) by schema, whereas inter-aggregation mismatches can come from sums or counts as added values
Internal path discrepancies can arise from different source-target retrieval paths in two different schemas (for example, hierarchical structures where the elements are different levels of remove)
The four sub-types of schematic discrepancy refer to where attribute and element names may be interchanged between schemas
Under languages, encoding mismatches can occur when either the import or export of data to XML assumes the wrong encoding type. While XML is based on Unicode, it is important that source retrievals and issued queries be in the proper encoding of the source. For Web retrievals this is very important, because only about 4% of all documents are in Unicode, and earlier BrightPlanet provided estimates there may be on the order of 25,000 language-encoding pairs presently on the Internet
Even should the correct encoding be detected, there are significant differences in different language sources in parsing (white space, for example), syntax and semantics that can also lead to many error types.

It should be noted that a different take on classifying semantics and integration approaches is taken by Sheth et al.[7] Under their concept, they split semantics into three forms: implicit, formal and powerful. Implicit semantics are what is either largely present or can easily be extracted; formal languages, though relatively scarce, occur in the form of ontologies or other descriptive logics; and powerful (soft) semantics are fuzzy and not limited to rigid set-based assignments. Sheth et al.’s main point is that first-order logic (FOL) or descriptive logic is inadequate alone to properly capture the needed semantics.

From my viewpoint, Pluempitiwiriyawej and Hammer’s [6] classification better lends itself to pragmatic tools and approaches, though the Sheth et al. approach also helps indicate what can be processed in situ from input data v. inferred or probabalistic matches.

Importance of Reference Standards

An attractive and compelling vision — perhaps even a likely one — is that standard reference ontologies become increasingly prevalent as time moves on and semantic mediation is seen as more of a mainstream problem. Certainly, a start on this has been seen with the use of the Dublin Core metadata initiative, and increasingly other associations, organizations, and major buyers are busy developing “standardized” or reference ontologies.[8] Indeed, there are now more than 10,000 ontologies available on the Web.[9] Insofar as these gain acceptance, semantic mediation can become an effort mostly at the periphery and not the core.

But, such is not the case today. Standards only have limited success and in targeted domains where incentives are strong. That acceptance and benefit threshold has yet to be reached on the Web. Until such time, a multiplicity of automated methods, semi-automated methods and gazetteers will all be required to help resolve these potential heterogeneities.

[1] Michael Stonebraker and Joey Hellerstein, “What Goes Around Comes Around,” in Joseph M. Hellerstein and Michael Stonebraker, editors, Readings in Database Systems, Fourth Edition, pp. 2-41, The MIT Press, Cambridge, MA, 2005. See http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf.[2] Natalya Noy, “Order from Chaos,” ACM Queue vol. 3, no. 8, October 2005 See http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=341&page=1

[3] Alon Halevy, “Why Your Data Won’t Mix,” ACM Queue vol. 3, no. 8, October 2005. See http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=336.

[4] Michael K. Bergman, “The Deep Web: Surfacing Hidden Value,” BrightPlanet Corporation White Paper, June 2000. The most recent version of the study was published by the University of Michigan’s Journal of Electronic Publishing in July 2001. See http://www.press.umich.edu/jep/07-01/bergman.html.

[5] William Kent, “The Many Forms of a Single Fact”, Proceedings of the IEEE COMPCON, Feb. 27-Mar. 3, 1989, San Francisco. Also HPL-SAL-88-8, Hewlett-Packard Laboratories, Oct. 21, 1988. [13 pp]. See http://www.bkent.net/Doc/manyform.htm.

[6] Charnyote Pluempitiwiriyawej and Joachim Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources,” Technical Report TR00-004, University of Florida, Gainesville, FL, 36 pp., September 2000. See ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf.

[7] Amit Sheth, Cartic Ramakrishnan and Christopher Thomas, “Semantics for the Semantic Web: The Implicit, the Formal and the Powerful,” in Int’l Journal on Semantic Web & Information Systems, 1(1), 1-18, Jan-March 2005. See http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html

[8] See, among scores of possible examples, the NIEM (National Information Exchange Model) agreed to between the US Departments of Justice and Homeland Security; see http://www.niem.gov/.

[9] OWL Ontologies: When Machine Readable is Not Good Enough

Posted:June 5, 2006

From Data Federation Pyramid to the Semantic Web ‘Birthday Cake’

The previous posting in this series described climbing the data federation pyramid and the progress that had been made in the last decade overcoming seemingly intractable problems involving hardware, software and networks. Key enablers for that progress were adoption of Internet protocols (TCP/IP, HTML) and adoption of the XML data representation standard.

Data Federation Wanes, Semantic Web Waxes

Through the late 1990s the focus of the data federation challenge matured to overcoming federations in meaning. Today, we know this challenge to be called the semantic Web or semantic mediation. The intellectual trigger for this shift in emphasis came from Tim Berners-Lee, James Hendler, and Ora Lassila when they described their “grand vision” for the semantic Web in a Scientific American article in 2001. [1] The authors described the semantic Web as:

To date, the World Wide Web has developed most rapidly as a medium of documents for people rather than of information that can be manipulated automatically. By augmenting Web pages with data targeted at computers and by adding documents solely for computers, we will transform the Web into the Semantic Web. Computers will find the meaning of semantic data by following hyperlinks to definitions of key terms and rules for reasoning about them logically. The resulting infrastructure will spur the development of automated Web services such as highly functional agents. Ordinary users will compose Semantic Web pages and add new definitions and rules using off-the-shelf software that will assist with semantic markup.

Berners-Lee had already been proselytizing on this topic for a few years, notably in his Weaving the Web book in 1999.[2] But the Scientific American article really popularized the topic.

Researchers in the field came to rely on a diagram that Berners-Lee had also developed to explain the various protocols and challenges underlying semantic Web technologies. This diagram, often affectionately called the “birthday cake,” has gone through many iterations. Here is one of the most widely reproduced versions from a Berners-Lee talk given in 2000: [3]

The Berners-Lee Semantic Web ‘Birthday Cake’

The Layers

Note that this diagram expands on the four top layers of data representation, semantics, pragmatics and trust from the pyramid graphic in my previous climbing the data federation pyramid post. (Also, note that Ian Horrocks et al. have updated this “stack” and looked at it from the basis of current standards, including OWL and inclusion of encryption.[4])

The foundation of the “stack” is Unicode, an industry standard for digital representation of human languages, symbols and scripts, and URIs (uniform resource identifiers), which, like URLs, provide a unique and unambiguous basis for locating resources.

The next layer, as in the data federation pyramid, is XML.

The basic enabler for semantic representation comes from the next RDF + Schema layer. RDF (resource description framework) is a first-order description logic “triple” representation of subject – predicate – object. The subjects and objects are nouns or “things” with the subject needing to be described via a URI (optional for the object). The predicate is a verb that describes the relationship between subject and object, and is often expressed in syntax such as isPartof or hasSex or hasBirthplace, etc. In terms of graph theory, RDF is a directed graph where the subjects and objects are nodes and the predicate is an edge. RDF Schema extends the RDF “triple” by adding semantics that relate domains, relationships and subclasses and subproperties. RDF Schema provides very wide nteroperability, but it is minimalist and unable to capture a complete semantic logic.

The ontology layer provides more “meta” information, such as transitive, unique, unambiguous, cardinality or other properties. Based on RDF, ontology languages provide a means for conveying domain representations or “world views” electronically for machine processing. Today, the standard is OWL (Web ontology language), which grew out of the earlier OIL (EU) and DAML (US) incipient standards. (However, any internally consistent syntax and language for descriptive logic can also qualify as an ontology layer.) In OWL, there are also three levels — or sub-languages — cardinality. OWL DL is a computationally complete description logic (all statements can be computed and will finish in finite time). OWL Full provides the syntactic freedom of RDF with no computational guarantees. OWL Full may be necessary for a complete representation of an ontolological domain, even though it cannot be guaranteed to be internally consistent. Each of these sublanguages is an extension of its simpler predecessor.

Of course, the real rub arises when different world views need to be reconciled, or what is known as semantic mediation. In this instance, it is now necessary to invoke reconciliation logic. (Is my “glad” your “happy”? Are my countries expressed as two-letter acronyms and yours spelled out in French, and do yours include native lands in addition to nation-states?)

(In fact, the next posting in this series actually details about 40 different sources of semantic heterogeneity.)

So, even if multiple domain specifications are provided via OWL, federating them requires mediating these heterogeneities, and that requires some form of logic or rule-based expert system. Thus, in terms of standards, we have achieved the representational ways to express semantics, but the logics and rules for resolving them are open and not likely subject to standards. (Indeed, most view the semantic mediation step at best as lending itself to semi-automatic methods.)

Finally, the ‘birthday cake’ shows that even with logics in place to resolve or mediate heterogeneities, the vexing challenge of what information to trust remains, the resolution of which is perhaps aided with digital signatures or certificates.

NOTE: This posting is part of an occasional series looking at a new category that I and BrightPlanet are terming the eXtensible Semantic Data Model (XSDM). Topics in this series cover all information related to extensible data models and engines applicable to documents, metadata, attributes, semi-structured data, or the processing, storing and indexing of XML, RDF, OWL, or SKOS data. A major white paper will be produced at the conclusion of the series.

[1] Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web,” in Scientific American 284(5): pp 34-43, 2001. See http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2.[2] Tim Berners-Lee and Mark Fischetti, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor, Harper, San Francisco, 226 pp., 1999.

[3] Tim Berners-Lee, “Semantic Web on XML,” at XML 2000, December 6, Washington, DC. See http://www.w3.org/2000/Talks/1206-xml2k-tbl

[4] Ian Horrocks, Bijan Parsia, Peter Patel-Schneider, and James Hendler, “Semantic Web Architecture: Stack or Two Towers?,” in Francois Fages and Sylvain Soliman, editors, Principles and Practice of Semantic Web Reasoning (PPSWR 2005), No. 3703 in LNCS, pp 37-41, 2005. See http://www.cs.man.ac.uk/~horrocks/Publications/download/2005/HPPH05.pdf.

Posted:June 2, 2006

Redux: Enterprise Software Licensing on Life Support

NOTE: This is an update of an 2005 post.

There has been a massive — but little noticed — shift in enterprise software expenditures and software company revenues in the past decade. A "typical" enterprise software vendor could expect to obtain 70% or more of its total revenues from software license fees a decade ago. Today, that percentage is about 35% with statistically significant trends heading toward below 10% within the decade. These trends have signficant implications on the emerging business models necessary for software companies to be successful.

The Trends and Data

The figure below provides software license revenues as a percent of total revenues for about 120 different software companies over the past decade. No matter the sample, there has been a steady — and signficantly strong — trend to declining license revenues.

Software Licensing Trends

The three sources for this figure are:

Search – my own values for Autonomy and Convera from SEC filings
Top 100 – these are listings compiled by Culpeper and Associates[2]
MIT – these values are from the MIT Sloan School of Management, using eight leading companies as referenced by Michael Cusumano [3]

The trend lines indicate continued percentage declines in the importance of software licensing. Based on these teyear trends, by 2008 conventional software licenses will account for less than 10% of total revenues for all software companies, and less than 20% for leading enterprise search vendors (Verity [now part of], Autonomy, Convera). These trends have very high R² values. Seven-fold or greater drops from a position of dominance suggests a sea change is taking place in the revenue mix for software companies and the expenditure mix for enterprises.

These trends can vary significantly by software company, as the comparison table I constructed from recent SEC filings below shows:

Company	License Revenue %
Red Hat	0.0%
salesforce.com	0.0%
i2	15.0%
Compuware	23.5%
Peoplesoft	23.7%
IBM	24.6%
SAP	31.4%
Oracle	34.9%
INDUSTRY AVERAGE	35.4%
Siebel	36.4%
Business Objects	51.1%
Microsoft	76.5%
Adobe	90.0%

These values are derived from the most recent SEC filings (10Ks or 10Qs). This table shows that companies that can truly "package" shrink-wrapped software can maintain the highest percentages of software license revenues; vendors that rely on the subscription model have the lowest percentage, often going to zero. Large, traditional software vendors such as IBM or Oracle are below industry averages for the percent of software license percentages. This trend is remarkable given that these larger vendors obtained 70-80% or more of their total software revenues from license fees a mere decade ago.

The abiding trend appears to be the shift from software to services, but the picture is considerably more complicated than that.

Other Software Licensing Studies

At least two comprehensive studies have been issued in the past year or so regarding software licensing trends. The first, from IDC that involved Delphi interviews of 100 large customers and 100 major software vendors, was conducted with the support of 11 major vendors and the Software and Information Industry Association (SIIA). [1] This study sees subscription licenses having increasing importance to vendor revenues.

This study shows that companies today budget 20% for maintenance contracts, 32% for new licenses. IDC projects that maintenance expenditures are likely to increase, license to decline. With an increased reliance on a subscription model, maintenance in fact increases as a source of revenue to the vendor. IDC projects 34% of revenue to come from subscriptions by 2008. Though the worldwide software market was about $200 billion in 2003, growth will continue, with changing fractions of the sources of revenue. Besides subscription models, maintenance fees and consulting and service fees are projected to increase while standard license fees decrease. Vendor drivers for these trends include the long lead times of traditional enterprise software license sales and the need for more predictable revenue streams. Customer drivers for these trends are demands for lower overall costs, a better alignment of value, and the requirement for smaller upfront costs.

The second study from Macrovision used a questionnaire methodology directed to a larger group of software executives and a similar number of customers. [4] This study, too, was conducted in association with SIIA. This later study, completed in late 2004, also sees subscription licenses increasing. Vendors reported a trend to subscription licensing to become 67% of license revenues, though customers exhibited more reluctance to embrace the subscription model.

Both studies showed maintenance fees to average 20-22% of initial software license fees.

What Changing Business Models are Emerging?

While the trend moving away from standard software licenses is clear, what that means in terms of winning next-generation business models is less clear. The software industry thus appears to be in flux with a period of experimentation with alternative business models prevalent. It may be a year or three before which of these alternatives begins to emerge as the clear business model winner.

So, what are these alternatives:

Services – many large traditional vendors, including Novell, IBM, HP and Oracle, have seen massive percentage shifts from software licenses to consulting and services revenues. This trend is linked both with related open source trends and the increasing need to engineer and deploy interoperable systems from multiple software vendors
Open Source – after steady trends to Linux in the late 1990’s and dominance of open source for Internet servers, most recently there has been an increase in open source applications and interoperable systems. The general importance of open source trends is documented in many of the current and pending AI3 blog posts
Outsourcing – the outsourcing of many traditional IT and backoffice functions is a well-documented phenomenon, and
Subscription – though the earlier buzz for application service providers (ASP) has waned in the past two years, a similar model has emerged under the subscription or Web services monikers. As noted above, there may be a doubling in importance of this revenue model in relation to traditional software licensing within this decade but customer enthusiasm is questionable.

The heyday for complete, turnkey enterprise software systems and the highwater mark for enterprise software budgets appear to have passed. Both customers and vendors are trying to bring more rationality and predictability into the IT software cost equation. The specific mix and nature of these changing models is still unclear.

Some Venture Implications

The major casualty from these trends is the idea of the enterprise "killer app" and its ability to become a virtual money printing press. The dominance of this myth can cause some significant mis-steps and misunderstandings in putting together a successful venture:

Waiting to get packaging and configuration right delays time to market and incurs higher development costs in the absence of supporting revenues. There is a need to get customer exposure and input earlier with less developed solutions. Shattering the myth of the packaged software printing press for money is important to change attitudes and immediate priorities
VC support may be deferable with lower needs for upfront development dollars, and, in any case, venture support should shift from packaged "products" to interoperable and modular technologies
As Eric von Hippel points out in his recent book, Democratizing Innovation [5], early and constant involvement of the customers and the market are keys to innovation and suggest business models that are more experimental and open source, and
It appears the days — at least for the foreseeable future — of the "killer app" are over in the enteprise setting. Companies and enterprises are demanding more accountability and justification for expenditures; software vendors are realizing that at the enterprise level "cookie cutter" approaches work relatively infrequently.

It is seductive to think that with the right packaging, the right interface, and the right combination of features and functionality that it then becomes possible to turn the crank on the money printing press. After development and packaging are complete, after all, the cost of the next incremental unit for shipment is close to nil. But enteprises rarely can adopt commodity approaches to unique situations and problems. Customization is the rule and the environment is never the same.

Understanding these secular trends is important for software entrepreneurs and the angels and VCs that may back them. The common theme returns: choice of business model in response to market conditions is likely more important than technology or innovation.

[1]
A.M. Konary, S. Graham, and L.A. Seymour, The Future of Software Licensing: Software Licensing Under Siege, IDC White
Paper, International Data Corporation, March 2004, 21 pp. See http://www.idc.com/groups/software_licensing/downloads/4046_rev6_idc_site.pdf (requires registration).

[2]
Culpepper & Associates, "Software Revenues Continue to Shift from Licenses to Services," September 10, 2002. See
http://www.culpepper.com/eBulletin/2002/SeptermberRatiosArticle.asp

[3]
M. Cusumano, "Business Models that Last: Balancing Products and Services in Software and Other Industries," MIT Sloan School of Management Working Paper 197, December 2003, 22 pp. See http://ebusiness.mit.edu/research/papers/197_Cusumano_ProdSrvcsBusMod.pdf

[4] Macrovision Corporation, Key Trends in Software Pricing and Licensing, White Paper for various clients, October 2004, 12 pp. See http://www.siia.net/software/pubs/SW_Pricing_Licensing_Report.pdf.

[5] E. von Hippel, Democratizing Innovation, MIT Press, Cambridge, MA, 2005, 220 pp. Electronic version available via Creative Commons license, see http://web.mit.edu/evhippel/www/democ.htm.

Posted:May 31, 2006

Let a Thousand (Better, Ten) Flowers Bloom

An incredibly fascinating visualization tool by Sala on the Aharef blog is called the Website as a Graph. This posting links to the actual entry site where you can enter a Web address and the system provides a visual analysis of that individual Web page (not an overall view of the site). The color coding applied is:

blue: for links (the A tag)
red: for tables (TABLE, TR and TD tags)
green: for the DIV tag
violet: for images (the IMG tag)
yellow: for forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)
orange: for linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)
black: the HTML tag, the root node
gray: all other tags

Here is the figure that is created based on my blog site:

Here is the figure from the BrightPlanet Web site:

Here is the figure from a new BrightPlanet Web site design, not yet publicly released:

Here is the figure from the CompletePlanet Web site. Note it uses the Deep Query Manager Publisher module:

Here is the figure from BrightPlanet’s Web and graphics design firm, Paulsen Marketing Communications, which is also a founder of the company. PMC uses a Flash design that does not render well with the applet:

And, finally, here is the figure from the QueryHorse equine search portal, built with the DQM Publisher:

These graphs are mostly fun, and are the number one tag currently on the Flickr site as "websitesasgraphs". You can see hundreds of examples there. These graphs do indicate whether sites depend on tables or /div tags, use of images, complexity and the like. But, mostly they are fun, and perhaps even art.

Main Links

Search

Author: Mike Bergman