Posted:February 13, 2006

Conventional service-oriented architectures (SOAs) have been found to have:

  • Slow and inefficient bindings
  • Complete duplication of information processing in requests because of no caching, and
  • Generally slow performance because of RDBMS storage.

These problems are especially acute at scale.

Frank Cohen recently posted a paper on IBM’s developerWorks, "FastSOA: Accelerate SOA with XML, XQuery, and native XML database technology: The role of a mid-tier SOA cache architecture," that presents some interesting alternatives to this conundrum.

The specific FastSOA proposal may or may not be your preferred solution if you are working with complex SOA environments at scale. But the general overview of conventional SOA constraints (in the SOAP framework) is very helpful and highly recommended.

Posted by AI3's author, Mike Bergman Posted on February 13, 2006 at 9:59 am in Information Automation, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/188/fastsoa-speed-up-conventional-soas/
The URI to trackback this post is: https://www.mkbergman.com/188/fastsoa-speed-up-conventional-soas/trackback/
Posted:February 12, 2006

The enterprise Semantic Web, as all Semantic Web instances, by definition depends on semi-structured data. Generally lacking in the move toward a semi-structured data paradigm has been the creation of adequate processing engines for efficient and scalable storage and retrieval of semi-structured data.[1]

While tremendous effort has gone into data representations like XML, when it comes to positing or designing engines for manipulating that data the approach is to clone kludgy workarounds on to existing relational DBMSs or text search engines. Neither meet the test. Thus, as the semantic Web and its association to semi-structured data looks forward, two impediments stand like gatekeepers blocking progress: 1) efficient processing engines and 2) scalable systems and architectures.

Unlike structured or unstructured data, there is no accepted database engine specific to semi-structured data. Some systems attempt to use relational DBMS approaches from the structured end of the spectrum; other systems attempt to add some structure to standard unstructured search engines (see the figure in my related posting). Structured data is dominated by RDBMSs and unstructured data is largely the realm of text or search engines:

Attempts to manage the middle ground of semi-structured data has involved either modifying RDBMS systems to be XML enabled, adding some structure to existing IR systems, or developing new, native XML data systems from scratch. The native XML systems are relatively new and unproven. For a listing of native XML databases, plus generally useful discussion about the use of XML within databases, see Ron Bourret’s Web site.[2].

Semi-structured data models are sometimes called “self-describing” (or schema-less). These data models are often represented as labeled graphs, or sometimes labeled trees with the data stored at the leaves. The schema information is contained in the edge labels of the graph. Semi-structured representations also lend themselves well to data exchange or the integration of heterogeneous data sources.

However, all of these three approaches to managing semi-structured XML data  — enabled RDBMSs, modified IR text engines, or native XML data systems  — have their own strengths and weaknesses, as shown by the table below:

Type Pros Cons

Because of their prevalence, XML-enabled RDBMSs are perhaps the most common approach, with all commercial vendors such as Oracle, IBM and Sybase offering their own versions. But realize that XML is itself text, much of its information requires text-based retrieval, and open XML schemas with the need to preserve ordering are very poorly suited to the relational data model. As a result, RDBMS options are very fragile, perform poorly for document-centric retrievals, and lose critical information.

IR-based text search systems do well on the text retrieval scale, but are not suited at all for storing and managing structured data. Further, many of these systems use in-line tagging of structural attributes. While this approach parses well and can seamlessly work with existing text token indexing, at scale it suffers the fatal flaw of requiring the complete re-indexing of existing content should new attributes or extensions be desired.

Finally, all native XML data systems perform poorly at scale. Some of these native systems build from a text-search basis, others from more object or relational approaches. But, in general, queries and other mechanisms are still highly XML document-centric, with very slow retrievals across large document repositories.

As XML and semi-structured data have become ubiquitous, clearly the path is opening in the marketplace for a “third way.” Later postings will look at efforts by new vendors such as Mark Logic to address this opportunity, as well as emerging efforts from BrightPlanet.

NOTE: This posting is part of an occasional series looking at a new category that I and BrightPlanet are terming the eXtensible Semantic Data Model (XSDM). Topics in this series cover all information related to extensible data models and engines applicable to documents, metadata, attributes, semi-structured data, or the processing, storing, indexing or semantic schemas and mappings of XML, RDF, OWL, or SKOS data. A major white paper will be produced at the conclusion of the series. Stay tuned!

[1] Matteo Magnani and Danilo Montesi, “A Unified Approach to Structured, Semistructured and Unstructured Data,” Technical Report UBLCS-2004-9, Department of Computer Science, University of Bologna, 29 pp., May 29, 2004. See http://www.cs.unibo.it/pub/TR/UBLCS/2004/2004-09.pdf.

[2]See http://www.rpbourret.com/xml/XMLDatabaseProds.htm
and http://www.rpbourret.com/xml/XMLAndDatabases.htm.

Posted by AI3's author, Mike Bergman Posted on February 12, 2006 at 12:51 pm in Semantic Web | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/
The URI to trackback this post is: https://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/trackback/

The W3C organization has just published an update on "A Survey of RDF/Topic Maps Interoperability Proposals."  This note, dated Feb 10, updates the previous version of one year ago.

It is well and good to embrace standards for semantic content such as RDF or OWL, but without mechanisms for standardly expressing schemas it is difficult to actually map and resolve semantic heterogeneities.  This introductory survey is useful from the standpoint of topic maps. 

Posted by AI3's author, Mike Bergman Posted on February 12, 2006 at 11:52 am in Adaptive Information, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/187/rdf-and-topic-maps-interoperability/
The URI to trackback this post is: https://www.mkbergman.com/187/rdf-and-topic-maps-interoperability/trackback/
Posted:February 1, 2006

IBM has announced it has completed the first step of making the Unstructured Information Management Architecture (UIMA) available to the open source community by publishing the UIMA source code to SourceForge.net. UIMA is an open software framework to aid the creation, development and deployment of technologies for unstructured content. IBM first unveiled UIMA in December of 2004. The source code for the IBM reference implementation of UIMA is currently available and can be downloaded from http://uima-framework.sourceforge.net/ . In addition, the IBM UIMA SDK, with additional facilities and components, can be downloaded for free from http://www.alphaworks.ibm.com/tech/uima .

UIMA has received support from the Defense Advanced Research Projects Agency (DARPA) and is currently in use as part of DARPA’s new human language technology research and development program called GALE (Global Autonomous Language Exploitation). UIMA is also embedded in various IBM products for processing unstructured information.

Posted:January 29, 2006

Peter Rip, the managing director of Leapfrog Ventures, has posted a very thoughtful piece in his EarlyStageVC blog on the ROI dilemma facing traditional VCs. It is titled Traditional Venture Capital Sure Seems Broken – It’s About Time.

Fewer VC dollars are going into early technology startups, and then at higher valuations and with larger amounts invested.  Meanwhile, technology has become democratized with faster times to market and with narrower innovation edges. This poses dilemmas both to VCs and to company founders, as Peter accurately notes:

The traditional venture capital model formula in technology was … a form of arbitrage based on two scarcities — risk capital and understanding of technology. Starting in 1995 the scarcity of both these drivers began to disappear….

The traditional venture capital model has been “fund twenty, pray for two.” Since you could only lose 1X your money, you could make it up with a couple of big hits. But big hits are fewer and much farther between than ever before. To make a modest venture return on a $500M fund, you need to generate $1B. Assuming you own an average of 20% of companies at the exit, you need to create $5B of shareholder value. If the average IPO valuation is $216M (per VentureOne, according to a WSJ article last week), that means you need 23 IPOs in a portfolio of 30 companies. The math worked when venture funds were $200M and exits were $500M. It doesn’t work when the numbers are reversed.

Venture capital is a three parameter problem. Buy Low, Sell High, Sell at the Right Time. Most people seem to have ignored the third parameter. Time-to-exit used to be 4-6 years. Then it collapsed to 2 years in the Bubble. Now it seems pretty much infinite. Divide by Zero and get infinite IRR. Divide by infinity and get -100% IRR. The proof is left to the Investor.

I can only see two venture capital strategies that make sense in this environment. One is to be small, focused, and totally aligned with market realities and founder incentives. … There is no magic to this strategy. It just takes discipline….

You would think that cheap, abundant capital was a great boon for entrepreneurs. It isn’t….Lots of cheap capital, available at high valuations seems great, until you do the exit math. Raise $8M at $12M pre-money and your post-money valuation is $20M. Your investors want to sell for $200M. Raise $2M at $4M pre- and your investors get the same rate of return at $60M. But a $60M exit is 10X more likely than $200M. [But] few VCs will write the $2M check these days, precisely because a $20M return doesn’t move the needle in a $500M fund. That’s why valuations are moving up . . . the need to invest more money  . . . not the intrinsic value of startups….

The other strategy is to be treat venture capital as one of many capital markets to search for inefficiencies across the private-public spectrum.

Smaller funding rounds align financing more closely to actual development needs and increase cash management discipline. The likelihoods of meaningful ROIs also go up for both entrepreneurs and VCs. Yet one dilemma is that smaller funding still imposes the same overhead costs to management and financiers in deal packaging and due diligence. As Peter notes, this can be an unacceptable cost to larger fund VCs. And, for entrepreneurs, this can also lead to the need for multiple rounds, dilution by a thousand cuts, and overhead burdens that detract from the real business of building a business.

I think Peter’s insights are very appropriate from the VC perspective.

But, as an entrepreneur, I look at this issue from the different end of the telescope. Yes, smaller (and therefore more) funds make sense, with smaller funding rounds to help capital discipline. But more patience and nurturing of the venture, especially with respect to business models, is even more critical. So long as the success rates for VC-backed ventures remains so abysmally low, the mentality of venture development will remain too much a Vegas “crap shoot” or rely on false “safe bets” on “proven management.”

Entrepreneurs are well advised to forget the question of initial valuations — which will prove meaningless very quickly — and focus on success rates by the VCs. That is the best indicator that critical expertise and insight will be brought forward — likely with some patience and staying power — in addition to the lubricant of capital.

Posted by AI3's author, Mike Bergman Posted on January 29, 2006 at 3:00 pm in Software and Venture Capital | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/181/too-big-for-success-the-vc-funding-dilemma/
The URI to trackback this post is: https://www.mkbergman.com/181/too-big-for-success-the-vc-funding-dilemma/trackback/