Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
May 2013
S M T W T F S
« Feb    
 1234
567891011
12131415161718
19202122232425
262728293031  
Archives
More . . .  
Credits
Blog software courtesy of WordPress Site Meter View Mike's profile on LinkedIn
6723
Search
Date:   February 12, 2006

The enterprise Semantic Web, as all Semantic Web instances, by definition depends on semi-structured data. Generally lacking in the move toward a semi-structured data paradigm has been the creation of adequate processing engines for efficient and scalable storage and retrieval of semi-structured data.[1]

While tremendous effort has gone into data representations like XML, when it comes to positing or designing engines for manipulating that data the approach is to clone kludgy workarounds on to existing relational DBMSs or text search engines. Neither meet the test. Thus, as the semantic Web and its association to semi-structured data looks forward, two impediments stand like gatekeepers blocking progress: 1) efficient processing engines and 2) scalable systems and architectures.

Unlike structured or unstructured data, there is no accepted database engine specific to semi-structured data. Some systems attempt to use relational DBMS approaches from the structured end of the spectrum; other systems attempt to add some structure to standard unstructured search engines (see the figure in my related posting). Structured data is dominated by RDBMSs and unstructured data is largely the realm of text or search engines:

Attempts to manage the middle ground of semi-structured data has involved either modifying RDBMS systems to be XML enabled, adding some structure to existing IR systems, or developing new, native XML data systems from scratch. The native XML systems are relatively new and unproven. For a listing of native XML databases, plus generally useful discussion about the use of XML within databases, see Ron Bourret’s Web site.[2].

Semi-structured data models are sometimes called “self-describing” (or schema-less). These data models are often represented as labeled graphs, or sometimes labeled trees with the data stored at the leaves. The schema information is contained in the edge labels of the graph. Semi-structured representations also lend themselves well to data exchange or the integration of heterogeneous data sources.

However, all of these three approaches to managing semi-structured XML data  — enabled RDBMSs, modified IR text engines, or native XML data systems  — have their own strengths and weaknesses, as shown by the table below:

Type Pros Cons

Because of their prevalence, XML-enabled RDBMSs are perhaps the most common approach, with all commercial vendors such as Oracle, IBM and Sybase offering their own versions. But realize that XML is itself text, much of its information requires text-based retrieval, and open XML schemas with the need to preserve ordering are very poorly suited to the relational data model. As a result, RDBMS options are very fragile, perform poorly for document-centric retrievals, and lose critical information.

IR-based text search systems do well on the text retrieval scale, but are not suited at all for storing and managing structured data. Further, many of these systems use in-line tagging of structural attributes. While this approach parses well and can seamlessly work with existing text token indexing, at scale it suffers the fatal flaw of requiring the complete re-indexing of existing content should new attributes or extensions be desired.

Finally, all native XML data systems perform poorly at scale. Some of these native systems build from a text-search basis, others from more object or relational approaches. But, in general, queries and other mechanisms are still highly XML document-centric, with very slow retrievals across large document repositories.

As XML and semi-structured data have become ubiquitous, clearly the path is opening in the marketplace for a “third way.” Later postings will look at efforts by new vendors such as Mark Logic to address this opportunity, as well as emerging efforts from BrightPlanet.

NOTE: This posting is part of an occasional series looking at a new category that I and BrightPlanet are terming the eXtensible Semantic Data Model (XSDM). Topics in this series cover all information related to extensible data models and engines applicable to documents, metadata, attributes, semi-structured data, or the processing, storing, indexing or semantic schemas and mappings of XML, RDF, OWL, or SKOS data. A major white paper will be produced at the conclusion of the series. Stay tuned!

[1] Matteo Magnani and Danilo Montesi, “A Unified Approach to Structured, Semistructured and Unstructured Data,” Technical Report UBLCS-2004-9, Department of Computer Science, University of Bologna, 29 pp., May 29, 2004. See http://www.cs.unibo.it/pub/TR/UBLCS/2004/2004-09.pdf.

[2]See http://www.rpbourret.com/xml/XMLDatabaseProds.htm
and http://www.rpbourret.com/xml/XMLAndDatabases.htm.

Posted by AI3's author, Mike Bergman

Posted on February 12, 2006 at 12:51 pm in Semantic Web | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/
The URI to trackback this post is: http://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/trackback/
4 Responses to “Enterprise Semantic Webs (ESW) Demand New Database Paradigms”
  1. AI3 - Adaptive Information::: » Blog Archive » Market Opportunities in the Semantic Web commented on

    [...] Semantics as the next stage in "data federation" and interoperability, and [...]

  2. AI3 - Adaptive Information::: » Blog Archive » Methods for Semantic Discovery, Annotation and Mediation commented on

    [...] Once all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community. [...]

  3. AI3 - Adaptive Information::: » Blog Archive » ‘Semantic Technologies’ in the Enterprise commented on

    [...] I agree totally with the evolutionary, incremental view of semantic Web adoption beginning in the enterprise as an earlier posting argued, with its initial role being to help overcome semantic heterogeneities.  I may also begin to work in the phrase ’semantic technologies’ more into my writings.Post to Your Favorite Social Bookmark(s):These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

  4. AI3 - Adaptive Information::: » Blog Archive » The Commoditization of Content Software commented on

    [...] Indeed, I have repeatedly documented these gaps for virtually all large-scale document-centric or federated applications. The root cause — besides rampant poor interface designs — has been in my opinion poorly suited data management foundations. Relational or IR-based systems both perform poorly for different reasons in managing semi-structured data. This problem will not be solved by open source per se (see below), though there are some interesting options emerging from open source that may point to way to new alternatives, as well as incipient designs from BrightPlanet and others. [...]

Leave a Reply

Comment Guidelines:  All submitted comments are moderated prior to posting. Off-topic or inappropriate language or comments will not be posted. Email addresses will never be published. Thanks for your interest.
Copyright © 2004–2013 Michael K. Bergman.   This work is licensed under a Creative Commons License