Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
July 2010
S M T W T F S
« Jun    
 123
45678910
11121314151617
18192021222324
25262728293031
Archives
More . . .  
Search
Affiliations
structWSF
Credits
Blog software courtesy of WordPress Obtain Technorati profile Subscribe with Bloglines
View Mike's profile on LinkedIn
Date:   May 24, 2006

Katie Portwin, one of the Ingenta developers whose Jena paper stimulated my recent posting on semantic Web scalability, has expanded on the scalability theme in interesting ways in her recent performance, triplestores, and going round in circles.. post.

In her post, Katie asks rhetorically, Can industrial scale triplestores be made to perform? Is breaking the "triple table" model the answer? She then goes on to note that in a related XTech paper, the Ingenta team showed that even a simple, bread and butter sample query takes 1.5 seconds on a 200 million triple-store.  The post also contains interesting links to other speakers at the Jena User’s Conference last week, including clever ways to cluster triples in an RDBMS.

I asked Tom Tiahrt, BrightPlanet’s chief scientist and lead developer on our text and semantic engines, to review this post and give me his thoughts.  Here are his comments:

I always like to see this: "re-modelling" or "modelling" instead of "modeling" because I abhor human-induced language entropy. Kudos to Katie Portwin (KP) for that alone.

Kevin Wilkinson (KW) defines a triple store as a three-column table in a relational system. This is unfortunate because a triple-store is not exclusive to RDB systems. It must be provided by any RDF system as part of its logical design, even if does not use it for its physical design.

KW's patterns-identification aspect is likely true in many instances, and his 'breaking' the clean RDF format is what DBAs and RDB developers always do to improve performance (denormalizing the database). KP points out the problem with this, viz., that you must maintain a more complex schema, and duplicates raise data retrieval issues (though they are tractable). Moreover, KP writes "The great thing about the triplestore is that we don’t have to bake assumptions about the data into the database – we can have as many whatevers as we like."

The point is to achieve acceptable performance you cannot simply rely on the triple store alone. At the same time, RDF requires triples, and to prevent assumption baking the user should not have to decide how to denormalize the triple-store. In addition,
the transitive closure computation is the onerous query that the RDBMS cannot do within a reasonable amount of time.

Here are the parameters of the great problem. Static assumptions about what will happen directly oppose what RDF is supposed to provide. Open-ended dynamic processing cannot perform well enough to solve the problem.

Thanks, Tom.

Katie Portwin also points out that re-modelling is a real problem as well when the system is hosted by an RDBMS, though the triple stores can remain intact.

I’ll keep monitoring this topic and post other interesting perspectives on RDF, triple-store and semantic Web scalability as I encounter them.

Posted by AI3's author, Mike Bergman

Posted on May 24, 2006 at 2:09 pm in Semantic Web | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/233/redux-scalability-of-the-semantic-web/
The URI to trackback this post is: http://www.mkbergman.com/233/redux-scalability-of-the-semantic-web/trackback/
2 Responses to “Redux: Scalability of the Semantic Web”
  1. Henry Story commented on

    I keep wondering if the solution is not going to be something like a java virtual machine: just in time organisation of the database to best match the types of queries that it gets asked.

  2. AI3 - Adaptive Information::: » Blog Archive » Methods for Semantic Discovery, Annotation and Mediation commented on

    [...] Once all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community. [...]

Comment Guidelines:  All submitted comments are moderated prior to posting. Off-topic or inappropriate language or comments will not be posted. Email addresses will never be published. Thanks for your interest.
Copyright © 2004–2010 Michael K. Bergman.   This work is licensed under a Creative Commons License