Posted:May 11, 2015

Pulse: Efficient Storage of URIs

AI3 Pulse

Mavlyutov et al. have posted a pre-print [1] of their upcoming paper to be presented at ESWC at the end of the month covering the most efficient representation of URIs in information systems. All of us who do large-scale work with the semantic Web or linked data should be interested in these findings.

To my knowledge, the paper is the first one to explicitly evaluate common data structures for encoding, storing and retrieving URIs at scale. As the unique identifiers for resources, there may be millions to billions needing to be stored and retrieved from triple stores or other database backends.

The authors compared a dozen different methods for storing URIs according to the standard needs to index, insert and retrieve URIs, including encoding and decoding, at scale.  Memory and operation times were measured. The methods evaulated were specific RDF systems; various hash maps; various hash tables; binary search, B+, ART (adaptive radix), and lexicographic trees; and the HAT-trie.

Different operational needs may point to different methods. However, the authors conclude that “overall, the HAT-trie appears to be a good compromise taking into account all aspects, i.e., memory consumption, loading time, and look-ups. ART also appears as an appealing structure, since it maintains the data in sorted order, which enables additional operations like range scans and prefix lookups, and since it still remains time and memory efficient.”

This paper should be a useful reference for any group that needs to manage URIs at scale.


[1] Mavlyutov, Ruslan, Marcin Wylot, and Philippe Cudre-Mauroux. “A Comparison of Data Structures to Manage URIs on the Web of Data.”, accepted paper at the 12th ESWC Conference (2015), May 31-June 4, 2015, Portoroz, Slovenia.

Schema.org Markup

headline:
Pulse: Efficient Storage of URIs

alternativeHeadline:

author:

image:
http://www.mkbergman.com/wp-content/themes/ai3v2/images/pulse.png

description:
This paper is the first one to explicitly evaluate common data structures for encoding, storing and retrieving URIs at scale. The authors compare a dozen different methods for storing URIs according to the standard needs to index, insert and retrieve URIs, including encoding and decoding, at scale. Memory and operation times were measured.

articleBody:
see above

datePublished:

Leave a Reply

Your email address will not be published. Required fields are marked *