Structured Dynamics‘ product and software architecture is oriented to the Web. It emphasizes maximum flexibility, minimum “lock-in” and complete adaptability. This piece describes this architecture and how these aims are being met.
Structured Dynamics is committed to what is known as a Web-oriented architecture (WOA) , which can be defined as:
Nick Gall describes WOA as based on the architectural foundations of the Web, and is characterized by “globally linked, decentralized, and uniform intermediary processing of application state via self-describing messages.”
WOA is a subset of the service-oriented architectural style, wherein discrete functions are packaged into modular and shareable elements (“services”) that are made available in a distributed and loosely coupled manner. WOA uses the representational state transfer (REST) architectural style defined by Roy Fielding in his 2000 doctoral thesis; Fielding is also one of the principal authors of the Hypertext Transfer Protocol (HTTP) specification.
REST provides principles for how resources are defined and used and addressed with simple interfaces without additional messaging layers such as SOAP or RPC. The principles are couched within the framework of a generalized architectural style and are not limited to the Web, though they are a foundation to it.
REST and WOA stand in contrast to earlier Web service styles that are often known by the WS-* acronym (such as WSDL, etc.). WOA has proven itself to be highly scalable and robust for decentralized users since all messages and interactions are self-contained (convey “state”).
Structured Dynamics abstracts its WOA services into simple and compound ones (which are combinations of the simple). All Web services (WS) have uniform interfaces and conventions and share the error codes and standard functions of HTTP. We further extend the WOA definition and scope to include linked data, which is also RESTful. Thus, our WOA also sits atop an RDF (Resource Description Framework) database (“triple store”) and full-text search engine.
These Web services then become the middleware interaction layer for general access and querying (“endpoints”) and for tying in external software (“clients”), portals or content management systems (CMS). This design provides maximum flexibility, extensibility and substitutability of components.
Here is the basic overview of the architecture and its components. Each of its major components is described in turn as keyed by number:
The core to the system is the Web services middleware layer, or WS framework (WSF). Structured Dynamics is preparing this framework [see (1)] as a separately available open source package under Apache 2 license (soon to be released). It provides all of the components shown as items (2) to (5) in the diagram.
WSF is the abstraction layer that provides the Web service endpoints and services for external use. It also provides the direct hooks into the underlying RDF triple stores and full-text search engines that drive these services. At initial release, these pre-configured hooks will be to the Virtuoso RDF triple store (via ODBC, and later HTTP) and the Solr faceted text search engine (via HTTP) . However, the design also allows other systems to be substituted if desired or for other specialized systems to be included (such as an analysis or advanced inference engine).
The controlling Web service in WSF is the Authentication/Registration WS [see (2)]. The initial version uses registered IP addresses as the basis to grant access and privileges to datasets and functional Web services. Later versions may be expanded to include other authentication methods such as OpenID, keys (ÃƒÆ’Ã†â€™Ãƒâ€ Ã¢â‚¬â„¢ÃƒÆ’Ã¢â‚¬Â ÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ la Amazon EC2), foaf+ssl or oauth. A secure channel (HTTPS, SSH) could also be included.
The other core Web services provided with WSF are the CRUD functional services (create – read – update – delete), import and export, browse and search, and a basic templating system [see (3)]. These are viewed as core services for any structured dataset.
In initial release, the import and export formats will likely include TSV, RDF/XML, RDF/N3, RDF/Turtle, XML and possibly JSON.
A simple but elegant system guides access and use rights. First, every Web service is characterized as to whether it supports one or more of the CRUD actions. Second, each user is characterized as to whether they first have access rights to a dataset and, if they do, which of the CRUD permissions they have [see (4, 5)]. We can thus characterize the access and use protocol simply as A + CRUD.
Thereafter, a simple mapping of dataset access and CRUD rights determines whether a user sees a given dataset and what Web services (“tools”) are presented to them and how they might manipulate that data. When expressed in standard user interfaces this leads to a simple contextual display of datasets and tools.
At the Web service layer, these access values are set parametrically. The system, however, is designed to more often be driven by user and group management at the CMS level via a lightweight plug-in or module layer.
Fundamentally, this “data-driven application” works because of its structured data foundation [see (6)]. Structured Dynamics employs an innovative design that exposes all RDF and record aspects to full-text search and is able to present contextual (“faceted”) results at all times in the interface . In addition, the Virtuoso universal server provides RDF triple store and related structured data services.
The actual “driver” for the structured data system is one or more schema (“ontologies”) setting all of these structured data conditions. These ontologies are also managed by the triple store. The definition of these ontologies is specified in such a way with accompanying documentation to enable new scopes or domains to easily drive the system.
As described by the diagram so far, all interactions with the system have been mediated either by Web service APIs or external endpoints, such as SPARQL.
For external clients or any HTTP-accessible system [see (10)], this is sufficient. Programmatically, external clients (software) may readily interact with the WS and obtain results via parametric requests.
However, the framework is also designed to be embedded within existing content management systems (CMSs). For this purpose, an additional layer is provided.
CMS interaction first occurs via specific modules or plug-ins written for that system [see (7)]. These are very lightweight wrappers that conform to the registry and hooks of the host CMS system. The actual modules or plug-ins provided are also geared to the management style of the governing CMS and what it offers [see (8)]. Each module or plug-in wrapper is a packaging decision of how to bound the WSF Web services in a configuration appropriate to the parent CMS.
This design keeps the actual tie-ins to the CMS as a very thin wrapper layer, which can embrace an open source licensing basis consistent with the host CMS. Because all of the underlying functionality has been abstracted in the WSF framework, licensing integrity across all deployment layers is maintained while allowing broad CMS interoperability. The design also allows networks to be established of multiple portals or nodes with different CMSs, perfect for broad-scale collaboration. User choice and flexibility is retained to the max.
In this design, the CMS retains its prominence and visibility (and, indeed, the standard admin and licensing basis). The WSF, specific Web services, and structured data backend remain largely invisible to the user.
This design has manifest benefits, some of which include:
Structured Dynamics has the twin philosophies of using the best tools available yet also to give its customers and clients full choice. For instance, SD believes the Virtuoso system to be the best RDF triple store with superior functionality. Our internal benchmarks affirm its performance. Virtuoso is our standard first recommendation for a performant triple store.
Yet our architecture and design is not dependent on this application, nor indeed any other application. Deployment environments, customer preference, or pre-existing installations sometimes warrant the use of certain tools or applications. Large collaboration networks necessarily spring from diversity. It is thus critical that SD’s designs and architectures be tool-neutral and allow swapping and substitution. This is a major reason for the WOA and other RESTful design aspects of our Web services framework.
SD brings particular strengths in architecture, proper splits and design that separate ABox and TBox functionalilty , and ontology use and development. All of our designs are meant to be as tool neutral as possible, and we are always seeking the best of class in open source tools for any category.
Over the coming weeks and months Structured Dynamics will be rolling out packages and distribution sites for access to this framework and components built around this philosophy . Stay tuned!
There has been much welcomed visibility for the semantic Web and linked data of late. Many wonder why it has not happened earlier; and some observe progress has still been too slow. But what is often overlooked is the foundational role of RDF — the Resource Description Framework.
From my own perspective focused on the issues of data interoperability and data federation, RDF is the single most important factor in today’s advances. Sure, there have been other models and other formulations, but I think we now see the Goldilocks “just right” combination of expressiveness and simplicity to power the foreseeable future of data interoperability.
So, on this 10th anniversary of the birth of RDF , I’d like to re-visit and update some much dated discussions regarding the advantages of RDF, and more directly address some of the mis-perceptions and myths that have grown up around this most useful framework.
RDF is a data model that is expressed as simple subject-predicate-object “triples”. That sounds fancy, but just substitute verb for predicate and noun for subject and object. In other words: Dick sees Jane; or, the ball is round. It may sound like a kindergarten reader, but it is how data can be easily represented and built up into more complex structures and stories.
A triple is also known as a “statement” and is the basic “fact” or asserted unit of knowledge in RDF. Multiple statements get combined together by matching the subjects or objects as “nodes” to one another (the predicates act as connectors or “edges”). As these node-edge-node triple statements get aggregated, a network structure emerges, known as the RDF graph.
The referenced “resources” in RDF triples have unique identifiers, IRIs, that are Web-compatible and Web-scalable. These identifiers can point to precise definitions of predicates or refer to specific concepts or objects, leading to less ambiguity and clearer meaning or semantics.
In my own company’s approach to RDF, basic instance data is simply represented as attribute-value pairs where the subject is the instance itself, the predicate is the attribute, and the object is the value. Such instance records are also known as the ABox. The structural relationships within RDF are defined in ontologies, also known as the TBox, which are basically equivalent to a schema in the relational data realm.
RDF triples can be applied equally to all structured, semi-structured and unstructured content. By defining new types and predicates, it is possible to create more expressive vocabularies within RDF. This expressiveness enables RDF to define controlled vocabularies with exact semantics. These features make RDF a powerful data model and language for data federation and interoperability across disparate datasets.
There are many excellent introductions or tutorials to RDF; a recommended sampling is shown in the endnotes .
Well, the answer to the rhetorical question is, all three!
The RDF data model provides an abstract, conceptual framework for defining and using metadata and metadata vocabularies. See: We were able to use all three concepts in a single sentence!
The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram. . . . In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables. 
But, actually, because RDF is simultaneously a framework, data model and basis for building more complex vocabularies, it is both simple and complex at the same time.
It is first perhaps best to understand basic RDF as a data model of triples with very few (or unconstrained) semantics . In its base form, it has no range or domain constraints; has no existence or cardinality constraints; and lacks transitive, inverse or symmetrical properties (or predicates) . As such, basic RDF has limited reasoning support. It is, however, quite useful in describing static things or basic facts.
In this regard, RDF in its base state is nearly adequate for describing the simple instances and data records of the world, what is called the ABox in description logics.
RDFS (RDF Schema) is the next layer in the RDF stack designed to overcome some of these baseline limitations. RDFS introduces new predicates and classes that bound these semantics. Importantly, RDFS establishes the basic constructs necessary to create new vocabularies, principally through adding the class and subClass declarations and adding domain and range to properties (the RDF term for predicates). Many useful vocabularies have been created with RDFS and it is possible to apply limited reasoning and inference support against them.
The next layer in the RDF stack is OWL, the Web Ontology Language. It, too, is based on RDF. The first versions of OWL were themselves layered from OWL Lite to OWL DL to OWL Full. OWL Lite and OWL DL are both decidable through the first-order logic basis of description logics (the basis for the acronym in OWL DL). OWL Full is not decidable, but provides an OWL counterpart to fragmented RDF and RDFS statements that are desirable in the aggregate, with reasoning applied where possible.
OWL provides sufficient expressive richness to be able to describe the relationships and structure of entire world views, or the so-called terminological (TBox) construct in description logics. Thus, we see that the complete structural spectrum of description logics can be satisfied with RDF and its schematic progeny, with a bit of an escape hatch for combining poorly defined or structural pieces via using OWL Full .
However, RDF is NOT a particular serialization. Though XML was the original specified serialization and still is the defined RDF MIME type (application/rdf+xml; other serializations take the form text/turtle or text/n3 or similar), it is not necessary to either write or transmit RDF in the XML syntax.
In any event, depending on its role and application, we can see that RDF is a foundation, in careful expressions based in description logics, that lends itself to a clean expression and separation of concerns. With RDF and RDFS, we have a data model and a basis for vocabularies well suited to instance data (ABox). With RDFS and OWL, we have an extended schema structure and ontologies suitable for describing and modeling the relationships in the world (TBox). Thus, RDF is a framework for modeling all forms of data, for describing that data through vocabularies, and for interoperating that data through shared conceptualizations (ontologies) and schema.
In the context of data interoperability, a critical premise is that a single, canonical data model is highly desirable. Why? Simply because of 2N v N2. That is, a single reference (“canon”) structure means that fewer tool variants and converters need be developed to talk to the myriad of data formats in the wild. With a canonical data model, talking to external sources and formats (N) only requires converters to the canonical form (2N). Without a canonical model, the combinatorial explosion of required format converters becomes N2 .
Note, in general, such a canonical data model merely represents the agreed-upon internal representation. It need not affect data transfer formats. Indeed, in many cases, data systems employ quite different internal data models from what is used for data exchange. Many, in fact, have two or three favored flavors of data exchange such as XML, JSON or the like.
In most enterprises and organizations, the relational data model with its supporting RDBMs is the canonical one. In some notable Web enterprises — say, Google for example — the exact details of its internal canonical data model is hidden from view, with APIs and data exchange standards such as GData being the only visible portions to outside consumers.
Generally speaking, a canonical, internal data standard should meet a few criteria:
Other desired characteristics might be for the model and many of its tools to be free and open source, suitable to much analytic work, efficient in storage, and other factors.
Though the relational data model is numerically the most prevalent one in use, it has fallen out of favor for data federation purposes. This loss of favor is due, in part, to the fragile nature of relational schema, which increases maintenance costs for the data and their applications, and incompatibilities in standards and implementation.
Though still comparatively young with a smaller-than-desirable suite of tools and applications support , RDF is perhaps the ideal candidate for the canonical data model. To understand why, let’s now switch our discussion to the advantages of RDF.
It is surprisingly difficult to find a consolidated listing of RDF’s advantages. The W3C, the developer of the specification, first published on this topic in the late 1990s, but it has not been updated for some time . Graham Klyne has a better and more comprehensive presentation, but still one that has not been updated since 2004 .
I believe data interoperability to be RDF’s premier advantage, but there are many, many others.
Another advantage that is less understood is that RDF and its progeny can completely switch the development paradigm: data can now drive the application, and not the other way around. Frankly, we are just at the beginning realizations of this phase with such developments as linked data and even whole applications or application languages being written in RDF , but I think time will prove this advantage to be game-changing.
But, there are many perspectives that can help tease out RDF’s advantages. Some of these are discussed below, with the accompanying table attempting to list these ‘Top Sixty’ advantages in a single location.
In its ten year history, RDF has spawned many related languages and standards. The W3C has been the shepherd for this process, and there are many entry locations on the World Wide Web Consortium’s Web site to begin exploring these options . These standards extend from the RDF, RDFS and OWL vocabularies and languages noted above that give RDF its range of expressiveness, to query languages (e.g., SPARQL), transformation languages (e.g., GRDDL), rule languages (e.g., RIF), and many additional constructs and standards.
The richness of this base of standards is only now being tapped. The combination of these standards and the tools they are spawning is just beginning. And, because it is so easily serialized as XML, a further suite of tools and capabilities such as XPath or XSLT or XForms may be layered onto this base.
Moreover, one is not limited in any way to XML as a serialization. RDF itself has been serialized in a number of formats including RDF/XML, N3, RDFa, Turtle, and N-triples. Also, RDF’s simple subject-predicate-object data model can readily convert human-readable and easily authored instance records (subject) written in the style of attribute-value pairs (predicate-object). As such, RDF is an excellent conversion target for all forms of naïve data structs .
Indeed, it is in data exchange and interoperability that RDF really shines. Via various processors or extractors, RDF can capture and convey the metadata or information in unstructured (say, text), semi-structured (say, HTML documents) or structured sources (say, standard databases). This makes RDF almost a “universal solvent” for representing data structure.
“The semantic Web’s real selling point is URI-based data integration.”
– Harry Halpin 
Because of this universality, there are now more than 100 off-the-shelf ‘RDFizers’ for converting various non-RDF notations (data formats and serializations) to RDF . Because of its diversity of serializations and simple data model, it is also easy to create new converters. Generalized conversion languages such as GRDDL provide framework-specific conversions, such as for microformats.
Once in a common RDF representation, it is easy to incorporate new datasets or new attributes. It is also easy to aggregate disparate data sources as if they came from a single source. This enables meaningful composition of data from different applications regardless of format or serialization.
Simple RDF structures and predicates enable synonyms or aliases to also be easily mapped to the same types or concepts. This kind of semantic matching is a key capability of the semantic Web. It becomes quite easy to say that your glad is my happy, and they indeed talk about the same thing.
What this mapping flexibility points to is the immense strengths of RDF in representing diverse schema, the next major advantage.
The single failure of data integration since the inception of information technologies — for more than 30 years, now — has been schema rigidity or schema fragility. That is, once data relationships are set, they remain so and can not easily be changed in conventional data management systems nor in the applications that use them.
Relational database management (RDBM) systems have not helped this challenge, at all. While tremendously useful for transactions and enabling the addition of more data records (instances, or rows in a relational table schema), they are not adaptive nor flexible.
Why is this so?
In part, it has to do with the structural “view” of the world. If everything is represented as a flat table of rows and columns, with keys to other flat structures, as soon as that representation changes, the tentacled connections can break. Such has been the fragility of the RDBMS model, and the hard-earned resistance of RDBMS administrators to schema growth or change.
Yet, change is inevitable. And thus, this is the source of frustration with virtually all extant data systems.
RDF has no such limitations. And, for those from a conventional data management perspective, this RDF flexibility can be one of the more unbelievable aspects of this data model.
As we have noted earlier, RDF is well suited and can provide a common framework to represent both instance data and the structures or schema that describe them, from basic data records to entire domains or world views. In fact, whatever schema or structure that characterizes the input data — from simple instance record layouts and attributes to complete vocabularies or ontologies — also embodies domain knowledge. This structure can be used at time of ingest as validity or consistency checks.
As a framework for data interoperability, RDF and its progeny can ingest all relations and terminology, with connections made via flexible predicates that assert the degree and nature of relatedness. There is no need for ingested records or data to be complete, nor to meet any prior agreement as to structure or schema.
Indeed, the very fluidity of RDF and structures based on it is another key strength. Since a basic RDF model can be processed even in the absence of more detailed information, input data and basic inferences can proceed early and logically as a simple fact basis. This strength means that either data or schema may be ingested and then extended in an incremental or partial manner. Partial representations can be incorporated as readily as complete ones, and schema can extend and evolve as new structure is discovered or encountered.
This is revolutionary. RDF provides a data and schema representation framework that can evolve and adapt to what data exists and what structure is known. As new data with new attributes are discovered, or as new relationships are found or realized, these can be added to the existing model without any change whatsoever to the prior existing schema.
This very adaptability is what enables RDF to be viewed as data-driven design. We can deal with a partial and incomplete world; we can learn as we go; we can start small and simple and evolve to more understanding and structure; and we can preserve all structure and investments we have previously made.
And applications based on RDF work the same way: they do not need to process or account for information they don’t know or understand. We can easily query RDF models without being affected whatsoever by unreferenced or untyped data in the basic model.
By replacing the rigid relational data model with one based on RDF, we gain robustness, flexibility, universality and structural persistence over fragility.
Existing technologies such as SQL and the relational model were devised without the specific requirements of disparate, uncontrolled, large-scale integration. Though the relational model enabled us to build efficient data silos and transaction systems, RDF now enables us to finally federate them.
|‘Top Sixty’ Benefits of RDF|
Despite these differences in fragility and robustness, there are in fact many logical and conceptual affinities between the relational model and the one for RDF. An excellent piece on those relations was written by Andrew Newman a bit over a year ago .
RDF can be modeled relationally as a single table with three columns corresponding to the subject-predicate-object triple. Conversely, a relational table can be modeled in RDF with the subject IRI derived from the primary key or a blank node; the predicate from the column identifier; and the object from the cell value. Because of these affinities, it is also possible to store RDF data models in existing relational databases. (In fact, most RDF “triple stores” are RDBM systems with a tweak, sometimes as “quad stores” where the fourth tuple is the graph.) Moreover, these affinities also mean that RDF stored in this manner can also take advantage of the historical learnings around RDBMS and SQL query optimizations.
Just as there are many RDFizers as noted above, there are also nice ways to convert relational schema to RDF automatically. OpenLink Software, for example, has its RDF “Views” system that does just that . Given these overall conceptual and logical affinities the W3C is also in the process of graduating an incubator group to an official work group, RDB2RDF , focused on methods and specifications for mapping relational schema to RDF.
What is emerging is one vision whereby existing RDBM systems retain and serve the instance records (ABox), while RDF and its progeny provide the flexible schema scaffolding and structure over them (TBox). Architectures such as this retain prior investments, but also provide a robust migration path for interoperating across disparate data silos in a performant way.
As developers, one of our favorite advantages of RDF is its ability to support data-driven applications. This makes even further sense when combined with a Web-oriented architecture that exposes all tools and data as RESTful Web services .
Two tool foundations are the RDF query language, SPARQL , and inferencing. SPARQL provides a generalized basis for driving reports and templated data displays, as well as standard querying. Utilizing RDF’s simple triple structure, SPARQL can also be used to query a dataset without knowing anything in advance about the data. This provides a very useful discovery mode.
Simple inferencing can be applied to broaden and contextualize search, retrieval and analysis. Inference tables can also be created in advance and layered over existing RDF datastores  for speedier use and the automatic invoking of inferencing. More complicated inferencing means that RDF models can also perform as complete conceptual views of the world, or knowledge bases. Quite complicated systems are emerging in such areas as common sense (with OpenCyc) and biological systems , as two examples.
RDF ontologies and controlled vocabularies also have some hidden power, not yet often seen in standard applications: by virtue of its structure and label properties, we can populate context-relevant dropdown lists and auto-complete entries in user interfaces solely from the input data and structure. This ability is completely generalizable solely on the basis of the input ontology(ies).
As the intro noted, when RDF triples get combined, a graph structure emerges. (Actually, it can most formally be described as a directed graph.) A graph structure has many advantages. While we are seeing much starting to emerge in the graph analysis of social networks, we could also fairly argue that we are still at the early stages of plumbing the unique features of graph (“network”) structures.
Graphs are modular and can be both readily combined and broken apart. From a computational standpoint, this can lend itself to parallelized information processing (and, therefore, scalability). With specific reference to RDF it also means that graph extractions are themselves valid RDF models.
Graph algorithms are a significant field of interest within mathematics, computer science and the social sciences. Via approaches such as network theory or scale-free networks, topics such as relatedness, centrality, importance, influence, “hubs” and “domains”, link analysis, spread, diffusion and other dynamics can be analyzed and modeled.
Graphs also have some unique aspects in search and pattern matching. Besides options like finding paths between two nodes, depth-first search, breadth-first search, or finding shortest paths, emerging graph and pattern-matching approaches may offer entirely new paradigms for search.
Graphs also provide new approaches for visualization and navigation, useful for both seeing relationships and framing information from the local to global contexts. The interconnectedness of the graph allows data to be explored via contextual facets, which is revolutionizing data understanding in a way similar to how the basic hyperlink between documents on the Web changed the contours of our information spaces .
Many would argue (as do I), that graphs are the most “natural” data structure for capturing the relationships of the real world. If so, we should continue to see new algorithms and approaches emerge based on graphs to help us better understand our information. And RDF is a natural data model for such purposes.
Ultimately, data interoperability implies a global context. The design of RDF began from this perspective with the semantic Web.
This perspective is firstly grounded in the open-world assumption: that is, the information at hand is understood to be incomplete and not self-contained. Missing values are to be expected and do not falsify what is there. A corollary assumption is there is always more information that can be added to the system, and the design should not only accommodate, but promote, that fact.
As the lingua franca for the semantic Web, using RDF means that many new data, structures and vocabularies now become available to you. So, not only can RDF work to interoperate your own data, but it can link in useful, external data and schema as well.
Indeed, the concept of linked data now becomes prominent whereby RDF data with unique IRIs as their universal identifiers are exposed explicitly to aid discovery and interlinking. Whether internal data is exposed in the linked data manner or not, this external data can now be readily incorporated into local contexts. The Linking Open Data movement that is promoting this pattern has become highly successful, with billions of useful RDF statements now available for use and consumption .
The semantic Web and RDF is enabling the data federation scope to extend beyond organizational boundaries to embrace (soon) virtually all public information. That means that, say, local customer records can now be supplemented with external information about specific customers or products. We are really just at the nascent stages of such data “mesh-ups” with many unforeseen benefits (and, challenges, too, such as privacy and identity and ownership) likely to emerge.
At Web scales, we will see network effects also emerge in areas such as shared vocabularies, shared background knowledge, and collective authoring, annotating and curating. To be sure the traditional work of trade associations and standards bodies will continue, but likely now in much more operable ways.
Throughout the years, a number of myths have grown up around RDF. Some, unfortunately, were based on the legacy of how RDF was first introduced and described. Other myths arise from incomplete understanding of RDF’s multiple roles as a framework, data model, and basis for vocabularies and conceptual descriptions of the world.
The accompanying table lists the “Top Ten” of myths I have found to date. I welcome other pet submissions. Perhaps soon we can get to the point of a clearer understanding of RDF.
|‘Top Ten’ Myths of RDF|
Emergence is the way complex systems arise out of a multiple of relatively simple interactions, exhibiting new and unforeseen properties in the process. RDF is an emergent model. It begins as simple “fact” statements of triples, that may then be combined and expanded into ever-more complex structures and stories.
As an internal, canonical data model, RDF has advantages over any other approach. We can represent, describe, combine, extend and adapt data and their organizational schema flexibly and at will. We can explore and analyze in ways not easily available with other models.
And, importantly, we can do all of this without the need to change what already exists. We can augment our existing relational data stores, and transfer and represent our current information as we always have.
We can truly call RDF a disruptive data model or framework. But, it does so without disrupting what exists in the slightest. And that is a most remarkable achievement.
The series is highly readable and a real keeper:
The printable versions are the easiest to read since you don’t have to get stuck in JavaWorld’s annoying document-splitting style.
Brian prefers to use Sam Ruby‘s term of resource-oriented architecture (ROA), though I have preferred to use Nick Gall‘s Web-oriented architecture (WOA) nomenclature. In any case, the series is highly informative and clearly written and is sufficiently general in Parts 1 and 4 and most of Part 3 to be of use to non-Java developers.
What is quite interesting is that Brian also makes the connection between a REST Web service style and linked data in Part 4, as I suggested a few months back. (In fact, for those familiar with REST, I recommend you start with Part 4.) Again, I think RESTful Web services in combination with RDF and linked data points to the winning and performant architecture of the foreseeable future.
Thanks for a great series, Brian!
I recently wrote about WOA (Web-oriented architecture), a term coined by Nick Gall, and how it represented a natural marriage between RESTful Web services and RESTful linked data. There was, of course, a method behind that posting to foreshadow some pending announcements from UMBEL and Zitgist.
Well, those announcements are now at hand, and it is time to disclose some of the method behind our madness.
As Fred Giasson notes in his announcement posting, UMBEL has just released some new Web services with fully RESTful endpoints. We have been working on the design and architecture behind this for some time and, all I can say is, it’s UMBELievable!
As Fred notes, there is further background information on the UMBEL project — which is a lightweight reference structure based on about 20,000 subject concepts and their relationships for placing Web content and data in context with other data — and the API philosophy underlying these new Web services. For that background, please check out those references; that is not my main point here.
We discussed much in coming up with the new design for these UMBEL Web services. Most prominent was taking seriously a RESTful design and grounding all of our decisions in the HTTP 1.1 protocol. Given the shared approaches between RESTful services and linked data, this correspondence felt natural.
What was perhaps most surprising, though, was how complete and well suited HTTP was as a design and architectural basis for these services. Sure, we understood the distinctions of GET and POST and persistent URIs and the need to maintain stateless sessions with idempotent design, but what we did not fully appreciate was how content and serialization negotiation and error and status messages also were natural results of paying close attention to HTTP. For example, here is what the UMBEL Web services design now embraces:
There are likely other services out there that embrace this full extent of RESTful design (though we are not aware of them). What we are finding most exciting, though, is the ease with which we can extend our design into new services and to mesh up data with other existing ones. This idea of scalability and distributed interoperability is truly, truly powerful.
It is almost like, sure, we knew the words and the principles behind REST and a Web-oriented architecture, but had really not fully taken them to heart. As our mindset now embraces these ideas, we feel like we have now looked clearly into the crystal ball of data and applications. We very much like what we see. WOA is most cool.
For lack of a better phrase, Zitgist has a component internal plan that it calls its ‘Grand Vision’ for moving forward. Though something of a living document, this reference describes how Zitgist is going about its business and development. It does not describe our markets or products (of course, other internal documents do that), but our internal development approaches and architectural principles.
Just as we have seen a natural marriage between RESTful Web services and RESTful linked data, there are other natural fits and synergies. Some involve component design and architecting for pipeline models. Some involve the natural fit of domain-specific languages (DSLs) to common terminology and design, too. Still others involve use of such constructs in both GUIs and command-line interfaces (CLIs), again all built from common language and terminology that non-programmers and subject matter experts alike can readily embrace. Finally, some is a preference for Python to wrap legacy apps and to provide a productive scripting environment for DSLs.
If one can step back a bit and realize there are some common threads to the principles behind RESTful Web services and linked data, that very same mindset can be applied to many other architectural and design issues. For us, at Zitgist, these realizations have been like turning on a very bright light. We can see clearly now, and it is pretty UMBELievable. These are indeed exciting times.
BTW, I would like to thank Eric Hoffer for the very clever play on words with the UMBELievable tag line. Thanks, Eric, you rock!
In the longer version, Nick describes WOA as based on the architecture of the Web that he further characterizes as “globally linked, decentralized, and [with] uniform intermediary processing of application state via self-describing messages.”
WOA is a subset of the service-oriented architectural style. He describes SOA as comprising discrete functions that are packaged into modular and shareable elements (“services”) that are made available in a distributed and loosely coupled manner.
Representational state transfer (REST) is an architectural style for distributed hypermedia systems such as the World Wide Web. It was named and defined in Roy Fielding‘s 2000 doctoral thesis; Roy is also one of the principal authors of the Hypertext Transfer Protocol (HTTP) specification.
REST provides principles for how resources are defined and used and addressed with simple interfaces without additional messaging layers such as SOAP or RPC. The principles are couched within the framework of a generalized architectural style and are not limited to the Web, though they are a foundation to it.
REST and WOA stand in contrast to earlier Web service styles that are often known by the WS* acronym (such as WSDL, etc.). (Much has been written on RESTful Web services v. “big” WS*-based ones; one of my own postings goes back to an interview with Tim Bray back in November 2006.)
Shortly after Nick coined the WOA acronym, REST luminaries such as Sam Ruby gave the meme some airplay . From an enterprise and client perspective, Dion Hinchliffe in particular has expanded and written extensively on WOA. Besides his own blog, Dion has also discussed WOA several times on his Enterprise Web 2.0 blog for ZDNet.
Largely due to these efforts (and — some would claim — the difficulties associated with earlier WS* Web services) enterprises are paying much greater heed to WOA. It is increasingly being blogged about and highlighted at enterprise conferences .
While exciting, that is not what is most important in my view. What is important is that the natural connection between WOA and linked data is now beginning to be made.
Linked data is a set of best practices for publishing and deploying data on the Web using the RDF data model. The data objects are named using Web uniform resource identifiers (URIs), emphasize data interconnections, and adhere to REST principles.
Most recently, Nick began picking up the theme of linked data on his new Gartner blog. Enterprises now appreciate the value of an emerging service aspect based on HTTP and accessible by URIs. The idea is jelling that enterprises can now process linked data architected in the same manner.
I think the similar perspectives between REST Web services and linked data become a very natural and easily digested concept for enterprise IT architects. This is a receptive audience because it is these same individuals who have experienced first-hand the challenges and failures of past hype and complexity from non-RESTful designs.
It helps immensely, of course, that we can now look at the major Web players such as Google and Amazon and others — not to mention the overall success of the Web itself — to validate the architecture and associated protocols for the Web. The Web is now understood as the largest Machine designed by humans and one that has been operational every second of its existence.
Many of the same internal enterprise arguments that are being made in support of WOA as a service architecture can be applied to linked data as a data framework. For example, look at Dion’s 12 Things You Should Know About REST and WOA and see how most of the points can be readily adopted to linked data.
So, enterprise thought leaders are moving closer to what we now see as the reality and scalability of the Web done right. They are getting close, but there is still one piece missing.
I admit that I have sometimes tended to think of enterprise systems as distinct from the public Web. And, for sure, there are real and important distinctions. But from an architecture and design perspective, enterprises have much to learn from the Web’s success.
With the Web we see the advantages of a simple design, of universal identifiers, of idempotent operations, of simple messaging, of distributed and modular services, of simple interfaces, and, frankly, of openness and decentralization. The core foundations of HTTP and adherence to REST principles have led to a system of such scale and innovation and (growing) ubiquity as to defy belief.
So, the first observation is that the Web will be the central computing touchstone and framework for all computing systems for the foreseeable future. There simply is no question that interoperating with the Web is now an enterprise imperative. This truth has been evident for some time.
But the reciprocal truth is that these realities are themselves a direct outcome of the Web’s architecture and basic protocol, HTTP. The false dichotomy of enterprise systems as being distinct from the Web arises from seeing the Web solely as a phenomenon and not as one whose basic success should be giving us lessons in architecture and design.
Thus, we first saw the emergence of Web services as an important enteprise thrust — we wanted to be on the Web. But that was not initially undertaken consistent with Web design — which is REST or WOA — but rather as another “layer” in the historical way of doing enterprise IT. We were not of the Web. As the error of that approach became evident, we began to see the trend toward “true” Web services that are now consonant with the architecture and design of the actual Web.
So, why should these same lessons and principles not apply as well to data? And, of course, they do.
If there is one area that enterprises have been abject failures in for more than 30 years it is data interoperability. ETL and enterprise busses and all sorts of complex data warehousing and EAI and EIA mumbo jumbo have kept many vendors fat and happy, but few enterprise customers so. On almost every single dimension, these failed systems have violated the basic principles now in force on the Web based on simplicity, uniform interfaces, etc.
OK, so how many of you have read the HTTP specifications ? How many understand them? What do you think the fundamental operational and architectural and design basis of the Web is?
HTTP is often described as a communications protocol, but it really is much more. It represents the operating system of the Web as well as the embodiment of a design philosophy and architecture. Within its specification lies the secret of the Web’s success. REST and WOA quite possibly require nothing more to understand than the HTTP specification.
Of course, the HTTP specification is not the end of the story, just the essential beginning for adaptive design. Other specifications and systems layer upon this foundation. But, the key point is that if you can be cool with HTTP, you are doing it right to be a Web actor. And being a cool Web actor means you will meet many other cool actors and be around for a long, long time to come.
An understanding of HTTP can provide similar insights with respect to data and data interoperability. Indeed, the fancy name of linked data is nothing more than data on the Web done right — that is, according to the HTTP specifications.
Just as packets need their routers to get to their proper location based on resolving the names of a URI to a physical device, data or information on the Web needs similar context. And, one mechanism by which such context can be provided is through some form of logical referencing framework by which information can be routed to its right “neighborhood”.
I am not speaking of routing to physical locations now, but the routing to the logical locations about what information “is about” and what it “means”. On the simple level of language, a dictionary provides such a function by giving us the definition of what a word “means”. Similar coherent and contextual frameworks can be designed for any information requirement and scope.
Of course, enterprises have been doing similar things internally for years by adopting common vocabularies and the like. Relational data schema are one such framework even if they are not always codified or understood by their enterprises as such.
Over the past decade or two we have seen trade and industry associations and standards bodies, among others, extend these ideas of common vocabularies and information structures such as taxonomies and metadata to work across enterprises. This investment is meaningful and can be quite easily leveraged.
As Nick notes, efforts such as what surrounds XBRL are one vocabulary that can help provide this “routing” in the context of financial data and reporting. So, too, can UMBEL as a general reference framework of 20,000 subject concepts. Indeed, our unveiling of the recent LOD constellation points to a growing set of vocabularies and classes available for such contexts. Literally thousands and thousands of such existing structures can be converted to Web-compliant linked data to provide the information routing hubs necessary for global interoperability.
And, so now we come down to that missing piece. Once we add context as the third leg to this framework stool to provide semantic grounding, I think we are now seeing the full formula powerfully emerge for the semantic Web:
SW = WOA + linked data + coherent context
This simple formula becomes a very powerful combination.
Just as older legacy systems can be exposed as Web services, and older Web services can be turned into WOA ones compliant with the Web’s architecture, we can transition our data in similar ways.
The Web has been pointing us to adaptive design for both services and data since its inception. It is time to finally pay attention.