The httpRange-14 issue and its predecessor “identity crisis” debate have been active for more than a decade on the Web [1]. It has been around so long that most acknowledge “fatigue” and it has acquired that rarified status as a permathread. Many want to throw up their hands when they hear of it again and some feel — because of its duration and lack of resolution — that there never will be closure on the question. Yet everyone continues to argue and then everyone wonders why actual consumption of linked data remains so problematic.
Jonathan Rees is to be thanked for refusing to let this sleeping dog lie. This issue is not going to go away so long as its basis and existing prescriptions are, in essence, incoherent. As a member of the W3C’s TAG (Technical Architecture Group), Rees has worked diligently to re-surface and re-frame the discussion. While I don’t agree with some of the specifics and especially with the constrained approach proposed for resolving this question [2], the sleeping dog has indeed been poked and is awake. For that we can thank Jonathan. Maybe now we can get it right and move on.
I don’t agree with how this issue has been re-framed and I don’t agree that responses to it must be constrained to the prescriptive approach specified in the TAG’s call for comments. Yet, that being said, as someone who has been vocal for years about the poor semantics of the semantic Web community, I feel I have an obligation to comment on this official call.
Thus, I am casting my vote behind David Booth’s alternative proposal [3], with one major caveat. I first explain the caveat and then my reasons for supporting Booth’s proposal. I have chosen not to submit a separate alternative in order to not add further to the noise, as Bernard Vatant (and, I’m sure, many, many others) has chosen [4].
I first commented on the absurdity of the ‘information resource’ terminology about five years ago [5]. Going back to Claude Shannon [6] we have come to understand information as entropy (or, more precisely, as differences in energy state). One need not get that theoretical to see that this terminology is confusing. “Information resource” is a term that defies understanding (meaning) or precision. It is also a distinction that leads to a natural counter-distinction, the “non-information resource”, which is also an imprecise absurdity.
What the confusing term is meant to encompass is web-accessible content (“documents”), as opposed to descriptions of (or statements about) things. This distinction then triggers a different understanding of a URI (locator v identifier alone) and different treatments of how to process and interpret that URI. But the term is so vague and easily misinterpreted that all of the guidance behind the machinery to be followed gets muddied, too. Even in the current chapter of the debate, key interlocutors confuse and disagree as to whether a book is an “information resource” or not. If we can’t basically separate the black balls from the white balls, how are we to know what to do with them?
If there must be a distinction, it should be based on the idea of the actual content of a thing — or perhaps more precisely web-accessible content or web-retrievable content — as opposed to the description of a thing. If there is a need to name this class of content things (a position that David Booth prefers, pers. comm.), then let’s use one of these more relevant terms and drop “information resource” (and its associated IR and NIR acronyms) entirely.
The motivation behind the “information resource” terminology also appears to be a desire that somehow a URI alone can convey the name of what a thing is or what it means. I recently tried to blow this notion to smithereens by using Peirce’s discussion of signs [1]. We should understand that naming and meaning may only be provided by the owner of a URI through additional explication, and then through what is understood by the recipient; the string of the URI itself conveys very little (or no) meaning in any semantic sense.
We should ban the notion of “information resource” forever. If the first exposure a potential new publisher or consumer of linked data encounters is “information resource”, we have immediately lost the game. Unresolvable abstractions lead to incomprehension and confusion.
The approach taken by the TAG in requesting new comments on httpRange-14 only compounds this problem. First, the guidance is to not allow any questioning of the “information resource” terminology within the prescribed comment framework [7]. Then, in the suggested framework for response, still further terminology such as “probe URIs”, “URI documentation carrier” or “nominal URI documentation carrier for a URI” is introduced. Aaaaarggghh! This only furthers the labored and artificial terminology common to this particular standards effort.
While Booth’s proposal does not call for an outright rejection of the “information resource” terminology (my one major qualification in supporting it), I like it because it purposefully sidesteps the question of the need to define “information resource” (see his Section 2.7). Booth’s proposal is also explicit in its rejection of implied meaning in URIs and through embrace of the idea of a protocol. Remember, all that is being put forward in any of these proposals is a mechanism for distinguishing between retrievable content obtainable at a given URL and a description of something found at a URI. By racheting down the implied intent, Booth’s proposal is more consistent with the purpose of the guidance and is not guilty of overreach.
One of the real strengths of Booth’s proposal is its rejection of the prescriptive method proposed by the TAG for suggesting an alternative to httpRange-14 [7]. The parsimonious objective should be to be simple, be clear, and be somewhat relaxed in terms of mechanisms and prescriptions. I believe use patterns — negotiated via adoption between publishers and consumers — will tell us over time what the “right” solutions may be.
Amongst the proposals put forward so far, David Booth’s is the most “neutral” with respect to imposed meanings or mechanisms, and is the simplest. Though I quibble in some respects, I offer qualified support for his alternative because it:
I would wholeheartedly support this approach were two things to be added: 1) the complete abandonment of all “information resource” terminology; and 2) an official demotion of the httpRange-14 rule (replacing it with a slash 303 option on equal footing to other options), including a disavowal of the “information resource” terminology. I suspect if the TAG adopts this option, that subsequent scrutiny and input might address these issues and improve its clarity even further.
There are other alternatives submitted, prominently the one by Jeni Tennison with many co-signatories [8]. This one, too, embraces multiple options and cow paths. However, it has the disadvantage of embedding itself into the same flawed terminology and structure as offered by httpRange-14.
Phenomenal Growth in Less than Two YearsToday, for the first time, we passed 400 articles published on the open semantic framework (OSF) TechWiki. The TechWiki content is a baseline “starter kit” of documentation related to these OSF projects and their contexts:
The TechWiki covers all aspects of this open source OSF software stack. Besides the specific components developed and maintained by Structured Dynamics as listed above, the OSF stack combines many leading third-party software packages — such as Drupal for content management, Virtuoso for (RDF) triple storage, Solr for full-text indexing, GATE for natural language processing, the OWL API for ontology management, and others.
The TechWiki is the one-stop resource for how to install, configure, use and maintain these components. The best entry point to the OSF content on the TechWiki is represented by this entry page covering overall workflows in use of the system:
Since our first release of the TechWiki in July 2010, we have been publishing and releasing content steadily. We post a new article about every 1.5 calendar days, or about one per working day. This content is well-organized into (at present) 72 categories and is supported by nearly 500 figures and diagrams. Users are free to download and use this content at will, solely by providing attribution. The content has proven to be a goldmine for local use and modification by our clients, and for training and curriculum development.
The TechWiki represents a part of our commitment that we are successful when our customers no longer need us. As one of our most popular Web sites with fantastic and growing user stats, we invite you to visit and see what it means to provide open source semantic technologies as a total open solution.