Posted:March 27, 2012

W3C Logo from http://www.w3.org/Icons/w3c_homeCasting My Vote on Revising httpRange-14

The httpRange-14 issue and its predecessor “identity crisis” debate have been active for more than a decade on the Web [1]. It has been around so long that most acknowledge “fatigue” and it has acquired that rarified status as a permathread. Many want to throw up their hands when they hear of it again and some feel — because of its duration and lack of resolution — that there never will be closure on the question. Yet everyone continues to argue and then everyone wonders why actual consumption of linked data remains so problematic.

Jonathan Rees is to be thanked for refusing to let this sleeping dog lie. This issue is not going to go away so long as its basis and existing prescriptions are, in essence, incoherent. As a member of the W3C’s TAG (Technical Architecture Group), Rees has worked diligently to re-surface and re-frame the discussion. While I don’t agree with some of the specifics and especially with the constrained approach proposed for resolving this question [2], the sleeping dog has indeed been poked and is awake. For that we can thank Jonathan. Maybe now we can get it right and move on.

I don’t agree with how this issue has been re-framed and I don’t agree that responses to it must be constrained to the prescriptive approach specified in the TAG’s call for comments. Yet, that being said, as someone who has been vocal for years about the poor semantics of the semantic Web community, I feel I have an obligation to comment on this official call.

Thus, I am casting my vote behind David Booth’s alternative proposal [3], with one major caveat. I first explain the caveat and then my reasons for supporting Booth’s proposal. I have chosen not to submit a separate alternative in order to not add further to the noise, as Bernard Vatant (and, I’m sure, many, many others) has chosen [4].

Bury the Notion of ‘Information Resource’ Once and for All

I first commented on the absurdity of the ‘information resource’ terminology about five years ago [5]. Going back to Claude Shannon [6] we have come to understand information as entropy (or, more precisely, as differences in energy state). One need not get that theoretical to see that this terminology is confusing. “Information resource” is a term that defies understanding (meaning) or precision. It is also a distinction that leads to a natural counter-distinction, the “non-information resource”, which is also an imprecise absurdity.

What the confusing term is meant to encompass is web-accessible content (“documents”), as opposed to descriptions of (or statements about) things. This distinction then triggers a different understanding of a URI (locator v identifier alone) and different treatments of how to process and interpret that URI. But the term is so vague and easily misinterpreted that all of the guidance behind the machinery to be followed gets muddied, too. Even in the current chapter of the debate, key interlocutors confuse and disagree as to whether a book is an “information resource” or not. If we can’t basically separate the black balls from the white balls, how are we to know what to do with them?

If there must be a distinction, it should be based on the idea of the actual content of a thing — or perhaps more precisely web-accessible content or web-retrievable content — as opposed to the description of a thing. If there is a need to name this class of content things (a position that David Booth prefers, pers. comm.), then let’s use one of these more relevant terms and drop “information resource” (and its associated IR and NIR acronyms) entirely.

The motivation behind the “information resource” terminology also appears to be a desire that somehow a URI alone can convey the name of what a thing is or what it means. I recently tried to blow this notion to smithereens by using Peirce’s discussion of signs [1]. We should understand that naming and meaning may only be provided by the owner of a URI through additional explication, and then through what is understood by the recipient; the string of the URI itself conveys very little (or no) meaning in any semantic sense.

We should ban the notion of “information resource” forever. If the first exposure a potential new publisher or consumer of linked data encounters is “information resource”, we have immediately lost the game. Unresolvable abstractions lead to incomprehension and confusion.

The approach taken by the TAG in requesting new comments on httpRange-14 only compounds this problem. First, the guidance is to not allow any questioning of the “information resource” terminology within the prescribed comment framework [7]. Then, in the suggested framework for response, still further terminology such as “probe URIs”, “URI documentation carrier” or “nominal URI documentation carrier for a URI” is introduced. Aaaaarggghh! This only furthers the labored and artificial terminology common to this particular standards effort.

While Booth’s proposal does not call for an outright rejection of the “information resource” terminology (my one major qualification in supporting it), I like it because it purposefully sidesteps the question of the need to define “information resource” (see his Section 2.7). Booth’s proposal is also explicit in its rejection of implied meaning in URIs and through embrace of the idea of a protocol. Remember, all that is being put forward in any of these proposals is a mechanism for distinguishing between retrievable content obtainable at a given URL and a description of something found at a URI. By racheting down the implied intent, Booth’s proposal is more consistent with the purpose of the guidance and is not guilty of overreach.

Keep It Simple

One of the real strengths of Booth’s proposal is its rejection of the prescriptive method proposed by the TAG for suggesting an alternative to httpRange-14 [7]. The parsimonious objective should be to be simple, be clear, and be somewhat relaxed in terms of mechanisms and prescriptions. I believe use patterns — negotiated via adoption between publishers and consumers — will tell us over time what the “right” solutions may be.

Amongst the proposals put forward so far, David Booth’s is the most “neutral” with respect to imposed meanings or mechanisms, and is the simplest. Though I quibble in some respects, I offer qualified support for his alternative because it:

  • Sidesteps the “information resource” definition (though weaker than I would want; see above)
  • Addresses only the specific HTTP and HTTPS cases
  • Avoids the constrained response format suggested by the TAG
  • Explicitly rejects assigning innate meanings to URIs
  • Poses the solution as a protocol (an understanding between publisher and consumer) rather than defining or establishing a meaning via naming
  • Provides multiple “cow paths” by which resource definitions can be conveyed, which gives publishers and consumers choice and offers the best chance for more well-trodden paths to emerge
  • Does not call for an outright repeal of the httpRange-14 rule, but retains it as one of multiple options for URI owners to describe resources
  • Permits the use of an HTTP 200 response with RDF content as a means of conveying a URI definition
  • Retains the use of the hash URI as an option
  • Provides alternatives to those who can not easily (or at all) use the 303 see also redirect mechanism, and
  • Simplifies the language and the presentation.

I would wholeheartedly support this approach were two things to be added: 1) the complete abandonment of all “information resource” terminology; and 2) an official demotion of the httpRange-14 rule (replacing it with a slash 303 option on equal footing to other options), including a disavowal of the “information resource” terminology. I suspect if the TAG adopts this option, that subsequent scrutiny and input might address these issues and improve its clarity even further.

There are other alternatives submitted, prominently the one by Jeni Tennison with many co-signatories [8]. This one, too, embraces multiple options and cow paths. However, it has the disadvantage of embedding itself into the same flawed terminology and structure as offered by httpRange-14.


[1] For my recent discussion about the history of these issues, see M.K. Bergman, 2012. “Give Me a Sign: What Do Things Mean on the Semantic Web?,” in AI3:::Adaptive Information blog, January 24, 2012; see https://www.mkbergman.com/994/give-me-a-sign-what-do-things-mean-on-the-semantic-web/.
[2] In all fairness, this call was the result of ISSUE-57, which had its own constraints. Not knowing all of the background that led to the httpRange-14 Pandora’s Box being opened again, the benefit of the doubt would be that the form and approach prescribed by the TAG dictated the current approach. In any event, now that the Box is open, all pertinent issues should be addressed and the form of the final resolution should also not be constrained from what makes best sense and is most pragmatic.
[3] David Booth‘s alternative proposal is for the “URI Definition and Discovery Protocol” (uddp). The actual submission according to form is found here.
[4] See Bernard Vatant, 2012. “Beyond httpRange-14 Addiction,” the wheel and the hub blog, March 27, 2012. See http://blog.hubjects.com/2012/03/beyond-httprange-14-addiction.html.
[5] M.K. Bergman, 2007. “More Structure, More Terminology and (hopefully) More Clarity,” in AI3:::Adaptive Information blog, July 27, 2007; see https://www.mkbergman.com/391/more-structure-more-terminology-and-hopefully-more-clarity/. Subsequent to that piece, I have written further on semantic Web semantics in “The Semantic Web and Industry Standards” (January 26, 2008), ” “The Shaky Semantics of the Semantic Web” (March 12, 2008), “Semantic Web Semantics: Arcane, but Important,” (April 8, 2008), “When Linked Data Rules Fail” (November 16, 2009), “The Semantic ‘Gap’” (October 24, 2010) and [1].
[6] Claude E. Shannon, 1948. “A Mathematical Theory of Communication,” Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, 1948.

[7] In the “Call for proposals to amend the “httpRange-14 resolution” (February 29, 2012), Jonathan Rees (presumably on behalf of the TAG), stated this as one of the rules of engagement: “9. Kindly avoid arguing in the change proposals over the terminology that is used in the baseline document. Please use the terminology that it uses. If necessary discuss terminology questions on the list as document issues independent of the 303 question.” The specific template formfor alternative proposals was also prescribed. In response to interactions on this question on the mailing list, Jonathan stated:

If it were up to me I’d purge “information resource” from the document, since I don’t want to argue about what it means, and strengthen the (a) clause to be about content or instantiation or something. But the document had to reflect the status quo, not things as I would have liked them to be.
I have not submitted this as a change proposal because it doesn’t address ISSUE-57, but it is impossible to address ISSUE-57 with a 200-related change unless this issue is addressed, as you say, head on. This is what I’ve written in my TAG F2F preparation materials.
[8] Jeni Tennison, 2012. “httpRange-14 Change Proposal,” submitted March 25, 2012. See the mailing list notice and actual proposal.

Posted by AI3's author, Mike Bergman Posted on March 27, 2012 at 5:45 pm in Linked Data, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/1002/tortured-terminology-and-problematic-prescriptions/
The URI to trackback this post is: https://www.mkbergman.com/1002/tortured-terminology-and-problematic-prescriptions/trackback/
Posted:March 14, 2012

Open Semantic FrameworkPhenomenal Growth in Less than Two Years

Today, for the first time, we passed 400 articles published on the open semantic framework (OSF) TechWiki. The TechWiki content is a baseline “starter kit” of documentation related to these OSF  projects and their contexts:

  • conStruct – connecting modules to enable structWSF and sComponents to be hosted/embedded in Drupal
  • structWSF – platform-independent suite of more than 20 RESTful Web services, organized for managing structured data datasets
  • Semantic Components – JavaScript or Flex semantic components (widgets) for visualizing and manipulating structured data
  • irON – instance record Object Notation for conveying XML, JSON or spreadsheets (CSV) in RDF-ready form, and
  • Various parsers and standard data exchange formats and schema to facilitate information flow amongst these options.

The TechWiki covers all aspects of this open source OSF software stack. Besides the specific components developed and maintained by Structured Dynamics as listed above, the OSF stack combines many leading third-party software packages — such as Drupal for content management, Virtuoso for (RDF) triple storage, Solr for full-text indexing, GATE for natural language processing, the OWL API for ontology management, and others.

The TechWiki is the one-stop resource for how to install, configure, use and maintain these components. The best entry point to the OSF content on the TechWiki is represented by this entry page covering overall workflows in use of the system:

OSF Work FlowsSince our first release of the TechWiki in July 2010, we have been publishing and releasing content steadily. We post a new article about every 1.5 calendar days, or about one per working day. This content is well-organized into (at present) 72 categories and is supported by nearly 500 figures and diagrams. Users are free to download and use this content at will, solely by providing attribution. The content has proven to be a goldmine for local use and modification by our clients, and for training and curriculum development.

The TechWiki represents a part of our commitment that we are successful when our customers no longer need us. As one of our most popular Web sites with fantastic and growing user stats, we invite you to visit and see what it means to provide open source semantic technologies as a total open solution.

Posted by AI3's author, Mike Bergman Posted on March 14, 2012 at 6:05 pm in Open Semantic Framework, Structured Dynamics | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/1000/techwiki-gets-400th-document/
The URI to trackback this post is: https://www.mkbergman.com/1000/techwiki-gets-400th-document/trackback/
Posted:February 27, 2012

Open Semantic FrameworkOntology-driven Application Meshes Structured Data with Public APIs

Locational information — points of interest/POIs, paths/routes/polylines, or polygons/regions — is common to many physical things in our real world. Because of its pervasiveness, it is important to have flexible and powerful display widgets that can respond to geo-locational data. We have been working for some time to extend our family of semantic components [1] within the open semantic framework (OSF) [2] to encompass just such capabilities. Structured Dynamics is thus pleased to announce that we have now added the sWebMap component, which marries the entire suite of Google Map API capabilities to the structured data management arising from the structWSF Web services framework [3] at the core of OSF.

The sWebMap component is fully in keeping with our design premise of ontology-driven applications, or ODapps [4]. The sWebMap component can itself be embedded in flexible layouts — using Drupal in our examples below — and can be very flexibly themed and configured. sWebMap we believe will rapidly move to the head of the class as the newest member of Structured Dynamics’ open source semantic components.

The absolutely cool thing about sWebMap is it just works. All one needs to do is relate it to a geo-enabled Search structWSF endpoint, and then all of the structured data with geo-locational attributes and its facets and structure becomes automagically available to the mapping widget. From there you can flexible map, display, configure, filter, select and keep those selections persistent and share with others. As new structured data is added to your system, that data too becomes automatically available.

Key Further Links

Though screen shots in the operation of this component are provided below, here are some further links to learn more:

sWebMap Overview

There is considerable functionality in the sWebMap widget, not all immediately obvious when you first view it.

NOTE: a wide variety of configuration options — icons and colors — matched with the specific data and base tiling maps appropriate to a given installation may produce maps of significantly different aspect from the screenshots presented below. Click on any screenshot to get a full-size view.

Here is an example for sWebMap when it first comes up, using an example for the “Beaumont neighborhood”:

It is possible to set pre-selected items for any map display. That was done in this case, which shows the pre-selected items and region highlighted on the map and in the records listing (lower left below map).

The basic layout of the map has its main search options at the top, followed by the map itself and then two panels underneath:

The left-hand panel underneath the map presents the results listing. The right-hand panel presents the various filter options by which these results are generated. The filter options consist of:

  • Sources – the datasets available to the instance
  • Kinds – the kinds or types of data (owl:Classes or rdf:types) contained within those datasets, and
  • Attributes – the specific attributes and their values for those kinds or sources.

As selections are made in sources or kinds, the subsequent choices narrow.

The layout below shows the key controls available on the sWebMap:

You can go directly to an affiliated page by clicking the upper right icon. This area often shows a help button or other guide. The search box below that enables you to search for any available data in the system. If there is information that can be mapped AND which occurs within the viewport of the current map size, those results will appear as one of three geographic feature types on the map:

  • Markers, which can be configured with differing icons for specific types or kinds of data
  • Polylines, such as highways or bus routes, or
  • Polygons, which enclose specific regions on the map through a series of drawn points in a closed area.

At the map’s right is the standard map control that allows you to scroll the map area or zoom. Like regular Google maps, you can zoom (+ or – keys, or middle wheel on mouse) or navigate (arrow direction keys, or left mouse down and move) the map.

Current records are shown below the map. Specific records may be selected with its checkbox; this keeps them persistent on the map and in the record listing no matter what the active filter conditions may be. (You may also see a little drawing icon [Update record], which presents an attribute report — similar to a Wikipedia ‘infobox‘ — for the current record). You can see in this case that the selected record also corresponds to a region (polygon) shape on the map.

sWebMap Views, Layers and Layouts

In the map area itself, it is possible to also get different map views by selecting one of the upper right choices. In this case, we can see a satellite view (or “layer”):

Or, we can choose to see a terrain layer:

Or there may optionally be other layers or views available in this same section.

Another option that appears on the map is the ability to get a street view of the map. That is done by grabbing the person icon at the map left and dragging it to where you are interested within the map viewport. That also causes the street portion to be highlighted, with street view photos displayed (if they exist for that location):

By clicking the person icon again, you then shift into walking view:

Via the mouse, you can now navigate up and down these streets and change perspective to get a visual feel for the area.

Multi-map View

Another option you may invoke is the multi-map view of the sWebMap. In this case, the map viewing area expands to include three sub-maps under the main map area. Each sub-map is color-coded and shown as a rectangle on the main map. (This particular example is displaying assessment parcels for the sample instance.) These rectangles can be moved on the main map, in which case their sub-map displays also move:

You must re-size using the sub-map (which then causes the rectangle size to change on the main map). You may also pan the sub-maps (which then causes the rectangle to move on the main map). The results list at the lower left is determined by which of the three sub-maps is selected (as indicated by the heavier bottom border).

Searching and Filter Selections

There are two ways to get filter selection details for your current map: Show All Records or Search.

NOTE: for all data and attributes as described below, only what is visible on the current map view is shown under counts or records. Counts and records change as you move the map around.

In the first case, we pick the Show All Records option at the bottom of the map view, which then brings up the detailed filter selections in the lower-right panel:

Here are some tips for using the left-hand records listing:

  • If there are more than 10 records, pagination appears at the bottom of the listing
  • Each record is denoted by an icon for the kind of thing it is (bus stops v schools v golf courses, for example)
  • If we mouse over a given record in the listing, its marker icon on the map bounces to show where it resides
  • To the right of each record listing, the checkbox indicates whether you want the record to be maintained persistently. If you check it, the icon on the map changes color, the record is promoted to the top of the list where it becomes sticky and is given an alphabetic sequence. Unchecking this box undoes all of these changes
  • To the right of each record listing is also the view record [View raw attributes for the record] icon; clicking it shows the raw attribute data for that record.

The records that actually appear on this listing are based on the records scope or Search (see below) conditions, as altered by the filter settings on the right-hand listing under the sWebMap. For example, if we now remove the neighborhood record as being persistent and Show included records we now get items across the entire map viewport:

Search works in a similar fashion, in that it invokes the filter display with the same left- and right-hand listings appear under the sWebMap, only now only for those records that met the search conditions. (The allowable search syntax is that for Lucene.) Here is the result of a search, in this case for “school”:

As shown above, the right-hand panel is split into three sections: Sources (or datasets), Kinds (that is, similar types of things, such as bus stops v schools v golf courses), and Attributes (that is, characteristics for these various types of things). All selection possibilities are supported by auto-select.

Sources and Kinds are selected via checkbox. (The default state when none are checked is to show all.) As more of these items are selected, the records listing in the left-hand panel gets smaller. Also, the counts of available items [as shown by the (XX) number at the end of each item] are also changed as filters are added or subtracted by adding or removing checkboxes.

Applying filters to Attributes works a little differently. Attributes filters are selected by selecting the magnifier plus [Filter by attribute] icon, which then brings up a filter selection at the top of the listing underneath the Attributes header.

The specific values and their counts (for the current selection population) is then shown; you may pick one or more items. Once done, you may pick another attribute to add to the filter list, and continue the filtering process.

Saving and Sharing Your Filters

sWebMaps have a useful way to save and share their active filter selections. At any point as you work with a sWebMap, you can save all of its current settings and configurations — viewport area, filter selections, and persistent records — via some simple steps.

You initiate this functionality by choosing the save button at the upper right of the map panel:

When that option is invoked, it brings up a dialog where you are able to name the current session, and provide whatever explanatory notes you think might be helpful.

NOTE: the naming and access to these saved sessions is local to your own use only, unless you choose to share the session with others; see below.

Once you have a saved session, you will then see a new control at the upper right of your map panel. This control is how you load any of your previously saved sessions:

Further, once you load a session, still further options are presented to you that enables you to either delete or share that session:

If you choose to share a session, a shortened URI is generated automatically for you:

If you then provide that URI link to another user, that user can then click on that link and see the map in the exact same state — viewport area, filter selections, and persistent records — as you initially saved. If the recipient then saves this session, it will now also be available persistently for his or her local use and changes.

NOTE: two users may interactively work together by sharing, saving and then modifying maps that they share again with their collaborator.

[1] A semantic components is a JavaScript or Flex component or widget that takes record descriptions and irXML schema as input, and then outputs interactive visualizations of those records. Depending on the logic described in the input schema and the input record descriptions, the semantic component may behave differently or provide presentation options to users. Each semantic component delivers a very focused set of functionality or visualization. Multiple components may be combined on the same canvas for more complicated displays and controls. At present, there are 12 individual semantic widgets in the available open source suite; see further the sComponent category on the TechWiki. By convention, all of the individual widgets in the semantic component suite are named with an ‘s’ prefix; hence, sWebMap.
[2] The open semantic framework, or OSF, is a combination of a layered architecture and an open-source, modular software stack. The stack combines many leading third-party software packages — such as Drupal for content management, Virtuoso for (RDF) triple storage, Solr for full-text indexing, GATE for tagging and natural language processing, the OWL2 API for ontology management and support, and others. These third-party tools are extended with open source developments from Structured Dynamics including structWSF (a RESTful Web services layer of about a dozen modules for interacting with the underlying data and data engines), conStruct (a series of Drupal modules that tie Drupal to the structWSF Web services layer), semantic components (data display and manipulation widgets, mostly based either in Flash or JavaScript, for working with the semantic data), various parsers and standard data exchange formats and schema to facilitate information flow amongst these options, and a ontologies layer, that consists of both domain ontologies that capture the coherent concepts and relationships of the current problem space and of administrative ontologies that govern how the other software layers interact with this structure.
[3] structWSF is a platform-independent Web services framework for accessing and exposing structured RDF (Resource Description Framework) data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies). The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework has a baseline set of more than 20 Web services in CRUD, browse, search, tagging, ontology management, and export and import.
[4] For the most comprehensive discussion of ODapps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
Posted:February 13, 2012

Bandersnatch image from Final Fantasy VII, Japanese version Shun the Frumious Bandersnatch?

The Web and open source have opened up a whole new world of opportunities and services. We can search the global information storehouse, connect with our friends and make new ones, form new communities, map where stuff is, and organize and display aspects of our lives and interests as never before. These advantages compound into still newer benefits via emergent properties such as social discovery or bookmarking, adding richness to our lives that heretofore had not existed.

And all of these benefits have come for free.

Of course, as our use and sophistication of the Web and open source have grown we have come to understand that the free provision of these services is rarely (ever?) unconditional. For search, our compact is to accept ads in return for results. For social networks, our compact is give up some privacy and control of our own identities. For open source, our compact is the acceptance of (generally) little or no support and often poor documentation.

We have come to understand this quid pro quo nature of free. Where the providers of these services tend to run into problems is when they change the terms of the compact. Google, for example, might change how its search results are determined or presented or how it displays its ads. Facebook might change its privacy or data capture policies. Or, OpenOffice or MySQL might be acquired by a new provider, Oracle, that changes existing distribution, support or community involvement procedures.

Sometimes changes may fit within the acceptable parameters of the compact. But, if such changes fundamentally alter the understood compact with the user community, users may howl or vote with their feet. Depending, the service provider may relent, the users may come to accept the new changes, or the user may indeed drop the service.

The Hidden Costs of Dependence

But there is another aspect of the use of free services, the implications of which have been largely unremarked. What happens if a service we have come to depend upon is no longer available?

Abandonment or changes in service may arise from bankruptcy or a firm being acquired by another. My favorite search service of a decade ago, AltaVista, and Delicious are two prominent examples here. Existing services may be dropped by a provider or APIs removed or deprecated. For Google alone, examples include Wave and Gears, Google Labs, and many, many APIs. (The howls around Google Translate actually caused it to be restored.) And existing services may be altered, such as moving from free to fee or having capabilities significantly modified. Ning and Babbel are two examples here. There are literally thousands of examples of Web-based free services that have gone through such changes. Most have not seen widespread use, but have affected their users nonetheless.

There is nothing unique about free services in these regards. Ford was able to cease production of its Edsel and change the form factor of the Thunderbird despite some loyal fans. Sugar Pops morphed into a variety of breakfast cereal brands. Sony Betamax was beat out by VHS, which then lost out to CDs and now DVDs. My beloved Saabs are heading for the dustbin, or Chinese ownership.

In all of these cases, as consumers we have no guarantees about the permanence of the service or the infrastructure surrounding it. The provider is solely able to make these determinations. It is no different when the service or offering is free. It is the reality of the marketplace that causes such changes.

But, somehow, with free Web services, it is easy to overlook these realities. I offer a couple of personal case studies.

Case Study #1: Site Search

I have earlier described the five different versions of site search that I have gone through for this blog. The thing is, my current option, Relevanssi, is also a free plug-in. What is notable about this example, though, is the multiple attempts and (unanticipated) significant effort to discover, evaluate and then implement alternatives. Unfortunately, I rather suspect my current option may itself — because of the nature of free on the Web — need to be replaced at some time down the road.

Case Study #2: FeedBurner

Part of what caused me to abandon Google Custom Search as one of the above search options was the requirement I serve ads on my blog to use it. So, when I decided to eliminate ads entirely in 2010 I not only gave up this search option, but I also lost some of the better tracking and analytics options also provided for free by Google. Fortunately, I had also adopted FeedBurner early in the life of this blog. It was also becoming increasingly clear that feed subscribers — in addition to direct site visitors — were becoming an essential metric for gauging traffic.

I thus had a replacement means for measuring traffic trends. Google (strange how it keeps showing up!) had purchased FeedBurner in 2007, and had made some nice site and feature improvements, including turning some paid services into free. The service was performing quite well, despite FeedBurner’s infamous knack to lose certain feed counts periodically. However, this performance broke last Summer when my site statistics indicated a massive drop in subscribers.

The figure below, courtesy of Feed Compare, shows the daily subscriber statistics for my AI3 blog for the past two years. The spikiness of the curve affirms the infamous statistics gaps of the service. The first part of the curve also shows nice, steady growth of readers, growing to more than 4000 by last Summer. Then, on August 16, there was a massive drop of 85% in my subscriber counts. I monitored this for a couple of days, thinking it was another temporary infamous event, then realized something more serious was afoot:

Drop in Reported Feedburner Subscribers

It was at this point I became active on the Google group for FeedBurner. Many others had noted the same service drop. (The major surmise is that FeedBurner now is having difficulty including Feedfetcher feeds, which is interesting because it is the feed of Google’s own Reader service, and the largest feed aggregation source on the Web.)

Over the ensuing months until last week I posted periodic notices to the official group seeking clarification as to the source of these errors and a fix to the service. In that period, no Google representative ever answered me, nor any of the numerous requests by others. I don’t believe there has been a single entry on any matter by Google staff for nearly the past year.

I made requests and inquiries no fewer than eight times over these months. True, Google had announced it was deprecating the FeedBurner API in May 2011, but, in that announcement, there was no indication that bug fixes or support to their own official group would cease. While it is completely within Google’s purview to do as it pleases, this behavior hardly lends itself to warm feelings by those using the service.

Finally, last week I dropped the FeedBurner stats and installed a replacement WordPress plugin service [1]. It was clear no fixes were forthcoming and I needed to regain an understanding of my actual subscriber base. The counts you now see on this site use this new service; they show the continuation of this site’s historical growth trend.

Is Google Becoming More Frumious?

It is not surprising that in the prior discussions Google figures prominently. It is the largest provider of APIs and free services on the Web. But, even with its continuing services, I am seeing trends that disturb me in terms of what I thought the “compact” was with the company.

I’m not liking recent changes to Google’s bread and butter, search. While they are doing much to incorporate more structure in their results, which I applaud, they are also making ranking, formatting and presentation changes I do not. I am now spending at least us much of my search time on DuckDuckGo, and have been mightily impressed with its cleanliness, quality and lack of ads in results.

I also do not like how all of my current service uses of Google are now being funneled into Google Plus. I am seeing an arrogance that Google knows what is best and wants to direct me to workflows and uses, reminiscent of the arrogance Microsoft came to assume at the height of its market share. How does that variant of Lord Acton’s dictum go? “Market share tends to corrupt, and absolute market share corrupts absolutely.”

We are seeing Google’s shift to monetize extremely popular APIs such as Maps and Translate. My company, Structured Dynamics, has utilized these services heavily for client work in the past. We now must find alternatives or cost the payment for these services into the ongoing economics of our customer installations. Of course, charging for these services is Google’s right, but it does change the equation and causes us to evaluate alternatives.

I fear that Google may be turning into a frumious Bandersnatch. I’m not sure we will shun it, but we certainly are changing our views of the basis by which we engage or not with the company and its services. Once we shift from a basis of free, our expectations as to permanence and support change as well.

Big Boys Don’t Cry

This is not a diatribe against Google nor a woe is us. Us big kids have come to know that there is no such thing as a free lunch. But that message is getting reaffirmed now more strongly in the Web context.

There can be benefits from seeking, installing or adapting to new alternatives with different service profiles when dependent services are abandoned or deprecated. Learning always takes place. Accepting one’s own responsibility for desired services also leads to control and tailoring for specific needs. Early use of free services also educates about what is desired or not, which can lead to better implementation choices if and when direct responsibility is assumed.

But, in some areas, we are seeing services or uses of the Web that we should adopt only with care or even shun. Business opportunities that depend on third-party services or APIs are very risky. Strong reliance on single-provider service ecosystems adds fragility to dependence. Own systems should be designed to not depend too strongly on specific API providers and their unique features or parameters.

Free is not forever, and it is conditional. Substitutability is a good design practice to embrace.


[1] I may detail at a later time how this replacement service was set up.

Posted by AI3's author, Mike Bergman Posted on February 13, 2012 at 7:02 pm in Blogs and Blogging, Site-related | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/996/the-conditional-costs-of-free/
The URI to trackback this post is: https://www.mkbergman.com/996/the-conditional-costs-of-free/trackback/
Posted:January 24, 2012

The Triadic of SignsCoca-Cola, Toucans and Charles Sanders Peirce

The crowning achievement of the semantc Web is the simple use of URIs to identify data. Further, if the URI identifier can resolve to a representation of that data, it now becomes an integral part of the HTTP access protocol of the Web while providing a unique identifier for the data. These innovations provide the basis for distributed data at global scale, all accessible via Web devices such as browsers and smartphones that are now a ubiquitous part of our daily lives.

Yet, despite these profound and simple innovations, the semantic Web’s designers and early practitioners and advocates have been mired in a muddled, metaphysical argument of at least a decade over what these URIs mean, what they reference, and what their actual true identity is. These muddles about naming and identity, it might be argued, are due to computer scientists and programmers trying to grapple with issues more properly the domain of philosophers and linguists. But that would be unfair. For philosophers and linguists themselves have for centuries also grappled with these same conundrums [1].

As I argue in this piece, part of the muddle results from attempting to do too much with URIs while another part results from not doing enough. I am also not trying to directly enter the fray of current standards deliberations. (Despite a decade of controversy, I optimistically believe that the messy process of argument and consensus building will work itself out [2].) What I am trying to do in this piece, however, is to look to one of America’s pre-eminent philosophers and logicians, Charles Sanders Peirce (pronounced “purse”), to inform how these controversies of naming, identity and meaning may be dissected and resolved.

‘Identity Crisis’, httpRange-14, and Issue 57

The Web began as a way to hyperlink between documents, generally Web pages expressed in the HTML markup language. These initial links were called URLs (uniform resource locators), and each pointed to various kinds of electronic resources (documents) that could be accessed and retrieved on the Web. These resources could be documents written in HTML or other encodings (PDFs, other electronic formats), images, streaming media like audio or videos, and the like [3].

All was well and good until the idea of the semantic Web, which postulated that information about the real world — concepts, people and things — could also be referenced and made available for reasoning and discussion on the Web. With this idea, the scope of the Web was massively expanded from electronic resources that could be downloaded and accessed via the Web to now include virtually any topic of human discourse. The rub, of course, was that ideas such as abstract concepts or people or things could not be “dereferenced” nor downloaded from the Web.

One of the first things that needed to change was to define a broader concept of a URI “identifier” above the more limited concept of a URL “locator”, since many of these new things that could be referenced on the Web went beyond electronic resources that could be accessed and viewed [3]. But, since what the referent of the URI now actually might be became uncertain — was it a concept or a Web page that could be viewed or something else? — a number of commentators began to note this uncertainty as the “identity crisis” of the Web [4]. The topic took on much fervor and metaphysical argument, such that by 2003, Sandro Hawke, a staffer of the standards-setting W3C (World Wide Web Consortium), was able to say, “This is an old issue, and people are tired of it” [5].

Yet, for many of the reasons described more fully below, the issue refused to go away. The Technical Architecture Group (TAG) of the W3C took up the issue, under a rubric that came to be known as httpRange-14 [6]. The issue was first raised in March 2002 by Tim Berners-Lee, accepted for TAG deliberations in February 2003, with then a resolution offered in June 2005 [7]. (Refer to the original resolution and other information [6] to understand the nuances of this resolution, since particular commentary on that approach is not the focus of this article.) Suffice it to say here, however, that this resolution posited an entirely new distinction of Web content into “information resources” and “non-information resources”, and also recommended the use of the HTTP 303 redirect code for when agents requesting a URI should be directed to concepts versus viewable documents.

This “resolution” has been anything but. Not only can no one clearly distinguish these de novo classes of “information resources” [19], but the whole approach felt arbitrary and kludgy.

Meanwhile, the confusions caused by the “identity crisis” and httpRange-14 continued to perpetuate themselves. In 2006, a major workshop on “Identity, Reference and the Web” (IRW 2006) was held in conjunction with the Web’s major WWW2006 conference in Edinburgh, Scotland, on May 23, 2006 [8]. The various presentations and its summary (by Harry Halpin) are very useful to understand these issues. What was starting to jell at this time was the understanding that the basis of identity and meaning on the Web posed new questions, and ones that philosophers, logicians and linguists needed to be consulted to help inform.

The fiat of the TAG’s 2005 resolution has failed to take hold. Over the ensuing years, various eruptions have occurred on mailing lists and within the TAG itself (now expressed as Issue 57) to revisit these questions and bring the steps moving forward into some coherent new understanding. Though linked data has been premised on best-practice implementation of these resolutions [9], and has been a qualified success, many (myself included) would claim that the extra steps and inefficiencies required from the TAG’s httpRange-14 guidance have been hindrances, not facilitators, of the uptake of linked data (or the semantic Web).

Today, despite the efforts of some to claim the issue closed, it is not. Issue 57 and the periodic bursts from notable semantic Web advocates such as Ian Davis [10], Pat Hayes and Harry Halpin [11], Ed Summers [12], Xiaoshu Wang [13], David Booth [14] and TAG members themselves, such as Larry Masinter [15] and Jonathan Rees [16], point to continued irresolution and discontent within the advocate community. Issue 57 currently remains open. Meanwhile, I think, all of us interested in such matters can express concern that linked data, the semantic Web and interoperable structured data have seen less uptake than any of us had hoped or wanted over the past decade. As I have stated elsewhere, unclear semantics and muddled guidelines help to undercut potential use.

As each of the eruptions over these identity issues has occurred, the competing camps have often been characterized as “talking past one another”; that is, not communicating in such a way as to help resolve to consensus. While it is hardly my position to do so, I try to encapsulate below the various positions and prejudices as I see them in this decades-long debate. I also try to share my own learning that may help inform some common ground. Forgive me if I overly simplify these vexing issues by returning to what I see as some first principles . . . .

What’s in a Name?

Original Coca-Cola bottle

One legacy of the initial document Web is the perception that Web addresses have meaning. We have all heard of the multi-million dollar purchasing of domains [17] and the adjudication that may occur when domains are hijacked from their known brands or trademark owners. This legacy has tended to imbue URIs with a perceived value. It is not by accident, I believe, that many within the semantic Web and linked data communities still refer to “minting” URIs. Some believe that ownership and control over URIs may be equivalent to grabbing up valuable real estate. It is also the case that many believe the “name” given to a URI acts to name the referent to which it refers.

This perception is partially true, partially false, but moreover incomplete in all cases. We can illustrate these points with the global icon, “Coca-Cola”.

As for the naming aspects, let’s dissect what we mean when we use the label “Coca-Cola” (in a URI or otherwise). Perhaps the first thing that comes to mind is “Coca-Cola,” the beverage (which has a description on Wikipedia, among other references). Because of its ubiquity, we may also recognize the image of the Coca-Cola bottle to the left as a symbol for this same beverage. (Though, in the hilarious movie, The Gods, They Must be Crazy, Kalahari Bushmen, who had no prior experience of Coca-Cola, took the bottle to be magical with evil powers [18].) Yet even as reference to the beverage, the naming aspects are a bit cloudy since we could also use the fully qualified synonyms of “Coke”, “Coca-cola” (small C), “Classic Coke” and the hundreds of language variants worldwide.

On the other hand, the label “Coca-Cola” could just as easily conjure The Coca-Cola Company itself. Indeed, the company web site is the location pointed to by the URI of http://www.thecoca-colacompany.com/. But, even that URI, which points to the home Web page of the company, does not do justice to conveying an understanding or description of the company. For that, additional URIs may need to be invoked, such as the description at Wikipedia, the company’s own company description page, plus perhaps the company’s similar heritage page.

Of course, even these links and references only begin to scratch the surface of what the company Coca-Cola actually is: headquarters, manufacturing facilities, 140,000 employees, shareholders, management, legal entities, patents and Coke recipe, and the like. Whether in human languages or URIs, in any attempt to signify something via symbols or words (themselves another form of symbol), we risk ambiguity and incompleteness.

URI shorteners also undercut the idea that a URI necessarily “names” something. Using the service bitly, we can shorten the link to the Wikipedia description of the Coke beverage to http://bit.ly/xnbA6 and we can shorten the link to The Coca-Cola Company Web site to http://bit.ly/9ojUpL. I think we can fairly say that neither of these shortened links “name” their referents. The most we can say about a URI is that it points to something. With the vagaries of meaning in human languages, we might also say that URIs refer to something, denote something or identify (but not in the sense of completely define) something.

From this discussion, we can assert with respect to the use of URIs as “names” that:

  1. In all cases, URIs are pointers to a particular referent
  2. In some cases, URIs do act to “name” some things
  3. Yet, even when used as “names,” there can be ambiguity as to what exactly the referent is that is denoted by the name
  4. Resolving what such “names” mean is a matter of context and reference to further information or links, and
  5. Because URIs may act as “names”, it is appropriate to consider social conventions and contracts (e.g., trademarks, brands, legal status) in adjudicating who can own the URI.

In summary, I think we can say that URIs may act as names, but not in all or most cases, and when used as such are often ambiguous. Absolutely associating URIs as names is way too heavy a burden, and incorrect in most cases.

What is a Resource?

The “name” discussion above masks that in some cases we are talking about a readable Web document or image (such as the Wikipedia description of the Coke beverage or its image) versus the “actual” thing in the real world (the Coke beverage itself or even the company). This distinction is what led to the so-called “identity crisis”, for which Ian Davis has used a toucan as his illustrative thing [10].Keel-billed Toucan

As I note in the conclusion, I like Davis’ approach to the identity conundrum insofar as Web architecture and linked data guidance are concerned. But here my purpose is more subtle: I want to tease apart still further the apparent distinction between an electronic description of something on the Web and the “actual” something. Like Davis, let’s use the toucan.

In our strawman case, we too use a description of the toucan (on Wikipedia) to represent our “information resource” (the accessible, downloadable electronic document). We contrast to that a URI that we mean to convey the actual physical bird (a “non-information resource” in the jumbled jargon of httpRange-14), which we will designate via the URI of http://example.com/toucan.

Despite the tortured (and newly conjured) distinction between “information resource” and “non-information resource”, the first blush reaction is that, sure, there is a difference between an electronic representation that can be accessed and viewed on the Web and its true, “actual” thing. Of course people can not actually be rendered and downloaded on the Web, but their bios and descriptions and portrait images may. While in the abstract such distinctions appear true and obvious, in the specifics that get presented to experts, there is surprising disagreement as to what is actually an “information resource” v. a “non-information resource” [19]. Moreover, as we inspect the real toucan further, even that distinction is quite ambiguous.

When we inspect what might be a definitive description of “toucan” on Wikipedia, we see that the term more broadly represents the family of Ramphastidae, which contains five genera and forty different species. The picture we are showing to the right is but of one of those forty species, that of the keel-billed toucan (Ramphastos sulfuratus). Viewing the images of the full list of toucan species shows just how divergent these various “physical birds” are from one another. Across all species, average sizes vary by more than a factor of three with great variation in bill sizes, coloration and range. Further, if I assert that the picture to the right is actually that of my pet keel-billed toucan, Pretty Bird, then we can also understand that this representation is for a specific individual bird, and not the physical keel-billed toucan species as a whole.

The point of this diversion is not a lecture on toucans, but an affirmation that distinctions between “resources” occur at multiple levels and dimensions. Just as there is no self-evident criteria as to what constitutes an “information resource”, there is also not a self-evident and fully defining set of criteria as to what is the physical “toucan” bird. The meaning of what we call a “toucan” bird is not embodied in its label or even its name, but in the context and accompanying referential information that place the given referent into a context that can be communicated and understood. A URI points to (“refers to”) something that causes us to conjure up an understanding of that thing, be it a general description of a toucan, a picture of a toucan, an understanding of a species of toucan, or a specific toucan bird. Our understanding or interpretation results from the context and surrounding information accompanying the reference.

In other words, a “resource” may be anything, which is just the way the W3C has defined it. There is not a single dimension which, magically, like “information” and “non-information,” can cleanly and definitely place a referent into some state of absolute understanding. To assert that such magic distinctions exist is a flaw of Cartesian logic, which can only be reconciled by looking to more defensible bases in logic [20].

Peirce and the Logic of Signs

The logic behind these distinctions and nuances leads us to Charles Sanders PeirceCharles Sanders Peirce (1839 – 1914). Peirce (pronounced “purse”) was an American logician, philosopher and polymath of the first rank. Along with Frege, he is acknowledged as the father of predicate calculus and the notation system that formed the basis of first-order logic. His symbology and approach arguably provide the logical basis for description logics and other aspects underlying the semantic Web building blocks of the RDF data model and, eventually, the OWL language. Peirce is the acknowledged founder of pragmatism, the philosophy of linking practice and theory in a process akin to the scientific method. He was also the first formulator of existential graphs, an essential basis to the whole field now known as model theory. Though often overlooked in the 20th century, Peirce has lately been enjoying a renaissance with his voluminous writings still being deciphered and published.

The core of Peirce’s world view is based in semiotics, the study and logic of signs. In his seminal writing on this, “What is in a Sign?” [21], he wrote that “every intellectual operation involves a triad of symbols” and “all reasoning is an interpretation of signs of some kind”. Peirce had a predilection for expressing his ideas in “threes” throughout his writings.

Semiotics is often split into three branches: 1) syntactics – relations among signs in formal structures; 2) semantics – relations between signs and the things to which they refer; and 3) pragmatics – relations between signs and the effects they have on the people or agents who use them.

Peirce’s logic of signs in fact is a taxonomy of sign relations, in which signs get reified and expanded via still further signs, ultimately leading to communication, understanding and an approximation of “canonical” truth. Peirce saw the scientific method as itself an example of this process.

A given sign is a representation amongst the triad of the sign itself (which Peirce called a representamen, the actual signifying item that stands in a well-defined kind of relation to the two other things), its object and its interpretant. The object is the actual thing itself. The interpretant is how the agent or the perceiver of the sign understands and interprets the sign. Depending on the context and use, a sign (or representamen) may be either an icon (a likeness), an indicator or index (a pointer or physical linkage to the object) or a symbol (understood convention that represents the object, such as a word or other meaningful signifier).

An interpretant in its barest form is a sign’s meaning, implication, or ramification. For a sign to be effective, it must represent an object in such a way that it is understood and used again. This makes the assignment and use of signs a community process of understanding and acceptance [20], as well as a truth-verifying exercise of testing and confirming accepted associations.

John Sowa has done much to help make some of Peirce’s obscure language and terminology more accessible to lay readers [22]. He has expressed Peirce’s basic triad of sign relations as follows, based around the Yojo animist cat figure used by the character Queequeg in Herman Melville’s Moby-Dick:

The Triangle of Meaning

In this figure, object and symbol are the same as the Peirce triad; concept is the interpretant in this case. The use of the word ‘Yojo’ conjures the concept of cat.

This basic triad representation has been used in many contexts, with various replacements or terms at the nodes. Its basic form is known as the Meaning Triangle, as was popularized by Ogden and Richards in 1923 [23].

The key aspect of signs for Peirce, though, is the ongoing process of interpretation and reference to further signs, a process he called semiosis. A sign of an object leads to interpretants, which, as signs, then lead to further interpretants. In the Sowa example below, we show how meaning triangles can be linked to one another, in this case by abstracting that the triangles themselves are concepts of representation; we can abstract the ideas of both concept and symbol:

Representing an Object by a Concept

We can apply this same cascade of interpretation to the idea of the sign (or representamen), which in this case shows that a name can be related to a word symbol, which in itself is a combination of characters in a string called ‘Yojo’:

Representing Signs of Signs of Signs

According to Sowa [22]:

“What is revolutionary about Peirce’s logic is the explicit recognition of multiple universes of discourse, contexts for enclosing statements about them, and metalanguage for talking about the contexts, how they relate to one another, and how they relate to the world and all its events, states, and inhabitants.
“The advantage of Peircean semiotics is that it firmly situates language and logic within the broader study of signs of all types. The highly disciplined patterns of mathematics and logic, important as they may be for science, lie on a continuum with the looser patterns of everyday speech and with the perceptual and motor patterns, which are organized on geometrical principles that are very different from the syntactic patterns of language or logic.”

Catherine Legg [20] notes that the semiotic process is really one of community involvement and consensus. Each understanding of a sign and each subsequent interpretation helps come to a consensus of what a sign means. It is a way of building a shared understanding that aids communication and effective interpretation. In Peirce’s own writings, the process of interpretation can lead to validation and an eventual “canonical” or normative interpretation. The scientific method itself is an extreme form of the semiotic process, leading ultimately to what might be called accepted “truths”.

Peircean Semiotics of URIs

So, how do Peircean semiotics help inform us about the role and use of URIs? Does this logic help provide guidance on the “identity crisis”?

The Peircean taxonomy of signs has three levels with three possible sign roles at each level, leading to a possible 27 combinations of sign representations. However, because not all sign roles are applicable at all levels, Peirce actually postulated only ten distinct sign representations.

Common to all roles, the URI “sign” is best seen as an index: the URI is a pointer to a representation of some form, be it electronic or otherwise. This representation bears a relation to the actual thing that this referent represents, as is true for all triadic sign relationships. However, in some contexts, again in keeping with additional signs interpreting signs in other roles, the URI “sign” may also play the role of a symbolic “name” or even as a signal that the resource can be downloaded or accessed in electronic form. In other words, by virtue of the conventions that we choose to assign to our signs, we can supply additional information that augments our understanding of what the URI is, what it means, and how it is accessed.

Of course, in these regards, a URI is no different than any other sign in the Peircean world view: it must reside in a triadic relationship to its actual object and an interpretation of that object, with further understanding only coming about by the addition of further signs and interpretations.

In shortened form, this means that a URI, acting alone, can at most play the role of a pointer between an object and its referent. A URI alone, without further signs (information), can not inform us well about names or even what type of resource may be at hand. For these interpretations to be reliable, more information must be layered on, either by accepted convention of the current signs or the addition of still further signs and their interpretations. Since the attempts to deal with the nature of a URI resource by fiat as stipulated by httpRange-14 neither meet the standards of consensus nor empirical validity, the attempt can not by definition become “canonical”. This does not mean that httpRange-14 and its recommended practices can not help in providing more information and aiding interpretation for what the nature of a resource may be. But it does mean that httpRange-14 acting alone is insufficient to resolve ambiguity.

Moreover, what we see in the general nature of Peirce’s logic of signs is the usefulness of adding more “triads” of representation as the process to increase understanding through further interpretation. Kind of sounds like adding on more RDF triples, does it not?

Global is Neither Indiscriminate Nor Unambiguous

Names, references, identity and meaning are not absolutes. They are not philosophically, and they are not in human language. To expect machine communications to hold to different standards and laws than human communications is naive. To effect machine communications our challenge is not to devise new rules, but to observe and apply the best rules and practices that human communications instruct.

There has been an unstated hope at the heart of the semantic Web enterprise that simply expressing statements in the right way (syntax) and in the right form (RDF) is sufficient to facilitate machine communications. But this hope, too, is naive and silly. Just as we do not accept all human utterances as truth, neither will we accept all machine transmissions as reliable. Some of the information will be posted in error; some will be wrong or ill-fitting to our world view; some will be malicious or intended to deceive. Spam and occasionally lousy search results on the Web tell us that Web documents are subject to these sources of unsuitability, why is not the same true of data?

Thus, global data access via the semantic Web is not — and can never be — indiscriminate nor unambiguous. We need to understand and come to trust sources and provenance; we need interpretation and context to decide appropriateness and validity; and we need testing and validation to ensure messages as received are indeed correct. Humans need to do these things in their normal courses of interaction and communication; our machine systems will need to do the same.

These confirmations and decisions as to whether the information we receive is actionable or not will come about via still more information. Some of this information may come about via shared conventions. But most will come about because we choose to provide more context and interpretation for the core messages we hope to communicate.

A Go-Forward Approach

Nearly five years ago Hayes and Halpin put forth a proposal to add ex:refersTo and ex:describedBy to the standard RDF vocabulary as a way for authors to provide context and explanation for what constituted a specific RDF resource [11]. In various ways, many of the other individuals cited in this article have come to similar conclusions. The simple redirect suggestions of both Ian Davis [10] and Ed Summers [12] appear particularly helpful.

Over time, we will likely need further representations about resources regarding such things as source, provenance, context and other interpretations that would help remove ambiguities as to how the information provided by that resource should be consumed or used. These additional interpretations can mechanically be provided via referenced ontologies or embedded RDFa (or similar). These additional interpretations can also be aided by judicious, limited additions of new predicates to basic language specifications for RDF (such as the Hayes and Halpin suggestions).

In the end, of course, any frameworks that achieve consensus and become widely adopted will be simple to use, easy to understand, and straightforward to deploy. The beauty of best practices in predicates and annotations is that failures to provide are easy to test. Parties that wish to have their data consumed have incentive to provide sufficient information so as to enable interpretation.

There is absolutely no reason that these additions can not co-exist with the current httpRange-14 approach. By adding a few other options and making clear the optional use of httpRange-14, we would be very Peirce-like in our go-forward approach: We are being both pragmatic while we add more means to improve our interpretations for what a Web resource is and is meant to be.


[1] Throughout intellectual history, a number of prominent philosophers and logicians have attempted to describe naming, identity and reference of objects and entities. Here are a few that you may likely encounter in various discussions of these topics in reference to the semantic Web; many are noted philosophers of language:

  • Aristotle (384 BC – 322 BC) – founder of formal logic; formulator and proponent of categorization; believed in the innate “universals” of various things in the natural world
  • Rudolf Carnap (1891 – 1970) –  proposed a logical syntax that provided a system of concepts, a language, to enable logical analysis via exactly formula; a basis for natural language processing;rejected the idea and use of metaphysics
  • René Descartes (1596 – 1650) – posited a boundary between mind and the world; the meaning of a sign is the intension of its producer, and is private and incorrigible
  • Friedrich Ludwig Gottlob Frege (1848 – 1925) – one of the formulators of first-order logic, though syntax not adopted; advocated shared senses, which can be objective and sharable
  • Kurt Gödel (1906 – 1978) – his two incompleteness theorems are some of the most important logic contributions of all time; they establish inherent limitations of all but the most trivial axiomatic systems capable of doing arithmetic, as well as for computer programs
  • David Hume (1711 – 1776) – embraced natural empiricism, but kept the Descartes concept of an “idea”
  • Immanuel Kant (1724 – 1804) – one of the major philosophers in history, argued that experience is purely subjective without first being processed by pure reason; a major influence on Peirce
  • Saul Kripke (1940 – ) – proposed the causal theory of reference and what proper names mean via a “baptism” by the namer
  • Gottfried Wilhelm Leibniz (1646 – 1716) – the classic definition of identity is Leibniz’s Law, which states that if two objects have all of their properties in common, they are identical and so only one object
  • Richard Montague (1930 – 1971) – wrote much on logic and set theory; student of Tarski; pioneered a logical approach to natural language semantics; associated with model theory, model-theoretic semantics
  • Charles Sanders Peirce (1839 – 1914) – see main text
  • Willard Van Orman Quine (1908 – 2000) – noted analytical philosopher, advocated the “radical indeterminancy of translation” (can never really know)
  • Bertrand Russell (1872 – 1970) – proposed the direct theory of reference and what it means to “ground in references”; adopted many Peirce arguments without attribution
  • Ferdinand de Saussure (1857 – 1913) – also proposed an alternative view to Peirce of semiotics, one grounded in sociology and linguistics
  • John Rogers Searle (1932 – ) – argues that consciousness is a real physical process in the brain and is subjective; has argued against strong AI (artificial intelligence)
  • Alfred Tarski (1901 – 1983) – analytic philosopher focused on definitions of models and truth; great admirer of Peirce; associated with model theory, model-theoretic semantics
  • Ludwig Josef Johann Wittgenstein (1889 – 1951) – he disavowed his earlier work, arguing that philosophy needed to be grounded in ordinary language, recognzing that the meaning of words is dependent on context, usage, and grammar.
Also, Umberto Eco has been a noted proponent and popularizer of semiotics.
[2] As any practitioner ultimately notes, standards development is a messy, lengthy and trying process. Not all individuals can handle the messiness and polemics involved. Personally, I prefer to try to write cogent articles on specific issues of interest, and then leave it to others to slug it out in the back rooms of standards making. Where the process works well, standards get created that are accepted and adopted. Where the process does not work well, the standards are not embraced as exhibited by real-world use.
[3] Tim Berners-Lee, 2007. What Do HTTP URIs Identify?
This article does not discuss the other sub-category of URIs, URNs (for names). URNs may refer to any standard naming scheme (such as ISBNs for books) and has no direct bearing on any network access protocol, as do URLs and URIs when they are referenceable. Further, URNs are little used in practice.
[4] Kendall Clark was one of the first to question “resource” and other identity ambiguities, noting the tautology between URI and resource as “anything that has identity.” See Kendall Clark, 2002. “Identity Crisis,” in XML.com, Sept 11 2002; see http://www.xml.com/pub/a/2002/09/11/deviant.html. From the topic map community, one notable contribution was from Steve Pepper and Sylvia Schwab, 2003. “Curing the Web’s Identity Crisis,” found at : http://www.ontopia.net/topicmaps/materials/identitycrisis.html.
[5] Sandro Hawke, 2003. Disambiguating RDF Identifiers. W3C, January 2003. See http://www.w3.org/2002/12/rdf-identifiers/.
[6] The issue was framed as what is the proper “range” for HTTP referrals and was also the 14th major TAG issue recorded, hence the name. See further the httpRange-14 Webography .
[7] See W3C, “httpRange-14: What is the range of the HTTP dereference function?”; see http://www.w3.org/2001/tag/issues.html#httpRange-14.
[9] Leo Sauermann and Richard Cyganiak, eds., 2008. Cool URIs for the Semantic Web, W3C Interest Group Note, December 3, 2008. See http://www.w3.org/TR/cooluris/.
[10] Ian Davis, 2010. Is 303 Really Necessary? Blog post, November 2010, accessed 20 January 2012. (See http://blog.iandavis.com/2010/11/04/is-303-really-necessary/.) A considerable thread resulted from this post; see http://markmail.org/thread/mkoc5kxll6bbjbxk.
[11] See first Harry Halpin, 2006. “Identity, Reference and Meaning on the Web,” presented at WWW 2006, May 23, 2006. See http://www.ibiblio.org/hhalpin/irw2006/hhalpin.pdf. This was then followed up with greater elaboration by Patrick J. Hayes and Harry Halpin, 2007. “In Defense of Amibiguity,” http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html.
[12] Ed Summers, 2010. Linking Things and Common Sense, blog post of July 7, 2010. See http://inkdroid.org/journal/2010/07/07/linking-things-and-common-sense/.
[13] Xiaoshu Wang, 2007. URI Identity and Web Architecture Revisited, Word document posted on posterous.com, November 2007. (Former Web documents have been removed.)
[14] David Booth, 2006. “URIs and the Myth of Resource Identity,” see http://dbooth.org/2006/identity/.
[15] See Larry Masinter, 2012. “The ‘tdb’ and ‘duri’ URI Schemes, Based on Dated URIs,” 10th version, IETF Network Working Group Internet-Draft,January 12, 2012. See http://tools.ietf.org/html/draft-masinter-dated-uri-10.
[16] Jonathan Rees has been the scribe and author for many of the background documents related to Issue 57. A recent mailing list entry provides pointers to four relevant documents in this entire discussion. See Jonathan A Rees, 2012. Guide to ISSUE-57 (httpRange-14) document suiteJanuary, 21, 2012.
[17] At least twenty domain names, led by insure.com, have sold for more the $2 million each; see this Wikipedia listing.
[18] In the wonderful movie, The Gods, They Must be Crazy, Bushmen in the Kalahari Desert one day find an unbroken glass Coke bottle that had been thrown out of an airplane. Initially, this strange artifact seems to be another boon from the gods, and the Bushmen find many uses for it. But unlike anything that they have had before, there is only one bottle to go around. This creates jealousy, envy, anger, hatred, even violence. The protagonist, Xi, decides that the bottle is an evil thing and must be thrown off of the edge of the world. The hilarity of the movie comes from that premise and Xi’s encounters with the modern world as he pursues his quest with the magic bottle.

[19] Wang [13]rhetorically asked which of the following things would be categorized as an “information resource”:

  1. A book
  2. A clock
  3. The clock on the wall of my bedroom
  4. A gene
  5. The sequence of a gene
  6. A software
  7. A service
  8. A namespace
  9. An ontology
  10. A language
  11. A number
  12. A concept, such as Dublin Core’s creator.

See the 2007 thread on this issue, mostly by Sean Palmer and Noah Mendelsohn, the latter aknowledging that various experts may only agree on 85% of the items.

[20] See further Catherine Legg, 2010. “Pragmaticsm on the Semantic Web,” in Bergman, M., Paavola, S., Pietarinen, A.-V., & Rydenfelt, H. eds., Ideas in Action: Proceedings of the Applying Peirce Conference, pp. 173–188. Nordic Studies in Pragmatism 1. Helsinki: Nordic Pragmatism Network. See http://www.nordprag.org/nsp/1/Legg.pdf.
[21] Charles Sanders Peirce, 1894. “What is in a Sign?”, see http://www.iupui.edu/~peirce/ep/ep2/ep2book/ch02/ep2ch2.htm.
[22] The figures in particular are from John F. Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS 2000 in Darmstadt, Germany, on August 14, 2000; published in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. May be found at http://www.jfsowa.com/ontology/ontometa.htm. Also see John F. Sowa, 2006. “Peirce’s Contributions to the 21st Century,” presented at International Conference on Conceptual Structures, Aalborg, Denmark, July 17, 2006. See http://www.jfsowa.com/pubs/csp21st.pdf.
[23] C.K. Ogden and I. A. Richards, 1923. The Meaning of Meaning, Harcourt, Brace, and World, New York, 8th edition 1946.