Posted:January 24, 2012

Give Me a Sign: What Do Things Mean on the Semantic Web?

Coca-Cola, Toucans and Charles Sanders Peirce

The crowning achievement of the semantc Web is the simple use of URIs to identify data. Further, if the URI identifier can resolve to a representation of that data, it now becomes an integral part of the HTTP access protocol of the Web while providing a unique identifier for the data. These innovations provide the basis for distributed data at global scale, all accessible via Web devices such as browsers and smartphones that are now a ubiquitous part of our daily lives.

Yet, despite these profound and simple innovations, the semantic Web’s designers and early practitioners and advocates have been mired in a muddled, metaphysical argument of at least a decade over what these URIs mean, what they reference, and what their actual true identity is. These muddles about naming and identity, it might be argued, are due to computer scientists and programmers trying to grapple with issues more properly the domain of philosophers and linguists. But that would be unfair. For philosophers and linguists themselves have for centuries also grappled with these same conundrums [1].

As I argue in this piece, part of the muddle results from attempting to do too much with URIs while another part results from not doing enough. I am also not trying to directly enter the fray of current standards deliberations. (Despite a decade of controversy, I optimistically believe that the messy process of argument and consensus building will work itself out [2].) What I am trying to do in this piece, however, is to look to one of America’s pre-eminent philosophers and logicians, Charles Sanders Peirce (pronounced “purse”), to inform how these controversies of naming, identity and meaning may be dissected and resolved.

‘Identity Crisis’, httpRange-14, and Issue 57

The Web began as a way to hyperlink between documents, generally Web pages expressed in the HTML markup language. These initial links were called URLs (uniform resource locators), and each pointed to various kinds of electronic resources (documents) that could be accessed and retrieved on the Web. These resources could be documents written in HTML or other encodings (PDFs, other electronic formats), images, streaming media like audio or videos, and the like [3].

All was well and good until the idea of the semantic Web, which postulated that information about the real world — concepts, people and things — could also be referenced and made available for reasoning and discussion on the Web. With this idea, the scope of the Web was massively expanded from electronic resources that could be downloaded and accessed via the Web to now include virtually any topic of human discourse. The rub, of course, was that ideas such as abstract concepts or people or things could not be “dereferenced” nor downloaded from the Web.

One of the first things that needed to change was to define a broader concept of a URI “identifier” above the more limited concept of a URL “locator”, since many of these new things that could be referenced on the Web went beyond electronic resources that could be accessed and viewed [3]. But, since what the referent of the URI now actually might be became uncertain — was it a concept or a Web page that could be viewed or something else? — a number of commentators began to note this uncertainty as the “identity crisis” of the Web [4]. The topic took on much fervor and metaphysical argument, such that by 2003, Sandro Hawke, a staffer of the standards-setting W3C (World Wide Web Consortium), was able to say, “This is an old issue, and people are tired of it” [5].

Yet, for many of the reasons described more fully below, the issue refused to go away. The Technical Architecture Group (TAG) of the W3C took up the issue, under a rubric that came to be known as httpRange-14 [6]. The issue was first raised in March 2002 by Tim Berners-Lee, accepted for TAG deliberations in February 2003, with then a resolution offered in June 2005 [7]. (Refer to the original resolution and other information [6] to understand the nuances of this resolution, since particular commentary on that approach is not the focus of this article.) Suffice it to say here, however, that this resolution posited an entirely new distinction of Web content into “information resources” and “non-information resources”, and also recommended the use of the HTTP 303 redirect code for when agents requesting a URI should be directed to concepts versus viewable documents.

This “resolution” has been anything but. Not only can no one clearly distinguish these de novo classes of “information resources” [19], but the whole approach felt arbitrary and kludgy.

Meanwhile, the confusions caused by the “identity crisis” and httpRange-14 continued to perpetuate themselves. In 2006, a major workshop on “Identity, Reference and the Web” (IRW 2006) was held in conjunction with the Web’s major WWW2006 conference in Edinburgh, Scotland, on May 23, 2006 [8]. The various presentations and its summary (by Harry Halpin) are very useful to understand these issues. What was starting to jell at this time was the understanding that the basis of identity and meaning on the Web posed new questions, and ones that philosophers, logicians and linguists needed to be consulted to help inform.

The fiat of the TAG’s 2005 resolution has failed to take hold. Over the ensuing years, various eruptions have occurred on mailing lists and within the TAG itself (now expressed as Issue 57) to revisit these questions and bring the steps moving forward into some coherent new understanding. Though linked data has been premised on best-practice implementation of these resolutions [9], and has been a qualified success, many (myself included) would claim that the extra steps and inefficiencies required from the TAG’s httpRange-14 guidance have been hindrances, not facilitators, of the uptake of linked data (or the semantic Web).

Today, despite the efforts of some to claim the issue closed, it is not. Issue 57 and the periodic bursts from notable semantic Web advocates such as Ian Davis [10], Pat Hayes and Harry Halpin [11], Ed Summers [12], Xiaoshu Wang [13], David Booth [14] and TAG members themselves, such as Larry Masinter [15] and Jonathan Rees [16], point to continued irresolution and discontent within the advocate community. Issue 57 currently remains open. Meanwhile, I think, all of us interested in such matters can express concern that linked data, the semantic Web and interoperable structured data have seen less uptake than any of us had hoped or wanted over the past decade. As I have stated elsewhere, unclear semantics and muddled guidelines help to undercut potential use.

As each of the eruptions over these identity issues has occurred, the competing camps have often been characterized as “talking past one another”; that is, not communicating in such a way as to help resolve to consensus. While it is hardly my position to do so, I try to encapsulate below the various positions and prejudices as I see them in this decades-long debate. I also try to share my own learning that may help inform some common ground. Forgive me if I overly simplify these vexing issues by returning to what I see as some first principles . . . .

What’s in a Name?

One legacy of the initial document Web is the perception that Web addresses have meaning. We have all heard of the multi-million dollar purchasing of domains [17] and the adjudication that may occur when domains are hijacked from their known brands or trademark owners. This legacy has tended to imbue URIs with a perceived value. It is not by accident, I believe, that many within the semantic Web and linked data communities still refer to “minting” URIs. Some believe that ownership and control over URIs may be equivalent to grabbing up valuable real estate. It is also the case that many believe the “name” given to a URI acts to name the referent to which it refers.

This perception is partially true, partially false, but moreover incomplete in all cases. We can illustrate these points with the global icon, “Coca-Cola”.

As for the naming aspects, let’s dissect what we mean when we use the label “Coca-Cola” (in a URI or otherwise). Perhaps the first thing that comes to mind is “Coca-Cola,” the beverage (which has a description on Wikipedia, among other references). Because of its ubiquity, we may also recognize the image of the Coca-Cola bottle to the left as a symbol for this same beverage. (Though, in the hilarious movie, The Gods, They Must be Crazy, Kalahari Bushmen, who had no prior experience of Coca-Cola, took the bottle to be magical with evil powers [18].) Yet even as reference to the beverage, the naming aspects are a bit cloudy since we could also use the fully qualified synonyms of “Coke”, “Coca-cola” (small C), “Classic Coke” and the hundreds of language variants worldwide.

On the other hand, the label “Coca-Cola” could just as easily conjure The Coca-Cola Company itself. Indeed, the company web site is the location pointed to by the URI of http://www.thecoca-colacompany.com/. But, even that URI, which points to the home Web page of the company, does not do justice to conveying an understanding or description of the company. For that, additional URIs may need to be invoked, such as the description at Wikipedia, the company’s own company description page, plus perhaps the company’s similar heritage page.

Of course, even these links and references only begin to scratch the surface of what the company Coca-Cola actually is: headquarters, manufacturing facilities, 140,000 employees, shareholders, management, legal entities, patents and Coke recipe, and the like. Whether in human languages or URIs, in any attempt to signify something via symbols or words (themselves another form of symbol), we risk ambiguity and incompleteness.

URI shorteners also undercut the idea that a URI necessarily “names” something. Using the service bitly, we can shorten the link to the Wikipedia description of the Coke beverage to http://bit.ly/xnbA6 and we can shorten the link to The Coca-Cola Company Web site to http://bit.ly/9ojUpL. I think we can fairly say that neither of these shortened links “name” their referents. The most we can say about a URI is that it points to something. With the vagaries of meaning in human languages, we might also say that URIs refer to something, denote something or identify (but not in the sense of completely define) something.

From this discussion, we can assert with respect to the use of URIs as “names” that:

In all cases, URIs are pointers to a particular referent
In some cases, URIs do act to “name” some things
Yet, even when used as “names,” there can be ambiguity as to what exactly the referent is that is denoted by the name
Resolving what such “names” mean is a matter of context and reference to further information or links, and
Because URIs may act as “names”, it is appropriate to consider social conventions and contracts (e.g., trademarks, brands, legal status) in adjudicating who can own the URI.

In summary, I think we can say that URIs may act as names, but not in all or most cases, and when used as such are often ambiguous. Absolutely associating URIs as names is way too heavy a burden, and incorrect in most cases.

What is a Resource?

The “name” discussion above masks that in some cases we are talking about a readable Web document or image (such as the Wikipedia description of the Coke beverage or its image) versus the “actual” thing in the real world (the Coke beverage itself or even the company). This distinction is what led to the so-called “identity crisis”, for which Ian Davis has used a toucan as his illustrative thing [10].

As I note in the conclusion, I like Davis’ approach to the identity conundrum insofar as Web architecture and linked data guidance are concerned. But here my purpose is more subtle: I want to tease apart still further the apparent distinction between an electronic description of something on the Web and the “actual” something. Like Davis, let’s use the toucan.

In our strawman case, we too use a description of the toucan (on Wikipedia) to represent our “information resource” (the accessible, downloadable electronic document). We contrast to that a URI that we mean to convey the actual physical bird (a “non-information resource” in the jumbled jargon of httpRange-14), which we will designate via the URI of http://example.com/toucan.

Despite the tortured (and newly conjured) distinction between “information resource” and “non-information resource”, the first blush reaction is that, sure, there is a difference between an electronic representation that can be accessed and viewed on the Web and its true, “actual” thing. Of course people can not actually be rendered and downloaded on the Web, but their bios and descriptions and portrait images may. While in the abstract such distinctions appear true and obvious, in the specifics that get presented to experts, there is surprising disagreement as to what is actually an “information resource” v. a “non-information resource” [19]. Moreover, as we inspect the real toucan further, even that distinction is quite ambiguous.

When we inspect what might be a definitive description of “toucan” on Wikipedia, we see that the term more broadly represents the family of Ramphastidae, which contains five genera and forty different species. The picture we are showing to the right is but of one of those forty species, that of the keel-billed toucan (Ramphastos sulfuratus). Viewing the images of the full list of toucan species shows just how divergent these various “physical birds” are from one another. Across all species, average sizes vary by more than a factor of three with great variation in bill sizes, coloration and range. Further, if I assert that the picture to the right is actually that of my pet keel-billed toucan, Pretty Bird, then we can also understand that this representation is for a specific individual bird, and not the physical keel-billed toucan species as a whole.

The point of this diversion is not a lecture on toucans, but an affirmation that distinctions between “resources” occur at multiple levels and dimensions. Just as there is no self-evident criteria as to what constitutes an “information resource”, there is also not a self-evident and fully defining set of criteria as to what is the physical “toucan” bird. The meaning of what we call a “toucan” bird is not embodied in its label or even its name, but in the context and accompanying referential information that place the given referent into a context that can be communicated and understood. A URI points to (“refers to”) something that causes us to conjure up an understanding of that thing, be it a general description of a toucan, a picture of a toucan, an understanding of a species of toucan, or a specific toucan bird. Our understanding or interpretation results from the context and surrounding information accompanying the reference.

In other words, a “resource” may be anything, which is just the way the W3C has defined it. There is not a single dimension which, magically, like “information” and “non-information,” can cleanly and definitely place a referent into some state of absolute understanding. To assert that such magic distinctions exist is a flaw of Cartesian logic, which can only be reconciled by looking to more defensible bases in logic [20].

Peirce and the Logic of Signs

The logic behind these distinctions and nuances leads us to Charles Sanders Peirce (1839 – 1914). Peirce (pronounced “purse”) was an American logician, philosopher and polymath of the first rank. Along with Frege, he is acknowledged as the father of predicate calculus and the notation system that formed the basis of first-order logic. His symbology and approach arguably provide the logical basis for description logics and other aspects underlying the semantic Web building blocks of the RDF data model and, eventually, the OWL language. Peirce is the acknowledged founder of pragmatism, the philosophy of linking practice and theory in a process akin to the scientific method. He was also the first formulator of existential graphs, an essential basis to the whole field now known as model theory. Though often overlooked in the 20th century, Peirce has lately been enjoying a renaissance with his voluminous writings still being deciphered and published.

The core of Peirce’s world view is based in semiotics, the study and logic of signs. In his seminal writing on this, “What is in a Sign?” [21], he wrote that “every intellectual operation involves a triad of symbols” and “all reasoning is an interpretation of signs of some kind”. Peirce had a predilection for expressing his ideas in “threes” throughout his writings.

Semiotics is often split into three branches: 1) syntactics – relations among signs in formal structures; 2) semantics – relations between signs and the things to which they refer; and 3) pragmatics – relations between signs and the effects they have on the people or agents who use them.

Peirce’s logic of signs in fact is a taxonomy of sign relations, in which signs get reified and expanded via still further signs, ultimately leading to communication, understanding and an approximation of “canonical” truth. Peirce saw the scientific method as itself an example of this process.

A given sign is a representation amongst the triad of the sign itself (which Peirce called a representamen, the actual signifying item that stands in a well-defined kind of relation to the two other things), its object and its interpretant. The object is the actual thing itself. The interpretant is how the agent or the perceiver of the sign understands and interprets the sign. Depending on the context and use, a sign (or representamen) may be either an icon (a likeness), an indicator or index (a pointer or physical linkage to the object) or a symbol (understood convention that represents the object, such as a word or other meaningful signifier).

An interpretant in its barest form is a sign’s meaning, implication, or ramification. For a sign to be effective, it must represent an object in such a way that it is understood and used again. This makes the assignment and use of signs a community process of understanding and acceptance [20], as well as a truth-verifying exercise of testing and confirming accepted associations.

John Sowa has done much to help make some of Peirce’s obscure language and terminology more accessible to lay readers [22]. He has expressed Peirce’s basic triad of sign relations as follows, based around the Yojo animist cat figure used by the character Queequeg in Herman Melville’s Moby-Dick:

In this figure, object and symbol are the same as the Peirce triad; concept is the interpretant in this case. The use of the word ‘Yojo’ conjures the concept of cat.

This basic triad representation has been used in many contexts, with various replacements or terms at the nodes. Its basic form is known as the Meaning Triangle, as was popularized by Ogden and Richards in 1923 [23].

The key aspect of signs for Peirce, though, is the ongoing process of interpretation and reference to further signs, a process he called semiosis. A sign of an object leads to interpretants, which, as signs, then lead to further interpretants. In the Sowa example below, we show how meaning triangles can be linked to one another, in this case by abstracting that the triangles themselves are concepts of representation; we can abstract the ideas of both concept and symbol:

We can apply this same cascade of interpretation to the idea of the sign (or representamen), which in this case shows that a name can be related to a word symbol, which in itself is a combination of characters in a string called ‘Yojo’:

According to Sowa [22]:

“What is revolutionary about Peirce’s logic is the explicit recognition of multiple universes of discourse, contexts for enclosing statements about them, and metalanguage for talking about the contexts, how they relate to one another, and how they relate to the world and all its events, states, and inhabitants.

“The advantage of Peircean semiotics is that it firmly situates language and logic within the broader study of signs of all types. The highly disciplined patterns of mathematics and logic, important as they may be for science, lie on a continuum with the looser patterns of everyday speech and with the perceptual and motor patterns, which are organized on geometrical principles that are very different from the syntactic patterns of language or logic.”

Catherine Legg [20] notes that the semiotic process is really one of community involvement and consensus. Each understanding of a sign and each subsequent interpretation helps come to a consensus of what a sign means. It is a way of building a shared understanding that aids communication and effective interpretation. In Peirce’s own writings, the process of interpretation can lead to validation and an eventual “canonical” or normative interpretation. The scientific method itself is an extreme form of the semiotic process, leading ultimately to what might be called accepted “truths”.

Peircean Semiotics of URIs

So, how do Peircean semiotics help inform us about the role and use of URIs? Does this logic help provide guidance on the “identity crisis”?

The Peircean taxonomy of signs has three levels with three possible sign roles at each level, leading to a possible 27 combinations of sign representations. However, because not all sign roles are applicable at all levels, Peirce actually postulated only ten distinct sign representations.

Common to all roles, the URI “sign” is best seen as an index: the URI is a pointer to a representation of some form, be it electronic or otherwise. This representation bears a relation to the actual thing that this referent represents, as is true for all triadic sign relationships. However, in some contexts, again in keeping with additional signs interpreting signs in other roles, the URI “sign” may also play the role of a symbolic “name” or even as a signal that the resource can be downloaded or accessed in electronic form. In other words, by virtue of the conventions that we choose to assign to our signs, we can supply additional information that augments our understanding of what the URI is, what it means, and how it is accessed.

Of course, in these regards, a URI is no different than any other sign in the Peircean world view: it must reside in a triadic relationship to its actual object and an interpretation of that object, with further understanding only coming about by the addition of further signs and interpretations.

In shortened form, this means that a URI, acting alone, can at most play the role of a pointer between an object and its referent. A URI alone, without further signs (information), can not inform us well about names or even what type of resource may be at hand. For these interpretations to be reliable, more information must be layered on, either by accepted convention of the current signs or the addition of still further signs and their interpretations. Since the attempts to deal with the nature of a URI resource by fiat as stipulated by httpRange-14 neither meet the standards of consensus nor empirical validity, the attempt can not by definition become “canonical”. This does not mean that httpRange-14 and its recommended practices can not help in providing more information and aiding interpretation for what the nature of a resource may be. But it does mean that httpRange-14 acting alone is insufficient to resolve ambiguity.

Moreover, what we see in the general nature of Peirce’s logic of signs is the usefulness of adding more “triads” of representation as the process to increase understanding through further interpretation. Kind of sounds like adding on more RDF triples, does it not?

Global is Neither Indiscriminate Nor Unambiguous

Names, references, identity and meaning are not absolutes. They are not philosophically, and they are not in human language. To expect machine communications to hold to different standards and laws than human communications is naive. To effect machine communications our challenge is not to devise new rules, but to observe and apply the best rules and practices that human communications instruct.

There has been an unstated hope at the heart of the semantic Web enterprise that simply expressing statements in the right way (syntax) and in the right form (RDF) is sufficient to facilitate machine communications. But this hope, too, is naive and silly. Just as we do not accept all human utterances as truth, neither will we accept all machine transmissions as reliable. Some of the information will be posted in error; some will be wrong or ill-fitting to our world view; some will be malicious or intended to deceive. Spam and occasionally lousy search results on the Web tell us that Web documents are subject to these sources of unsuitability, why is not the same true of data?

Thus, global data access via the semantic Web is not — and can never be — indiscriminate nor unambiguous. We need to understand and come to trust sources and provenance; we need interpretation and context to decide appropriateness and validity; and we need testing and validation to ensure messages as received are indeed correct. Humans need to do these things in their normal courses of interaction and communication; our machine systems will need to do the same.

These confirmations and decisions as to whether the information we receive is actionable or not will come about via still more information. Some of this information may come about via shared conventions. But most will come about because we choose to provide more context and interpretation for the core messages we hope to communicate.

A Go-Forward Approach

Nearly five years ago Hayes and Halpin put forth a proposal to add ex:refersTo and ex:describedBy to the standard RDF vocabulary as a way for authors to provide context and explanation for what constituted a specific RDF resource [11]. In various ways, many of the other individuals cited in this article have come to similar conclusions. The simple redirect suggestions of both Ian Davis [10] and Ed Summers [12] appear particularly helpful.

Over time, we will likely need further representations about resources regarding such things as source, provenance, context and other interpretations that would help remove ambiguities as to how the information provided by that resource should be consumed or used. These additional interpretations can mechanically be provided via referenced ontologies or embedded RDFa (or similar). These additional interpretations can also be aided by judicious, limited additions of new predicates to basic language specifications for RDF (such as the Hayes and Halpin suggestions).

In the end, of course, any frameworks that achieve consensus and become widely adopted will be simple to use, easy to understand, and straightforward to deploy. The beauty of best practices in predicates and annotations is that failures to provide are easy to test. Parties that wish to have their data consumed have incentive to provide sufficient information so as to enable interpretation.

There is absolutely no reason that these additions can not co-exist with the current httpRange-14 approach. By adding a few other options and making clear the optional use of httpRange-14, we would be very Peirce-like in our go-forward approach: We are being both pragmatic while we add more means to improve our interpretations for what a Web resource is and is meant to be.

[1] Throughout intellectual history, a number of prominent philosophers and logicians have attempted to describe naming, identity and reference of objects and entities. Here are a few that you may likely encounter in various discussions of these topics in reference to the semantic Web; many are noted philosophers of language:

Aristotle (384 BC – 322 BC) – founder of formal logic; formulator and proponent of categorization; believed in the innate “universals” of various things in the natural world
Rudolf Carnap (1891 – 1970) – proposed a logical syntax that provided a system of concepts, a language, to enable logical analysis via exactly formula; a basis for natural language processing;rejected the idea and use of metaphysics
René Descartes (1596 – 1650) – posited a boundary between mind and the world; the meaning of a sign is the intension of its producer, and is private and incorrigible
Friedrich Ludwig Gottlob Frege (1848 – 1925) – one of the formulators of first-order logic, though syntax not adopted; advocated shared senses, which can be objective and sharable
Kurt Gödel (1906 – 1978) – his two incompleteness theorems are some of the most important logic contributions of all time; they establish inherent limitations of all but the most trivial axiomatic systems capable of doing arithmetic, as well as for computer programs
David Hume (1711 – 1776) – embraced natural empiricism, but kept the Descartes concept of an “idea”
Immanuel Kant (1724 – 1804) – one of the major philosophers in history, argued that experience is purely subjective without first being processed by pure reason; a major influence on Peirce
Saul Kripke (1940 – ) – proposed the causal theory of reference and what proper names mean via a “baptism” by the namer
Gottfried Wilhelm Leibniz (1646 – 1716) – the classic definition of identity is Leibniz’s Law, which states that if two objects have all of their properties in common, they are identical and so only one object
Richard Montague (1930 – 1971) – wrote much on logic and set theory; student of Tarski; pioneered a logical approach to natural language semantics; associated with model theory, model-theoretic semantics
Charles Sanders Peirce (1839 – 1914) – see main text
Willard Van Orman Quine (1908 – 2000) – noted analytical philosopher, advocated the “radical indeterminancy of translation” (can never really know)
Bertrand Russell (1872 – 1970) – proposed the direct theory of reference and what it means to “ground in references”; adopted many Peirce arguments without attribution
Ferdinand de Saussure (1857 – 1913) – also proposed an alternative view to Peirce of semiotics, one grounded in sociology and linguistics
John Rogers Searle (1932 – ) – argues that consciousness is a real physical process in the brain and is subjective; has argued against strong AI (artificial intelligence)
Alfred Tarski (1901 – 1983) – analytic philosopher focused on definitions of models and truth; great admirer of Peirce; associated with model theory, model-theoretic semantics
Ludwig Josef Johann Wittgenstein (1889 – 1951) – he disavowed his earlier work, arguing that philosophy needed to be grounded in ordinary language, recognzing that the meaning of words is dependent on context, usage, and grammar.

Also, Umberto Eco has been a noted proponent and popularizer of semiotics.

[2] As any practitioner ultimately notes, standards development is a messy, lengthy and trying process. Not all individuals can handle the messiness and polemics involved. Personally, I prefer to try to write cogent articles on specific issues of interest, and then leave it to others to slug it out in the back rooms of standards making. Where the process works well, standards get created that are accepted and adopted. Where the process does not work well, the standards are not embraced as exhibited by real-world use.

[3] Tim Berners-Lee, 2007. What Do HTTP URIs Identify?

This article does not discuss the other sub-category of URIs, URNs (for names). URNs may refer to any standard naming scheme (such as ISBNs for books) and has no direct bearing on any network access protocol, as do URLs and URIs when they are referenceable. Further, URNs are little used in practice.

[4] Kendall Clark was one of the first to question “resource” and other identity ambiguities, noting the tautology between URI and resource as “anything that has identity.” See Kendall Clark, 2002. “Identity Crisis,” in XML.com, Sept 11 2002; see http://www.xml.com/pub/a/2002/09/11/deviant.html. From the topic map community, one notable contribution was from Steve Pepper and Sylvia Schwab, 2003. “Curing the Web’s Identity Crisis,” found at : http://www.ontopia.net/topicmaps/materials/identitycrisis.html.

[5] Sandro Hawke, 2003. Disambiguating RDF Identifiers. W3C, January 2003. See http://www.w3.org/2002/12/rdf-identifiers/.

[6] The issue was framed as what is the proper “range” for HTTP referrals and was also the 14th major TAG issue recorded, hence the name. See further the httpRange-14 Webography .

[7] See W3C, “httpRange-14: What is the range of the HTTP dereference function?”; see http://www.w3.org/2001/tag/issues.html#httpRange-14.

[8] See http://www.ibiblio.org/hhalpin/irw2006/.

[9] Leo Sauermann and Richard Cyganiak, eds., 2008. Cool URIs for the Semantic Web, W3C Interest Group Note, December 3, 2008. See http://www.w3.org/TR/cooluris/.

[10] Ian Davis, 2010. Is 303 Really Necessary? Blog post, November 2010, accessed 20 January 2012. (See http://blog.iandavis.com/2010/11/04/is-303-really-necessary/.) A considerable thread resulted from this post; see http://markmail.org/thread/mkoc5kxll6bbjbxk.

[11] See first Harry Halpin, 2006. “Identity, Reference and Meaning on the Web,” presented at WWW 2006, May 23, 2006. See http://www.ibiblio.org/hhalpin/irw2006/hhalpin.pdf. This was then followed up with greater elaboration by Patrick J. Hayes and Harry Halpin, 2007. “In Defense of Amibiguity,” http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html.

[12] Ed Summers, 2010. Linking Things and Common Sense, blog post of July 7, 2010. See http://inkdroid.org/journal/2010/07/07/linking-things-and-common-sense/.

[13] Xiaoshu Wang, 2007. URI Identity and Web Architecture Revisited, Word document posted on posterous.com, November 2007. (Former Web documents have been removed.)

[14] David Booth, 2006. “URIs and the Myth of Resource Identity,” see http://dbooth.org/2006/identity/.

[15] See Larry Masinter, 2012. “The ‘tdb’ and ‘duri’ URI Schemes, Based on Dated URIs,” 10th version, IETF Network Working Group Internet-Draft,January 12, 2012. See http://tools.ietf.org/html/draft-masinter-dated-uri-10.

[16] Jonathan Rees has been the scribe and author for many of the background documents related to Issue 57. A recent mailing list entry provides pointers to four relevant documents in this entire discussion. See Jonathan A Rees, 2012. Guide to ISSUE-57 (httpRange-14) document suiteJanuary, 21, 2012.

[17] At least twenty domain names, led by insure.com, have sold for more the $2 million each; see this Wikipedia listing.

[18] In the wonderful movie, The Gods, They Must be Crazy, Bushmen in the Kalahari Desert one day find an unbroken glass Coke bottle that had been thrown out of an airplane. Initially, this strange artifact seems to be another boon from the gods, and the Bushmen find many uses for it. But unlike anything that they have had before, there is only one bottle to go around. This creates jealousy, envy, anger, hatred, even violence. The protagonist, Xi, decides that the bottle is an evil thing and must be thrown off of the edge of the world. The hilarity of the movie comes from that premise and Xi’s encounters with the modern world as he pursues his quest with the magic bottle.

[19] Wang [13]rhetorically asked which of the following things would be categorized as an “information resource”:

A book
A clock
The clock on the wall of my bedroom
A gene
The sequence of a gene
A software
A service
A namespace
An ontology
A language
A number
A concept, such as Dublin Core’s creator.

See the 2007 thread on this issue, mostly by Sean Palmer and Noah Mendelsohn, the latter aknowledging that various experts may only agree on 85% of the items.

[20] See further Catherine Legg, 2010. “Pragmaticsm on the Semantic Web,” in Bergman, M., Paavola, S., Pietarinen, A.-V., & Rydenfelt, H. eds., Ideas in Action: Proceedings of the Applying Peirce Conference, pp. 173–188. Nordic Studies in Pragmatism 1. Helsinki: Nordic Pragmatism Network. See http://www.nordprag.org/nsp/1/Legg.pdf.

[21] Charles Sanders Peirce, 1894. “What is in a Sign?”, see http://www.iupui.edu/~peirce/ep/ep2/ep2book/ch02/ep2ch2.htm.

[22] The figures in particular are from John F. Sowa, 2000. “Ontology, Metadata, and Semiotics,” presented at ICCS 2000 in Darmstadt, Germany, on August 14, 2000; published in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81. May be found at http://www.jfsowa.com/ontology/ontometa.htm. Also see John F. Sowa, 2006. “Peirce’s Contributions to the 21st Century,” presented at International Conference on Conceptual Structures, Aalborg, Denmark, July 17, 2006. See http://www.jfsowa.com/pubs/csp21st.pdf.

[23] C.K. Ogden and I. A. Richards, 1923. The Meaning of Meaning, Harcourt, Brace, and World, New York, 8th edition 1946.

Posted:December 12, 2011

The State of Tooling for Semantic Technologies

Number of Semantic Web Tools Passes 1000 for First Time; Many Other Changes

We have been maintaining Sweet Tools, AI3‘s listing of semantic Web and -related tools, for a bit over five years now. Though we had switched to a structWSF-based framework that allows us to update it on a more regular, incremental schedule [1], like all databases, the listing needs to be reviewed and cleaned up on a periodic basis. We have just completed the most recent cleaning and update. We are also now committing to do so on an annual basis.

Thus, this is the inaugural ‘State of Tooling for Semantic Technologies‘ report, and, boy, is it a humdinger. There have been more changes — and more important changes — in this past year than in all four previous years combined. I think it fair to say that semantic technology tooling is now reaching a mature state, the trends of which likely point to future changes as well.

In this past year more tools have been added, more tools have been dropped (or abandoned), and more tools have taken on a professional, sophisticated nature. Further, for the first time, the number of semantic technology and -related tools has passed 1000. This is remarkable, given that more tools have been abandoned or retired than ever before.

Click here to browse the Sweet Tools listing. There is also a simple listing of URL links and categories only.

We first present our key findings and then overall statistics. We conclude with a discussion of observed trends and implications for the near term.

Key Findings

Some of the key findings from the 2011 State of Tooling for Semantic Technologies are:

As of the date of this article, there are 1010 tools in the Sweet Tools listing, the first it has passed 1000 total tools
A total of 158 new tools have been added to the listing in the last six months, an increase of 17%
75 tools have been abandoned or retired, the most removed at any period over the past five years
A further 6%, or 55 tools, have been updated since the last listing
Though open source has always been an important component of the listing, it now constitutes more than 80% of all listings; with dual licenses, open source availability is about 83%. Online systems contribute another 9%
Key application areas for growth have been in SPARQL, ontology-related areas and linked data
Java continues to dominate as the most important language.

Many of these points are elaborated below.

The Statistical Picture

The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories, each with over 6% of the total, are information extraction, general RDF tools, ontology tools, browser tools (RDF, OWL), and parsers or converters. The relative share by category is shown in this diagram (click to expand):

Since the last listing, the fastest growing categories have been SPARQL, linked data, knowledge bases and all things related to ontologies. The relative changes by tools category are shown in this figure:

Though it is true that some of this growth is the result of discovery, based on our own tool needs and investigations, we have also been monitoring this space for some time and serendipity is not a compelling explanation alone. Rather, I think that we are seeing both an increase in practical tools (such as for querying), plus the trends of linked data growth matched with greater sophistication in areas such as ontologies and the OWL language.

The languages these tools are written in have also been pretty constant over the past couple of years, with Java remaining dominant. Java has represented half of all tools in this space, which continues with the most recent tools as well (see below). More than a dozen programming or scripting languages have at least some share of the semantic tooling space (click to expand):

With only 160 new tools it is hard to draw firm trends, but it does appear that some languages (Haskell, XSLT) have fallen out of favor, while popularity has grown for Flash/Flex (from a small base), Python and Prolog (with the growth of logic tools):

Last Year Change in Sweet Tools Languages

PHP will likely continue to see some emphasis because of relations to many content management systems (WordPress, Drupal, etc.), though both Python and Ruby seem to be taking some market share in that area.

New Tools

The newest tools added to the listing show somewhat similar trends. Again, Java is the dominant language, but with much increased use of JavaScript and Python and Prolog:

The higher incidence of Prolog is likely due to the parallel increase in reasoners and inference engines associated with ontology (OWL) tools.

The increase in comprehensive tool suites and use of Eclipse as a development environment would appear to secure Java’s dominance for some time to come.

Trends and Observations

These dry statistics tend to mask the feel one gets when looking at most of the individual tools across the board. Older academic and government-funded project tools are finally getting cleaned out and abandoned. Those tools that remain have tended to get some version upgrades and improved Web sites to accompany them.

The general feel one gets with regard to semantic technology tooling at the close of 2011 has these noticeable trends:

A three-tiered environment – the tools seem to segregate into: 1) a bottom tier of tools (largely) developed by individuals or small groups, now most often found on Google Code or Github; 2) a middle-tier of (largely) government-funded projects, sometimes with multiple developers, often older, but with no apparent driving force for ongoing improvements or commercialization; and 3) a top-tier of more professional and (often) commercially-oriented tools. The latter category is the most noticeable with respect to growth and impact
Professionalism – the tools in the apparent top tiers feel to have more professionalism and better (and more attractive) packaging. This professionalism is especially true for the frameworks and composite applications. But, it also applies to many of the EU-funded projects from Europe, which has always been a huge source of new tool developments
More complete toolsets – similarly, the upper levels of tools are oriented to pragmatic problems and problem-solving, which often means they embody multiple functions and more complete tooling environments. This category actually appears to be the most visible one exhibiting growth
Changing nature of academic releases – yet, even the academic releases seem to be increasing in professionalism and completeness. Though in the lowest tier it is still possible to see cursory or experimental tool releases, newer academic releases (often) seem to be more strategically oriented and parts of broader programmatic emphases. Programs like AKSW from the University of Leipzig or the Freie Universität Berlin or Finland’s Semantic Computing Research Group (SeCo), among many others, tend to be exemplars of this trend
Rise of commercial interests and enterprise adoption – the growing maturity of semantic technologies is also drawing commercial interest, and the incubation of new start-ups by academic and research institutions acts to reinforce the above trends. Promising projects and tools are now much more likely to be spun off as potential ventures, with accompanying better packaging, documentation and business models
Multiple languages and applications – with this growing complexity and sophistication has also come more complicated apps, combining multiple languages and functions. In fact, for some time the Sweet Tools listing has been justifiably criticized by some as overly “simplifying” the space by classifying tools under (largely) single applications or single languages. By the 2012 survey, it will likely be necessary to better classify the tools using multiple assignments
Google code over SourceForge for open source (and an increase in Github, as well) – virtually all projects on SourceForge now feel abandoned or less active. The largest source of open source projects in the semantic technology space is now clearly Google Code. Though of a smaller footprint today, we are also seeing many of the newer open source projects also gravitate to Github. Open source hosting environments are clearly in flux.

I have said this before, and been wrong about it before, but it is hard to see the tooling growth curve continue at its current slope into the future. I think we will see many individual tools spring up on the open source hosting sites like Google and Github, perhaps at relatively the same steady release rate. But, old projects I think will increasingly be abandoned and older projects will not tend to remain available for as long a time. While a relatively few established open source standards, like Solr and Jena, will be the exception, I think we will see shorter shelf lives for most open source tools moving forward. This will lead to a younger tools base than was the case five or more years ago.

I also think we will continue to see the dominance of open source. Proprietary software has increasingly been challenged in the enterprise space. And, especially in semantic technologies, we tend to see many open source tools that are as capable as proprietary ones, and generally more dynamic as well. The emphasis on open data in this environment also tends to favor open source.

Yet, despite the professionalism, sophistication and complexity trends, I do not yet see massive consolidation in the semantic technology space. While we are seeing a rapid maturation of tooling, I don’t think we have yet seen a similar maturation in revenue and business models. While notable semantic technology start-ups like Powerset and Siri have been acquired and are clear successes, these wins still remain much in the minority.

[1] Please use the comments section of this post for suggesting new or overlooked tools. We will incrementally add them to the Sweet Tools listing. Also, please see the About tab of the Sweet Tools results listing for prior releases and statistics.

Posted:December 5, 2011

An Ontologies Architecture for Ontology-driven Apps

Ontology Modularization and Roles within an OSF Instance

For some time now, Structured Dynamics (SD) has been touting the unique advantages of ODapps, or ontology-driven applications [1]. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules. When these supplements are added to standard ontology functions, we collectively term them adaptive ontologies [2].

To further the discussion around ODapps, today we are publishing two new documents, using the semantic technology foundation of the open semantic framework. OSF is a comprehensive, open source stack of SD and external tools that provides a turnkey environment for enterprises to adopt semantic technologies and approaches. OSF has been designed from the ground up to be an ontology-driven application framework.

The first new document, posted on Fred Giasson’s blog, provides a detailed discussion of the dozen or so roles ontologies can play within an OSF installation. Fred’s document is geared more to specific properties and configurations useful to deploy this framework; that is, the “drivers” in an ODapp setting. The second new document — this one — is more of a broad overview of the modularization and architecture of the constituent ontologies that make up an OSF installation. Both documents have also been posted to SD’s open content TechWiki [3], which now has about 360 technical articles on understanding and implementing an OSF installation, importantly including its ontologies.

OSF Constituent Ontologies

As presently configured, an OSF installation may typically utilize most or all of the following internal ontologies:

The SCO Ontology (Semantic Component Ontology)
The WSF Ontology (Web Service Framework Ontology)
The AGGR Ontology (Aggregation Ontology)
The irON Ontology (Instance Record and Object Notation Ontology)
One or more domain ontologies, to capture the concepts and relationships for the purposes of a given OSF installation, and
Possibly UMBEL (optional) or other upper-level concept ontologies, used for linkages to external systems.

(Note: the internal wiki links to each of these ontologies also provides links to the actual ontology specifications on Github.)

Depending on the specific OSF installation, of course, multiple external ontologies may also be employed. Some of the common external ones used in an OSF installation are described by the external ontologies document on the TechWiki. These external ontologies are important — indeed essential in order to ensure linkage to the external world — but have little to do with internal OSF control structures. That is why the rest of this discussion is focused on internal ontologies only.

The OSF Ontologies Architecture

The actual relationships between these ontologies are shown in the following diagram. Note that the ontologies tend to cluster into two main areas:

Content (or domain) ontologies, which tend to embody more of the traditional ontology functions such as information interoperability. inferencing, reasoning and conceptual and knowledge capture of the applicable domain; and
Administrative ontologies, which govern internal application use and user interface interactions.

This ontology architecture supports the broader open semantic framework:

(click for full size)

The WSF ontology plays a special role in that it sets the overall permission and access rights to the other components and ontologies. The UMBEL ontology (or other upper-level ontologies that might be chosen) is also optional. Such vocabularies are included when interoperability with external applications or knowledge bases is desired.

Summary of OSF Roles

We can further disaggregate these ontology splits with respect to the specific dozen or so ontology roles discussed in Fred’s complementary piece on ontology roles in OSF. These dozen roles are shown by the rows with interactions marked for the various ontologies:

	S C O	A G G R	W S F	i r O N	D o m a i n	U M B E L
Define record descriptions				♦
Inform interface displays				♦	♦	♦
Integrate different data sources				♦	♦	♦
Define component selections	♦			♦	♦	♦
Define component behaviors	♦	♦
Guide template selection				♦	♦	♦
Provide reasoning and inference				♦	♦	♦
Guide content filtering (with and without inference)		♦	♦
Tag concepts in text documents				♦	♦	♦
Help organize and navigate Web portals					♦	♦
Manage datasets and ontologies			♦
Set access permissions and registrations			♦

One of the unique aspects of adaptive ontologies is their added role in informing user interfaces and supporting specific semantic tools. Note, for example, the role of the content ontologies in informing interface displays, as well as their use in tagging concepts (via information extraction). These additional roles are the reason that these ontologies are shown as straddling both content and administrative functions in the first figure.

See Fred’s piece to learn more about these dozen roles.

Interactions Are More Complex than Arrows

Naturally, a simple drawn arrow between ontologies (first figure) or a checkmark on a matrix (table above) can hide important details of how these interactions between ontologies and components actually work. In an earlier article, we discussed how the whole workflow takes place between users and user interface selections affecting the types of data returned by those selections, and then the semantic components (widgets) used to display them. This example interaction is shown by the following animation:

(click for full size)

The blue nodes show the ontology interactions. These, in turn, instruct how the various components (yellow) and code (green) need to operate. These interactions are the essence of an ontology-driven app. The software is expressively designed to respond to specifications in the ontology(ies) used, and the ontologies themselves embrace some additional properties specific to driving those apps.

Possible Future Directions

ODapps are a relatively new paradigm, from which we continue to learn more about uses and potentials. We have wanted to write the first versions of these two new documents for some time, but have held off as we learned and exploited further the latent potentials in this design. As it stands, we see further potentials in this approach, and will therefore be likely adding new ontologies and capabilities to the general system for some time.

Some of the areas that look promising to us include:

A generalized statistical ontology, especially as it can inform data displays in the semantic components
Even more capable widgets in business intelligence (BI) uses, with a concomitant expansion of the vocabulary (predicates and classes) in some of the underlying ontologies
More aggregation and summation functions supported by the AGGR ontology, and
Still further improved permissions and access layers in the WSF ontology.

These potentials arise from the native power of the design basis for ontology-driven apps. Conceptually, the design is simplicity itself. Operationally, the system is extremely flexibile and robust. Strategically, it means that development and specification efforts can now move from coding and programmers to ontologies and the subject matter users who define and depend on them. With these advantages, who can argue with that?

[1] For the most comprehensive discussion of ODapps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.

[2] See M.K. Bergman, 2009. “Ontologies as the ‘Engine’ for Data-Driven Applications“, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.

[3] Slight revisions of these documents have been posted to the TechWiki as Role and Use of Ontologies in OSF and OSF Ontologies Modularization and Architecture, respectively.

Posted:November 15, 2011

UMBEL Services, Part 4: structOntology

Improved Ontology Navigation and Management in Read-only and Editable Forms

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part four discusses structOntology, the online ontology viewing and management tool that is an integral part of the open semantic framework (OSF), the framework that hosts the UMBEL portal.

Ontologies are the central governing structure or “brains” of a semantic installation. As provided by the OSF framework, ontologies are also the basis for instructing user interface labels and how the interface behaves. The Web is about global access, immediacy, flexibility and adaptability. Why can’t our use of ontologies be the same?

Unlike similar tools of the past, structOntology exists on the same installation as the ontology that drives it. It is a backoffice ontology editing and management tool that is part of the conStruct tool suite, accessible via the OSF admin panel. There is no need to go off to a separate application, make changes, re-import, and then test. structOntology allows all of that to occur locally with the instance in which it resides. Also, there are some important functionality differences — especially finding and selecting stuff and search — that sets structOntology apart from existing, conventional tools.

Yet, that being said, structOntology is also not the complete Swiss Army knife for ontology management. It is designed for local and immediate use. Its spectrum of functionality is not as complete as other ontology frameworks (for example, supporting reasoners, consistency testers or plug-ins). So, for immmediate and locally relevant use, structOntology appears to be the appropriate tool. For more detailed ontology work or testing, other frameworks are perhaps more useful. And, in recognition of these roles, structOntology also has robust import and export capabilities that enable these dual local-detailed use scenarios. For these distinctions, see further the structOntology v Protégé? document.

structOntology comes in two versions. First, there is the read-only version, which can be made publicly available, that is a great aid to ontology navigation and discovery. This is the version viewable on the UMBEL portal. Second, there is an editable version, which is only available to administrators via a back office function within an OSF instance. Some screen shots of this version, plus pointers to more documentation about it, are provided below.

OWL API as a First-class Citizen

What enables OSF to treat ontologies as a first-class citizen — viewable and editable from within the applications in which they operate — results from the incorporation of the OWL API as one of the major engines underlying the structWSF Web services framework, the key foundational basis to an OSF installation. As noted in Part 2 of this series, the OWL API is one of the four major engines supporting structWSF:

The OWL API is the same engine used by Protégé 4, which is why both structOntology and Protégé are fully interoperable.

Besides interoperabilty, the use of the OWL API also means that other OWL API-based tools, such as reasoners or mappers, may be linked into the system. This design is in keeping with our normative view of an ontology tooling landscape, which Structured Dynamics keeps pursuing in a steady, incremental manner [2]. Further, because of its sibling engines, the OWL API and OSF are also able to leverage the other engines supporting structWSF, such as Solr for advanced search or efficient indexing in the RDF triplestore. (The advantages go both ways, too, such as for example enabling the OWL API to feed appropriate ontology specifications to the GATE text processing area for uses such as ontology-based information extraction [OBIE]). All of this makes for a most powerful and capable foundation to an OSF instance.

The Read-Only Version (UMBEL)

Since UMBEL is a reference ontology and the UMBEL portal is an access point to those references and specifications, we really don’t want casual users making modifications to the ontology [3]. For this reason, only a read-only version of structOntology is provided on the portal.

Access to the structOntology function occurs via the Ontology link on the UMBEL portal. Upon access, you are presented with the main structOntology interface:

The organization of the structOntology application presents all currently available and active ontologies listed in the left panel; UMBEL, of course, is the one selected here. Since this is a read-only version, only the View button shows up in the right-hand panel. (For the options available in the editable version, see below.)

View Option

Upon invoking the View option, the hierarchical tree for the selected ontology appears on the left; structural and definitions on the right.

You may expand the tree and explore the structure deeper by either clicking on the tree nodes in the left-hand panel or the item links in the right-hand panel. If there are further levels in the tree, you will get the JavaScript ‘working’ icon and then see the tree expanded with the new node information shown to the right.

Also note that your interaction with the structOntology application is recounted via the “breadcrumbs” listing at the upper left of the application. The green arrow icon allows you to expand or collapse various sections in the display.

Tooltips

The tree labels are themselves based on the preferred labels assigned to things. However, if you want to see the actual ontology URI reference, you can do so via the tooltip when mousing over the item:

The tooltip shows the full URI path (unique identifier) of the selected item.

Classes Tab

This example has been based on the Classes tab, which are the reference concepts in the UMBEL context. In read-only mode, the basic information presented is the tree structure, the item description and prefLabel, and super and sub class information in the right-hand panel. (More options are available in the editable version; see below.)

Properties Tab

Properties — that is the relations or predicates between items or nodes — are presnted in a similar manner to that for Classes. The Properties tab has the same basic layout and operations as the Classes tab, including similar right-hand panels:

The Editable Version

The editable version of structOntology shares all of the functionality of the read-only version. Besides adding editing capabilities, the editable version also has other functionality related to general ontology creation and management. There is separate documentation for the editable version; the examples below are from a different instance than UMBEL.

The editable version is accessed via the backoffice admin function within an OSF instance. When invoked, it also has more management options presented in the right-hand panel:

We’ll highlight some of the differences from the read-only version below.

Create New Option

The first notable addition is the ability to create ontologies (as well as to delete, or Remove, them):

The URL (such as http://purl.org/ontology/myont#) becomes the base URI for the new ontology. The new ontology is created with a basic structure, from which you only need fill in your new concepts or classes and relationships:

Basic stubbing is provided for the new ontology to help bootstrap its development (not shown). Once created, this new ontology also now appears on the available local ontologies when first invoking the structOntology application.

View Option

Most screens are quite similar to the read-only version with the obvious change of replacing labels with edit boxes. It is via these edit fields that the ontology becomes editable. This change is quite evident for the View screen:

Searching

Searching can take place on the currently active ontology or all loaded (available) ontologies. Note that selection was made above via the radiobutton under the search box.

Also, depending on settings, searching can also take place on only the preferred label, or on alternative labels or descriptions (in fact, all annotations). (This is part of the settings.)

When entering search terms, the system automatically attempts to complete the matching search phrase. A minimum of three entered characters guides this auto-completion functionality:

When search is initiated, the potential results list also auto-completes for what you have already typed into the search box. Upon selection of one of these items (or completion of the full search phrase), the structOntology system issues a search query to the remote server, which then acts to auto-populate the ontology tree on the left-hand panel. In this case, we have selected ‘communitiy facilities’:

The desired search results then automatically expand the ontology tree. This is really helpful for longer ontologies (the example one shown has about 3000 concepts and about 6000 axioms) and means quicker initial tree loading. Once completed, the (multiple) occurrences of the search item are shown in highlight throughout the tree.

Note this search is not necessarily restricted to the actual node label. Alternative labels and descriptions may also be used to find the search results. This greatly expands the findability of the search function. Here is a great example of matching the OWL API engine to Solr underneath a structWSF instance.

Tab Structure

The editable version of structOntology offers more detail in the right-hand panel when Viewing an item. These sections include:

Annotations
Structural relationships
Instances
Linkage to characteristics, and
Advanced settings.

Each section is editable. All have auto-complete. Each section may also be expanded or collapsed.

General Operations

Each panel has an expand and collapse arrow shown at the upper right of its panel. These causes the panel’s individual entries to either be exposed or hidden. At the right of each entry, new entries can be invoked with the green plus symbol; existing entries can be deleted with the red minus symbol. (See Structural Relationships below.)

In working with each panel, note that each entry also has the search and auto-complete features earlier noted. Drag-and-drop is also contextual into these panels or not, depending on the nature of the item selected in the left-hand panel (tree).

Annotations

Annotations provide the descriptions about the thing at hand and its associated metadata. (These are separately defined under the Properties tab, or as part of the imported ontology specification.) The available annotations are displayed in this panel when expanded:

Entries are simply provided by entering values into the text fields and then Saving.

Structural Relationships

The structural relationships are the means to set parent and child relations between concepts, as well as to instruct disjoint or equivalent class relations. The Structural Relationships panel is the key one for setting the interconnections within the graph structure at the heart of the governing ontology.

Most of the key structural relationships in OWL are provided by this panel. (However, note there are some additional and rarely used structural specifications in OWL. These must be set via a third-party external application. Such potential interactions are made possible via the flexible import and export options with structOntology).

Instances (Individuals)

Another right-hand panel provides the facility to assign individuals to the classes (or concepts) established under the prior two panels. In this case, we are looking at some specific ‘community facilities’ to assign to that concept:

As with the prior panels, a new instance may be added or discarded ones deleted. Individual instances and their characteristics may also be updated or changes.

Linkage to Characteristics

Another aspect to OSF ontologies is the ability to relate concepts to various metadata characteristics or attributes that might describe that concept’s instances. This relationship is done via the dedicated hasCharacteristic property, which is assigned via this right-hand panel:

This option has the specific behavior of allowing one or more properties (characteristics) to be asserted for a given a class (concept).

Advanced Options

Display and widget and other options are set under the Advanced Options panel. One item to note are the widgets that may be assigned for displaying a given information item:

The relationship of widgets (or semantic components) to information items is a deserving topic in its own right. For more information about this topic, see the semantic components category.

Contextual Drag-and-Drop

In edit mode, it is possible to drag items from the left-hand tree panel into the specifications at the right. This is contextual. In this first example, we see an attempt to drop a “class” result (or concept) into the annotation panel, which violates the structure of the system and is therefore not allowed (as shown by the visual red X cues):

However, if we drag and drop from the tree in an allowable structural definition, we get the visual green check as a cue the move is legal:

This functionality and feedback means that only allowable assignments can be dropped into a new structural definition.

Export Option

Another piece of functionality in the editable version is the export option. When invoked, Export brings up the save dialog with the ability to assign an ontology file name:

Upon saving, it stores the currently active ontology in RDF/XML format:

Export is not active in UMBEL do to the large size of the ontology. If you want to obtain it directly, you may do so from the UMBEL ontology CVS.

Import Option

An Import option is available in the editable version. structOntology import supports all OWL API serializations, specifically RDF/XML, N3, Manchester Syntax and Turtle. When import is invoked, a file open dialog is presented that enables you to find the ontology on your local hard drive:

The Import feature has no file extension limitations; make care to pick and assign the proper types for importation.

Via the Import and Export buttons, it is possible to work locally with structOntology while exporting to more capable third-party tools. Then, once use of those tools is complete, Import provides the ability to re-import the updated ontology back into the local collection.

File Options

Finally, as a server-based system accessed via Web services, there are some slightly different concepts necessary to keep in mind when using the editable version of structOntology. These distinctions need to be kept in mind because you might be working with the local version or the one on the main server. These file options are:

Save — saves all modifications on the file, on the server. Then, all modifications will be used if you do a Reload
Unload — removes the currently active ontology from the local instance, but does NOT remove it from the server. It merely acts to remove that ontology for local use in the current session
Remove — a full delete of the ontology, both locally and on the server
Update — recreates the serializations files created from these ontologies, like the .SRZ files used by structWSF and conStruct; the ironXML schema used by the semantic components, etc. The Update option is the most common one when updating an ontology locally, for which you want the persistent version on the remote server to be kept in sync
Reload — reloads the server version. If prior local work had not been updated, then a reload acts as a way to restore the remote instance to the local one without change..

These are all available via buttons under the main right-hand panel in structOntology and are more fully described in the edit version documentation.

Additional Information

Additional information on structOntology may be found in an online video:

structOntology intro video.

This is the fourth of a multi-part series on the newly updated UMBEL services. Other articles in this series are:

[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.

[2] See especially the second figure and the accompanying discussion in this document.

[3] The appropriate pathway for suggested changes to the UMBEL ontology itself is via its official mailing list.

Posted:November 9, 2011

UMBEL Services, Part 3: Concept Browser

The OSF Browser is Now More Configurable

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part three deals with the portal’s navigational tool, the concept or relation browser [2]. It is a favorite component of the open semantic framework (OSF).

Discovery and navigation in a graph structure — as is the basis of ontologies and the UMBEL structure — can be difficult. It is made even more difficult when the number of concepts in the object space is large. With 28,000 concepts, UMBEL is one of the largest public ontologies extant. The relation browser is designed specifically to address these difficulties.

The concept browser in UMBEL is invoked via the main menu option or by clicking on the browser icon [] shown in conjuction with a given concept. Here is an example for the concept ‘tractor’:

Note in this case that the More details … link brings you to a detailed concept view, as was covered in the previous part in this series.

With its extreme configurability and flexibility — see further below — the relation browser can be an essential foundation to an open semantic framework installation. But, the best part about the relation browser is that it is fun to use. Clicking bubbles and dynamically moving through the graph structure is like snorkeling through a massive school of silvery fish.

Origins of the Relation Browser

We have been featuring the relation browser since April 2008 when the first UMBEL sandbox was released:

The relation browser is the innovation of Moritz Stefaner, one of the best data and information visualization gurus around. He continues to innovate in large-scale information spaces, and is a frequent speaker at information visualization conferences. Moritz’s Web site and separate blog are each worth perusing for neat graphics and ideas.

Configurability

Since our first efforts with the browser, we have worked to extend its applicability and configurability. The relation browser can be downloaded separately from our semantic components code distribution site.

The relation browser is configured via an XML specification file. Separate specifications are available for the nodes (classes or concept) and connecting edges (predicates or properties). Here are the current configuration options:

NODE PARAMETERS
label	`label` is the label assigned to a given node; by default, the end of the URI of the type will be used as the label
displayNodeLabel	a Boolean value whether to display or hide a label for a specific node
tooltips	the tooltip to be displayed when mousing over a specific node
textFont	defines the font of the text label on the node; for example: “Verdana”
textColor	defines the color of the text label on the node; value in RGB hex format
textSize	defines the size of the text to display in the node
image	a URL to an image to use to display at the position of the node
shape	a shape of the node to display; available values are “circle”, “block”, “polygon”, “square”, “cross”, “X”, “triangleUp”, “triangleDown”, “triangleLeft”, “triangleRight”, “diamond”
lineWeight	defines the size of the line of the border for the node’s shape
lineColor	defines the color of the line of the border for the node’s shape; value in RGB hex format
fillColor	defines the color to use within the shape for the node; value in RGB hex format
radius	defines the radius of the node. The radius is an invisible boundary where the edges get attached
backgroundScaleFactor	scale factor for the node’s shape background; a scale factor of 1.25 means that it is 125% normal size
textScaleFactor	scale factor of the node’s text label
textOffsetX	X Offset where to start displaying the text within the node’s shape
textOffsetY	Y Offset where to start displaying the text within the node’s shape
textMultilines	multi-lines means that each word of a label is displayed on a single line
textMaxWidth	maximum width of the text; if longer, then it is truncated with an ellipsis (“…”) appended
textMaxHeight	maximum height of the text; if higher, then it is truncated with an ellipsis (“…”) appended
selectedNodeColorOverlay	defines a color to overlay on the center (selected) node of the graph; it is defined by a series of 4 different offsets [alpha, red, green, blue] ranging from -255 to 255 in relation to the base node’s values; can, for example, to make the central node of the graph brighter
overNodeColorOverlay	defines a color to overlay on a moused over node of the graph; it is defined by a series of 4 different offsets [alpha, red, green, blue] ranging from -255 to 255 in relation to the base node’s values; can, for example, to make a moused over node of the graph brighter

EDGE PARAMETERS
displayLabel	the `label` to display over the center of the edge
tooltipLabel	the tooltip to be displayed when mousing over a specific edge
directedArrowHead	defines the type of the arrow for the edge; available values are “none”, “triangle”, “lines”
textFont	defines the font of the text label on the edge
textColor	defined the color of the text label on the edge; value in RGB hex format
textSize	defines the size of the text to display on the edge
image	a URL to an image to use to display over the edge at middle of the two connected nodes
lineWeight	defines the size of the line for the edge connector
lineColor	defines the color of the line for the edge connector; value in RGB hex format

It is also possible to specify a breadcrumb in association with the browser.

Besides these configurations, the API for the relation browser also provides for methods to:

Link Nodes to Objects
Link Nodes to Displays

Via these mechanisms, the relation browser can become a central focal point for any OSF installation. See further the specifications for additional ideas and tips.

Some Other Examples

Here are some other examples of relation browsers you can see across the Web:

This is the third of a multi-part series on the newly updated UMBEL services. Other articles in this series are:

[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.

[2] Various clients and users have named this widget a number of things, including spider, concept explorer, relation browser and concept browser.

Main Links

Search

Coca-Cola, Toucans and Charles Sanders Peirce

‘Identity Crisis’, httpRange-14, and Issue 57

What’s in a Name?

What is a Resource?

Peirce and the Logic of Signs

Peircean Semiotics of URIs

Global is Neither Indiscriminate Nor Unambiguous

A Go-Forward Approach

Number of Semantic Web Tools Passes 1000 for First Time; Many Other Changes

Key Findings

The Statistical Picture

New Tools

Trends and Observations

Ontology Modularization and Roles within an OSF Instance

OSF Constituent Ontologies

The OSF Ontologies Architecture

Summary of OSF Roles

Interactions Are More Complex than Arrows

Possible Future Directions

Improved Ontology Navigation and Management in Read-only and Editable Forms

OWL API as a First-class Citizen

The Read-Only Version (UMBEL)

View Option

Tooltips

Classes Tab

Properties Tab

The Editable Version

Create New Option

View Option

Searching

Tab Structure

General Operations

Annotations

Structural Relationships

Instances (Individuals)

Linkage to Characteristics

Advanced Options

Contextual Drag-and-Drop

Export Option

Import Option

File Options

Additional Information

The OSF Browser is Now More Configurable

Origins of the Relation Browser

Configurability

Some Other Examples