The Time and Technology is Here to Stand Software Engineering on its HeadAs an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value [1]. We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing [3]. In the U.S. alone, about 2 million people work as programmers or related [4].
It goes without saying that software is a very big deal.
No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership [5]. Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception.
This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in [1]) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare.
Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps.
In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership.
For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 [6]. In this review, he points to these advantages for ontology-based approaches to software engineering:
These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:
In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 [7]. The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software [8].
Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is:
When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches.
For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as:
Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized [12]. Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down.
Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example [13]?
For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline.
Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset [14].
If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions.
Ontology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components [15]. However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision.
ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies [16].
ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.
ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.
In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps).
In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago.
This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through:
Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP.
Booch has maintained for years that in software design abstraction is good, but not if too abstract [1]. ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology [16].
In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact.
Again, to cite Booch, the most fundamental software design decision is architecture [1]. In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps [17]. The major layers in the OSF stack are:
Not all of these layers or even their specifics is necessary for an ontology-driven app design [18]. However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings.
The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework.
The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases.
What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos:
(click for full size)
Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets.
Conversion of existing assets to RDF and canonical forms is not discussed further here. See the irON and scones documentation or the TechWiki for more information on these topics.
The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to:
Here is a listing of current ODapp functions within structWSF (with links to details for each):
| WSF management Web services |
User-oriented Web services
|
At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection.
The visualization and data display and manipulation ODapps are provided via the semantic components layer. Structured Dynamics’s sComponents are Flex-based widgets that conform to a standard, generic design. Other developers using the OSF framework are developing JavaScript versions [19]. Here is the current library (with links to details for each):
| New Components |
Components Extending Flex
|
|
These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above.
The next animated figure shows how the basic interaction flow works with these components:
(click for full size)
Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise.
As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema:
(click for full size)
These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas.
As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
Since self-service reporting has been such a disappointment [12], it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object).
However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:
| Thing | ||||||||||
| Product | ||||||||||
| Camera | ||||||||||
| Digital Camera | ||||||||||
| SLR Digital Camera | ||||||||||
| Olympus Evolt E520 |
At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.
This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain.
It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT.
The Mission Critical IT reference provided earlier [11] helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics.
All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it.
Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures.
This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head.
What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate [20]. These are important development needs for the near term.
And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end.
What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand.
A Seminal Release by SD and Ontotext; Links to Wikipedia and PROTONStructured Dynamics and Ontotext are pleased to announce — after four years of iterative refinement — the release of version 1.00 of UMBEL (Upper Mapping and Binding Exchange Layer). This version is the first production-grade release of the system. UMBEL’s current implementation is the result of much practical experience.
UMBEL is primarily a reference ontology, which contains 28,000 concepts (classes and relationships) derived from the Cyc knowledge base. The reference concepts of UMBEL are mapped to Wikipedia, DBpedia ontology classes, GeoNames and PROTON.
UMBEL is designed to facilitate the organization, linkage and presentation of heterogeneous datasets and information. It is meant to lower the time, effort and complexity of developing, maintaining and using ontologies, and aligning them to other content.
This release 1.00 builds on the prior five major changes in UMBEL v. 0.80 announced last November. It is open source, provided under the Creative Commons Attribution 3.0 license.
In broad terms, here is what is included in the new version 1.00:
rdf:type UMBEL’s basic vocabulary can also be used for constructing specific domain ontologies that can easily interoperate with other systems. This release sees a number of changes in the UMBEL vocabulary:
correspondsTo predicate has been added for nearly or approximate sameAs mappings (symmetric, transitive, reflexive) hasMapping predicate relatesToXXX predicates have been added to relate external entities or concepts to UMBEL SuperTypes The UMBEL Vocabulary defines three classes:
The UMBEL Vocabulary defines these properties:
correspondsTo isAbout isRelatedTo relatesToXXX (31 variants related to SuperTypes) isLike hasMapping hasCharacteristic isCharacteristicOf. The UMBEL vocabulary also has a significant reliance on SKOS, among other external vocabularies.
Here are links to various downloads, specifications, communities and assistance.
All documentation from the prior v 0.80 has been updated, and some new documentation has been added:
Major updates were made to the specifications and Annex G; Annex H is new. Minor changes were also made to Annexes A and B. All remaining Annexes only had minor header changes. All spec documents with minor or major changes were also versioned, with the earlier archives now date stamped.
All UMBEL files are listed on the Downloads and SVN page on the UMBEL Web site. The reference concept and mapping files may also be obtained from http://code.google.com/p/umbel/source/browse/#svn/trunk.
To learn more about UMBEL or to participate, here are some additional links:
These latest improvements to UMBEL and its mappings have been undertaken by Structured Dynamics and Ontotext. Support has also been provided by the European Union research project RENDER, which aims to develop diversity-aware methods in the ways Web information is selected, ranked, aggregated, presented and used.
This release continues the path to establish a gold standard between UMBEL and Wikipedia to guide other ontological, semantic Web and disambiguation needs. For example, the number of UMBEL reference concepts was expanded by some 36% from 20,512 to 27,917 in order to provide a more balanced superstructure for organizing Wikipedia content. And across all mappings, 60% of all UMBEL reference concepts (or 16,884) are now linked directly to Wikipedia via the new umbel:correspondsTo property. A later post will describe the design and importance of this gold standard in greater detail.
Next releases will expand this linkage and coverage, and bring in other important reference structures such as GeoNames and others. This version of UMBEL will also be incorporated into the next version of FactForge. We will also be re-invigorating the Web vocabulary access and Web services, and adding tagging services based on UMBEL.
We invite other players with an interest in reusable and broadly applicable vocabularies and reference concepts to join with us in these efforts.
Changes Instigated by UMBEL May Benefit OthersIn the semantic Web, arguably SKOS is the right vocabulary for representing simple knowledge structures [1] and OWL 2 is the right language for asserting axioms and ontological relationships. In the early days we chose a reliance on SKOS for the UMBEL reference concept ontology, because of UMBEL’s natural role as a knowledge structure. Most recently we also migrated UMBEL to OWL 2 to gain (among other reasons) the metamodeling advantages of “punning”, which allows us to treat things as either classes or instances depending on modeling needs, context and viewpoint [2].
But — until today — we could not get SKOS and OWL 2 to play together nicely. This meant we could not take full advantage of each language’s respective strengths.
Happily, that gap has now been closed. By action of a relatively minor change by the SKOS Work Group (WG) and the addition of a simple statement, SKOS more fully interoperates with OWL 2.
SKOS was and is purposefully designed to be simple and to stick to its knowledge system scope. The authors of the language avoided many restrictions and kept axioms regarding the language to a minimum. As a result, since its inception, in relation to OWL, the core SKOS has been expressed as OWL Full (that is, undecidable and more free-form in interpretation).
However, also for some time it has been understood that, with some relatively minor changes, the core SKOS language could be modified to be OWL DL (decidable). In fact, since June 2009 there has been a DL version of SKOS, called by the editors the “DL prune” [3]. A useful discussion of the specific axioms that cause the core SKOS to require interpretation as OWL Full is provided by the Semantic Web and Interoperability Group at the University of Patras [4].
Most of the conditions in question relate to the various annotation properties in SKOS, such as skos:prefLabel, skos:altLabel and skos:hiddenLabel. These are core constructs within the use of “semsets” to describe and refer to reference concepts in UMBEL [5]. Among others, a simple explanation for one issue is that these labels within SKOS are defined as sub-properties of rdfs:label, which is not allowable in OWL 2. Where such conflicts exist, they are removed (“pruned”) from the SKOS DL version. There are only a minor few of these, and not (in our view) central to the purpose or usefulness of SKOS.
For UMBEL, and we think other vocabularies, the transition to OWL 2 is important for many reasons, including the “punning” of individuals and classes. Other reasons include better handling of annotation properties, a better emerging set of tools, and the use of inference and reasoning engines.
Thus, to square this circle when using SKOS in OWL 2, it is important to refer to the DL version of SKOS and not SKOS core. However, since there was no version acknowledgment to the core in the SKOS DL namespace, it was not possible to make this reference while retaining general SKOS namespace (skos:XXX) compatibility.
My UMBEL co-editor, Fred Giasson, first brought this issue to the SKOS WG’s attention in November [6]. He further suggested a relatively simple means to resolve the problem. By including a version reference in the SKOS DL version to SKOS core, consistent with the current OWL 2 specification [7], OWL 2-compliant tools will now know how to discern the proper references. This fix looks as follows:
<rdf:Description rdf:about="http://www.w3.org/2004/02/skos/core">
<owl:versionIRI rdf:resource="http://www.w3.org/TR/skos-reference/skos-owl1-dl.rdf"/>
</rdf:Description>
Now, in our ontology (UMBEL), we define the skos namespace (N3 format):
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
and, then, import the SKOS DL version:
<http://umbel.org/umbel> rdf:type owl:Ontology ;
<-- statements --> ;
owl:imports <http://www.w3.org/TR/skos-reference/skos-owl1-dl.rdf> .
Pretty simple and straightforward.
With the expeditious treatment of the SKOS group [8], this proposal was floated and commented upon and then passed and adopted today. The change was also posted as an erratum to the SKOS specification [9]. As Fred reported [10], his testing of the fix with available tools (such as Protégé 4.1 and reasoners) also appears to be working properly (on a 58 MB file with 28 K concepts and all of their annotations and relationships!).
This is a good example of how even simple changes may make major differences. It also shows how even simple changes may take some time and effort in a standards-making environment. But, even though simple, we think this will be highly useful to keeping SKOS a central vocabulary within OWL 2-based systems. Within the context of conceptual and domain vocabularies, such as represented by UMBEL or other ontologies based on the UMBEL vocabulary or similar approaches, we see this as a major win.
Finally, as for UMBEL itself, this change has allowed us to remove duplicate predicates in favor of those from SKOS. It has also caused a delay in our release of UMBEL v. 1.00, planned for today, until February 15 to complete testing and documentation revisions. But we’ll gladly take a change like this in trade for a minor delay any day.
Thanks, SKOS!
Refining UMBEL’s Linking and Mapping Predicates with WikipediaWe are only days away from releasing the first commercial version 1.00 of UMBEL (Upper Mapping and Binding Exchange Layer) [1]. To recap, UMBEL has two purposes, both aimed to promote the interoperability of Web-accessible content. First, it provides a general vocabulary of classes and predicates for describing domain ontologies and external datasets. Second, UMBEL is a coherent framework of 28,000 broad subjects and topics (the “reference concepts”), which can act as binding nodes for mapping relevant content.
This last iteration of development has focused on the real-world test of mapping UMBEL to Wikipedia [2]. The result, to be more fully described upon release, has led to two major changes. It has acted to expand the size of the core UMBEL reference concepts to about 28,000. And it has led to adding to and refining the mapping predicates necessary for UMBEL to fulfill its purpose as a reference structure for external resources. This latter change is the focus of this post.
There is a huge diversity of organizational structure and world views on the Web; the linking and mapping predicates to fulfill this purpose must also capture that diversity. Relations between things on the Web can range from the exact and identity, to the approximate, descriptive and casual [3]. The 16 K direct mappings that have now been made between UMBEL and Wikipedia (resulting in the linkage of more than 2 million Wikipedia pages) provide a real-world test for how to capture this diversity. The need is to find the range of predicates that can reflect and capture quality, accurate mappings. Further, because mappings also can be aided with a variety of techniques from the manual to the automatic, it is important to characterize the specific mapping methods used whenever a linking predicate is assigned. Such qualifications can help to distinguish mapping trustworthiness, plus enable later segregation for the application of improved methods as they may arise.
As a result, the UMBEL Vocabulary now has a pretty well vetted and diverse set of linking and mapping predicates. Guidelines for how these differ, how they are used, and how they are qualified is described next.
Properties for linking and mapping need to differ more than in name or intended use. They must represent differences that affect inferences and reasoners, and can be acted upon by specific utilities via user interfaces and other applications. Furthermore, the diversity of mapping predicates should capture the types of diverse mappings and linkages possible between disparate sources.
Sometimes things are individuals or instances; other times they are classes or groupings of similar things. Sometimes things are of the same kind, but not exactly aligned. Sometimes things are unlike, but related in a common way. (Everything in Britain, for example, is a British “thing” even though they may be as different as trees, dead kings or cathedrals.) Sometimes we want to say something about a thing, such as an animal’s fur color or age, as a way to further characterize it, and so on.
The OWL 2 language and existing semantic Web languages give us some tools and existing vocabulary to capture some of this diversity. How these options, plus new predicates defined for UMBEL’s purposes, compare is shown by this table:
| Property | Relative Strength | Usage | Standard Reasoner? | Inverse Property? | Kind of Thing | Symmetrical? | Transitive? | Reflexive? | |
| It is | It Relates to | ||||||||
owl:equivalentClass |
10 | equivalence | X | N/A | class | class | yes | yes | yes |
owl:sameAs |
9 | identity | X | N/A | individual | individual | yes | yes | yes |
rdfs:subClassOf |
8 | subset | X | class | class | no | yes | yes | |
umbel:correspondsTo |
7 | ~equivalence | + / - | anything | RefConcept | yes | yes | yes | |
skos:narrowerTransitive |
6 | hierarchical | X | skos:Concept | skos:Concept | no | yes | no | |
skos:broaderTransitive |
6 | hierarchical | X | skos:Concept | skos:Concept | no | yes | no | |
rdf:type |
5 | membership | X | anything | class | no | no | no | |
umbel:isAbout |
4 | topical | X | anything | RefConcept | perhaps | not likely | not likely | |
umbel:isLike |
3 | similarity | anything | anything | yes | no | not likely | ||
umbel:relatesToXXX |
2 | relationship | anything | SuperType | no | no | not likely | ||
umbel:isCharacteristicOf |
1 | attribute | X | anything | RefConcept | no | no | no | |
I discuss each of these predicates below. But, first, let’s discuss what is in this table and how to interpret it [4].
The Usage metric is described for each property below.
To further aid the understanding of these properties, we can also group them into equivalence, membership, approximate or descriptive categories.
Equivalent properties are the most powerful available since they entail all possible axioms between the resources.
Equivalent class means that two classes have the same members; each is a sub-class of the other. The classes may differ in terms of annotations defined for each of them, but otherwise they are axiomatically equivalent.
An owl:equivalentClass assertion is the most powerful available because of its ability to ‘Explode the Domain‘ [6]. Because of its entailments, owl:equivalentClass should be used with great care.
The owl:sameAs assertion claims two instances to be an identical individual. This assertion also carries with it strong entailments of symmetry and reflexivity.
owl:sameAs is often misapplied [7]. Because of its entailments, it too should be used with great care. When there are doubts about claiming this strong relationship, UMBEL has the umbel:isLike alternative (see below).
Membership properties assert that an instance is a member of a class.
The rdfs:subClassOf asserts that one class is a subset of another class. This assertion is transitive and reflexive. It is a key means for asserting hierarchical or taxonomic structures in an ontology. This assertion also has strong entailments, particularly in the sense of members having consistent general or more specific relationships to one another.
Care must be exercised that full inclusivity of members occurs when asserting this relationship. When correctly asserted, however, this is one of the most powerful means to establish a reasoning structure in an ontology because of its transitivity.
Both of these predicates work on skos:Concept (recall that umbel:RefConcept is itself a subClassOf a skos:Concept). The predicates state a hierarchical link between the two concepts that indicates one is in some way more general (“broader”) than the other (“narrower”) or vice versa. The particular application of skos:broaderTransitive (or its complement) is used to infer the transitive closure of the hierarchical links, which can then be used to access direct or indirect hierarchical links between concepts.
The transitive relationship means that there may be intervening concepts between the two stated resources, making the relationship an ancestral one, and not necessarily (though it is possible to be so) a direct parent-child one.
The rdf:type assertion assigns instances (individuals) to a class. While the idea is straightforward, it is important to understand the intensional nature of the target class to ensure that the assignment conforms to the intended class scope. When this determination can not be made, one of the more approximate UMBEL predicates (see below) should be used.
For one reason or another, the precise assertions of the equivalent or membership properties above may not be appropriate. For example, we might not know sufficiently an intended class scope, or there might be ambiguity as to the identity of a specific entity (is it Jimmy Johnson the football coach, race car driver, fighter, local plumber or someone else?). Among other options — along a spectrum of relatedness — is the desire to assign a predicate that is meant to represent the same kind of thing, yet without knowing if the relationship is an equivalence (identity, or sameAs), a subset, or merely just a member of relationship. Alternatively, we may recognize that we are dealing with different things, but want to assert a relationship of an uncertain nature.
This section presents the UMBEL alternatives for these different kinds of approximate predicates [4].
The most powerful of these approximate predicates in terms of alignment and entailments is the umbel:correspondsTo property. This predicate is the recommended option if, after looking at the source and target knowledge bases [8], we believe we have found the best equivalent relationship, but do not have the information or assurance to assign one of the relationships above. So, while we are sure we are dealing with the same kind of thing, we may not have full confidence to be able to assign one of these alternatives:
rdfs:subClassOf owl:equivalentClass owl:sameAs superClassOf
Thus, with respect to existing and commonly used predicates, we want an umbrella property that is generally equivalent or so in nature, and if perhaps known precisely might actually encompass one of the above relations, but we don’t have the certainty to choose one of them nor perhaps assert full “sameness”. This is not too dissimilar from the rationale being tested for the x:coref predicate in relation to owl:sameAs from the UMBC Ebiquity group [9,10].
The property umbel:correspondsTo is thus used to assert a close correspondence between an external class, named entity, individual or instance with a Reference Concept class. It asserts this correspondence through the basis of both its subject matter and intended scope.
This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.
In most uses, the most prevalent linking property to be used is the umbel:isAbout assertion. This predicate is useful when tagging external content with metadata for alignment with an UMBEL-based reference ontology. The reciprocal assertion, umbel:isRelatedTo is when an assertion within an UMBEL vocabulary is desired to an external ontology. Its application is where the reference vocabulary itself needs to refer to an external topic or concept.
The umbel:isAbout predicate does not have the same level of confidence or “sameness” as the umbel:correspondsTo property. It may also reflect an assertion that is more like rdf:type, but without the confidence of class membership.
The property umbel:isAbout is thus used to assert the relation between an external named entity, individual or instance with a Reference Concept class. It can be interpreted as providing a topical assertion between an individual and a Reference Concept.
This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.
The property umbel:isLike is used to assert an associative link between similar individuals who may or may not be identical, but are believed to be so. This property is not intended as a general expression of similarity, but rather the likely but uncertain same identity of the two resources being related.
This property may be considered as an alternative to sameAs where there is not a certainty of sameness, and/or when it is desirable to assert a degree of overlap of sameness via the umbel:hasMapping reification predicate. This property can and should be changed if the certainty of the sameness of identity is subsequently determined.
It is appropriate to use this property when there is strong belief the two resources refer to the same individual with the same identity, but that association can not be asserted at the present time with full certitude.
This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.
At a different point along this relatedness spectrum we have unlike things that we would like to relate to one another. It might be an attribute, a characteristic or a functional property about something that we care to describe. Further, by nature of the thing we are relating, we may also be able to describe the kind of thing we are relating. The UMBEL SuperTypes (among many other options) gives us one such means to characterize the thing being related.
UMBEL presently has 31 predicates for these assertions relating to a SuperType [11]. The various properties designated by umbel:relatesToXXX are used to assert a relationship between an external instance (object) and a particular (XXX) SuperType. The assertion of this property does not entail class membership with the asserted SuperType. Rather, the assertion may be based on particular attributes or characteristics of the object at hand. For example, a British person might have an umbel:relatesToXXX asserted relation to the SuperType of the geopolitical entity of Britain, though the actual thing at hand (person) is a member of the Person class SuperType.
This predicate is used for filtering or clustering, often within user interfaces. Multiple umbel:relatesToXXX assertions may be made for the same instance.
Each of the 32 UMBEL SuperTypes has a matching predicate for external topic assignments (relatesToOtherOrganism shares two SuperTypes, leading to 31 different predicates):
| SuperType | Mapping Predicate | Comments |
| NaturalPhenomena | relatesToPhenomenon |
This predicate relates an external entity to the SuperType (ST) shown. It indicates there is a relationship to the ST of a verifiable nature, but which is undetermined as to strength or a full rdf:type relationship |
| NaturalSubstances | relatesToSubstance |
same as above |
| Earthscape | relatesToEarth |
same as above |
| Extraterrestrial | relatesToHeavens |
same as above |
| Prokaryotes | relatesToOtherOrganism |
same as above |
| ProtistsFungus | ||
| Plants | relatesToPlant |
same as above |
| Animals | relatesToAnimal |
same as above |
| Diseases | relatesToDisease |
same as above |
| PersonTypes | relatesToPersonType |
same as above |
| Organizations | relatesToOrganizationType |
same as above |
| FinanceEconomy | relatesToFinanceEconomy |
same as above |
| Society | relatesToSociety |
same as above |
| Activities | relatesToActivity |
same as above |
| Events | relatesToEvent |
same as above |
| Time | relatesToTime |
same as above |
| Products | relatesToProductType |
same as above |
| FoodorDrink | relatesToFoodDrink |
same as above |
| Drugs | relatesToDrug |
same as above |
| Facilities | relatesToFacility |
same as above |
| Geopolitical | relatesToGeoEntity |
same as above |
| Chemistry | relatesToChemistry |
same as above |
| AudioInfo | relatesToAudioMusic |
same as above |
| VisualInfo | relatesToVisualInfo |
same as above |
| WrittenInfo | relatesToWrittenInfo |
same as above |
| StructuredInfo | relatesToStructuredInfo |
same as above |
| NotationsReferences | relatesToNotation |
same as above |
| Numbers | relatesToNumbers |
same as above |
| Attributes | relatesToAttribute |
same as above |
| Abstract | relatesToAbstraction |
same as above |
| TopicsCategories | relatesToTopic |
same as above |
| MarketsIndustries | relatesToMarketIndustry |
same as above |
This property may be reified with the umbel:hasMapping property to describe the “degree” of the assertion.
Descriptive properties are annotation properties.
Two annotation properties are used to describe the attribute characteristics of a RefConcept, namely umbel:hasCharacteristic and its reciprocal, umbel:isCharacteristicOf. These properties are the means by which the external properties to describe things are able to be brought in and used as lookup references (that is, metadata) to external data attributes. As annotation properties, they have weak semantics and are used for accounting as opposed to reasoning purposes.
These properties are designed to be used in external ontologies to characterize, describe, or provide attributes for data records associated with a given RefConcept. It is via this property or its inverse, umbel:hasCharacteristic, that external data characterizations may be incorporated and modeled within a domain ontology based on the UMBEL vocabulary.
The choice of these mapping predicates may be aided with a variety of techniques from the manual to the automatic. It is thus important to characterize the specific mapping methods used whenever a linking predicate is assigned. Following this best practice allows us to distinguish mapping trustworthiness, plus to also enable later segregation for the application of improved methods as they may arise.
UMBEL, for its current mappings and purposes, has adopted the following controlled vocabulary for characterizing the umbel:hasMapping predicate; such listings may be readily modified for other domains and purposes when using the UMBEL vocabulary. This controlled vocabulary is based on instances of the Qualifier class. This class represents a set of descriptions to indicate the method used when applying an approximate mapping predicate (see above):
| Qualifier | Description |
| Manual – Nearly Equivalent | The two mapped concepts are deemed to be nearly an equivalentClass or sameAs relationship, but not 100% so |
| Manual – Similar Sense | The two mapped concepts share much overlap, but are not the exact same sense, such as an action as related to the thing it acts upon |
| Heuristic – ListOf Basis | Type assignment based on Wikipedia ListOf category; not currently used |
| Heuristic – Not Specified | Heuristic mapping method applied; script or technique not otherwise specified |
| External – OpenCyc Mapping | Mapping based on existing OpenCyc assertion |
| External – DBOntology Mapping | Mapping based on existing DBOntology assertion |
| External – GeoNames Mapping | Mapping based on existing GeoNames assertion |
| Automatic – Inspected SV | Mapping based on automatic scoring of concepts using Semantic Vectors, with specific alignment choice based on hand selection |
| Automatic – Inspected S-Match | Mapping based on automatic scoring of concepts using S-Match, with specific alignment choice based on hand selection; not currently used |
| Automatic – Not Specified | Mapping based on automatic scoring of concepts using a script or technique not otherwise specified; not currently used |
Again, as noted, for other domains and other purposes this listing can be modified at will.
Final aspects of these mappings are now undergoing a last round of review. A variety of sources and methods have been applied, to be more fully documented at time of release.
Some of the final specifics and counts may be modified slightly by the time of actual release of UMBEL v 1.00, which should occur in the next week or so. Nonetheless, here are some tentative counts for a select portion of these predicates in the internal draft version:
| Item or Predicate | Count |
| Total UMBEL Reference Concepts | 27,917 |
owl:equivalentClass (external OpenCyc, PROTON, DBpedia) |
28,618 |
umbel:correspondsTo (direct mappings to Wikipedia) |
16,884 |
rdf:type |
876,125 |
umbel:relatesToXXX |
3,059,023 |
| Unique Wikipedia Pages Mapped | 2,130,021 |
All of these assignments have also been hand inspected and vetted.
To date, in various steps and in various phases, the inspection of Wikipedia, its categories, and its match with UMBEL has perhaps incurred more than 5,000 hours (or nearly a three person-year equivalence) of expert domain and semantic technology review [12]. As noted, about 60% (16,884 of 27,917) of UMBEL concepts have now been directly mapped to Wikipedia and inspected for accuracy.
Wikipedia provides the most demanding and complete mapping target available for testing the coverage of UMBEL’s reference concepts and the adequacy of its vocabulary. As a result, we have added to and refined the mapping and linking predicates used in the UMBEL vocabulary, and added a Qualifier class to record the mapping process, as this post overviews. We have added the SuperType class to better organize and disambiguate large knowledge bases [13]. And, in this mapping process, we have expanded UMBEL’s reference concepts by about 33% to improve coverage, while remaining consistent with its origins as a faithful subset of the venerable Cyc knowledge structure [14].
A side benefit that has emerged from these efforts — with a huge potential upside — is the valuable combination of UMBEL and Wikipedia as a “gold standard” for aligning and mapping knowledge bases. Such a standard is critically needed. For example, in reviewing many of the existing Wikipedia mappings claimed as accurate, we found misplacement errors that averaged 15.8% [15]. Having a baseline of vetted mappings will aid future mappings. Moreover, having a complete conceptual infrastructure over Wikipedia will enable new and valuable reasoning and inference services.
The results from the UMBEL v 1.00 mapping are promising and very much useful today, but by no means complete. Future versions will extend the current mappings and continue to refine its accuracy and completeness [16]. What we can say, however, is that a coherent organization and conceptual schema — namely, UMBEL — overlaid on the richness of the instance data and content of Wikipedia, can produce immediate and useful benefits. These benefits apply to semantic search, semantic annotation and tagging, reasoning, discovery, inferencing, organization and comparisons.
umbel:correspondsTo predicate is used to assert close correspondence between UMBEL Reference Concepts and Wikipedia categories or pages, yet without entailing the actual Wikipedia category structure.owl:sameAs may lead to contradictions. However, virtually merging the descriptions in a co-reference engine is fine — both provide information that is useful in disambiguating future references as well as for many other purposes.”
And, Seven Guidelines for this Second of Two Semantic ‘Gaps’I have been writing and speaking of late about next priorities to promote the interoperability of linked data and the semantic Web. In a talk a few weeks back to the Dublin Core (DCMI) annual conference, I summarized these priorities as the need to address two aspects of the semantic “gap”:
I discussed the second aspect in an earlier post [1]. In today’s installment, we now focus on the “gap” relating to reference concepts.
Interoperability comes down to the nature of things and how we describe those things or quite similar things from different sources. Given the robust nature of semantic heterogeneities in diverse sources and datasets on the Web (or anywhere else, for that matter!) [2], how do we bring similar or related things into alignment? And, then, how can we describe the nature or basis of that alignment?
Of course, classifiers since Aristotle and librarians for time immemorial have been putting forward various classification schemes, controlled vocabularies and subject headings. When one wants to find related books, it is convenient to go to a central location where books about the same or related topics are clustered. And, if the book can be categorized in more than one way — as all are — then something like a card catalog is helpful to find additional cross-references. Every domain of human endeavor makes similar attempts to categorize things.
On the Web we have none of the limitations of physical books and physical libraries; locations are virtual and copies can be replicated or split apart endlessly because of the essentially zero cost of another electron. But, we still need to find things and we still want to gather related things together. According to Svenonius, “Organizing information if it means nothing else means bringing all the same information together” [3]. This sentiment and need remains unchanged whether we are talking about books, Web documents, chemical elements or linked data on the Web.
Like words or terms in human language that help us communicate about things, how we organize things on the Web needs to have an understood and definable meaning, hopefully bounded with some degree of precision, that enables us to have some confidence we are really communicating about the same something with one another. However, when applied to the Web and machine communications, the demands for how these definitions and precisions apply need to change. This makes the notion of a Web basis for organization both easier and harder than traditional approaches to classification.
It is easier because everything is virtual: we can apply multiple classification schema and can change those schema at will. We are not locked into historical anomalies like huge subject areas reserved for arcane or now historically less important topics, such as the Boer Wars or phrenology. We need not move physical books around on shelves in order to accommodate new or expanded classification schemes. We can add new branches to our classification of, say, nanotechnology as rapidly as the science advances.
Yet it is harder because we can no longer rely on the understanding of human language as a basis for naming and classifying things. Actually, of course, language has always been ambiguous, but it is manifestly more so when put through the grinder of machine processing and understanding. Machine processing of related information adds the new hurdles of no longer being able to rely on text labels (“names”) alone as the identifier of things and requires we be more explicit about our concept relationships and connections. Fortunately, here, too, much has been done in helping to organize human language through such lexical frameworks as WordNet and similar.
Many groups and individuals have been grappling with these questions of how to organize and describe information to aid interoperability in an Internet context. Among many, let me simply mention two because of the diversity their approaches show.
Bernard Vatant, for one, has with his colleagues been an advocate for some time for the need for what he calls “hubjects.” With an intellectual legacy from the Topic Maps community, the idea of “hubjects” is to have a flat space of reference subjects to which related information can link and refer. Each hubject is the hub of a spoked wheel of representations by which the same subject matter from different contexts may be linked. The idea of the flat space or neutrality in the system is to place the “hubject” identifier (referent) outside of other systems that attempt to organize and provide “meta-frameworks” of knowledge organization. In other words, there are no inherent suggested relationships in the reference “hubject” structure: just a large bin of defined subjects to which external systems may link.
A different and more formalized approach has been put forward by the FRSAD working group [4], dealing with subject authority data. Subject authority data is the type of classificatory information that deals with the subjects of various works, such as their concepts, objects, events, or places. As the group stated, the scope of this effort pertains to the “aboutness” of various conceptual works. The framework for this effort, as with the broader FRBR effort, are new standards and approaches appropriate to classifying electronic bibliographic records.
Besides one of the better summaries and introductions to the general problems of subject classification in general, the FRSAD approach makes its main contribution in clearly distinguishing the idea of something (which it calls a thema, or entity used as the subject of a work) from the name or label of something (which it calls nomen). For many in the logic community, steeped in the Peirce triad of sign-object-interpretant [5], this distinction seems rather obvious and straightforward. But, in library science, labels have been used interchangeably as identifiers, and making this distinction clean is a real contribution. The FRSAD effort does not itself really address how the thema are actually found or organized.
The notion of a reference concept used herein combines elements from both of these approaches. A reference concept is the idea of something, or a thema in the FRSAD sense. It is also a reference hub of sorts, similar to the idea of a “hubject”. But it is also much more and more fully defined.
So, let’s first begin by representing a reference concept in relation to its referers and possible linking predicates as follows:
A referer needs to link appropriately to its reference concept, with some illustrative examples shown on the arrows in the diagram. These links are the predicates, ranging from the exact to the approximate, discussed in the first semantic “gap” posting. (Note: see that earlier post for a longer listing of existing, candidate linking predicates. No further comment is made in this present article as to whether those in that earlier posting or the example ones above are “correct” or not; see the first post for that discussion.)
If properly constructed and used, a reference concept thus becomes a fixed point in an information space. As one or more external sources link to these fixed points, it is then possible to gather similar content together and to begin to organize the information space, in the sense of Svenonius. Further, and this is a key difference from the “hubject” approach, if the reference concept is itself part of a coherent structure, then additional value can be derived from these assignments, such as inference, consistence testing, and alignments. (More on this latter point is discussed below.)
If the right factors are present, it should be possible to relate and interoperate multiple datasets and knowledge representations. If present, these factors can result in a series of fixed reference points to which external information can be linked. In turn, these reference nodes can form constellations to guide the traversal to desired information destinations on the Web.
Let’s look at the seven factors as to what constitutes guidelines for best practices.
By definition, a Web-based reference concept should adhere to linked data principles and should have a URI as its address and identifier. Also, by definition as a “reference”, the vocabulary or ontology in which the concept is a member should be given a permanent and persistent address. Steps should be taken to ensure 24×7 access to the reference concept’s URI, since external sources will be depending on it.
As a general rule, the concepts should also be stated as single nouns and use CamelCase notation (that is, class names should start with a capital letter and not contain any spaces, such as MyNewConcept).
Provide a preferred label annotation property that is used for human readable purposes and in user interfaces. For this purpose, a construct such as the SKOS property of skos:prefLabel works well. Note, this label is not the basis for deciding and making linkages, but it is essential for mouseovers, tooltips, interface labels, and other human use factors.
Give all concepts and properties a definition. The matching and alignment of things is done on the basis of concepts (not simply labels), which means each concept must be defined [6]. Providing clear definitions (along with the coherency of its structure) gives an ontology its semantics. Remember not to confuse the label for a concept with its meaning. For this purpose, a property such as skos:definition works well, though others such as rdfs:comment or dc:description are also commonly used.
The definition is the most critical guideline for setting the concept’s meaning. Adequate text and content also aid semantic alignment or matching tasks.
Include explicit consideration for the idea of a “semset” or “tagset”, which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept. The semset construct is similar to the “synsets” in Wordnet, but with a broader use understanding. Included in the semset construct is the single (per language) preferred (human-readable) label for the concept, the prefLabel, an embracing listing of alternative phrase and terms for the concept (including acronyms, synonyms, and matching jargon), the altLabels, and a listing of prominent or common misspellings for the concept or its alternatives, the hiddenLabels.
This tagset is an essential basis for tagging unstructured text documents with reference concepts, and for search not limited to keywords. The tagset, in combination with the definition, is also the basis for feeding many NLP-driven methods for concept or ontology alignment.
The practice of using an identifier separate from label, and language qualified entries for definition, preferred label and tagset (alternative labels) means that multi-lingual versions can be prepared for each concept. Though this is a somewhat complicated best practice in its own right (for example, being attentive to the xml:lang=”en” tag for English), adhering to this practice provides language independence for reference concepts.
Sources such as Wikipedia, with its richness of concepts and multiple language versions, can then be a basis for creation of alternative language versions.
Use of domains and ranges assists testing, helps in disambiguation, and helps in external concept alignments. Domains apply to the subject (the left hand side of a triple); ranges to the object (the right hand side of the triple). Domains and ranges should not be understood as actual constraints, but as axioms to be used by reasoners. In general, domain for a property is the range for its inverse and the range for a property is the domain of its inverse.
When reference concepts, properly constructed as above, are also themselves part of a coherent structure, further benefits may be gained. These benefits include inferencing, consistency testing, discovery and navigation. For example, the sample at right shows that a retrieval for Saab cars can also inform that these are automobiles, a brand of automobile, and a Swedish kind of car.
To gain these advantages, the coherent structure need not be complicated. RDFS and SKOS-based lightweight vocabularies can meet this test. Properly constructed OWL ontologies can also provide these benefits.
When best practices are combined with being part of a coherent structure, we can refer to these structures as reference ontologies or domain ontologies.
In part, these best practices are met to a greater or lesser extent by many current vocabularies. But few provide complete coverage, and across a broad swath of domain needs, major gaps remain. This unfortunate observation applies to upper-level ontologies, reference vocabularies, and domain ontologies alike.
Upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc and BFO (Basic Formal Ontology). While these have a coherency of construction, they are most often incomplete with respect to reference concept construction. With the exception of SUMO and Cyc, domain coverage is also very general.
Our own UMBEL reference ontology [7] is closest to meeting all criteria. The reference concepts are constructed to standard. But coverage is fairly general, and not directly applicable to most domains (though it can help to orient specific vocabularies).
Wikipedia, as accessed via the DBpedia expression, has good persistent URIs, labels, altLabels and proxy definitions (via the first sentences abstract). As a repository of reference concepts, it is extremely rich. But the organizational structure is weak and provides very few of the benefits for coherent structures noted above.
Going back to 1997, DCMI has been involved in putting forward possible vocabularies that may act as “qualifiers” to dc:subject [8]. Such reference vocabularies can extend from the global or comprehensive, such as the Universal Decimal Classification or Library of Congress Subject Headings, to the domain specific such as MeSH in medicine or Agrovoc in agriculture [9]. One or more concepts in such reference vocabularies can be the object of a dc:subject assertion, for example. While these vocabularies are also a rich source of reference concepts, they are not constructed to standards and at most provide hierarchical structures.
In the area of domain vocabularies, we are seeing some good pockets of practice, especially in the biomedical and life sciences arena [10]. Promising initiatives are also underway in library applications [11] and perhaps other areas unknown to the author.
In summary, I take the state of the art to be quite promising. We know what to do, and it is being done in some pockets. What is needed now is to more broadly operationalize these practices and to extend them across more domains. If we can bring attention to and publicize exemplar vocabularies, we can start to realize the benefits of actual data interoperability on the Web.