Posted:July 6, 2010

Consolidating Under the Open Semantic Framework
Release of Semantic Components Adds Final Layer, Leads to Streamlined Sites

Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents).  A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.

Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.

The OSF “Semantic Muffin”

We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design [1]. However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:

Incremental Layers of the Open Semantic Framework

(click for full size)

The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:

  • Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
  • scones / irON — this layer is for general conversion of non-RDF data and data schema to RDF (via irON or RDFizers) or for information extraction of subject concepts or named entities (scones)
  • structWSF — is the pivotal Web services framework layer, and provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
  • Semantic components — the highlighted layer in the “semantic muffin”; in essence, this is the visualization and data interaction layer in the OSF stack; see more below
  • Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave, and
  • conStruct — is the content management system (CMS) layer based on Drupal and the thinnest layer with respect to OSF; this optional layer provides the theming, user rights and permissions, or other functionality drawn from Drupal’s 6500 third-party modules.

Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.

The Semantics Components Layer

Current semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.

Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.

These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:

Semantic Components Workflow

(click for full size)

New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

Consolidating and Rationalizing Web Sites and Mailing Lists

OpenStructs and Open Semantic Framework LogoAs the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.

Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.

We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.

There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.

Actual code SVN repositories are unchanged. These code repositories may be found at:

We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!


[1] An alternative view of this layer diagram is shown by the general Structured Dynamics product stack and architecture.
Posted:June 11, 2010

How Shall We Measure Progress Over the Past Three Years?

Friday     Brown Bag Lunch
Colorado  Interstate construction - 1970; courtesy National ArchivesFor a dozen years, my career has been centered on Internet search, dynamic content and the deep Web. For the past few years, I have been somewhat obsessed by two topics.

The first topic, a conviction really, is that implicit structure needs to be extracted from Web content to enable it to be disambiguated, organized, shared and re-purposed. The second topic, more an open question as a former academic married to a professor, is what might replace editorial selections and peer review to establish the authoritativeness of content. These topics naturally steer one to the semantic Web.

A Millennial Perspective

The semantic Web, by whatever name it comes to be called, is an inevitability. History tells us that as information content grows, so do the mechanisms for organizing and managing it. Over human history, innovations such as writing systems, alphabetization, pagination, tables of contents, indexes, concordances, reference look-ups, classification systems, tables, figures, and statistics have emerged in parallel with content growth [19].

When the Lycos search engine, one of the first profitable Internet ventures, was publicly released in 1994, it indexed a mere 54,000 pages [1]. When Google wowed us with its page-ranking algorithm in 1998, it soon replaced my then favorite search engine, AltaVista. Now, tens of billions of indexed documents later, I often find Google’s results to be overwhelming dross — unfortunately true again for all of the major search engines. Faceted browsing, vertical search, and Web 2.0′s tagging and folksonomies demonstrate humanity’s natural penchant to fight this entropy, efforts that will next continue with the semantic Web and then mechanisms unforeseen to manage the chaos of burgeoning content.

An awful lot of hot air has been expelled over the false dichotomy of whether the semantic Web will fail or is on the verge of nirvana. Arguments extend from the epistemological versus ontological (classically defined) to Web 3.0 versus SemWeb or Web services (WS*) versus REST (Representational State Transfer). My RSS feed reader points to at least one such dust up every week.

Some set the difficulties of resolving semantic heterogeneities as absolutes, leading to an illogical and false rejection of semantic Web objectives. In contrast, some advocates set equally divisive arguments for semantic Web purity by insisting on formal ontologies and descriptive logics. Meanwhile, studied leaks about “stealth” semantic Web ventures mean you should grab your wallet while simultaneously shaking your head.

A Decades-Long Perspective

My mental image of the semantic Web is a road from here to some achievable destination — say, Detroit. Parts of the road are well paved; indeed, portions are already superhighways with controlled on-ramps and off-ramps. Other portions are two lanes, some with way too many traffic lights and some with dangerous intersections. A few small portions remain unpaved gravel and rough going.

1919 Wreck in Nebraska

Wreck in Nebraska during the 1919 Transcontinental Motor Convoy

A lack of perspective makes things appear either too close or too far away. The automobile isn’t yet a century old as a mass-produced item. It wasn’t until 1919 that the US Army Transcontinental Motor Convoy made the first automobile trip across the United States.

The 3,200 mile route roughly followed today’s Lincoln Highway, US 30, from Washington, D.C. to San Francisco. The convoy took 62 days and 250 recorded accidents to complete the trip (see figure), half on dirt roads at an average speed of 6 miles per hour. A tank officer on that trip later observed Germany’s autobahns during World War II. When he subsequently became President Dwight D. Eisenhower, he proposed and then signed the Interstate Highway Act.

That was 50 years ago. Today, the US is crisscrossed with 50,000 miles of interstates, which have completely remade the nation’s economy and culture [2].

Today’s Perspective

Like the interstate system in its early years, today’s semantic Web lets you link together a complete trip, but the going isn’t as smooth or as fast as it could be. Nevertheless, making the trip is doable and keeps improving day by day, month by month.

My view of what’s required to smooth the road begins with extracting structure and meaningful information according to understandable schema from mostly uncharacterized content. Then we store the now-structured content as RDF triples that can be further managed and manipulated at scale. By necessity, the journey embraces tools and requirements that, individually, might not constitute semantic Web technology as some strictly define it. These tools and requirements are nonetheless integral to reaching the destination. We are well into that journey’s first leg, what I and others are calling the structured Web.

For the past six months or so I have been researching and assembling as many semantic Web and related tools as I can find [3]. That Sweet Tools listing now exceeds 500 tools [4] (with its presentation using the nifty lightweight Exhibit publication system from MIT’s Simile program [5]). I’ve come to understand the importance of many ancillary tool sets to the entire semantic Web highway, such as natural language processing and information extraction. I’ve also found new categories of pragmatic tools that embody semantic Web and data mediation processes but don’t label themselves as such.

In its entirety, the Sweet Tools listing provides a pretty good picture of the semantic Web’s state. It’s a surprisingly robust picture — though with some notable potholes — and includes impressive open source options in all categories. Content publishing, indexing, and retrieval at massive scales are largely solved problems. We also have the infrastructure, languages, and (yes!) standards for tying this content together meaningfully at the data and object levels.

I also think a degree of consensus has emerged on RDF as the canonical data model for semantic information. RDF triple stores are rapidly improving toward industrial strength, and RESTful designs enable massive scalability, as terabyte- and petabyte-scale full-text indexes prove.

Powerful and flexible middleware options, such as those from OpenLink [6], can transform and integrate diverse file formats with a variety of back ends. The World Wide Web Consortium’s GRDDL standard [7] and related tools, plus various “RDF-izers” from Massachusetts Institute of Technology and elsewhere [8], largely provide the conversion infrastructure for getting Web data into that canonical RDF form. Sure, some of these converters are still research-grade, but getting them to operational capabilities at scale now appears trivial.

Things start getting shakier when trying to structure information into a semantic formalism. Controlled vocabularies and ontologies range broadly and remain a contentious area. Publishers and authors perhaps have too many choices: from straight Atom or RSS feeds and feeds with tags to informal folksonomies and then Outline Processor Markup Language [9] or microformats [10]. From there, the formalism increases further to include the standard RDF ontologies such as SIOC (Semantically-Interlinked Online Communities), SKOS (Simple Knowledge Organizing System), DOAP (Description of a Project), and FOAF (Friend of a Friend) [11] and the still greater formalism of OWL’s various dialects [12].

If we compare the semantic Web to the US interstate highway system, we’re still in the early stages of a journey that will remake our economy and culture.
Many potholes on the road to the semantic Web exist.
One ready task is to transform existing structure to RDF. Another priority is to refine tools to extract structure and meaningful information from uncharacterized content.

Arguing which of these is the theoretical best method is doomed to failure, except possibly in a bounded enterprise environment. We live in the real world, where multiple options will always have their advocates and their applications.

All of us should welcome whatever structure we can add to our information base, no matter where it comes from or how it’s done. The sooner we can embrace content in any of these formats and convert it into canonical RDF form, we can then move on to needed developments in semantic mediation, some of the roughest road on the journey.

Potholes on the Semantic Highway

Semantic mediation requires appropriate structured content. Many potholes on the road to the semantic Web exist because the content lacks structured markup; others arise because existing structure requires transformation. We need improved ways to address both problems. We also need more intuitive means for applying schema to structure. Some have referred to these issues as “who pays the tax.”

Recent experience with social software and collaboration proves that a portion of the Internet user community is willing to tag and characterize content. Furthermore, we can readily leverage that resulting structure, and free riders are welcomed. The real pothole is the lack of easy — even fun — data extractors and “structurizers.” But we’re tantalizingly close.

Tools such as Solvent and Sifter from MIT’s Simile program [13] and Marmite from Carnegie Mellon University [14] are showing the way to match DOM (document object model) inspectors with automated structure extractors. DBpedia, the alpha version of Freebase, and System One now provide large-scale, open Web data sets in RDF [15], including all of Wikipedia. Browser extensions such as Zotero [16] are showing how to integrate structure management into acceptable user interfaces, as are services such as Zoominfo [17]. Yet we still lack easy means to design the differing structures suitable for a plenitude of destinations.

Amazingly, a compelling road map for how all these pieces could truly fit together is also incomplete. How do we actually get from here to Detroit? Within specific components, architectural understandings are sometimes OK (although documentation is usually awful for open source projects, as most of the current tools are). Until our community better documents that vision, attracting new contributors will be needlessly slower, thus delaying the benefits of network effects.

So, let’s create a road map and get on with paving the gaps and filling the potholes. It’s not a matter of standards or technology — we have those in abundance. Let’s stop the silly squabbles and commit to the journey in earnest. The structured Web‘s ability to reach Hyperland [18], Douglas Adam’s prescient 1990 forecast of the semantic Web, now looks to be no further away than Detroit.

Friday      Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about three years ago on May 3, 2007.  The piece was my answer to a request by Jim Hendler to pen some thoughts on the semantic Web, based on I believe what he thought might be a pragmatic perspective combining Internet business with Web science. The formal piece appeared as a guest editorial in the May/June 2007 issue of IEEE Intelligent Systems. What appears above is unaltered from my original posting (aside from some minor formatting clean-up and — sorry to say — some of the projects are now defunct).

[1] Chris Sherman, “Happy Birthday, Lycos!,” Search Engine Watch, August 14, 2002. See http://searchenginewatch.com/showPage.html?page=2160551.
[2] David A. Pfeiffer, “Ike’s Interstates at 50: Anniversary of the Highway System Recalls Eisenhower’s Role as Catalyst,” Prologue Magazine, National Archives, Summer 2006, Vol. 38, No. 2. See: http://www.archives.gov/publications/prologue/2006/summer/interstates.html.
[3] The mention of specific tool names is meant to be illustrative and not necessarily a recommendation.
[6] OpenLink Software’s Virtuoso and Data Spaces products; see http://www.openlinksw.com/.
[7] W3C’s Gleaning Resource Descriptions from Dialects of Languages (GRDDL, pronounced “griddle”). See http://www.w3.org/2004/01/rdxh/spec.
[9] Outline Processor Markup Language (OPML); see http://www.opml.org/.
[10] Microformats; see http://microformats.org/.
[12] W3C’s Web Ontology Language (OWL). See http://www.w3.org/TR/owl-features/.
[13] Solvent (http://simile.mit.edu/wiki/Solvent) and Sifter (http://simile.mit.edu/wiki/Sifter) are from MIT’s Simile program.
[14] Marmite (http://www.cs.cmu.edu/~jasonh/projects/marmite/) is from Carnegie Mellon University.
[15] DBpedia (http://dbpedia.org/docs/) and Freebase (in alpha, by invitation only at http://www.freebase.com/) are two of the first large-scale open datasets on the Web; Wikipedia has also been converted to RDF by System One (http://labs.systemone.at/wikipedia3).
[16] Zotero is produced by George Mason University’s Center for History and New Media; see http://www.zotero.org.
[17] ZoomInfo (http://www.zoominfo.com/) provides online structured search of companies and people, plus broader services to enterprises.
[18] The late Douglas Adams, of Doctor Who and A Hitchhiker’s Guide to the Galaxy fame, produced a TV program for BBC2 presaging the Internet called Hyperland. This 50-min video can be seen in five parts via YouTube at Part 1 of 5, 2 of 5, 3 of 5, 4 of 5 and 5 of 5.
[19] Since I first wrote this piece, I have systematized these developments in my Timeline of Information History.
Posted:April 23, 2010

A Reprise AI3 Post from Four Years Ago

Friday    Brown Bag Lunch
In 2002 Joel Mokyr, an economic historian from Northwestern University, wrote a book that should be read by anyone interested in knowledge and its role in economic growth. The Gifts of Athena : Historical Origins of the Knowledge Economy is a sweeping and comprehensive account of the period from 1760 (in what Mokyr calls the “Industrial Enlightenment”) through the Industrial Revolution beginning roughly in 1820 and then continuing through the end of the 19th century.

The book (and related expansions by Mokyr available as separate PDFs on the Internet) should be considered as the definitive reference on this topic to date. The book contains 40 pages of references to all of the leading papers and writers on diverse technologies from mining to manufacturing to health and the household. The scope of subject coverage, granted mostly focused on western Europe and America, is truly impressive.

Mokyr deals with ‘useful knowledge,’ as he acknowledges Simon Kuznets‘ phrase. Mokyr argues that the growth of recent centuries was driven by the accumulation of knowledge and the declining costs of access to it. Mokyr helps to break past logjams that have attempted to link single factors such as the growth in science or the growth in certain technologies (such as the steam engine or electricity) as the key drivers of the massive increases in economic growth that coincided with the era now known as the Industrial Revolution.

Mokyr cracks some of these prior impasses by picking up on ideas first articulated through Michael Polanyi‘s “tacit knowing” (among other recent philosophers interested in the nature and definition of knowledge). Mokyr’s own schema posits propositional knowledge, which he defines as the science, beliefs or the epistemic base of knowledge, which he labels omega (Ω), in combination with prescriptive knowledge, which are the techniques (“recipes”), and which he also labels lambda (λ). Mokyr notes that an addition to omega (Ω) is a discovery, an addition to lambda (λ) is an invention.

One of Mokyr’s key points is that both knowledge types reinforce one another and, of course, the Industrial Revolution was a period of unprecedented growth in such knowledge. Another key point, easily overlooked when “discoveries” are seemingly more noteworthy, is that techniques and practical applications of knowledge can provide a multiplier effect and are equivalently important. For example, in addition to his main case studies of the factory, health and the household, he says:

The inventions of writing, paper, and printing not only greatly reduced access costs but also materially
affected human cognition, including the way people thought about their environment.

Mokyr also correctly notes how the accumulation of knowledge in science and the epistemic base promotes productivity and more still-more efficient discovery mechanisms:

The range of experimentation possibilities that needs to be searched over is far larger if the searcher knows nothing about the natural principles at work. To paraphrase Pasteur’s famous aphorism once more, fortune may sometimes favor unprepared minds, but only for a short while. It is in this respect that the width of the epistemic base makes the big difference.

In my own opinion, I think Mokyr starts to get closer to the mark when he discusses knowledge “storage”, access costs and multiplier effects from basic knowledge-based technologies or techniques. Like some other recent writers, he also tries to find analogies with evolutionary biology. For example:

Much like DNA, useful knowledge does not exist by itself; it has to be “carried” by people or in storage
devices. Unlike DNA, however, carriers can acquire and shed knowledge so that the selection process is quite different. This difference raises the question of how it is transmitted over time, and whether it can actually shrink as well as expand.

One of the real advantages of this book is to move forward a re-think of the “great man” or “great event” approach to history. There are indeed complicated forces at work. I think Mokyr summarizes well this transition when he states:

A century ago, historians of technology felt that individual inventors were the main actors that brought about
the Industrial Revolution. Such heroic interpretations were discarded in favor of views that emphasized deeper economic and social factors such as institutions, incentives, demand, and factor prices. It seems, however, that the crucial elements were neither brilliant individuals nor the impersonal forces governing the masses, but a small group of at most a few thousand peopled who formed a creative community based on the exchange of knowledge. Engineers, mechanics, chemists, physicians, and natural philosophers formed circles in which access to knowledge was the primary objective. Paired with the appreciation that such knowledge could be the base of ever-expanding prosperity, these elite networks were indispensible, even if individual members were not. Theories that link education and human capital of technological progress need to stress the importance of these small creative communities jointly with wider phenomena such as literacy rates and universal schooling.

There is so much to like and to be impressed with this book and even later Mokyr writings. My two criticisms are that, first, I found the pseudo-science of his knowledge labels confusing (I kept having to mentally translate the omega symbol) and I disliked the naming distinctions between propositional and prescriptive, even though I think the concepts are spot on.

My second criticism, a more major one, is that Mokyr notes, but does not adequately pursue, “In the decades after 1815, a veritable explosion of technical literature took place. Comprehensive technical compendia appeared in every industrial field.” Statements such as these, and there are many in the book, hint at perhaps some fundamental drivers.

Mokyr has provided the raw grist for answering his starting question of why such massive economic growth occurred in conjunction with the era of the Industrial Revolution. He has made many insights and posited new factors to explain this salutary discontinuity from all prior human history. But, in this reviewer’s opinion, he still leaves the why tantalizingly close but still unanswered. The fixity of information and growing storehouses because of declining production and access costs remain too poorly explored.

Friday    Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on July 6, 2006. It was part of a series of book reviews I was doing at that time getting at the importance of bulk paper production as a key enabler of economic growth. No changes have been made to the original posting.

Posted by AI3's author, Mike Bergman Posted on April 23, 2010 at 1:48 am in Adaptive Information, Adaptive Innovation, Brown Bag Lunch | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/
The URI to trackback this post is: http://www.mkbergman.com/879/brown-bag-lunch-historical-origins-of-the-knowledge-economy/trackback/
Posted:April 9, 2010

Friday   Brown Bag Lunch

Mediating semantic heterogeneities requires tools and automation (or semi-automation) at scale. But existing tools are still crude and lack across-the-board integration. This is one of the next challenges in getting more widespread acceptance of the semantic Web.

In earlier posts, I described the significant progress in climbing the data federation pyramid, today’s evolution in emphasis to the semantic Web, and the 40 or so sources of semantic heterogeneity. We now transition to an overview of how one goes about providing these semantics and resolving these heterogeneities.

Why the Need for Tools and Automation?

In an excellent recent overview of semantic Web progress, Paul Warren points out:[1]

Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. . . . Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area . . . .Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge.

Of course, even assuming that ontologies are created and semantics and metadata are added to content, there still remains the nasty problems of resolving heterogeneities (semantic mediation) and efficiently storing and retrieving the metadata and semantic relationships.

Putting all of this process in place requires the infrastructure in the form of tools and automation and proper incentives and rewards for users and suppliers to conform to it.

Areas Requiring Tools and Automation

In his paper, Warren repeatedly points to the need for “semi-automatic” methods to make the semantic Web a reality. He makes fully a dozen such references, in addition to multiple references to the need for “reasoning algorithms.” In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:

  • Assign authoritativeness
  • Learn ontologies
  • Infer better search requests
  • Mediate ontologies (semantic resolution)
  • Support visualization
  • Assign collaborations
  • Infer relationships
  • Extract entities
  • Create ontologies
  • Maintain and evolve ontologies
  • Create taxonomies
  • Infer trust
  • Analyze links
  • etc.

In a different vein, SemWebCentral lists these clusters of semantic Web-related tasks, each of which also requires tools:[2]

  • Create an ontology — use a text or graphical ontology editor to create the ontology, which is then validated. The resulting ontology can then be viewed with a browser before being published
  • Disambiguate data – generate a mapping between multiple ontologies to identify where classes and properties are the same
  • Expose a relational database as OWL — an editor is first used to create the ontologies that represent the database schema, then the ontologies are validated, translated to OWL and then the generated OWL is validated
  • Intelligently query distributed data – repository and again able to be queried
  • Manually create data from an ontology — a user would use an editor to create new OWL data based on existing ontologies, which is then validated and browsable
  • Programmatically interact with OWL content — custom programs can view, create, and modify OWL content with an API
  • Query non-OWL data — via an annotation tool, create OWL metadata from non-OWL content
  • Visualize semantic data — view semantic data in a custom visualizer.

With some ontologies approaching tens to hundreds of thousands to millions of triples, viewing, annotating and reconciling at scale can be daunting tasks, the efforts behind which would never be taken without useful tools and automation.

A Workflow Perspective Helps Frame the Challenge

A 2005 paper by Izza, Vincent and Burlat (among many other excellent ones) at the first International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA) provides a very readable overview on the role of semantics and ontologies in enterprise integration.[3] Besides proposing a fairly compelling unified framework, the authors also present a useful workflow perspective emphasizing Web services (WS), also applicable to semantics in general, that helps frame this challenge:

Generic Semantic Integration Workflow (adapted from [3])

For existing data and documents, the workflow begins with information extraction or annotation of semantics and metadata (#1) in accordance with a reference ontology. Newly found information via harvesting must also be integrated; however, external information or services may come bearing their own ontologies, in which case some form of semantic mediation is required.

Of course, this is a generic workflow, and depending on the interoperation task, different flows and steps may be required. Indeed, the overall workflow can vary by perspective and researcher, with semantic resolution workflow modeling a prime area of current investigations. (As one alternative among scores, see for example Cardoso and Sheth.[4])

Matching and Mapping Semantic Heterogeneities

Semantic mediation is a process of matching schemas and mapping attributes and values, often with intermediate transformations (such as unit or language conversions) also required. The general problem of schema integration is not new, with one prior reference going back as early as 1986. [5] According to Alon Halevy:[6]

As would be expected, people have tried building semi-automated schema-matching systems by employing a variety of heuristics. The process of reconciling semantic heterogeneity typically involves two steps. In the first, called schema matching, we find correspondences between pairs (or larger sets) of elements of the two schemas that refer to the same concepts or objects in the real world. In the second step, we build on these correspondences to create the actual schema mapping expressions.

The issues of matching and mapping have been addressed in many tools, notably commercial ones from MetaMatrix,[7] and open source and academic projects such as Piazza, [8] SIMILE, [9] and the WSMX (Web service modeling execution environment) protocol from DERI. [10] [11] A superb description of the challenges in reconciling the vocabularies of different data sources is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM’s Prestigious Doctoral Dissertation Award.[12]

What all of these efforts has found is the inability to completely automate the mediation process. The current state-of-the-art is to reconcile what is largely unambiguous automatically, and then prompt analysts or subject matter experts to decide the questionable matches. These are known as “semi-automated” systems and the user interface and data presentation and workflow become as important as the underlying matching and mapping algorithms. According to the WSMX project, there is always a trade-off between how accurate these mappings are and the degree of automation that can be offered.

Also a Need for Efficient Semantic Data Stores

Once all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community.

As Stonebraker and Hellerstein note in their retrospective covering 35 years of development in databases,[13] some of the first post-relational data models were typically called semantic data models, including those of Smith and Smith in 1977[14] and Hammer and McLeod in 1981.[15] Perhaps what is different now is our ability to address some of the fundamental issues.

At any rate, this subsection is included here because of the hidden importance of database foundations. It is therefore a topic often addressed in this series.

A Partial Listing of Semantic Web Tools

In all of these areas, there is a growing, but still spotty, set of tools for conducting these semantic tasks. SemWebCentral, the open source tools resource center, for example, lists many tools and whether they interact or not with one another (the general answer is often No).[16] Protégé also has a fairly long list of plugins, but not unfortunately well organized. [17]

In the table below, I begin to compile a partial listing of semantic Web tools, with more than 50 listed. Though a few are commercial, most are open source. Also, for the open source tools, only the most prominent ones are listed (Sourceforge, for example, has about 200 projects listed with some relation to the semantic Web though most of minor or not yet in alpha release).

NAME

URL

DESCRIPTION

Almo http://ontoware.org/projects/almo An ontology-based workflow engine in Java
Altova SemanticWorks http://www.altova.com/products_semanticworks.html Visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design
Bibster http://bibster.semanticweb.org/ A semantics-based bibliographic peer-to-peer system
cwm http://www.w3.org/2000/10/swap/doc/cwm.html A general purpose data processor for the semantic Web
Deep Query Manager http://www.brightplanet.com/products/dqm_overview.asp Search federator from deep Web sources
DOSE https://sourceforge.net/projects/dose A distributed platform for semantic annotation
ekoss.org http://www.ekoss.org/ A collaborative knowledge sharing environment where model developers can submit advertisements
Endeca http://www.endeca.com Facet-based content organizer and search platform
FOAM http://ontoware.org/projects/map Framework for ontology alignment and mapping
Gnowsis http://www.gnowsis.org/ A semantic desktop environment
GrOWL http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html Open source graphical ontology browser and editor
HAWK http://swat.cse.lehigh.edu/projects/index.html#hawk OWL repository framework and toolkit
HELENOS http://ontoware.org/projects/artemis A Knowledge discovery workbench for the semantic Web
Jambalaya http://www.thechiselgroup.org/jambalaya Protégé plug-in for visualizing ontologies
Jastor http://jastor.sourceforge.net/ Open source Java code generator that emits Java Beans from ontologies
Jena http://jena.sourceforge.net/ Opensource ontology API written in Java
KAON http://kaon.semanticweb.org/ Open source ontology management infrastructure
Kazuki http://projects.semwebcentral.org/projects/kazuki/ Generates a java API for working with OWL instance data directly from a set of OWL ontologies
Kowari http://www.kowari.org/ Open source database for RDF and OWL
LuMriX http://www.lumrix.net/xmlsearch.php A commercial search engine using semantic Web technologies
MetaMatrix http://www.metamatrix.com/ Semantic vocabulary mediation and other tools
Metatomix http://www.metatomix.com/ Commercial semantic toolkits and editors
MindRaider http://mindraider.sourceforge.net/index.html Open source semantic Web outline editor
Model Futures OWL Editor http://www.modelfutures.com/OwlEditor.html Simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports
Net OWL http://www.netowl.com/ Entity extraction engine from SRA International
Nokia Semantic Web Server https://sourceforge.net/projects/sws-uriqa An RDF based knowledge portal for publishing both authoritative and third party descriptions of URI denoted resources
OntoEdit/OntoStudio http://ontoedit.com/ Engineering environment for ontologies
OntoMat Annotizer http://annotation.semanticweb.org/ontomat Interactive Web page OWL and semantic annotator tool
Oyster http://ontoware.org/projects/oyster Peer-to-peer system for storing and sharing ontology metadata
Piggy Bank http://simile.mit.edu/piggy-bank/ A Firefox-based semantic Web browser
Pike http://pike.ida.liu.se/ A dynamic programming (scripting) language similar to Java and C for the semantic Web
pOWL http://powl.sourceforge.net/index.php Semantic Web development platform
Protégé http://protege.stanford.edu/ Open source visual ontology editor written in Java with many plug-in tools
RACER Project https://sourceforge.net/projects/racerproject A collection of Projects and Tools to be used with the semantic reasoning engine RacerPro
RDFReactor http://rdfreactor.ontoware.org/ Access RDF from Java using inferencing
Redland http://librdf.org/ Open source software libraries supporting RDF
RelationalOWL https://sourceforge.net/projects/relational-owl Automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OW
Semantical http://semantical.org/ Open source semantic Web search engine
SemanticWorks http://www.altova.com/products_semanticworks.html SemanticWorks RDF/OWL Editor
Semantic Mediawiki https://sourceforge.net/projects/semediawiki Semantic extension to the MediaWiiki wiki
Semantic Net Generator https://sourceforge.net/projects/semantag Utility for generating topic maps automatically
Sesame http://www.openrdf.org/ An open source RDF database with support for RDF Schema inferencing and querying
SMART http://web.ict.nsc.ru/smart/index.phtml?lang=en System for Managing Applications based on RDF Technology
SMORE http://www.mindswap.org/2005/SMORE/ OWL markup for HTML pages
SPARQL http://www.w3.org/TR/rdf-sparql-query/ Query language for RDF
SWCLOS http://iswc2004.semanticweb.org/demos/32/ A semantic Web processor using Lisp
Swoogle http://swoogle.umbc.edu/ A semantic Web search engine with 1.5 M resources
SWOOP http://www.mindswap.org/2004/SWOOP/ A lightweight ontology editor
Turtle http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/ Terse RDF “Triple” language
WSMO Studio https://sourceforge.net/projects/wsmostudio A semantic Web service editor compliant with WSMO as a set of Eclipse plug-ins
WSMT Toolkit https://sourceforge.net/projects/wsmt The Web Service Modeling Toolkit (WSMT) is a collection of tools for use with the Web Service Modeling Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX)
WSMX https://sourceforge.net/projects/wsmx/ Execution environment for dynamic use of semantic Web services

Tools Still Crude, Integration Not Compelling

Individually, there are some impressive and capable tools on this list. Generally, however, the interfaces are not intuitive, integration between tools is lacking, and why and how standard analysts should embrace them is lacking. In the semantic Web, we have yet to see an application of the magnitude of the first Mosaic browser that made HTML and the World Wide Web compelling.

It is perhaps likely that a similar “killer app” may not be forthcoming for the semantic Web. But it is important to remember just how entwined tools are to accelerating acceptance and growth of new standards and protocols.

Friday   Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on June 12, 2006. It was the follow-on to last week’s Brown Bag Lunch posting. It is also the first attempt I made at assembling semantic Web- and -related tools, which has now grown into the 800+ Sweet Tools listing. No changes have been made to the original posting.

[3] Said Izza, Lucien Vincent and Patrick Burlat, “A Unified Framework for Enterprise Integration: An Ontology-Driven Service-Oriented Approach,” pp. 78-89, in Pre-proceedings of the First International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA’2005), Geneva, Switzerland, February 23 – 25, 2005, 618 pp. See http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf.
[4] Jorge Cardoso and Amit Sheth, “Semantic Web Processes: Semantics Enabled Annotation, Discovery, Composition and Orchestration of Web Scale Processes,” in the 4th International Conference on Web Information Systems Engineering (WISE 2003), December 10-12, 2003, Rome, Italy. See http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf.
[5] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” in ACM Computing Survey, 18(4):323-364, 1986.
[6] Alon Halevy, “Why Your Data Won’t Mix,” ACM Queue vol. 3, no. 8, October 2005. See http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=336.
[7] Chuck Moser, Semantic Interoperability: Automatically Resolving Vocabularies, presented at the 4th Semantic Interoperability Conference, February 10, 2006. See http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt.
[8] Alon Y. Halevy, Zachary G. Ives, Peter Mork and Igor Tatarinov, “Piazza: Data Management Infrastructure for Semantic Web Applications,” Journal of Web Semantics, Vol. 1 No. 2, February 2004, pp. 155-175. See http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf.
[9] Stefano Mazzocchi, Stephen Garland, Ryan Lee, “SIMILE: Practical Metadata for the Semantic Web,” January 26, 2005. See http://www.xml.com/pub/a/2005/01/26/simile.html.
[10] Adrian Mocan, Ed., “WSMX Data Mediation,” in WSMX Working Draft, W3C Organization, 11 October 2005. See http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011.
[11] J.Madhavan , P. A. Bernstein , P. Domingos and A. Y. Halevy, “Representing and Reasoning About Mappings Between Domain Models,” in the Eighteenth National Conference on Artificial Intelligence, pp.80-86, Edmonton, Alberta, Canada, July 28-August 01, 2002.
[12] AnHai Doan, Learning to Map between Structured Representations of Data, Ph.D. Thesis to the Computer Science & Engineering Department, University of Washington, 2002, 133 pp. See http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf.
[13] Michael Stonebraker and Joey Hellerstein, “What Goes Around Comes Around,” in Joseph M. Hellerstein and Michael Stonebraker, editors, Readings in Database Systems, Fourth Edition, pp. 2-41, The MIT Press, Cambridge, MA, 2005. See http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf.
[14] John Miles Smith and Diane C. P. Smith, “Database Abstractions: Aggregation and Generalization,” ACM Transactions on Database Systems 2(2): 105-133, 1977.
[15] Michael Hammer and Dennis McLeod, “Database Description with SDM: A Semantic Database Model,” ACM Transactions on Database Systems 6(3): 351-386, 1981.
Posted:April 2, 2010

Friday  Brown Bag Lunch

Semantic mediation — that is, resolving semantic heterogeneities — must address more than 40 discrete categories of potential mismatches from units of measure, terminology, language, and many others. These sources may derive from structure, domain, data or language.

Earlier postings in this recent series traced the progress in climbing the data federation pyramid to today’s current emphasis on the semantic Web. Partially this series is aimed at disabusing the notion that data extensibility can arise simply by using the XML (eXtensible Markup Language) data representation protocol. As Stonebraker and Hellerstein correctly observe:

XML is sometimes marketed as the solution to the semantic heterogeneity problem . . . . Nothing could be further from the truth. Just because two people tag a data element as a salary does not mean that the two data elements are comparable. One could be salary after taxes in French francs including a lunch allowance, while the other could be salary before taxes in US dollars. Furthermore, if you call them “rubber gloves” and I call them “latex hand protectors”, then XML will be useless in deciding that they are the same concept. Hence, the role of XML will be limited to providing the vocabulary in which common schemas can be constructed.[1]

This series also covers the ontologies and the OWL language (written in XML) that now give us the means to understand and process these different domains and “world views” by machine. According to Natalya Noy, one of the principal researchers behind the Protégé development environment for ontologies and knowledge-based systems:

How are ontologies and the Semantic Web different from other forms of structured and semi-structured data, from database schemas to XML? Perhaps one of the main differences lies in their explicit formalization. If we make more of our assumptions explicit and able to be processed by machines, automatically or semi-automatically integrating the data will be easier. Here is another way to look at this: ontology languages have formal semantics, which makes building software agents that process them much easier, in the sense that their behavior is much more predictable (assuming they follow the specified explicit semantics–but at least there is something to follow). [2]

Again, however, simply because OWL (or similar) languages now give us the means to represent an ontology, we still have the vexing challenge of how to resolve the differences between different “world views,” even within the same domain. According to Alon Halevy:

When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies–or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel. [3]

In the sections below, we describe the sources for how this heterogeneity arises and classify the many different types of heterogeneity. I then describe some broad approaches to overcoming these heterogeneities, though a subsequent post looks at that topic in more detail.

Causes and Sources of Semantic Heterogeneity

There are many potential circumstances where semantic heterogeneity may arise (partially from Halevy [3]):

  • Enterprise information integration
  • Querying and indexing the deep Web (which is a classic data federation problem in that there are literally tens to hundreds of thousands of separate Web databases) [4]
  • Merchant catalog mapping
  • Schema v. data heterogeneity
  • Schema heterogeneity and semi-structured data.

Naturally, there will always be differences in how differing authors or sponsors create their own particular “world view,” which, if transmitted in XML or expressed through an ontology language such as OWL may also result in differences based on expression or syntax. Indeed, the ease of conveying these schema as semi-structured XML, RDF or OWL is in and of itself a source of potential expression heterogeneities. There are also other sources in simple schema use and versioning that can create mismatches [3]. Thus, possible drivers in semantic mismatches can occur from world view, perspective, syntax, structure and versioning and timing:

  • One schema may express a similar “world view” with different syntax, grammar or structure
  • One schema may be a new version of the other
  • Two or more schemas may be evolutions of the same original schema
  • There may be many sources modeling the same aspects of the underlying domain (“horizontal resolution” such as for competing trade associations or standards bodies), or
  • There may be many sources that cover different domains but overlap at the seams (“vertical resolution” such as between pharmaceuticals and basic medicine).

Regardless, the needs for semantic mediation are manifest, as are the ways in which semantic heterogeneities may arise.

Classification of Semantic Heterogeneities

The first known classification scheme applied to data semantics that I am aware of is from William Kent nearly 20 years ago.[5] (If you know of earlier ones, please send me a note.) Kent’s approach dealt more with structural mapping issues (see below) than differences in meaning, which he pointed to data dictionaries as potentially solving.

The most comprehensive schema I have yet encountered is from Pluempitiwiriyawej and Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources.” [6] They classify heterogeneities into three broad classes:

  • Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying DTDs. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names.
  • Domain conflicts arise when the semantic of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the DTDs and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts.
  • Data conflicts refer to discrepancies among similar or related data values across multiple sources. Data conflicts can only be detected by comparing the underlying DOCs. The class of data conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the element contents and the attribute values.

Moreover, mismatches or conflicts can occur between set elements (a “population” mismatch) or attributes (a “description” mismatch).

The table below builds on Pluempitiwiriyawej and Hammer’s schema by adding the fourth major explicit category of language, leading to about 40 distinct potential sources of semantic heterogeneities:

Class

Category

Subcategory

STRUCTURAL Naming Case Sensitivity
Synonyms
Acronyms
Homonyms
Generalization / Specialization
Aggregation Intra-aggregation
Inter-aggregation
Internal Path Discrepancy
Missing Item Content Discrepancy
Attribute List Discrepancy
Missing Attribute
Missing Content
Element Ordering
Constraint Mismatch
Type Mismatch
DOMAIN Schematic Discrepancy Element-value to Element-label Mapping
Attribute-value to Element-label Mapping
Element-value to Attribute-label Mapping
Attribute-value to Attribute-label Mapping
Scale or Units
Precision
Data Representation Primitive Data Type
Data Format
DATA Naming Case Sensitivity
Synonyms
Acronyms
Homonyms
ID Mismatch or Missing ID
Missing Data
Incorrect Spelling
LANGUAGE Encoding Ingest Encoding Mismatch
Ingest Encoding Lacking
Query Encoding Mismatch
Query Encoding Lacking
Languages Script Mismatches
Parsing / Morphological Analysis Errors (many)
Syntactical Errors (many)
Semantic Errors (many)

Most of these line items are self-explanatory, but a few may not be:

  • Homonyms refer to the same name referring to more than one concept, such as Name referring to a person v. Name referring to a book
  • A generalization/specialization mismatch can occur when single items in one schema are related to multiple items in another schema, or vice versa. For example, one schema may refer to “phone” but the other schema has multiple elements such as “home phone,” “work phone” and “cell phone”
  • Intra-aggregation mismatches come when the same population is divided differently (Census v. Federal regions for states, or full person names v. first-middle-last, for examples) by schema, whereas inter-aggregation mismatches can come from sums or counts as added values
  • Internal path discrepancies can arise from different source-target retrieval paths in two different schema (for example, hierarchical structures where the elements are different levels of remove)
  • The four sub-types of schematic discrepancy refer to where attribute and element names may be interchanged between schema
  • Under languages, encoding mismatches can occur when either the import or export of data to XML assumes the wrong encoding type. While XML is based on Unicode, it is important that source retrievals and issued queries be in the proper encoding of the source. For Web retrievals this is very important, because only about 4% of all documents are in Unicode, and earlier BrightPlanet provided estimates there may be on the order of 25,000 language-encoding pairs presently on the Internet
  • Even should the correct encoding be detected, there are significant differences in different language sources in parsing (white space, for example), syntax and semantics that can also lead to many error types.

It should be noted that a different take on classifying semantics and integration approaches is taken by Sheth et al. [7] Under their concept, they split semantics into three forms: implicit, formal and powerful. Implicit semantics are what is either largely present or can easily be extracted; formal languages, though relatively scarce, occur in the form of ontologies or other descriptive logics; and powerful (soft) semantics are fuzzy and not limited to rigid set-based assignments. Sheth et al.’s main point is that first-order logic (FOL) or descriptive logic is inadequate alone to properly capture the needed semantics.

From my viewpoint, Pluempitiwiriyawej and Hammer’s [6] classification better lends itself to pragmatic tools and approaches, though the Sheth et al. approach also helps indicate what can be processed in situ from input data v. inferred or probabalistic matches.

Importance of Reference Standards

An attractive and compelling vision  — perhaps even a likely one  — is that standard reference ontologies become increasingly prevalent as time moves on and semantic mediation is seen as more of a mainstream problem. Certainly, a start on this has been seen with the use of the Dublin Core metadata initiative, and increasingly other associations, organizations, and major buyers are busy developing “standardized” or reference ontologies.[8] Indeed, there are now more than 10,000 ontologies available on the Web.[9] Insofar as these gain acceptance, semantic mediation can become an effort mostly at the periphery and not the core.

But, such is not the case today. Standards only have limited success and in targeted domains where incentives are strong. That acceptance and benefit threshold has yet to be reached on the Web. Until such time, a multiplicity of automated methods, semi-automated methods and gazetteers will all be required to help resolve these potential heterogeneities.

Friday  Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on June 6, 2006. No changes have been made to the original posting. Current approaches to dealing with these heterogeneities would be to use “bridging” ontologies that map the mismatches.

[1] Michael Stonebraker and Joey Hellerstein, “What Goes Around Comes Around,” in Joseph M. Hellerstein and Michael Stonebraker, editors, Readings in Database Systems, Fourth Edition, pp. 2-41, The MIT Press, Cambridge, MA, 2005. See http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf.
[2] Natalya Noy, “Order from Chaos,” ACM Queue vol. 3, no. 8, October 2005 See http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=341&page=1
[3] Alon Halevy, “Why Your Data Won’t Mix,” ACM Queue vol. 3, no. 8, October 2005. See http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=336.
[4] Michael K. Bergman, “The Deep Web: Surfacing Hidden Value,” BrightPlanet Corporation White Paper, June 2000. The most recent version of the study was published by the University of Michigan’s Journal of Electronic Publishing in July 2001. See http://www.press.umich.edu/jep/07-01/bergman.html.
[5] William Kent, “The Many Forms of a Single Fact”, Proceedings of the IEEE COMPCON, Feb. 27-Mar. 3, 1989, San Francisco. Also HPL-SAL-88-8, Hewlett-Packard Laboratories, Oct. 21, 1988. [13 pp]. See http://www.bkent.net/Doc/manyform.htm.
[6] Charnyote Pluempitiwiriyawej and Joachim Hammer, “A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources,” Technical Report TR00-004, University of Florida, Gainesville, FL, 36 pp., September 2000. See ftp.dbcenter.cise.ufl.edu/Pub/publications/tr00-004.pdf.
[7] Amit Sheth, Cartic Ramakrishnan and Christopher Thomas, “Semantics for the Semantic Web: The Implicit, the Formal and the Powerful,” in Int’l Journal on Semantic Web & Information Systems, 1(1), 1-18, Jan-March 2005. See http://www.informatik.uni-trier.de/~ley/db/journals/ijswis/ijswis1.html
[8] See, among scores of possible examples, the NIEM (National Information Exchange Model) agreed to between the US Departments of Justice and Homeland Security; see http://www.niem.gov/.