Posted:December 16, 2013

The Maturation of

Schema.orgComplementary Efforts of the W3C Mirror the Trend

Two and one-half years ago the triumvirate of Google, Bing and Yahoo! — soon to be joined by Yandex, the major Russian search engine — released The purpose of is to bring a simple means for Web site owners and authors to tag their sites with a vocabulary, designed to be understandable by search engines, to describe common things and activities on the Web. Though informed and led by innovators with impeccable backgrounds in the early semantic Web and knowledge representation [1], the founders of also understood that the Web is a messy place with often wrong syntax and usage. Their stated commitment to simplicity and practicality caused me to state the day of release that was “perhaps the most important event for the structured Web since RDF was released a dozen years ago.”

Just a week ago version 1.0e was released. That event, plus much else in recent months, is suggesting a real maturity and take up of It looks like the promise of is being fulfilled.

Growth and Impact of the schema

When first released, provided nearly 300 structured record types that may be used to tag information in Web pages. Via various collaborative processes since, and with an active discussion group, the vocabulary has about doubled in size. Some key areas of expansion have been in describing various actions, adding basic medical terms, product and transaction expansion via linkages to GoodRelations, civic services, and most recently, accessibility. Many other additions are in progress.

In his keynote address at ISWC 2013 in Sydney on October 23, Ramathan Guha [1] reported that 15 percent of crawled pages and 5 million sites have some markup. We can also see that some of the most widely used content management systems on the Web, notably including WordPress, Joomla and Drupal, have or plan to have native support. These tooling trends are important because, though designed for simple manual markup, it does require a bit of attention and skill to get markup right. Having markup added to pages automatically in the background is the next threshold for even broader adoption.

The ability of the vocabulary to capture essential domain facts as structured data is reflected in the growing list of prominent sites tagging with According to Guha, these are some of the prominent sites now using

Category Prominent Sites
News Nytimes,,
Movies imdb, rottentomatoes,
Jobs / careers,,
Videos youtube, dailymotion,,
Events,,, eventful
Key Applications,

Examples like Pinterest show how can also provide a central organizing point for new ventures and applications. There are also key relationships between and new search initiatives such as Google’s Now or its knowledge graph.

From day one was released with a mechanism for other parties to extend its vocabulary. However, more recently, there has been a significant increase of attention on questions of interoperability and relation to other existing vocabularies. To wit:

  • Prominent knowledge representation experts, such as Peter Patel-Schneider, have become active to suggest better interoperability and design considerations
  • The root of is now recognized as owl:Thing
  • Much discussion has occurred on integration or interoperability or not with SKOS, the simple knowledge organizational vocabulary
  • Provisions have been added to capture concepts such as domain and range
  • Calls have been made to increase the number of examples and documentation, including enforcing consistency across the vocabulary.

To be clear, it was never the intent for to become a single, governing vocabulary for the Web. Nonetheless, these broader means to enable others to tie in effectively with it are an indicator that’s sponsors are serious about finding effective common grounds.

Aside from certain areas such as recipes or claiming site or blog ownership, it has been unclear how the search engines are actually using markup or not. The sponsors have oft stated a go-slow attitude to see if the marketplace indeed embraces the vocabulary or not. I’m also sure that the sponsors, as familiar as they are with spam and erroneous markup, have also wanted to put in place effective ingest procedures that do not reduce the quality of their search indexes.

Getting Dan Brickley, one of the better-known individuals in RDF and the semantic Web, to act as’s liaison to the broader community, and beginning to open up about actual usage and uptake of are great signs of the sponsors’ commitment to the vocabulary. We should expect to see a much quickened pace and more visibility for within the search services themselves within the coming months.

W3C’s Complementary Efforts

Meanwhile, back at the ranch, a number of other interesting efforts are occurring within the World Wide Web Consortium (W3C) that are complementary to these trends. As readers of this blog well know, I have argued for some time that RDF makes for a fantastic data model for interoperating disparate content, which our company Structured Dynamics centrally relies upon, but that RDF is not an essential for metadata specification or exchange. Understood serializations based on understood vocabularies — in other words, exactly the design of — should be sufficient to describe the various types of things and their attributes as may be found on the Web. This idea of structured data in a variety of forms puts control into the hands of content authors. Various markets will determine what makes best sense for them as to how they actually express that structured data.

Last week the W3C announced its retirement of the Semantic Web group, subsuming it instead into the activities of the new W3C Data Activity. The W3C also announced a new group in CSV (comma-separated values) data exchange to go along with recent efforts in JSON-LD (linked data).

These are great trends that reflect a prejudice to adoption. Along with the advances taking place with, the Web now appears to be entering into a golden age of structured data.

[1] For example, a Google Fellow instrumental in founding is Ramanathan V. Guha, with a background extending back to Cyc and through Apple and Netscape through what came to be RDF. Guha was also the lead executive behind Google’s Knowledge Graph, which has some key relations with Markup

The Maturation of

Complementary Efforts of the W3C Mirror the Trend



Just a week ago version 1.0e was released. That event, plus much else in recent months, is suggesting a real maturity and take up of It looks like the promise of is being fulfilled.

see above


2 thoughts on “The Maturation of

  1. has been on the uptake in recent times, mostly for authorship, recipes and reviews for display purposes. I think there will be a greater uptake if search engines provide more information about the potential and future uses of, best practices to prevent spam activities and how this fits in with the Knowledge Graph.

Leave a Reply

Your email address will not be published. Required fields are marked *