Posted:June 2, 2011

Schema.orgContrary to Some Views, Google and Co.’s Microdata Effort will Also Boost RDF

In my opinion, perhaps the most important event for the structured Web since RDF was released a dozen years ago was today’s joint announcement by the search engine triumvirate of Google, Bing and Yahoo! releasing Schema.org. Schema.org is a vendor specification for nearly 300 mini-schema (or structured record definitions) that can be used to tag information in Web pages. These schema are organized into a clean little hierarchy and cover many of the leading things — from organizations to people to products and creative works — that can be written about and characterized on the Web.

These schema specifications are based on the microdata standard presently under review as part of the pending HTML5 specification. Microdata are set record descriptions of key-value pair attributes that can be embedded into the HTML Web page language. These microdata schema are similar to microformats, but broader in coverage and extensible. Microdata is also simpler than RDFa, another W3C specification that the Schema.org organizers call “. . . extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption.”

Is the Initiative a Slap in RDF’s Face?

Various forums have been alive with howls and questions from many RDF and RDFa advocates that this initiative negates years of effort behind those formats. Yet I and my company, Structured Dynamics, which base our entire technology approach on semantics and RDF, do not see this announcement as a threat or rejection. What gives; what is the difference in perspective?

In our view, RDF and its triple representations in its data model, is the simplest and most expressive means to represent any data or any data relationship. As such, RDF, and its language extensions such as OWL and ontologies, provide a robust and flexible canonical data model for capturing any extant data or schema. No matter what the native form of the source information, we can boil it down to RDF and inter-relate it to any other information. It is for these reasons (and others) we have frequently termed RDF as the universal data solvent.

But, simple records and simple data need not be encumbered with the complexity of RDF. We have long argued for the importance of naive data structs. Many of these are simple key-value pairs where the subject is implied. The so-called little structured data records in Wikipedia, called infoboxes, are of this form. JSON and many simple data formats also have cleaner data formats.

The basic fact that RDF provides a universal data model for any kind of native data does not necessarily translate into its use as the actual data exchange format. Rather, winning data exchange formats are those that can be easily understood, easily expressed and therefore widely used. I think there is a real prospect that microdata, ready for ingest and expression by the Web’s leading search engines, may represent a real sea change in the availability and expression of structured data on the Web.

More structure — not less — is the real fuel that will promote greater adoption of RDF when it comes time to interoperate that data. The RDF community should rejoice that more structure will be coming to the Web from Google et al.’s announcement. We should also soon see an explosion of tools and utilities and services that make it easy to automatically add such structure to Web pages via single clicks. Then, once this structure is available, watch out!

So, while the backers of Schema.org also announced their continued support for microformats and RDFa as they presently exist, I rather suspect today’s announcement represents a denouement for these alternative formats. Though these formats may be creatively destroyed, I think the effect on RDF itself will be a profound and significant boost. I foresee clarity coming to the marketplace regarding RDF’s role:  as a canonical means for expressing data of any form, and not necessarily as a data exchange format.

The Initiative is No Surprise

This initiative, led by Google, should be no surprise. Google is the registered agent for the Schema.org Web site and has been the key proponent of microdata via its support of Ian Hickson in the WhatWG and HTML5 work groups. As I stated a couple of years back, Google has also not hidden its interests in structured data. Practically daily we see more structured data appear in Google search results and it has maintained a very active program in structured data extraction from text and tables for some years.

Google and its search engine partners recognize that search needs are evolving from keyword retrievals to structure, relationships, and filtered, targeted results. Those advances come from structure — as well as the semantic relationships between things that something like the Schema.org begins to represent.

Many within the W3C and elsewhere questioned why Google was pushing microdata when there were competing options such as microformats or RDFa (or even earlier variants). Of course, like Microsoft of a decade earlier, some ascribed Google’s microdata advocacy as arising from commercial interests or clout in advertising alone. Of course Google has an economic interest in the growth and usefulness of the Web. But I do not believe its advocacy to be premised on clout or “my way or the highway.”

Google and the search engine triumvirate understand well — much better than many of the researchers and academics that dominate mailing list discussions — that use and adoption trump elegance and sophistication. When one deconstructs the design of microdata and the nearly 300 schema now released behind it, I think the pragmatic observer can only come to one conclusion: Job well done!

Why This is Exciting

I have been a fervent RDF advocate for nearly a decade and have also been a vocal proponent of the structured Web as a necessary stepping stone to the semantic Web. In fact, here is a repeat of a diagram I have used many times over the past 5 years:

Transition in Web Structure
Document Web Structured Web
Semantic Web
Linked Data
  • Document-centric
  • Document resources
  • Unstructured data and semi-structured data
  • HTML
  • URL-centric
  • circa 1993
  • Data-centric
  • Structured data
  • Semi-structured data and structured data
  • XML, JSON, RDF, etc
  • URI-centric
  • circa 2003
  • Data-centric
  • Linked data
  • Semi-structured data and structured data
  • RDF, RDF-S
  • URI-centric
  • circa 2007
  • Data-centric
  • Linked data
  • Semi-structured data and structured data
  • RDF, RDF-S, OWL
  • URI-centric
  • circa ???

When one looks at the schema of schema that accompany today’s announcement, it is really clear just how encompassing and important these instant standards will become:

DataType

Thing

Intangible

CreativeWork

Event

Organization

LocalBusiness

AnimalShelter
AutomotiveBusiness

ChildCare
DryCleaningOrLaundry
EmergencyService

EmploymentAgency
EntertainmentBusiness

FinancialService

FoodEstablishment

GovernmentOffice

HealthAndBeautyBusiness

HomeAndConstructionBusiness

InternetCafe
Library
LodgingBusiness

MedicalOrganization

ProfessionalService

RadioStation
RealEstateAgent
RecyclingCenter
SelfStorage
ShoppingCenter
SportsActivityLocation

Store

TelevisionStation
TouristInformationCenter
TravelAgency

NGO

SportsTeam

Organization (con’t)

Person
Place

Product

Today’s announcement is the best news I have heard in years regarding the structured Web, RDF, and the semantic Web. This announcement is — I believe — the signal event of the structured Web. With regard to my longstanding diagram above, I can go to bed tonight knowing we have now crossed the threshold into the semantic Web.

Posted by AI3's author, Mike Bergman Posted on June 2, 2011 at 8:57 pm in Adaptive Information, Structured Web | Comments (7)
The URI link reference to this post is: http://www.mkbergman.com/962/structured-web-gets-massive-boost/
The URI to trackback this post is: http://www.mkbergman.com/962/structured-web-gets-massive-boost/trackback/