Posted:November 8, 2009

Must Read: ‘Data Smoke and Mirrors’

Mazzocchi Sounds a Warning to Linked Data Advocates

Stefano Mazzocchi has been a clear thinker for years and an innovative contributor to the community since his early leadership of the Apache Cocoon project. One of his best qualities is he speaks his mind. Now at Freebase, but previously with MIT’s Simile program, he is one of my dedicated reads via his Stefano’s Linotype blog.

His aforementioned post, Data Smoke and Mirrors, stands on its own, and I highly recommend it. He particularly focuses on the conversion of data.gov datasets to “linked data” (my quotes are purposeful). Combined with the recent poor conversion of New York Times datasets to linked data, I think he is the canary sending out a warning about a disturbing trend.

Posting linked data for its own sake — whatever the reasons — risks undercutting the premise.

We have now moved beyond “proof of concept” to the need for actual useful data of trustworthy provenance and proper mapping and characterization. Recent efforts are a disappointment that no enterprise would or could rely upon.

Listen up, folks.

Schema.org Markup

headline:
Must Read: ‘Data Smoke and Mirrors’

alternativeHeadline:

author:

image:

description:
Mazzocchi Sounds a Warning to Linked Data Advocates Stefano Mazzocchi has been a clear thinker for years and an innovative contributor to the community since his early leadership of the Apache Cocoon project. One of his best qualities is he speaks his mind. Now at Freebase, but previously with MIT’s Simile program, he is one […]

articleBody:
see above

datePublished:

4 thoughts on “Must Read: ‘Data Smoke and Mirrors’

  1. I think Stefano is trying too hard to discredit N-triples and not hard enough to use the skills he has learned as an internet user.

    He says, “I’m a human and I can’t figure out what this is from or for, what it means in practice nor have a way to figure out what M01 stands for, or what series_id SMU55225408000000001 means”

    I am an average user and I was able to make sense of the URI very easily. I did a simple Google search for the series ID “SMU55225408000000001″ and I got a page from the Bureau of Labor Statistics for State and Area Employment, Hours, and Earnings. http://data.bls.gov/PDQ/servlet/SurveyOutputServlet?series_id=SMU55225408000000001&data_tool=XGtable

    This was very easy. Did I do something wrong?

  2. Michael,

    I agree with you that our initial foray into Linked Open Data was less than perfect, but I am hopeful that you will find that our recently released revisions address many of the community’s concerns. We’ve only been at this for 13 days now, but I assure you we are listening, learning and quickly iterating.

    Recent talk “Pipe Dreams” and “Smoke and Mirrors” aside, I anticipate a bright future for Linked Open Data.

    All the best,

    Evan Sandhaus


    Evan Sandhaus
    Semantic Technologist
    New York Times Research + Development
    @kansandhaus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>