12 thoughts on “Practical P-P-P-Problems with Linked Data

  1. Mike,

    First off, I don’ think you are picking on me. You picked a good example, but you misinterpreted the following:

    1. My intent – which is 100% of the time about stimulating discussion (so I was deliberately succinct)

    2. My views about “owl:sameAs” – I’ve never been (or will be) an advocate for “owl:sameAs” misuse, I raise concerns about misuse each time I encounter it.

    Clarification about my LOD mailing list post follow.

    Paul has (or had) a problem with regards to obtaining accurate data from DBpedia or Freebase for his Ookaboo data space. I believe this was specifically about geo points etc.

    My suggestions to Paul:

    1. make a new data space (could be Ookaboo or another);

    2. in the new data space create accurate data (as he sees it);

    3. since you know the Names of the problem Entities in Both DBpedia and Freebase, use “owl:sameAs” in the more accurate data space for coreference which enables the use of “inference context” to conditionally deliver “union expansion” in situations where accuracy requirements are lower;

    4. share you newly curated data with the burgeoning Web of Linked Data by publishing from a data space such that each of these newer and more accurate Entities are endowed with Resolvable Names (typically HTTP re. RDF based Linked Data) that basically carry your insignia and also form the underlying basis for Attribution and (if you so choose) monetization re. Linked Data value chain contribution.

    Important point re. “owl:sameAs”, cofererence, and inference contexts:
    Virtuoso has always been about conditional entailment of “owl:sameAs” assertions within the context of SPARQL queries. I have never given an “owl:sameAs” demo or made a comment about said subject without “conditional entailments” at the back of my mind 🙂

    I hope I’ve cleared up this matter. Naturally, the lod.openlinksw.com instance of the LOD cloud cache is a nice place to experiment and explore “conditional owl:sameAs entailments” against as massive Linked Data Space. It also amplifies what the real issue is i.e. conditional entailments, using backward chained reasoning, at massive scale (e.g., 17 Billion Triples).

    Your concerns are well founded, but my recent LOD mailing list comments are not your exemplar; especially as I am advocating the same thing as you which boils down to this: use “owl:sameAs” with caution.

    Also, and more importantly, we still share common agreement about the following: Linked Data isn’t about the ABox solely, there is a much more powerful TBox dimension that most ignore, due to the capabilities shortfalls of many platforms in the Linked Data Deployment realm.

    Kingsley

  2. Hi Kingsley,

    Thanks for the clarification, and I am glad you took my comments in your normal spirit of constructive dialog and discussion.

    However, I am still perplexed about what appears to me to be a new construct: “conditional entailment”. I’m not familiar with that in any of the specs. And, if Virtuoso is conditioning its entailments, where is that documented? What are the conditions? Does Virtuoso apply different conditions to different predicates? different named graphs?

    I know that various vendors (including Openlink) have created their own extensions to SPARQL, that also tend to somewhat track one another, and which are (hopefully) informing the next spec. Is a similar thing going on with inferencing and entailments?

    Any pointers on this would be helpful.

    Thanks, again, Kingsley.

    Mike

  3. My current thinking is that you should be consistant in the use of sameAs in a single dataset.

    If a 3rd party chooses to honor your sameAs declarations depends if their current application of the data is similar to.

    For example, if you create a database of which movies have comic book characters in, then all batman URIs are the same. If you are doing a database of what tools and weapons comic book characters use, then the batman in different movies and writers are distinct.

    There is not one truth for all.

  4. Mike,

    We use pragmas in our SPARQL extensions to make entailments conditional. Here are the steps:

    1. Put rules in a graph (typically this is inherent in an ontology so an ontology can suffice e.g. umbel)

    2. Associate this ontology graph with a named rule

    3. When issuing a SPARQL query simply add a pragma that tells Virtuoso to use a name rule.

    When the above is done, you end up with a SPARQL query applied to a working set that’s the product of backward chained reasoning. Thus, the entailments from the backward chained reasoning context become the basis of the eventual SPARQL solution (query result set). Take away the inference rules pragma and there is no reasoning whatsoever. This has been the case since 2007.

    i have never commented about “owl:sameAs” without SPARQL pragmas factoring into my world view.

    Some examples:

    1. http://lists.w3.org/Archives/Public/public-lod/2009May/0067.html — mailing list post about “owl:sameAs” and how we make the entailments conditional via Virtuoso SPARQL pragmas

    2. http://www.mail-archive.com/public-lod@w3.org/msg00870.html — example using UMBEL

    3. http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg01210.html — not “owl:sameAs” specific but showing conditional entailments use re. fixing DBpedia ABox data .

    Kingsley
    .

  5. Hi Kingsley,

    I’m really going to be dense here, because I’m still not sure we are communicating.

    First, of your three examples, only the first one is explicitly about owl:sameAs, and that one merely does a string lookup to assert a sameAs relationship. (Probably itself a questionable practice.) I don’t believe my question about “conditional entailments” that your raised in relation to sameAs are addressed by any of these three examples.

    Second, in my original post, what I was criticizing was the use of asserting owl:sameAs to two external datasets from the cleaned one (what I called [CCC] and was Paul’s Ookaboo in your original response). The problem I see in this case is the entailment that the third dataset [CCC] is making to *both* [AAA] (DBpedia) and [BBB] (Freebase) viz the owl:sameAs assertion. Via OWL semantics, I believe this to be wrong and a misuse of the predicate as defined in the specifications.

    I think actually that the idea of a “conditional entailment”, properly defined and published so the semantics are clear, is one approach, though again I still have open questions about what you mean by this and where it is defined for this circumstance. Also, maybe some SWRL stuff could accomplish a similar intent. But, what I was really advocating in my article was to define new predicates that actually capture the intended relationship.

    Unless I am really dense here (always a distinct possibility 😉 ), I still believe that your advocacy for owl:sameAs in Paul’s circumstance is wrong.

    Best, Mike

  6. Mike,

    <<>

    I don’t know how you arrived at the string lookup conclusion. Why would we be doing that? Pragmas instruct the underlying Virtuoso compiler.

    When the inference context pragma is enabled we are going to perform create a working set that would include union expansion of the owl:sameAs triples i.e. de-reference the local objects of owl:sameAs triples and then (if a crawl pragma is used) actually de-reference over HTTP (e.g. Web of Linked Data if the URIs leads us in that direction). After all of that we also use Transitivity to set direction fo the union that’s used to make the working set.

    What do I mean by all of this?

    Virtuoso will not effect an owl:sameAs claim just because it’s asserted in a triple. You have to explicitly tell it to do so via a Pragma.

    Simple example:

    owl:sameAs .

    If pragmas for inference context and crawling are enabled my working set will included a union of the graphs associated with the “S” and the “O” re. above triple.

    if i disable the crawler pragma, and “S” is local to my data space while “O” is remote, I stil get the benefit of coreference such that the solution for the following queries are identical:

    DESCRIBE .

    DESCRIBE .

    If I turn off all pragmas, and “S” is local the statement above will not be identical since corefrence is not discerned since inference context is off.

    The good, bad, and ugly re. “owl:sameAs” has and will always ge conditional re. Virtuoso. I can’t speak for other RDF stores.

    Back to my suggestion to Paul:

    You are basically saying that the following is wrong:

    Go make your own data collection that deals with inaccuracies, if you so choose, publish it to the Web so that others can get high quality data from yor data space while using your URIs as a mechanism for conveying your imprint which keeps you hooked into the data consumption value chain. Also, as you do in real life, reference your sources (so link your URIs to the relevant DBpedia and Freebase URIs).

    Basically you are saying that he can’t have: URI for that is owl;samAs , . Of course he can, and that’s his call to make.

    My specific response to Paul was a journey towards discovering a platform that performs “owl:sameAs” reasoning conditionally (which he isn’t using otherwise he would have resolved his dilema) . And therein lies the point of contention (as I see it) since this behavior isn’t the norm. That said, I haven’t seen full union expansion of owl:sameAs as the norm either, since following your-nose via a browser through “S” or “O” URIs ins’t what I am talking about, far from it. This is about producing a solution for a SPARQL query with or without union expansion (that includes full or partial transitive closure).

    Key principle in play here is this re. Virtuoso and its SPARQL pragmas for inference rules: give people rope, even enough for them to hang themselves with, but don’t forget to make this reality optional 🙂

    Kingsley

  7. Mike,

    A cleaned up response without some typos and escaping problems etc..

    **
    First, of your three examples, only the first one is explicitly about owl:sameAs, and that one merely does a string lookup to assert a sameAs relationship. (Probably itself a questionable practice.) I don’t believe my question about “conditional entailments” that your raised in relation to sameAs are addressed by any of these three examples.
    **

    I don’t know how you arrived at the string lookup conclusion. Why would we be doing that? Pragmas instruct the underlying Virtuoso compiler.

    When the inference context pragma is enabled we are going to perform create a working set that would include union expansion of the owl:sameAs triples i.e. de-reference the local objects of owl:sameAs triples and then (if a crawl pragma is used) actually de-reference over HTTP (e.g. Web of Linked Data if the URIs leads us in that direction). After all of that we also use Transitivity to set direction fo the union that’s used to make the working set.

    What do I mean by all of this?

    Virtuoso will not effect an owl:sameAs claim just because it’s asserted in a triple. You have to explicitly tell it to do so via a Pragma.

    Example using a Triple pattern:

    %lt;http://kingsley.idehen.net/dataspace/person/kidehen#this&gt; owl:sameAs <acct:kidehen@id.myopenlink.net>.

    If pragmas for inference context and crawling are enabled my working set will included a union of the graphs associated with the “S” and the “O” re. above triple.

    if i disable the crawler pragma, and “S” is local to my data space while “O” is remote, I stil get the benefit of coreference such that the solution for the following queries are identical:

    DESCRIBE <http://kingsley.idehen.net/dataspace/person/kidehen#this&gt; .

    DESCRIBE <acct:kidehen@id.myopenlink.net> .

    If I turn off all pragmas, and “S” is local, the statements above will not produce identical solutions (result sets). Why? Because corefrence is not discerned due inference context being disabled i.e. pragma is “off” and Virtuoso will do nothing.

    The good, bad, and ugly re. “owl:sameAs” has and will always be conditional re. Virtuoso. I can’t speak for other RDF stores.

    Back to my suggestion to Paul:

    You are basically saying that the following is instruction is wrong:

    Go make your own data collection that deals with inaccuracies, if you so choose, publish it to the Web so that others can get high quality data from yor data space while using your URIs as a mechanism for conveying your imprint which keeps you hooked into the data consumption value chain. Also, as you do in real life, reference your sources (so link your URIs to the relevant DBpedia and Freebase URIs).

    Basically you are saying that he can’t have: FixedParisURI that is owl;samAs NotSoBadParisURI, BadParisURI . Of course he can, and that’s his call to make.

    My specific response to Paul was *me* setting him on a journey towards discovering a platform that performs “owl:sameAs” reasoning conditionally (which he isn’t using otherwise he would have resolved his dilema). Basically, therein lies the point of contention (as I see it) since this behavior isn’t the norm. That said, I haven’t seen full union expansion of owl:sameAs as the norm either, since following your-nose via a browser through “S” or “O” URIs isn’t what I am talking about, far from it. This is about producing a solution for a SPARQL query with or without union expansion (that includes full or partial transitive closure).

    Key principle in play here is this re. Virtuoso and its SPARQL pragmas for inference context, gives people rope, even enough for them to hang themselves with, but doesn’t forget to make this *virtuoso delivered* reality optional

    Kingsley

  8. Mike,

    Here are some links to excerpts from one of our SPARQL tutorials that deals with Virtuoso’s inference functionality (which is driven by our backward-chaining reasoner):

    1. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html#(60) — owl:sameAs pragma example, results will vary based existence of pragma

    2. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html#(42) — other examples that go beyond owl:sameAs .

    Kingsley

Leave a Reply

Your email address will not be published. Required fields are marked *