Posted:September 30, 2014

AI3 Pulse

A few weeks back I reflected on more than a decade of involvement in the semantic Web. I appreciate the many nice comments and compliments received.

In part in reaction to that post and also because it is the retrospective season, Amit Sheth has just posted his own retrospective on the semWeb going back 15 years. Amit has been one of the leaders in the space and (according to his history) the first to obtain a semantic Web patent.

Amit’s wonderful history is more global and informed than my own, and I recommend you check it out. BTW, Amit is asking for comments from those involved in the early years for any corrections or additions.

Posted by AI3's author, Mike Bergman Posted on September 30, 2014 at 9:41 am in Pulse, Semantic Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1807/pulse-more-semweb-retrospective/
The URI to trackback this post is: http://www.mkbergman.com/1807/pulse-more-semweb-retrospective/trackback/
Posted:September 29, 2014

AI3 Pulse

For years now I have been a huge fan of Pandora (sorry, not apparently available to all across the world due to digital rights issues). Even though Pandora’s Music Genome was not set up as a W3C-compliant ontology, in its use and application it is effectively one. What Pandora shows is that feature selection and characterization trumps language and data structure format.

Given that, I have also been a Web scientist as to how I select and promote music to meet my musical interests.

Thus, based on my own totally unscientific study, here are a few things I’ve found worth passing on. It would be bold to call them secrets; they are really more just observations:

  • When a new “seed” artist is chosen, the attributes of that artist (up to 450 different attributes from beat to genre to dominant instruments) set the pure characteristics of that channel
  • As similar songs play that meet this profile, when you vote them “up” or “down” you are effectively adding or deleting options for these 450 attributes in your profile criteria
  • Thus, continuous expressions of preference as a channel plays acts to “dilute” the purity of the initial seed; these preference expressions lead to a “mixed” seed
  • The more that choices are preferred, the more the signal of your original selection gets diluted.

The net result is that I now no longer vote any of my songs on any channel as up or down. Rather, I look to the purest “seeds” that capture my mode or genre preference. If my initial selection does not provide this purity, I delete it, and try to find a better true seed.

This approach has led to some awesome channels for me, that I can then combine together, depending on mood, into mixed shuffle play with randomized channel selection. I no longer vote any song up or down; I rather look for more telling seeds.

As a couple of examples, here is a channel of less-well known 60′s goldies and a jazz guitar channel that are pure, single-seed channels, that, at least for me, provide hours of consistent music in that genre. Roll your own!

Posted by AI3's author, Mike Bergman Posted on September 29, 2014 at 10:44 am in Pulse | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1804/pulse-pure-v-mixed-seeds-on-pandora/
The URI to trackback this post is: http://www.mkbergman.com/1804/pulse-pure-v-mixed-seeds-on-pandora/trackback/
Posted:September 25, 2014

AI3 Pulse

We all know the slow decline and zero take-off about the semantic Web writ large, but what we probably did not know is that Big data would gobsmack the concept. Trends do not lie.

Some ten years ago the idea of the semantic Web had about a 4x advantage in Google searches compared to that for “big data”. Today, Big data searches exceed those for the semWeb by 25-fold. The cross-over point occurred in about April 2011:
SemWeb v Big Data Google Trends
Within a year, projections are that the Big data interest level will exceed the semantic Web by 35x.

These may not be entirely apples-to-apples cases, but trends don’t often lie. Play yourself with these comparisons on Google Trends.

Posted by AI3's author, Mike Bergman Posted on September 25, 2014 at 1:22 am in Pulse, Semantic Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1803/pulse-big-data-smokes-semweb/
The URI to trackback this post is: http://www.mkbergman.com/1803/pulse-big-data-smokes-semweb/trackback/
Posted:September 15, 2014

Big Structure has a Foundation in Reference Structures, But Any Structure Aids Interoperability

Big Structure is built on a foundation of reference structures, with domain structures capturing the domain at hand. These represent the target foundations for mapping schema and transforming data in the wild into an operable, canonical form. Any structure, even the most lightweight of lists and metadata, can contribute to and be mapped into this model, as this wall of structure shows:

Foundations to Big Structure

Described below are some of these structures, in rough descending order of completeness and usefulness, for making data interoperable. Please note that any of these structures might be available as linked data.

Reference Structures

In both semantics and artificial intelligence — and certainly in the realm of data interoperability — there is always the problem of symbol grounding. In the conceptual realm, symbol grounding means that when we use a term or phrase we are referring to the same thing. In the data value realm, symbol grounding means that when we refer to an object or a number, we are referring to the same measure.

UMBEL is the standard reference ontology used by Structured Dynamics. It contains 28,000 concepts (classes and relationships) derived from the Cyc knowledge base. The reference concepts of UMBEL are mapped to Wikipedia, schema.org (used in Google’s knowledge graph), DBpedia ontology classes, GeoNames and PROTON. Similar reference structures are used to ground the actual data values and attributes.

Other reference structures may be used, so long as they are rather complete in scope and coherent in their relationships. Logical consistency is a key requirement for grounding.

Knowledge Bases

Knowledge bases combine schema with data in a logical manner; well-constructed ones support computations, inference and reasoning. To date, the two primary knowledge bases that we use are Wikipedia and Cyc. However, many specific domain knowledge bases also exist.

Knowledge bases are important sources for symbol grounding. It addition, because of their computability, they may be used with artificial intelligence methods to both extend the knowledge base and to refine the feature estimates used in the AI algorithms.

Domain Ontologies

Domain ontologies, constructed as graphs, are the principal working structures in data interoperability. Though best practices recommend they be grounded in the reference structures, the domain structures are the ones that specifically capture the concepts and data attributes of the target information domain. More effort should be focused at this level in the wall of structure than any other.

Domain structures provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness. Further, these domain structures can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And, these domain structures may be matured incrementally, which means their development is cost-effective.

Mappings

Data and schema in the wild need to be mapped and transformed into these canonical structures. What is known as data wrangling is an aspect of these mappings and transformations. Mappings thus become the glue that ties native data to interoperable forms.

Mapping is the critical bridging function in data interoperability. It requires tools and background intelligence to suggest possible correspondences; how well this is done is a key to making the semi-automatic mapping process as efficient as possible. Mapping structures are the result of the final correspondences. Mapping effort is a function of the scope of Big Structure, not the volume of Big Data.

Existing Structures

A broad variety of structures occur in the wild — from database schema and taxonomies to dictionaries and lists — that need to be represented in a common form and then mapped in order to support interoperability. The common representation used by Structured Dynamics is the RDF data model.

Structure Scripts

Scripting and tooling are essential to help create Big Structure efficiently.

Editor’s Note: We are pleased to share with you in advance some of the text from Structured Dynamics’ new Web site.
Posted:September 9, 2014

UMBEL Logo Much Clean-up, Consistency Brought to New Version

Structured Dynamics today released version 1.10 of its open source UMBEL (Upper Mapping and Binding Exchange Layer) reference concept ontology. This new release is in preparation for a series of subsequent releases planned over the next few months as additional consistency and functionality is brought to the system.

This is the first UMBEL version to be created and loaded from scratch using a new Clojure scripting framework. Fred Giasson describes these scripts and highlights some of the new functionality in a separate blog post.

Besides this new processing framework, here are the key changes at a high level made in this new version:

  • Reconciliation with the parent OpenCyc concepts
  • Added reference concept definitions where missing
  • Added additional altLabels (to the semsets) in many cases
  • Checked graph integrity for relationships between concepts
  • Reviewed and corrected prefLabels to make them unique (for more usefulness in autocompletion)
  • Checked assignments of all reference concepts to a parent SuperType
  • Reviewed SuperType assignment inconsistencies and removed some disjoint assertions
  • Updated mappings to OpenCyc, GeoNames, schema.org and the DBpedia ontology (see next).

This new UMBEL v. 1.10 is now updated and consistent with OpenCyc (version 4.0, dated October 12, 2012), GeoNames (version 3.1, dated October 29, 2012), schema.org (version 1.9, dated August 19, 2014) and the DBpedia ontology (version 3.9, retrieved August 18, 2014). The resulting mappings now are:

  • 26,091 Reference Concepts in UMBEL Core
  • 1,925 Reference Concepts in UMBEL Geo
  • 27,691 links between OpenCyc and UMBEL (which includes Core & Geo)
  • 754 links between Schema.org classes and UMBEL
  • 688 links between Geonames.org classes and UMBEL
  • 682 links between the DBpedia ontology classes and UMBEL.

These changes have also resulted in some improvements to the umbel.org Web site and its Web services, as more fully described by Fred. The updated ontologies and mappings may be found at the UMBEL GitHub site.

What’s In the Pipeline

As noted, this version update is in preparation for some pending activities, which are now moving toward completion. Subsequent releases (perhaps not in the order shown) will include:

  • Introduction of a new reference Attributes Ontology as a new module extension to UMBEL
  • Completion of the mappings to the English Wikipedia
  • Adding new concepts resulting from these Wikipedia mappings, plus adding concepts to complete 100% mappings to GeoNames, DBpedia and schema.org
  • Additional definition and semset updates to the structure
  • Checks to external mappings for consistency
  • Automated tests for completing the integrity of the UMBEL graph by identifying missing connecting concepts
  • An improved method to extend disjointedness assertions across the UMBEL structure
  • Additional Web services to support the above.

Look here and on the UMBEL mailing list for these future announcements.

Posted by AI3's author, Mike Bergman Posted on September 9, 2014 at 10:10 am in UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1795/umbel-version-1-10-released/
The URI to trackback this post is: http://www.mkbergman.com/1795/umbel-version-1-10-released/trackback/