Posted:July 16, 2014

Battle of Niemen, WWI, photo from WikimediaAre We Losing the War? Was it Even the Right One?

Cinemaphiles will readily recognize Akira Kurosawa‘s Rashomon film of 1951. And, in the 1960s, one of the most popular book series was Lawrence Durrell‘s The Alexandria Quartet. Both, each in its own way, tried to get at the question of what is truth by telling the same story from the perspective of different protagonists. Whether you saw this movie or read these books you know the punchline: the truth was very different depending on the point of view and experience — including self-interest and delusion — of each protagonist. All of us recognize this phenomenon of the blind men’s view of the elephant.

I have been making my living and working full time on the semantic Web and semantic technologies now for a full decade. So has my partner at Structured Dynamics, Fred Giasson. Others have certainly worked longer in this field. The original semantic Web article appeared in Scientific American in 2000 [1], and the foundational Resource Description Framework data model dates from 1999. Fred and I have our own views of what has gone on in the trenches of the semantic Web over this period. We thought a decade was a good point to look back, share what we’ve experienced, and discover where to point our next offensive thrusts.

What Has Gone Well?

The vision of the semantic Web in the Scientific American article painted a picture of globally interconnected data leveraged by agents or bots designed to make our lives easier and more automated. However, by the time that I got directly involved, nearly five years after standards first started to be published, Tim Berners-Lee and many leading proponents of RDF were beginning to shift focus to linked data. The agents, and automation, and ontologies of the initial vision were being downplayed in favor of effective means to publish and consume data based on RDF. In many ways, linked data resembled a re-branding.

This break had been coming for a while, memorably captured by a 2008 ISWC session led by Peter F. Patel-Schneider [2]. This internal division of viewpoint likely caused effort to be split that would have been better spent in proselytizing and improving tools. It also diverted somewhat into internal squabbles. While many others have pointed to a tactical mistake of using an XML serialization for early versions of RDF as a key factor is slowing initial adoption, a factor I agree was at play, my own suspicion is that the philosophical split taking place in the community was the heavier burden.

Whatever the cause, many of the hopes of the heady days of the initial vision have not been obtained over the past fifteen years, though there have been notable successes.

The biomedical community has been the shining exemplar for data interoperability across an entire discipline, with earth sciences, ecology and other science-based domains also showing interoperability success [3]. Families of ontologies accompanied by tooling and best practices have characterized many of these efforts. Sadly, though, most other domains have not followed suit, and commercial interoperability is nearly non-existent.

Most all of the remaining success has resided in single-institution data integration and knowledge representation initiatives. IBM’s Watson and Apple’s Siri are two amazing capabilities run and managed by single institutions, as is Google’s Knowledge Graph. Also, some individual commercial and government enterprises, willing to pay support to semantic technology experts, have shown success in data integration, using RDF, SKOS and OWL.

We have seen the close kinship between natural language, text, and Q & A with the semantic Web, also demonstrated by Siri and more recent offshoots. We have seen a trend toward pairing great-performing open source text engines, notably Solr, with RDF and triple stores. Recommendation systems have shown some success. Linked data publishing has also had some notable examples, including the first of the lot, DBpedia, with certain institutional publishers (such as the Library of Congress, Eurostat, The Getty, Europeana, OpenGLAM [galleries, archives, libraries, and museums]) showing leadership and the commitment of significant vocabularies to linked data form.

On the standards front, early experience led to new and better versions of the SPARQL query language (SPARQL 1.1 was greatly improved in the last decade and appears to be one capability that sells triple stores), RDF 1.1 and OWL 2. Certain open source tools have become prominent, including Protégé, Virtuoso (open source) and Jena (among unnamed others, of course). At least in the early part of this history, tool development was rapid and flourishing, though the innovation pace has dropped substantially according to my tracking database Sweet Tools.

What Has Disappointed?

My biggest disappointments have been, first, the complete lack of distributed data interoperability, and, second, the lack or inability of commercial enterprises to embrace and adopt semantic technologies on their own. The near absence of discussion about instance records and their attributes helps frame the current maturity of the semantic Web. Namely, it has yet to crack the real nuts of data integration and interoperability across organizations. Again, with the exception of the biomedical community, neither in the linked data realm nor in the broader semantic Web, can we point to information based on semantic Web principles being widely shared between systems and organizations.

Some in the linked data community have explicitly acknowledged this. The abstract for the upcoming COLD 2014 workshop, for example, states [4]:

. . . applications that consume Linked Data are not yet widespread. Reasons may include a lack of suitable methods for a number of open problems, including the seamless integration of Linked Data from multiple sources, dynamic discovery of available data and data sources, provenance and information quality assessment, application development environments, and appropriate end user interfaces.

We have written about many issues with linked data, ranging from the use of improper mapping predicates; to the difficulty in publishing; and to dereferencing URIs on the Web since they are sparse and not always properly implemented [5]. But ultimately, most linked data is just instance data that can be represented in simpler attribute-value form. By shunning a knowledge representation language (namely, OWL) at the processing end, we have put too much burden on what are really just instance records. Linked data does not get the balance of labor right. It ignores the reality that data consumers want actionable information over being able to click from data item to data item, with overall quality reduced to the lowest common denominator. If a publisher has the interest and capability to publish quality linked data, great! It should become part of the data ingest pool and the data becomes easy to consume. But to insist on linked data across the board creates unnecessary barriers. Linked data growth has not nearly kept pace with broader structured data growth on the Web [6].

At the enterprise level, the semantic technology stack is hard to grasp and understand for newcomers. RDF and OWL awareness and understanding are nearly nil in companies without prior semantic Web experience, or 99.9% of all companies. This is not a failure of the enterprises; it is the failure of us, the advocates and suppliers. While we (Structured Dynamics) have developed and continue to refine the turnkey Open Semantic Framework stack, and have spent more efforts than most in documenting and explicating its use, the systems are still too complicated. We combine complicated content management systems as user front-ends to a complicated semantic technology stack that needs to be driven by a complicated (to develop) ontology. And we think we are doing some of the best technology transfer around!

Moreover, while these systems are good at integrating concepts and schema, they are virtually silent on the question of actual data integration. It is shocking to say, but the semantic Web has no vocabularies or tools sufficient to enable data items for the same entity from two different datasets to be combined or reconciled [7]. These issues can be solved within the individual enterprise, but again the system breaks when distributed interoperability is the desire. General Web-based inconsistencies, such as in HTML coding or mime types, impose hurdles on distributed interoperability. These are some of the reasons why we see the successes in the context (generally) of single institutions, as opposed to anything that is truly yet Web-wide.

These points, as is often the case with software-oriented technologies, come down to a disappointing state of tooling. Markets drive developer interest, and market share has been disappointing; thus, fewer tools. Tool interest comes from commercial engagements, and not generally grants, the major source of semantic Web funding, particularly in the European Union. Pragmatic tools that solve real problems in user adoption are rarely a sufficient basis for getting a Ph.D.

The weaknesses in tooling extend from basic installation, to configuration, unit and integrated tests, data conversion and lifting, and, especially, all things ontology. Weaknesses in ontology tooling include (critically) mapping, consistency and coherency checking, authoring, managing, version control, re-factoring, optimization, and workflows. All of these issues are solvable; they are standard software challenges. But it is hard to conquer markets largely with the wrong army pursuing the wrong objectives in response to the wrong incentives.

Yet, despite the weaknesses in tooling, we believe we have been fairly effective in transferring technology to our clients. It takes more documentation and more training and, often, accompanying tool development or improvement in the workflow areas critical to the project. But clients need to be told this as well. In these still early stages, successful clients are going to have to expend more staff effort. With reasonable commitment, it is demonstrable that an enterprise can take over and manage a large-scale semantic engagement on its own. Still, for semantic technologies to have greater market penetration, it will be necessary to lower those commitments.

How Has the Environment Changed?

Of course, over the period of this history, the environment as a whole has changed markedly. The Web today is almost unrecognizable from the Web of 15 years ago. If one assumes that Web technologies tend to have a five year or so period of turnover, we have gone through at least two to three generations of change on the Web since the initial vision for the semantic Web.

The most systemic changes in this period have been cloud computing and the adoption of the smartphone. These, plus the network of workstations approach to data centers, have radically changed what is desirable in a large-scale, distributed architecture. APIs have become RESTful and database infrastructures have become flatter and more distributed. These architectures and their supporting infrastructure — such as virtual servers, MapReduce variants, and many applications — have in turn opened the door to performant management of large volumes of flat (key-value or graph) data, or big data.

On the Web side, JavaScript, just a few years older than the semantic Web, is now dominant in Web pages and taking on server-side roles (such as through Node.js). In turn, JSON has now grown in popularity as a form of data representation and transfer and is being adopted to the semantic Web (along with codifying CSV). Mobile, too, affects the Web side because of the need for multiple-platform deployments, touchscreen use, and different user interface paradigms and layout designs. The app ecosystem around smartphones has become a huge source for change and innovation.

Extremely germane to the semantic Web — indeed, overall, for artificial intelligence — has been the occurrence of knowledge-based AI (KBAI). The marrying of electronic Web knowledge bases — such as Wikipedia or internal ones like the Google search index or its Knowledge Graph — with improvements in machine-learning algorithms is systematically mowing down what used to be called the Grand Challenges of computing. Sensors are also now entering the picture, from our phones to our homes and our cars, that exposes the higher-order requirement for data integration combined with semantics. NLP kits have improved in terms of accuracy and execution speed; many semantic tasks such as tagging or categorizing or questioning already perform at acceptable levels for most projects.

On the tooling side, nearly all building blocks for what needs to be done next are available in open source, with some platform areas quite functional (including OSF, of course). We have also been successful in finding clients that agree to open source the development work we do for them, since they are benefiting from the open source development that went on before them.

What Did We Set Out to Achieve?

When Structured Dynamics entered the picture, there were already many tools available and core languages had been released. Our view of the world at that time led us to adopt two priorities for what we thought might be a five year or so plan. We have achieved the objectives we set for ourselves then, though it has taken us a couple of years longer to realize.

One priority was to develop a reference structure for concepts to serve as a “grounding” basis for relating datasets, vocabularies, schema, taxonomies, or ontologies. We achieved this with our first commercial release (v 1.00) of UMBEL in February 2011. Subsequent to that we have progressed to v 1.05. In the coming months we will see two further major updates that have been under active effort for about eight months.

The other priority was to create a turnkey foundation for a semantic enterprise. This, too, has been achieved, with many more releases. The Open Semantic Framework (OSF) is now in version 3.00, backed by a 500-article training documentation and technical wiki. Support tooling now includes automated installation, testing, and data transfer and synchronization.

Because our corporate objectives were largely achieved it was time to look at lessons learned and set new directions. This article, in part, is a result of that process.

How Did Our Priorities Evolve Over the Decade?

I thought it would be helpful to use the content of this AI3 blog to track how concerns and priorities changed for me and Structured Dynamics over this history. Since I started my blog quite soon after my entry into the semantic Web, the record of my perspectives was conterminous and rather complete.

The fifty articles below trace my evolution in knowledge and skills, as well as a progression from structured data to the semantic Web. These 50 articles represent about 11% of all articles in my chronological archive; they were selected as being the most germane to the question of evolution of the semantic Web.

After early ramp up, most of the formative discussion below occurred in the early years. Posts have declined most recently as implementation has taken over. Note most of the links below have  PDFs available from their main pages.

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

The early years of this history were concentrated on gathering background information and getting educated. The release of DBpedia in 2007 showed how knowledge bases would become essential to the semantic Web. We also identified that a lack of shared reference concepts was making it difficult to “ground” different semantic Web datasets or schema to one another. Another key theme was the diversity of native data structures on the Web, but also how all of them could be readily represented in RDF.

By 2008 we began to study the logical underpinnings to the semantic Web as we were coming to understand how it should be practiced. We also began studying Web-oriented architectures as key design guidance going forward. These themes continued into 2009, though now informed by clients and applications, which was expanding our understanding of requirements (and, sometimes, shortcomings) in the enterprise marketplace. The importance of an open world approach to the basic open nature of knowledge management was cementing a clarity of the role and fit of semantic solutions in the overall informaton space. The general community shift to linked data was beginning to surface worries.

2010 marked a shift for us to become more of a popularizer of semantic technologies in the enterprise, useful to attract and inform prospects. The central role of ontologies as the guiding structures (either as codified knowledge structures or as instruction sets for the platform) for OSF opened realizations that generic functional software could be designed that can be re-used in most any knowledge domain by simply changing the data and ontologies guiding them. This increased our efforts in ontology tooling and training, now geared more to the knowledge worker.  The importance of groundings for aligning schema and data caused us to work hard on UMBEL in 2011 to get it to a commercial release state.

All of these efforts were converging on design thoughts about the nature of information and how it is signified and communicated. The bases of an overall philosophy regarding our work emerged around the teachings of Charles S Peirce and Claude Shannon. Semantics and groundings were clearly essential to convey accurate messages. Simple forms, so long as they are correct, are always preferred over complex ones because message transmittal is more efficient and less subject to losses (inaccuracies). How these structures could be represented in graphs affirmed the structural correctness of the design approach. The now obvious re-awakening of artificial intelligence helps to put the semantic Web in context: a key subpart, but still a subset, of artificial intelligence. The percentage of formative articles directly related over these last couple of years to the semantic Web drops much, as the emphasis continues to shift to tech transfer.

What Else Did We Learn?

Not all lessons learned warranted an article on their own. So, we have also reflected on what other lessons we learned over this decade. The overall theme is: Simpler is better.

Distributed data interoperability across the Web is a fundamental weakness. There are no magic tricks to integrate data. Data mapping and integration will always require massaging. Each data integration activity needs its own solution. However, it can greatly be helped with ontologies and with better tooling.

In keeping with the lesson of grounding, a reference ontology for attributes is missing. It is needed as a bridge across disparate datasets describing similar entities or with different attributes for the same entities. It is also a means to reduce the pairwise combinatorial issue of integrating multiple datasets. And, whatever is done in the data integration area, an open world approach will be essential given the nature of knowledge information.

There is good design and best practice for distributed architectures. The larger these installations become, the more important it is to use a lightweight, loosely-coupled design. RESTful Web services and their interfaces are key. Simpler services with fewer functions can be designed to complement one another and increase throughput effectiveness.

Functional programming languages align well with the data and schema in knowledge management functions. Ontologies, as structures, also fit well with functional languages. The ability to create DSLs should continue to improve bringing the knowledge management function directly into the hands of its users, the knowledge workers.

In a broader sense, alluded to above, the semantic Web is but a set of concepts. There are multiple ways to use it. It can be leveraged without requiring “core” semantic Web tools such a triple stores. Solr can act as a semantic store because semantics, NLP and search are naturally married. But, the semantic Web, in turn, needs to become re-embedded in artificial intelligence, now backed by knowledge bases, which are themselves creatures of the semantic Web.

Design needs to move away from linked data or the semantic Web as the goals. The building blocks are there, though perhaps not yet combined or expressed well. The real improvements now to the overall knowledge function will result from knowledge bases, artificial intelligence, and the semantic Web working together. That is the next frontier.

Overall, we perhaps have been in the wrong war for the wrong reasons. Linked data is certainly not an end and mostly appears to represent work, rather than innovation. The semantic Web is no longer the right war, either, because improvements there will not come so much from arguing semantic languages and paradigms. Learning how to master distributed data integration will teach the semantic Web much, and coupling artificial intelligence with knowledge bases will do much to improve the most labor-intensive stumbling blocks in the knowledge management workflow: mappings and transformations. Further, these same bases will extend the reach into analytical and statistical realms.

The semantic Web has always been an infrastructure play to us. On that basis, it will be hard to ever judge market penetration or dominance. So, maybe in terms of a vision from 15 years ago the growth of the semantic Web has been disappointing. But, for Fred and me, we are finally seeing the landscape clearly and in perspective, even if from a viewpoint that may be different from others’. From our vantage point, we are at the exciting cusp of a new, broader synthesis.

NOTE: This is Part I of a two-part series. Part II will appear shortly.

[1] Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web,” in Scientific American 284(5): pp 34-43, 2001. See http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2.
[2] For those with a spare 90 minutes or so, you may also want to view this panel session and debate that took place on “An OWL 2 Far?” at ISWC ’08 in Karlsruhe, Germany, on October 28, 2008. The panel was chaired by Peter F. Patel-Schneider (Bell Labs, Alcathor) with the panel members of Stefan Decker (DERI Galway), Michel Dumontier (Carleton University), Tim Finin (University of Maryland) and Ian Horrocks (University of Oxford), with much audience participation. See http://videolectures.net/iswc08_panel_schneider_owl/
[3] Open Biomedical Ontologies (OBO) is an effort to create controlled vocabularies for shared use across different biological and medical domains. As of 2006, OBO formed part of the resources of the U.S. National Center for Biomedical Ontology (NCBO). As of the date of this article, there were 376 ontologies listed on the NCBO’s BioOntology site. Both OBO and BioOntology provide tools and best practices.
[4] Fifth International Workshop on Consuming Linked Data (COLD 2014), co-located with the 13th International Semantic Web Conference (ISWC) in Riva del Garda, Italy, October 19-20.
[7] See the thread on the W3C semantic web mailing list beginning at http://lists.w3.org/Archives/Public/semantic-web/2014Jul/0129.html.
Posted:April 24, 2014
Open Semantic Framework

Another Expansion in Documentation for the Open Semantic Framework

The Open Semantic Framework is a complete foundation to bring semantic technology capabilities to the enterprise. OSF has applications from enterprise information integration to collaboration networks and open government. It has been under development since 2009, leveraging a set of robust open source engines and connecting Web services and architecture, and is now in its third major version. OSF is fully integrated as a semantic technology extension to the Drupal content management system.

Structured Dynamics, the developer of OSF, with the generous support of SD clients, has been committed to provide excellent documentation and tech transfer support to OSF since its inception. For examples, OSF now has a nearly 500 document technical support library, plus many automated means for installing and testing the OSF stack.

Yet, as all of us know, written documentation is not always discovered nor read. The paradigm for technology transfer is shifting to online tutorials and screencasts.

In keeping with that trend, SD has committed itself to develop a (hopefully) complete suite of online screencasts and tutorials geared to the nuts-and-bolts of how to install, configure, test, manage and use an OSF installation. Our intent is to aid users to bring semantics into the enterprise without the need for external support or cost.

We call this curriculum of tech transfer screencasts and video tutorials the OSF Academy.

Over the past week we have been releasing the first dozen screencasts in this series. With this foundation, it is now time to make a broader announcement of the OSF Academy.

So, On With it Now

Welcome!

We are on pace to release many dozens of specific screencasts on all use and management aspects of OSF. Please stay tuned over the coming weeks.

You can always see the complete contents of the YouTube channel at the Open Semantic Framework Academy.

Also, as basic grounding, also know that the OSF Wiki section on screencasts is another central access point to this content.

The Series Begins

Most of the screencasts are quite specific in certain use aspects of the Open Semantic Framework. However, tutorial #1 is a useful overview of OSF and the series:

The Next Ten Screencasts

SD’s CTO, Fred Giasson, is the key demo jockey for most of the OSF Academy screencasts. Many of these screencasts are technical, and all are specific and focused. Access each screencast by number below. There is also a (blog post) associated with each screencast that provides useful background information and links.

Thumbnail

Where Next?

We have nearly four dozen additional screencasts in our plan to round out introductory material to OSF. Please monitor our OSF channel on YouTube to stay on top of these releases.

Posted by AI3's author, Mike Bergman Posted on April 24, 2014 at 1:16 am in Open Semantic Framework, Structured Dynamics, Videos | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1725/osf-academy-inaugurates-with-eleven-screencasts/
The URI to trackback this post is: http://www.mkbergman.com/1725/osf-academy-inaugurates-with-eleven-screencasts/trackback/
Posted:January 27, 2014


Open Government Data
It’s Time to Move Beyond Static Dataset Dumps

It would be an understatement to say that open data has been transforming how government does business. Over the past five years — ranging from national governments such as the United States and the United Kingdom to hundreds of local governments and municipalities and all forms of government in between — a veritable revolution in opening up data to the public has been underway. The open data in government (OGD) movement has spawned an entirely new cottage industry in open data advocacy and tools. Literally hundreds of government organizations are committed to open data, supported by an ecosystem of advocacy, technology and consulting groups.

Open data, of course, is not limited to governments. Open data in science and from the Web and for-profit entities are legitimate focal points in their own right. But, because data generated by governments are both sanctioned and developed using taxpayer monies, open data in government (OGD) occupies a special place in the conversation. Now, with experience and practice, we are beginning to see a generational shift in how open data is being handled by governments. The first generation, still mostly the current practice, was built around the idea of just making the data public and open. This current generation of open data is characterized by the publishing of datasets via catalogs. The datasets are static, unconnected and dumb. Mostly, too, the data within those datasets are poorly described and documented, often lacking standard metadata. What is now exciting, however, is the emergence of what can best be called dynamic open data. What this is and how it offers advantages is the focus of this article.

The 8 Initial Principals of Open Government Data

In October 2007, 30 open government advocates met in Sebastopol, California to discuss how government could open up electronically-stored government data for public use. Up until that point, the federal and state governments had made some data available to the public, usually inconsistently and incompletely, which had whetted the advocates’ appetites for more and better data. The conference, led by Carl Malamud and Tim O’Reilly and funded by a grant from the Sunlight Foundation, resulted in eight principles that, if implemented, would empower the public’s use of government-held data. These principles, no longer online, were summarized by Joshua Tauberer in his Open Government Data book as:

  1. Data Must be Complete
    All public data are made available. Data are electronically stored information or recordings, including but not limited to documents, databases, transcripts, and audio/visual recordings. Public data are data that are not subject to valid privacy, security or privilege limitations, as governed by other statutes.
  2. Data Must be Primary
    Data are published as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
  3. Data Must be Timely
    Data are made available as quickly as necessary to preserve the value of the data.
  4. Data Must be Accessible
    Data are available to the widest range of users for the widest range of purposes.
  5. Data Must be Machine Processable
    Data are reasonably structured to allow automated processing of it.
  6. Access Must be Non-Discriminatory
    Data are available to anyone, with no requirement of registration.
  7. Data Formats Must be Non-Proprietary
    Data are available in a format over which no entity has exclusive control.
  8. Data Must be License-free
    Data are not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed as governed by other statutes.

These basic principles were then updated and re-phrased by the Sunlight Foundation in August 2010 to now number 10 principles, including the use of open standards, making data permanent, and keeping usage costs to an absolute minimum. All of these are laudable points. Each may or may not be provided in a fully open way by any given governmental entity.

This first step in the open data process has led to systems that are oriented to posting and publishing downloadable datasets. Existing open government data platforms, for example, such as Socrata or DKAN, can best be described as catalog systems. Listings of datasets with associated descriptions and metadata are presented. Users or the public may then chose among one or more machine-readable formats to download the entire dataset.

The 5 Added Principles of Dynamic Open Data

Of course, simply throwing data over the fence does not make it useful. Once we can get past the first threshold of making data publicly accessible, we next face the challenge of making that data meaningful and relevant. Since relevance is in the eye of the user, we no longer can think about information solely in terms of static, dumb datasets. We now need to expose the underlying data dynamically, such that users may request and filter and correlate what they need and only what they need.

Thus, there are five principles — or dimensions — by which we need to judge next-generation dynamic open data:

  1. Data Should be Filterable
    Data should be selectable by type (class), attribute or value such that only the data of interest is exposed to the user. This means the data should be structured in some way with facets that can be used dynamically to filter and make those selections.
  2. Data Should be Atomic
    Data should be exposed as individual entities or concepts with their attributes and values. The unit of manipulation thus becomes the datum, rather than the dataset.
  3. Data Should be Connected
    Because we are now collecting by datum and not dataset, connections between relevant things must be made explicit across relevant datasets. Similar things should be retrievable together. To achieve this aim, some schema or data definition framework must be layered over the data and datasets.
  4. Data Should be Expandable
    Since new data and new instances and new datasets will constantly arise, the design of the overall data management system must itself be “open”, enabling expansion of the available datastore at acceptable cost and effort.
  5. Data Should be Documented
    In order for these dynamic selections to be achievable, the data in the system must be fully documented, specifically including the full description and units used for attributes and values and the scope of entities and concepts. Only through such complete documentation can accurate connections and relevant selections per above be made.

There is no set order to the principles above. They are presented in the order shown so as to help remember them through the FACED mneumonic.

Parallels with Linked Data

Though the principles above do not call out linked data as a requirement, they do share many parallels with the early growth and maturation of linked data. A number of years back Fred Giasson and I commented on When Linked Data Rules Fail. Two of the points made in that article are the absence of suitable data descriptions and lack or wrong connections in data.gov and the NY Times datasets. I subsequently expanded on these types of problems in Practical P-P-P-Problems with Linked Data.

Official data from governments can avoid many of the provenance issues associated with general linked data, but in other areas there are important parallels. Like any emerging new practice, it takes a while to learn and formalize best practices. It is not surprising that we are seeing open data in government needing to transition from dumb datasets to actionable information. Making data actionable is when government information assets will finally become effective for the broader public.

Also, like linked data, it is likely the platforms built around semantic technologies and knowledge graphs (schema) will also come to the fore. Our own Open Semantic Framework is one such example, but there are a few now emerging in the linked data and semantic technology space. It will be through different practices and these newer platforms that we will see the next generation of open government data truly emerge.

Posted by AI3's author, Mike Bergman Posted on January 27, 2014 at 2:55 pm in Open Semantic Framework, Structured Dynamics | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/1707/welcoming-a-new-era-of-dynamic-open-data/
The URI to trackback this post is: http://www.mkbergman.com/1707/welcoming-a-new-era-of-dynamic-open-data/trackback/
Posted:January 20, 2014

Open Semantic Framework New OSF Platform Leapfrogs Earlier Releases in Features and Capabilities

After nearly five years of concentrated development — including the past 20 months of quiet, background efforts — Structured Dynamics is proud to announce version 3.0 of its open-source Open Semantic Framework. OSF is a turnkey platform targeted to enterprises to bring interoperability to their information assets, achieved via a layered architecture of semantic technologies. OSF can integrate information from documents to Web pages and standard databases. Its broad functions range from information ingest and tagging to search and data management to publishing.

Until today, the version available for download was OSF version 1.x. While capable as an enterprise platform — indeed, it has been in use by a number of leading global enterprises since development first began — the capability of the platform was spotty and required consulting expertise to configure and set-up. SD was hired by Healthdirect Australia (HDA) nearly two years ago to enhance OSF’s capabilities and integrate it more closely with the Drupal open-source content management system, among other modern enterprise requirements. The OSF from those developments — the non-public version 2.0 specific to HDA — has now been generalized for broader public use with today’s public announcement of version 3.0.

A More Complete Enterprise Platform

HDA's healthinsite Portal

Not unlike many large organizations, HDA had specific enterprise requirements when it began its recent initiative. Included in these were stringent security, broad use of proven open-source applications, governance and workflow procedures, and strict content authoring and management guidelines. These requirements further needed to express themselves via a sequence of deployment and testing environments, all conducted by a multi-vendor support group following agile development practices.

These requirements placed a premium on performance, scalability and interoperability, all subject to repeatable release procedures and scripts. OSF’s initial development as a more-or-less standalone platform needed to accommodate an enterprise-wide management model involving many players, environments and applications. Prior decisions based on OSF alone now needed to consider and bridge modern enterprise development and deployment practices.

Tighter integration with Drupal was one of these requirements (see next section), but other OSF changes necessary to accommodate this environment included:

  • A new security layer — the initial OSF security model was based on IP authentication. Given the sensitivity of the health data managed by HDA, such a simplistic approach was unacceptable. The actual HDA deployment relied on a third-party security application. However, what was learned from that resulted in a key-based access and validation model in the OSF v 3.0 update
  • A new revisioning system — content authoring and governance required multiple checks in the workflow, and requirements to review prior edits and invoke possible rollbacks. The result was to add a completely new revisioning capability to OSF
  • Middleware integration and APIs — in a multi-vendor environment, OSF operates in part as a central repository for all system information, which third parties must more readily and easily be able to access. Thus, besides the security aspects, a much improved programmatic API and a generalized search API were added to the OSF platform
  • New, additional Web services — the requirements above meant that seven new OSF Web services were added to the system, bringing the total number of current Web services to 27
  • New caching layer — because of its Web-service design, information access and mediation occurs via a large number of endpoint queries, many of which are patterned and repeated. To improve overall performance, a new caching layer was added to OSF that significantly improved performance and reduced access burdens on the OSF engines
  • Workflow integration — improved workflow sequences and screens were required to capture workflow and goverance demands, and
  • Multilingual support — like most larger organizations, HDA has a diversity of native languages throughout its user base. Though OSF had initially been explicitly designed to support multilinguality, specific procedures and capabilities were put in place to more easily support multiple languages in OSF.

Tighter Integration with Drupal

When Fred Giasson and I first designed and architected the Open Semantic Framework in 2009, we made the conscious decision to loosely couple OSF with the initial user interface and content management system, Drupal. We did so thinking that perhaps other CMS frameworks would be cloned onto OSF over time.

Time has not proven this assumption correct. Client experience and HDA’s interests suggested the wisdom of a tighter coupling to Drupal. This shift arose because of the great flexibility of Drupal with its tens of thousands of add-on modules and its ecosystem of capable developers and designers. Our early decision to keep Drupal at arm’s length was making it more difficult to manage an OSF instance. Existing Drupal developers were not able to employ their Drupal expertise to manage OSF portals.

We pivoted on this error by tightening the coupling to Drupal, which involved a number of discrete activities:

  • Upgrade to Drupal 7 — earlier versions of OSF used Drupal 6. We migrated the code base to Drupal 7. That, plus the other Drupal changes noted below, resulted in re-writing about 80% of the OSF code base related to Drupal
  • Alternative Drupal data storage — Drupal’s own evolution in version 7 (and continuing with version 8) is to abstract its underlying information model around entities and fields, abstractions that are much better aligned with OSF’s RDF data model. As these entity and field changes were exposed in Drupal APIs, it was possible for us to write an entirely new information model underlying Drupal. Drupal administrators using OSF are now able to use OSF solely as the data model underneath Drupal (rather than the more standard MySQL)  or any mixed portions in between. The typical OSF for Drupal design now uses OSF for all content storage, with MySQL reserved for internal Drupal settings (à la MVC)
  • Drupal connectors — certain common or core Drupal modules, such as Fields, Entities, Search, and Views, are either common utilities for Drupal developers or are themselves core bases for third-party modules. Because of their centrality, SD developed a series of “connectors” that enable these modules to be used as is while transparently communicating with and writing to OSF. Thus, Drupal developers can use these familiar capabilities without needing particular OSF knowledge
  • Major updates to Drupal modules — because of the changes above, the existing OSF Drupal modules (called conStruct in the earlier versions) were updated to take advantage of the common terminology and tighter integration
  • Major updates to Drupal widgets — similarly, the standard OSF data and visualization widgets used with Drupal (called Semantic Components in the earlier versions) were also updated to work in this more tightly integrated environment.

Expanded Search Capabilities and Web Services

Some of the extended capabilities in OSF v 3.0 are noted above, including the expanded roster of Web services. However, the OSF Search Web service, which is by far the most used OSF endpoint, received massive improvements in this latest release.

First, OSF Search now uses a new query parser, which provides the capability to change the ranking of search results by boosting how specific query components get scored. Types, attributes, datasets or counts may be used to vary any given search result, including different occurrences on the same page. It is also now possible to add restrictions to the search queries, including restricting results to a specified set of attributes.

This flexibility is highly useful wherein certain structured pages contain blocks or sections with patterned search results. This structuring leads to the ability to create generic page templates, wherein search queries and results vary within the layout. An “events” block may score differently than, say, a “related topics” block, all of which in turn can respond to a given context (say, “cancer” versus “automobiles”) for a given page (and its template).

These repeated patterns lend themselves to the use of reusable “search profiles,” which are predefined queries that may include context variables. These profiles, in turn, can be named and placed on page layouts. Existing profiles may be recalled or invoked to become patterns for still further profiles. The flexibility of these search profiles is immense, and the parameters used in constructing them can be quite extensive.

Thus, OSF version 3.0 includes the new Query Builder module. Via an intuitive selection interface, users may construct search queries of any complexity, and then save and reuse them later as search profiles.

Lastly, registering, configuring and managing OSF instances and datasets into Drupal has never been easier. The new OSF Configure module centralizes all the features and options required for these purposes, which are then managed by a new suite of tools (see next).

Automated Installation and Management Tools

Standard enterprise deployments that proceed from development to production require constant updates and versions, both in application code and content. Keeping track and managing these changes — let alone deploying them quickly and without error — requires separate management capabilities in their own right. The new OSF thus has a number of utilities and command-line tools to aid these requirements:

  • OSF Installer — this tool installs and configures all the pieces required by the OSF stack, then runs the OSF Tests Suites to make sure that all functionality is fully operational on the new server
  • OSF Tests Suites — composed of 746 tests and 4139 assertions, these tests may be run every time an OSF instance is deployed or code is changed. The tests measure all of the input parameters of each endpoint, combinations thereof, mime types, and expected errors returned by each endpoint
  • OSF Ontologies Management Tool — (OMT) is used to manage ontologies, list ontologies, create/import new ones, delete existing ones, or to generate underlying ontological structures
  • OSF Datasets Management Tool — (DMT) is used to manage datasets of a OSF instance, enabling the user to create, delete, update, import and export datasets directly from the command line
  • OSF Permissions Management Tool — (PMT) is used to manage, list, create or delete access permissions groups and users
  • OSF Data Validator Tool — (DVT) is used to perform a series of post-indexation data validation tests and return validation errors if any are found.

Tempered via Enterprise Development and Deployments

The methods and processes by which these advances have been made all occurred within the context of state-of-the-art enterprise IT management. Experience with supporting infrastructure tools (such as Jira, Confluence, Puppet, etc.) and agile development methods are part of the ongoing documentation of OSF (see next). This experience also bolsters Structured Dynamics’ ability to work with other third-party applications at the middleware layer or in support of enterprise deployments.

Comprehensive and Completed Updated Documentation

The Open Semantic Framework has evolved considerably since its conception now five years ago. In its early development, components and pieces were sometimes developed in isolation and then brought into the framework. This jagged development path led to a cacophony of names and terms to characterize portions of the OSF stack. This terminology confusion has made it more difficult than it needed to be to understand the vision of OSF, the layers of its architecture, or the interactions between its components and parts.

In making the substantial efforts to update documentation from OSF version 1.x to the current version 3.0, terminology was made consistent and code references were cleaned up to reflect the simpler OSF branding. This clean up has led to necessary updates across multiple Web sites maintained by Structured Dynamics with some relationship to OSF.

The Web site with the most changes required has been the OSF Wiki. In its prior incarnation, called TechWiki, there were nearly 400 technical articles on OSF. That site has now been completely rewritten and re-organized. Nearly two hundred new articles have been written in support of OSF v 3.0. Terminology related to the older cacophony (see correspondance table here) has (hopefully) been updated and corrected. Most architectural and technical diagrams have been updated. Additional documentation is being posted daily, catching up with the experience of the past twenty months.

Moving Beyond the Established Foundation

Open Semantic Framework

SD is pleased that enterprise sponsors want to continue beyond the Open Semantic Framework’s present solid foundations. While we are not at liberty to discuss specific client initiatives, a number of ongoing developments can be described broadly. First, in terms of the key engines that provide the core of OSF’s data management capabilities, initiatives are underway in the areas of visualization, business analytics and workflow orchestration and management. There are also efforts underway in more automated means for direct ingest of quality Web-based information, both based on linked data and from Web APIs. We are also pleased that efforts to further extend OSF’s tight integration with Drupal are also of interest, even while the integration efforts of the past months have not yet been fully exploited.

To Learn More

To learn more, make sure and check out the re-organized OSF wiki. See specifically the complete OSF overview, the list of all the OSF 3.0 features, and the list of all the new features to OSF 3.0. Also, for a complete soup-to-nuts view of what it takes to put up a new OSF installation, see the Users Guide. Lastly, for a broad overview of OSF, see its reference architecture and the overviews on its dedicated OSF Web site.

As a final note, Structured Dynamics would like to thank its corporate sponsors of the past five years for providing the development funds for OSF, and for agreeing with the open source purposes of the Open Semantic Framework.

Posted:December 17, 2013

Peg ProjectShows Usefulness of Self-service Semantic Publishing with OSF

The Peg project has just been moved from beta to public status by its two sponsors, United Way of Winnipeg and the Institute for Sustainable Development (IISD). Peg is an innovative Web portal for community indicators of well-being for the city of Winnipeg, Manitoba. Peg helps identify and, on an ongoing basis, track indicators that relate to the economic, environmental, cultural and social well-being of the people of Winnipeg. Here is the main screen:

Main Peg Screen

The Peg website (www.mypeg.ca) is a fully integrated, interactive and dynamic information portal. Peg is a robust knowledge system that includes a graphic user interface to display a range of community well-being and sustainability themes and the indicators used to track progress over time. Structured Dynamics was the lead technical contractor on the project, basing the site on our Open Semantic Framework (OSF).

We originally completed and delivered the site to the sponsors in beta with a single indicator cluster around the concept of poverty. Based on our training and the tools packaged with OSF, the sponsors were then able to gather and load further data in broader indicator areas including Basic Needs, the Built Environment, the Economy, Education & Learning, Governance, Health, Natural Environment and Social Vitality. The Web site design contractor, Tactica Interactive, was also able to extend the baselline visualization tools provided by OSF using the existing APIs and documentation. Here, for example, is a chloropleth map created by Tactica for the site:

Peg Example Heat Map

Datasets may be selected and compared with a variety of charting and mapping and visualization tools at the level of the entire city, neighborhoods or communities. All told, there are 54 datasets now within Peg representing more than 4,000 different entities. The sponsors collected, organized and inputted these data themselves according to specifications and tools provided with OSF.

Peg is a great example of how the basic OSF can be extended and maintained by site users. After initial delivery and training, Structured Dynamics played no role in the completion and publication of the site.

The Peg model shows how the combination of open source software, documentation and training enables any organization to deploy and manage their own semantic publishing system. Congratulations to all associated with Peg for this newest release!

Posted by AI3's author, Mike Bergman Posted on December 17, 2013 at 6:35 pm in Open Semantic Framework, Structured Dynamics | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1697/peg-goes-live-with-broad-slate-of-community-well-being-indicators/
The URI to trackback this post is: http://www.mkbergman.com/1697/peg-goes-live-with-broad-slate-of-community-well-being-indicators/trackback/