Posted:June 30, 2014

Open Semantic Framework Structured Dynamics Moves to Integrate Key Initiatives

Structured Dynamics is pleased to announce its new UMBEL Web site and set of Web services.

Our first release of the UMBEL site occurred in 2007 while UMBEL was still under development. That site used its own homegrown HTML. The release was followed in 2008 by the addition of our own Web services. The Web services were well-received, which caused Structured Dynamics to develop the more general structWSF Web services framework (most recently updated as the OSF Web Services). We subsequently migrated the earlier UMBEL Web services to this more general framework, and also migrated to Drupal as the standard content management and Web site component for OSF.

For most reasons, including all client work to date, our OSF framework (Web services + Drupal 7) has been performant and met client site needs. However, the operation of the UMBEL Web services was often problematic after moving to the Drupal (full OSF) version. Unfortunately, we have seen both performance and stability problems, though calculations over a full 28,000 node graph are a challenge in any environment.

Since the UMBEL structure was an order of magnitude larger than our client work to date, we have frankly adopted a posture of occasional monitoring and reboots to keep the UMBEL Web site up. This posture was not limiting use of UMBEL for general browsing purposes, but was limiting its usefulness as a working API.

Because the cobbler’s son is often the last to get shoes, we have let the UMBEL Web site chill to a degree in the background. But, now, with other imperatives underway and some dedicated time to look directly at performance of larger-scale ontologies, we have looked at these items anew. The report card on our current evaluations is contained in a newly released UMBEL Web site with services, which I summarize and provide context for below. What emerges is an interesting story of discovery and growth.

Basis of the New Site

The new UMBEL site and its underlying 28,000 concept graph is consistent with the OSF layered architecture. However, the Web services are now written in Clojure and the Web site framework uses Bootstrap and plain ol’ HTML. These structured and foundational changes have been championed by Fred Giasson, SD’s chief technology officer, who is also putting forth a blog series on Clojure in particular. He also has a current post from a technical basis on these UMBEL site and service changes.

In essence, we have learned two important things about our prior practice with respect to making UMBEL Web services broadly available. First, for UMBEL, we do not need or want our standard configuration of having a Drupal front-end as the interface into OSF. Access to a knowledge graph does not need — and is ill-served — by having a complicated interface stand atop a large-scale concept model. APIs and Web services are the most important interaction points with the UMBEL knowledge graph, not a user-oriented Web site.

Second, in the various phases of our work, we had come to embrace the idea of ontology-driven applications (what we have termed ODapps). The compelling vision behind such structures is to place the emphasis on knowledge structures and data, rather than more software. Once one begins to unpack that vision, it can also become clear that software programming languages themselves that look toward “code as data” might be one way to be consistent with that vision.

Seeking a Sense of Harmony

For years I have been writing about data integration and interoperability and our company has been devoted to the topic. I have written extensively about the importance of RDF and description logics to how we organize and represent data. We were also some of the first to supplement RDF with a faceted text-search engine (Solr) to provide the most responsive query environment across structured to unstructured data. We have also adopted ontologies and the OWL 2 (plus SKOS) languages as standards to both foster and enable interoperability. We have explored native data structs to understand how wild forms of information can be efficiently pipelined into interoperable RDF and text forms.

All of this points to the ideal of the democratization of the information function in the enterprise. In other words, to the idea that how data structures get organized and represented (the ontology side of things) is something that knowledge workers can do themselves, rather than accepting the bottleneck of IT and programmers.

This is well and good except there is a critical “last mile” between data representation and data usefulness. This “last mile” deals with how actual data gets manipulated and then organized and presented (visualized). Query responses, reports, analysis and maps continue to be the choke points between knowledge workers and their IT support. And one need not frame this entirely from an enterprise perspective: these same challenges exist for the individual researcher or the small organization.

So, while one can focus on data and its organization and representation, until we address this “last mile” problem, we still are not likely addressing the largest source of frustration and lost opportunities in the knowledge function.

The reason that simple data struct forms and tools like spreadsheets continue to be popular is that they are empirically the best tools for the “last mile”. Web forms and services are increasingly showing their strengths in this realm.

Once one steps back and looks at the entire cycle from basic datum to actionable knowledge, it is clear that the question of data model is but one portion of the challenge. The remaining challenge is how (now) accessible information can be placed into context and acted upon. Further, if one premise is the democratization of the information function, then the challenge should also be how to provide productive capabilities for the last mile to the knowledge worker. Productivity is enhanced when there are the fewest channels and distortions between signal (problem) and channel (user chains).

Fred, in his investigation of functional languages, clearly saw that bringing the languages of code (programming) into the language of data (knowledge workers as expressed in our RDF world view) was one means to reduce the number and lossiness of the channels between problem (signal) and solution. A world view premised on the efficient representation and interoperability of data must logically support the idea of a coding (instructional or language) base aligned as well to problems. Moreover, since software guides the actual computer operations, a form of the software that supports the nature of the data should also provide a more performant framework for moving forward. In technical terms, this is known as homoiconicity.

Whether one looks to the intellectual foundations of Charles S Peirce or Claude Shannon (both of whom we do), one can see that the idea of signs and information theory means finding both data representations and code that minimize communication losses and promote the accurate transfer of the message. Lossless data transmission is one contributor to that vision, but so too is a functional representation for how the information is to be processed and transformed that aligns most closely with the information at hand.

Ergo, a better model for data is not enough. A better model of how to manipulate that data (that is, software) is also needed that aligns with the idea of coherence and structure in the underlying information. For our purposes, we have chosen Clojure as the functional language basis for these new UMBEL Web services. Not only is it performant, but it aligns well with the creation of domain-specific languages (DSLs) that also promise to democratize the computing function for the knowledge worker.

Bringing the Pieces Together

Fred and I founded Structured Dynamics a bit more than five years ago. But, we had worked together much earlier on UMBEL and Zitgist. For nearly ten years now, we have episodically emphasized a few different initiatives and passions.

One of those passions has been the structure of data and information. It is this perspective that brought us to RDF and data structs (and our irON efforts) at various times. The idea of structure is a basis for our company name, and represents the belief that structure can be brought to unstructured forms (via tagging, for example). Structure is perhaps the most common notion or concept in my own writings for a decade.

Another need has been the idea of making semantic technologies operational. We have been keen researchers of the tools space and algorithms and such since the beginning. We observed early on that many innovative and open source semantic programs existed, but most were the result of EU grants or academic efforts elsewhere. Thousands of tools existed, but very few had either been evaluated or stress-tested. By bringing together the best of class tools and integrating them, we could begin to provide a useful semantic platform for enterprises. This motivation was the genesis for the Open Semantic Framework, and has been the major source of our client support since SD was founded. We have finally created an enterprise-capable platform and have done much to transfer its technology. But, these concepts are difficult, and much remains to be done before semantic technologies are a standard option for enterprises.

Still, in another vein, our first love and interest has been knowledge bases. We first identified the need for UMBEL years ago when we perceived an organizing vocabulary would become an essential glue on the Web. We pursued and studied Wikipedia and how it is informing knowledge bases. Instance data and how it is represented is a passion for how these knowledge bases (KBs) get leveraged going forward.

As a smaller consulting and development boutique, we have needed to be opportunistic about when and where we devoted efforts to these pieces. So, over the months or years, we have at various times devoted ourselves to data models and ontologies (structure), the Open Semantic Framework (platform), or UMBEL or Wikipedia (KBs, knowledge bases). Depending on funding and priorities, any one of these threads did receive episodic attention and focus. But, truth is, each one of these pieces has been developed in (project-level) isolation to the whole. Such piecemeal development was essential until each component achieved an appropriate degree of maturity.

I could say we could foresee some years back that all of these pieces would eventually reinforce and bolster one another. Though there is a small bit of truth in that statement, the way things have actually unfolded is to show, as experience and sophistication have been gained, that there is a synergy that comes in the interplay of these various pieces. The goodness is that Structured Dynamics’ efforts (and of its predecessors) were building inexorably to the possible cross-fertilization of these efforts.

Once this kind of realization takes place — that data, code and semantics move hand-in-hand — it then becomes logical to look at the entire knowledge ecosystem. For example, it is not surprising that artificial intelligence, now in the informed guise of KB-backed systems, has again come to the fore. It is also not surprising that what software and programming languages we bring to bear also directly interact with these concerns. Just as Hadoop and non-relational database systems have become prominent, we should also investigate what kind of programming languages and constructs may best fit into this brave new information world.

What we have seen from that investigation is that functional languages (with their DSL offspring) somehow fit into the overall equation moving forward. SD has moved from a single-focus endeavor to one explicitly looking at integration and interoperability issues. What we had earlier seen as (largely) independent pieces we now see as fitting into a broader equation of related emphases:

Structure + Platform + KBs + Functional Language = Knowledge Worker-based Interoperability

We are seeing artificial intelligence moving in these directions. As a subset of AI, I suspect we will also see  the semantic Web moving in the same direction.

We clearly now have the theory, the data, the understanding of semantics, and languages and data representations that can make these democratic interoperabilities become real. This new UMBEL Web site is the first expression of how these pieces can begin to work together into a compelling, accessible whole.

We welcome you to visit and to take advantage of UMBEL’s fully accessible APIs.

Posted:April 24, 2014
Open Semantic Framework

Another Expansion in Documentation for the Open Semantic Framework

The Open Semantic Framework is a complete foundation to bring semantic technology capabilities to the enterprise. OSF has applications from enterprise information integration to collaboration networks and open government. It has been under development since 2009, leveraging a set of robust open source engines and connecting Web services and architecture, and is now in its third major version. OSF is fully integrated as a semantic technology extension to the Drupal content management system.

Structured Dynamics, the developer of OSF, with the generous support of SD clients, has been committed to provide excellent documentation and tech transfer support to OSF since its inception. For examples, OSF now has a nearly 500 document technical support library, plus many automated means for installing and testing the OSF stack.

Yet, as all of us know, written documentation is not always discovered nor read. The paradigm for technology transfer is shifting to online tutorials and screencasts.

In keeping with that trend, SD has committed itself to develop a (hopefully) complete suite of online screencasts and tutorials geared to the nuts-and-bolts of how to install, configure, test, manage and use an OSF installation. Our intent is to aid users to bring semantics into the enterprise without the need for external support or cost.

We call this curriculum of tech transfer screencasts and video tutorials the OSF Academy.

Over the past week we have been releasing the first dozen screencasts in this series. With this foundation, it is now time to make a broader announcement of the OSF Academy.

So, On With it Now

Welcome!

We are on pace to release many dozens of specific screencasts on all use and management aspects of OSF. Please stay tuned over the coming weeks.

You can always see the complete contents of the YouTube channel at the Open Semantic Framework Academy.

Also, as basic grounding, also know that the OSF Wiki section on screencasts is another central access point to this content.

The Series Begins

Most of the screencasts are quite specific in certain use aspects of the Open Semantic Framework. However, tutorial #1 is a useful overview of OSF and the series:

The Next Ten Screencasts

SD’s CTO, Fred Giasson, is the key demo jockey for most of the OSF Academy screencasts. Many of these screencasts are technical, and all are specific and focused. Access each screencast by number below. There is also a (blog post) associated with each screencast that provides useful background information and links.

Thumbnail

Where Next?

We have nearly four dozen additional screencasts in our plan to round out introductory material to OSF. Please monitor our OSF channel on YouTube to stay on top of these releases.

Posted by AI3's author, Mike Bergman Posted on April 24, 2014 at 1:16 am in Open Semantic Framework, Structured Dynamics, Videos | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1725/osf-academy-inaugurates-with-eleven-screencasts/
The URI to trackback this post is: http://www.mkbergman.com/1725/osf-academy-inaugurates-with-eleven-screencasts/trackback/
Posted:February 4, 2014

Civic Dynamics Logo Some Thoughts on SD’s Gestation of Civic Dynamics

Structured Dynamics (SD) announced yesterday that, in association with its partner Buzzr, it was spinning off a new software company, Civic Dynamics Inc., headquartered in Québec City, Canada. Included in the launch was the introduction of the new company’s Civic Dynamics Platform.  CDP is open-source software and supporting systems to assist municipalities to publish dynamic open government data, and to provide citizens a set of tools for viewing, searching, filtering and analyzing that data.

The announcements of those releases stand on their own. My purpose is not to duplicate them. Rather, now that the efforts needed for the new launch are behind us, I wanted to reflect on why and how such a spin off occurred in the first place. I think these reflections offer some insight into imperatives that face new software ventures, especially those geared to enterprise IT.

A Bit of History

It was just about five years ago that Fred Giasson and I began Structured Dynamics. (This was also after a year working together at Zitigist under the sponsorship of OpenLink Software.) Our mission at SD’s inception was to create a workable platform for bringing semantic technology capabilities to enterprises. Our specific interest was in using semantic technologies and RDF to solve the decades-old challenge of information interoperability in larger organizations. By serendipity, we were able to secure an enterprise client on virtually the first day we started SD. That forced us to grapple immediately with the then current woeful state of semantic technologies for enterprises.

We observed a number of problems at that time. Here is a short list of some of those problems from five years ago, and brief statements of what we initiated to address them:

  • Search — native triple stores at that time were not performant in search, and none captured the full text of documents. Further, semantic search offers unique opportunities in structure and inference. As a result, we were one of the first to adopt Solr for semantic technologies
  • Portal framework — there was a (general) absence of portal front ends that met acceptance in the marketplace. We evaluated and chose Drupal; over time a design choice to have loose coupling with Drupal has transitioned to become more integrated
  • CRUD — basic database management capabilities, such as create, read, update or delete, were not often exposed at the application level. Our choice here was to decouple this access and adopt a distributed design by embracing RESTful web services, endpoints and APIs, all of which were geared to provide a universal abstraction for dealing with all data engines (as colllectively expressed as a “repository”)
  • Architecture — though complete frameworks had been put forward, mostly by academic researchers, most had short lives and all lacked basic enterprise capabilities. We designed an architecture that favored integration and expansion — largely though APIs — while leveraging existing components. We also at this time made a commitment to open-source for all key components of the architecture
  • Stack — there were no complete software (deployment) stacks. Creating one required fragmented piece parts with gaps; and there certainly were not standard deployment or installation abilities. Much of SD’s effort over the past five years has been addressing this gap
  • Access control (security) — virtually all enterprises need to control access to privileged information, and no security existed five years ago for semantic applications. In the early versions of the Open Semantic Framework, the foundation to Civic Dynamics’ CDP platform, we used a simple IP authorization approach based on the interaction of tool (endpoint), dataset and role. Subsequently we have established middleware integrations with third-party security and key-based permission mechanisms when OSF is used standalone
  • Version control — any enterprise content system or repository must also have ways to track revisions and enforce version control. Early semantic technologies completely lacked these considerations. OSF has made progress in integrating with the Drupal revisioning system and in establishing middleware methods for interfacing with third-party version control systems
  • Workflow support — managing enterprise content in general, and more specifically managing the semantic aspects of integration, requires formal workflow and governance procedures. However, historically, and up to and including today, there is zero workflow support in semantic technologies. In fact, there is virtually no discussion of this topic at all. We are only at the beginning stages of incorporating formal workflow methods into OSF. We have development methodologies and best practices, though, and have identified suitable workflow engines to extend the system with formal workflow methods
  • Data ingest — five years ago there was little recognition that data in the wild would not be compliant with W3C standards nor RDF, and as a result demo systems lacked ingest capabilties for legacy information, particularly enterprise database info.  OpenLink Software and its Virtuoso system (one of the core engines in OSF) did, however, recognize this need. The OSF design has very much followed this approach of using “converters” or “RDFizers” for getting all wild data forms into a canonical RDF basis internal to the system
  • Reference vocabularies — the ultimate means of integrating enterprise information to achieve interoperability is premised on semantic approaches and technologies. Yet, apart from some minor vocabularies, there were no suitable vocabularies five years ago in many areas. We have constructed and supported an across-domain reference vocabulary (UMBEL) and a means of representing instance data and records (irON) since then to redress these gaps
  • Tooling — the means to design, manage, and test components of a semantic enterprise stack were nearly totally lacking, since most early semantic efforts were of proofs-of-concept and not production-grade systems. The areas of ontology design and maintenance were (and stilll somewhat are) weak. We have since developed many new tools, some geared at the user level, with administrator and developer tools including test suites, command-line utilities, and automated installers
  • Templates, widgets and visualization — the highly structured nature of RDF data lends itself well to templating records by type, page layouts by type, and widgets by type, which may be further leveraged using inheritance and inference. The recognition of the role of semantic technologies as publishing platforms did not exist five years ago. Our response has been to develop a template inheritance system and semantic data-driven widgets. Our earliest widgets were based on Flash; the libraries are now migrating to JavaScript (d3.js) and HTML5
  • Lack of documentation — lack of documentation is the bane of most open source projects, and early semantic technologies were most often developed for academic theses or as proofs-of-concept. As a result, documentation of use to practitioners and administrators was totally lacking. SD has made a concerted commitment to improved and complete documentation. Our OSF wiki with its nearly 500 technical and metholodogy articles and accompanying images is one expression of that commitment
  • Lack of enterprise rigor — across all of these fronts, early semantic technologies were clearly not designed and developed with enterprise objectives and use cases in mind. SD’s overall commitment has been to rectify this gap.

What we did have five years ago was a growing list of (often) unproven open standards (principally those from the W3C) and a large roster of prototype and research tools [1], most from the academic community. Still, there were some proven engines suitable to a semantic stack (most adopted as core to the Open Semantic Framework), so there were building blocks upon which a complete framework could be based. With the right design and architecture, and appropriate “glue” to tie it all together, it appeared quite feasible to create a working semantic stack suitable for enteprise use. Multi-component, open-source packages — ranging from Alfresco to Talend or Pentaho — were showing the path to such next-generation platforms.

With the development model of an integrated semantic technology stack based on open source components and consistent “glue” in mind, we could then turn our attention to the business model and strategy behind the nascent Open Semantic Framework.

The Business Philosophy

I don’t speak much about my prior ventures because, well, they are in the past. But I have financed ventures via angel funding, venture capital, grants and client revenues. I also have background in ventures ranging across many aspects of enterprise (mostly) and consumer (less so) software.

Our funding prejudice in starting SD was to be self-financed via clients. A customer focus keeps one from getting too abstract or falling in love with innovations for which there may not be a real market. Revenue financing also means that we need not alter business strategy or approach based on a financier’s perspective. Customers call the shots; not the money interests. This funding prejudice has kept us market focused and, as a consequence, profitable since day one.

Our staffing prejudice was to not hire, at least during the framework development phase. Setting the vision for a framework is not a democratic activity, and every hire means less development productivity. To fulfill, we have partnered and employed consultants and sub-contractors, but have not diluted our own efforts in managing employees.  We could stay focused by feeding only our own mouths and our vision.

Such narrow bandwidth also carries other implications. We could not take on too many clients at a given time. We needed to be extremely productive and leveraged, finding opportunities wherever we could to re-purpose prior writings or reusing or generalizing code. We also needed to be quite selective in what projects and what clients we chose. When attempting to make progress on a new platform, it is important to not become simply a contract fulfillment shop. Customers have many options for IT contracting or outsourcing; platform development and growth requires a certain self-selection by clients.

Our standard contract emphasizes that (most) efforts are intended to be open source, and our intellectual property clauses make that explicit. At first we did not know how the market might react to this insistence. For prospects serious enough to commit monies to us, however, we have found a good appreciation that open source leads to lower current project costs because the client is leveraging what has already been developed before. It seems only fair that new developments should also be made available to later customers, as well. Some of our prior clients are now seeing the lower costs and benefits by leveraging intermediate work in upgrading to latest versions and functionality.

Our fulfillment prejudice has been to complete work on time and under budget, document and train the customer in the work, and move on. Though we know they are profitable and a bread-and-butter for most enterprise vendors, we have not sought recurring annuities from our clients in maintenance fees. By keeping our eye squarely on successful tech transfer, we are disciplining ourselves to document as we go, provide tooling and support infrastructure as well as application software, and to find efficiencies in fulfillment. Meanwhile, we are able to progress rapidly on our overall development roadmap without getting bogged down in handholding. We would rather teach the customer how to fish, rather than doing the fishing for them.

Of course, not all enterprises understand or embrace these philosophies. That is fine under our development approach where market understanding and refinements are the drivers of decisions, not maximizing revenues for an increasingly growing staff count. We have been blessed to have new clients arise whenever they are needed, and to be real partners with us in furthering the vision. We have actively rejected some customer prospects because the philosophical fit was not good. We have also actively weaned ourselves from some engagements by insisting on sunsets for our support and encouraging more tech transfer and training.

These prejudices may change as we see the underlying Open Semantic Framework nearing fulfillment of its development vision. But, for an open source platform in a hurry (even considering it has been five years!), we believe these philosophies have served us and our clients well.

An Emphasis on the Open Semantic Framework

The net outcome for the Open Semantic Framework has been to emphasize a generic, enterprise-ready design that can be rapidly embraced and adopted by multiple markets. We have called OSF a platform of ontology-driven applications. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. ODapps fulfill specific generic tasks. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation, user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

The ODapp vision underlying the design of OSF means we can leverage an architecture of generic tools to respond to virtually any knowledge application or any enterprise domain. The basic idea is shown by this diagram, which we first published about three years ago:

The Open Semantic Framework can Spawn Many Different Domain Instances

(click for full size)

In the five years of development of OSF, now at version 3.x (recently announced), we have had the good fortune to have clients and uses in publishing, tech transfer of R&D, group collaboration, health, automotive, air traffic control, sustainability, community indicators and local government. Demand in the latter two areas has been particularly strong. The strength of that market interest was the source of the dilemma for Structured Dynamics.

Unique Demands of Municipal Markets

The idea of rapid and nimble development of a new platform — especially one expressly designed to be generic across multiple domains — does not readily square with focusing on a specific market segment. This disconnect is particularly true for quite unique markets, as is the case for local governments.

In a past life I spent nearly ten years working for a trade association that represents municipally-owned electric utilities. APPA has members ranging from huge municipalities such as Los Angeles, Toronto, Seattle and San Antonio, to the smallest towns and burgs of the plains of North America. In my former role running the R&D and technical programs for this association, I personally interacted with hundreds of these wide-ranging individual communities.

In the larger communities, the electric utilities were separate departments from the local government per se, and were directed by professional utility managers. But for mid-size and smaller communities, there was often close interaction with all municipal departments.

Though sales lead times are long for all enterprise markets, they are particularly long and (often political) in government. Budgets are perennially tight. Budgets need to be proposed, argued with councils and management, and approved before work can begin. Staff are stretched across multiple functions, so use and maintenance are key factors as are concerns about longer-term support contracts. Portals and Web sites must serve all constituencies and content and tone need to be suitable to taxpayer-supported venues. Yet, because of the number and diversity of communities [2], across the entire market there is surprising innovation and experimentation. Finding better ways to do more with less is a key motivator in the local government market.

Specifically, in our own use of OSF in this market, we also observed some other unique aspects related to open data and Web sites. What constitutes open data and whether and how to make it “open” varies widely by community. Capturing local needs and perspectives often leads to comparatively high costs in theming and customizing the Web sites. The lack of dedicated and trained staff to care and feed a new Web site is always a challenge.

Structured Dynamics, with its generic platform interests and avoidance of staffing, is clearly not the right vehicle to pursue this market. Specific focus on the unique aspects of the local government market is required, plus modifications and specializationis of the platform to address government needs. Possible integration or incorporation of standard local government Web site(s) may also be required. Though we were seeing keen interest from this market, in order to address it properly a different vehicle with different venture imperatives was necessary.

Doing Justice to the Local Government Market

Early on, our good colleague and friend, Steve Ardire, helped point out some gaps in our business development. We saw that three things were missing within Structured Dynamics itself to do the local government and open government data markets justice. First, we needed a dedicated company to focus solely on this market. Second, we needed an executive familiar with the OSF platform and municipal government to head the effort. And, third, we somehow needed a way to overcome the time and costs associated with tailoring the portal for local community needs.

It was actually the last of these things that showed the first solution. We were approached about eighteen months ago by Ed Sussman, the CEO of Buzzr, about possibilities of partnering for the local government market. Buzzr has a one-click solution to theming and customizing individual Web portals, buillt around the Drupal content management system (CMS). Buzzr, a NYC-based company, has impeccable Drupal chops, having been co-founded by one of the leading Drupal shops, Lullabot. Buzzr has proven the applicability of their approach to specific verticals, including retail and education. The fact that Buzzr found us and saw a good fit for the municipal market was a formative discussion. We welcomed Buzzr’s outreach because their approach squarely addresses one of the cost and effort sticking points we were observing.

When Ed first contacted us, the OSF platform was still not sufficiently mature to be a market foundation. We needed more time to refine the platform, as well as to gain more market insight from use and use cases. Fortunately, Ed and Buzzr kept their interest strong while we refined things in the background. By the time we were able to address the other missing items, Buzzr was there to partner with us on the new venture.

Our second requirement was met by hiring Kelly Goldstrand, formerly the project manager for the NOW (Neighbourhoods of Winnipeg) portal, to head up the venture’s business development. NOW is one of the flagship installations of OSF. Her career focus has been on service planning, delivery and evaluation in the area of community health, protection and development. Kelly has significant management experience in local government and clearly understood OSF; her guidance had been pivotal in much of the system’s functionality. Kelly also has a proven track record in mentoring projects through local approvals and training city staff in use and maintenance of new technologies. After early retirement Kelly was ready to consider our opportunity and then graciously agreed to join us.

The last piece of the puzzle was forming the new venture. We had been working with the Civic Dynamics name for some time, and had also played around a bit with logo and Web site. Once the other things fell into place, we incorporated Civic Dynamics, Inc. in Québec (where it is also known as Dynamique Civique), given the strong market interest shown in Canada to date, and began preparing for the formal launch of the venture. We also needed to await the completion of OSF v 3.0.

A Report Card on SD’s Multi-year Plan

It now appears likely that the five-year plan we set for ourselves at the founding of Structured Dynamics may actually take six to seven years to achieve. This time extension derives from the realities of our client work over this time frame. One reality is that client-specific needs have caused us to necessarily divert from our own internal development path. Not all development can contribute to fulfilling a generic platform. Every client has unique needs and circumstances that are not generalizable to others. A second reality is that only through real client engagements can market requirements be truly discerned. Customer-centric development is absolutely essential to keep software grounded.

Meanwhile, Back at Civic Dynamics

We are as curious as the next person to see whether a dedicated spin-off is the right way to handle a specific vertical market. It will also be interesting to see how coordination and support can best be provided between the dynamics duo (Structured and Civics).

Nonetheless, we are excited about finally getting postured to pursue the growing market for open, local government data. We’d like to thank Kelly and Ed and all of our original sponsors for helping to gestate the venture to this point. Now that it has been birthed, we hope to nurture it and get it on its own two feet as soon as possible. Before we know it, and assuming we’ve raised it properly, Civic Dynamics will be celebrating its own life events!


[1] See our Sweet Tools listing of about 1000 semantic technology tools
[2] There is a total of about 24,000 municipal governments across the United States and Canada.
Posted:January 27, 2014


Open Government Data
It’s Time to Move Beyond Static Dataset Dumps

It would be an understatement to say that open data has been transforming how government does business. Over the past five years — ranging from national governments such as the United States and the United Kingdom to hundreds of local governments and municipalities and all forms of government in between — a veritable revolution in opening up data to the public has been underway. The open data in government (OGD) movement has spawned an entirely new cottage industry in open data advocacy and tools. Literally hundreds of government organizations are committed to open data, supported by an ecosystem of advocacy, technology and consulting groups.

Open data, of course, is not limited to governments. Open data in science and from the Web and for-profit entities are legitimate focal points in their own right. But, because data generated by governments are both sanctioned and developed using taxpayer monies, open data in government (OGD) occupies a special place in the conversation. Now, with experience and practice, we are beginning to see a generational shift in how open data is being handled by governments. The first generation, still mostly the current practice, was built around the idea of just making the data public and open. This current generation of open data is characterized by the publishing of datasets via catalogs. The datasets are static, unconnected and dumb. Mostly, too, the data within those datasets are poorly described and documented, often lacking standard metadata. What is now exciting, however, is the emergence of what can best be called dynamic open data. What this is and how it offers advantages is the focus of this article.

The 8 Initial Principals of Open Government Data

In October 2007, 30 open government advocates met in Sebastopol, California to discuss how government could open up electronically-stored government data for public use. Up until that point, the federal and state governments had made some data available to the public, usually inconsistently and incompletely, which had whetted the advocates’ appetites for more and better data. The conference, led by Carl Malamud and Tim O’Reilly and funded by a grant from the Sunlight Foundation, resulted in eight principles that, if implemented, would empower the public’s use of government-held data. These principles, no longer online, were summarized by Joshua Tauberer in his Open Government Data book as:

  1. Data Must be Complete
    All public data are made available. Data are electronically stored information or recordings, including but not limited to documents, databases, transcripts, and audio/visual recordings. Public data are data that are not subject to valid privacy, security or privilege limitations, as governed by other statutes.
  2. Data Must be Primary
    Data are published as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
  3. Data Must be Timely
    Data are made available as quickly as necessary to preserve the value of the data.
  4. Data Must be Accessible
    Data are available to the widest range of users for the widest range of purposes.
  5. Data Must be Machine Processable
    Data are reasonably structured to allow automated processing of it.
  6. Access Must be Non-Discriminatory
    Data are available to anyone, with no requirement of registration.
  7. Data Formats Must be Non-Proprietary
    Data are available in a format over which no entity has exclusive control.
  8. Data Must be License-free
    Data are not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed as governed by other statutes.

These basic principles were then updated and re-phrased by the Sunlight Foundation in August 2010 to now number 10 principles, including the use of open standards, making data permanent, and keeping usage costs to an absolute minimum. All of these are laudable points. Each may or may not be provided in a fully open way by any given governmental entity.

This first step in the open data process has led to systems that are oriented to posting and publishing downloadable datasets. Existing open government data platforms, for example, such as Socrata or DKAN, can best be described as catalog systems. Listings of datasets with associated descriptions and metadata are presented. Users or the public may then chose among one or more machine-readable formats to download the entire dataset.

The 5 Added Principles of Dynamic Open Data

Of course, simply throwing data over the fence does not make it useful. Once we can get past the first threshold of making data publicly accessible, we next face the challenge of making that data meaningful and relevant. Since relevance is in the eye of the user, we no longer can think about information solely in terms of static, dumb datasets. We now need to expose the underlying data dynamically, such that users may request and filter and correlate what they need and only what they need.

Thus, there are five principles — or dimensions — by which we need to judge next-generation dynamic open data:

  1. Data Should be Filterable
    Data should be selectable by type (class), attribute or value such that only the data of interest is exposed to the user. This means the data should be structured in some way with facets that can be used dynamically to filter and make those selections.
  2. Data Should be Atomic
    Data should be exposed as individual entities or concepts with their attributes and values. The unit of manipulation thus becomes the datum, rather than the dataset.
  3. Data Should be Connected
    Because we are now collecting by datum and not dataset, connections between relevant things must be made explicit across relevant datasets. Similar things should be retrievable together. To achieve this aim, some schema or data definition framework must be layered over the data and datasets.
  4. Data Should be Expandable
    Since new data and new instances and new datasets will constantly arise, the design of the overall data management system must itself be “open”, enabling expansion of the available datastore at acceptable cost and effort.
  5. Data Should be Documented
    In order for these dynamic selections to be achievable, the data in the system must be fully documented, specifically including the full description and units used for attributes and values and the scope of entities and concepts. Only through such complete documentation can accurate connections and relevant selections per above be made.

There is no set order to the principles above. They are presented in the order shown so as to help remember them through the FACED mneumonic.

Parallels with Linked Data

Though the principles above do not call out linked data as a requirement, they do share many parallels with the early growth and maturation of linked data. A number of years back Fred Giasson and I commented on When Linked Data Rules Fail. Two of the points made in that article are the absence of suitable data descriptions and lack or wrong connections in data.gov and the NY Times datasets. I subsequently expanded on these types of problems in Practical P-P-P-Problems with Linked Data.

Official data from governments can avoid many of the provenance issues associated with general linked data, but in other areas there are important parallels. Like any emerging new practice, it takes a while to learn and formalize best practices. It is not surprising that we are seeing open data in government needing to transition from dumb datasets to actionable information. Making data actionable is when government information assets will finally become effective for the broader public.

Also, like linked data, it is likely the platforms built around semantic technologies and knowledge graphs (schema) will also come to the fore. Our own Open Semantic Framework is one such example, but there are a few now emerging in the linked data and semantic technology space. It will be through different practices and these newer platforms that we will see the next generation of open government data truly emerge.

Posted by AI3's author, Mike Bergman Posted on January 27, 2014 at 2:55 pm in Open Semantic Framework, Structured Dynamics | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/1707/welcoming-a-new-era-of-dynamic-open-data/
The URI to trackback this post is: http://www.mkbergman.com/1707/welcoming-a-new-era-of-dynamic-open-data/trackback/
Posted:January 20, 2014

Open Semantic Framework New OSF Platform Leapfrogs Earlier Releases in Features and Capabilities

After nearly five years of concentrated development — including the past 20 months of quiet, background efforts — Structured Dynamics is proud to announce version 3.0 of its open-source Open Semantic Framework. OSF is a turnkey platform targeted to enterprises to bring interoperability to their information assets, achieved via a layered architecture of semantic technologies. OSF can integrate information from documents to Web pages and standard databases. Its broad functions range from information ingest and tagging to search and data management to publishing.

Until today, the version available for download was OSF version 1.x. While capable as an enterprise platform — indeed, it has been in use by a number of leading global enterprises since development first began — the capability of the platform was spotty and required consulting expertise to configure and set-up. SD was hired by Healthdirect Australia (HDA) nearly two years ago to enhance OSF’s capabilities and integrate it more closely with the Drupal open-source content management system, among other modern enterprise requirements. The OSF from those developments — the non-public version 2.0 specific to HDA — has now been generalized for broader public use with today’s public announcement of version 3.0.

A More Complete Enterprise Platform

HDA's healthinsite Portal

Not unlike many large organizations, HDA had specific enterprise requirements when it began its recent initiative. Included in these were stringent security, broad use of proven open-source applications, governance and workflow procedures, and strict content authoring and management guidelines. These requirements further needed to express themselves via a sequence of deployment and testing environments, all conducted by a multi-vendor support group following agile development practices.

These requirements placed a premium on performance, scalability and interoperability, all subject to repeatable release procedures and scripts. OSF’s initial development as a more-or-less standalone platform needed to accommodate an enterprise-wide management model involving many players, environments and applications. Prior decisions based on OSF alone now needed to consider and bridge modern enterprise development and deployment practices.

Tighter integration with Drupal was one of these requirements (see next section), but other OSF changes necessary to accommodate this environment included:

  • A new security layer — the initial OSF security model was based on IP authentication. Given the sensitivity of the health data managed by HDA, such a simplistic approach was unacceptable. The actual HDA deployment relied on a third-party security application. However, what was learned from that resulted in a key-based access and validation model in the OSF v 3.0 update
  • A new revisioning system — content authoring and governance required multiple checks in the workflow, and requirements to review prior edits and invoke possible rollbacks. The result was to add a completely new revisioning capability to OSF
  • Middleware integration and APIs — in a multi-vendor environment, OSF operates in part as a central repository for all system information, which third parties must more readily and easily be able to access. Thus, besides the security aspects, a much improved programmatic API and a generalized search API were added to the OSF platform
  • New, additional Web services — the requirements above meant that seven new OSF Web services were added to the system, bringing the total number of current Web services to 27
  • New caching layer — because of its Web-service design, information access and mediation occurs via a large number of endpoint queries, many of which are patterned and repeated. To improve overall performance, a new caching layer was added to OSF that significantly improved performance and reduced access burdens on the OSF engines
  • Workflow integration — improved workflow sequences and screens were required to capture workflow and goverance demands, and
  • Multilingual support — like most larger organizations, HDA has a diversity of native languages throughout its user base. Though OSF had initially been explicitly designed to support multilinguality, specific procedures and capabilities were put in place to more easily support multiple languages in OSF.

Tighter Integration with Drupal

When Fred Giasson and I first designed and architected the Open Semantic Framework in 2009, we made the conscious decision to loosely couple OSF with the initial user interface and content management system, Drupal. We did so thinking that perhaps other CMS frameworks would be cloned onto OSF over time.

Time has not proven this assumption correct. Client experience and HDA’s interests suggested the wisdom of a tighter coupling to Drupal. This shift arose because of the great flexibility of Drupal with its tens of thousands of add-on modules and its ecosystem of capable developers and designers. Our early decision to keep Drupal at arm’s length was making it more difficult to manage an OSF instance. Existing Drupal developers were not able to employ their Drupal expertise to manage OSF portals.

We pivoted on this error by tightening the coupling to Drupal, which involved a number of discrete activities:

  • Upgrade to Drupal 7 — earlier versions of OSF used Drupal 6. We migrated the code base to Drupal 7. That, plus the other Drupal changes noted below, resulted in re-writing about 80% of the OSF code base related to Drupal
  • Alternative Drupal data storage — Drupal’s own evolution in version 7 (and continuing with version 8) is to abstract its underlying information model around entities and fields, abstractions that are much better aligned with OSF’s RDF data model. As these entity and field changes were exposed in Drupal APIs, it was possible for us to write an entirely new information model underlying Drupal. Drupal administrators using OSF are now able to use OSF solely as the data model underneath Drupal (rather than the more standard MySQL)  or any mixed portions in between. The typical OSF for Drupal design now uses OSF for all content storage, with MySQL reserved for internal Drupal settings (à la MVC)
  • Drupal connectors — certain common or core Drupal modules, such as Fields, Entities, Search, and Views, are either common utilities for Drupal developers or are themselves core bases for third-party modules. Because of their centrality, SD developed a series of “connectors” that enable these modules to be used as is while transparently communicating with and writing to OSF. Thus, Drupal developers can use these familiar capabilities without needing particular OSF knowledge
  • Major updates to Drupal modules — because of the changes above, the existing OSF Drupal modules (called conStruct in the earlier versions) were updated to take advantage of the common terminology and tighter integration
  • Major updates to Drupal widgets — similarly, the standard OSF data and visualization widgets used with Drupal (called Semantic Components in the earlier versions) were also updated to work in this more tightly integrated environment.

Expanded Search Capabilities and Web Services

Some of the extended capabilities in OSF v 3.0 are noted above, including the expanded roster of Web services. However, the OSF Search Web service, which is by far the most used OSF endpoint, received massive improvements in this latest release.

First, OSF Search now uses a new query parser, which provides the capability to change the ranking of search results by boosting how specific query components get scored. Types, attributes, datasets or counts may be used to vary any given search result, including different occurrences on the same page. It is also now possible to add restrictions to the search queries, including restricting results to a specified set of attributes.

This flexibility is highly useful wherein certain structured pages contain blocks or sections with patterned search results. This structuring leads to the ability to create generic page templates, wherein search queries and results vary within the layout. An “events” block may score differently than, say, a “related topics” block, all of which in turn can respond to a given context (say, “cancer” versus “automobiles”) for a given page (and its template).

These repeated patterns lend themselves to the use of reusable “search profiles,” which are predefined queries that may include context variables. These profiles, in turn, can be named and placed on page layouts. Existing profiles may be recalled or invoked to become patterns for still further profiles. The flexibility of these search profiles is immense, and the parameters used in constructing them can be quite extensive.

Thus, OSF version 3.0 includes the new Query Builder module. Via an intuitive selection interface, users may construct search queries of any complexity, and then save and reuse them later as search profiles.

Lastly, registering, configuring and managing OSF instances and datasets into Drupal has never been easier. The new OSF Configure module centralizes all the features and options required for these purposes, which are then managed by a new suite of tools (see next).

Automated Installation and Management Tools

Standard enterprise deployments that proceed from development to production require constant updates and versions, both in application code and content. Keeping track and managing these changes — let alone deploying them quickly and without error — requires separate management capabilities in their own right. The new OSF thus has a number of utilities and command-line tools to aid these requirements:

  • OSF Installer — this tool installs and configures all the pieces required by the OSF stack, then runs the OSF Tests Suites to make sure that all functionality is fully operational on the new server
  • OSF Tests Suites — composed of 746 tests and 4139 assertions, these tests may be run every time an OSF instance is deployed or code is changed. The tests measure all of the input parameters of each endpoint, combinations thereof, mime types, and expected errors returned by each endpoint
  • OSF Ontologies Management Tool — (OMT) is used to manage ontologies, list ontologies, create/import new ones, delete existing ones, or to generate underlying ontological structures
  • OSF Datasets Management Tool — (DMT) is used to manage datasets of a OSF instance, enabling the user to create, delete, update, import and export datasets directly from the command line
  • OSF Permissions Management Tool — (PMT) is used to manage, list, create or delete access permissions groups and users
  • OSF Data Validator Tool — (DVT) is used to perform a series of post-indexation data validation tests and return validation errors if any are found.

Tempered via Enterprise Development and Deployments

The methods and processes by which these advances have been made all occurred within the context of state-of-the-art enterprise IT management. Experience with supporting infrastructure tools (such as Jira, Confluence, Puppet, etc.) and agile development methods are part of the ongoing documentation of OSF (see next). This experience also bolsters Structured Dynamics’ ability to work with other third-party applications at the middleware layer or in support of enterprise deployments.

Comprehensive and Completed Updated Documentation

The Open Semantic Framework has evolved considerably since its conception now five years ago. In its early development, components and pieces were sometimes developed in isolation and then brought into the framework. This jagged development path led to a cacophony of names and terms to characterize portions of the OSF stack. This terminology confusion has made it more difficult than it needed to be to understand the vision of OSF, the layers of its architecture, or the interactions between its components and parts.

In making the substantial efforts to update documentation from OSF version 1.x to the current version 3.0, terminology was made consistent and code references were cleaned up to reflect the simpler OSF branding. This clean up has led to necessary updates across multiple Web sites maintained by Structured Dynamics with some relationship to OSF.

The Web site with the most changes required has been the OSF Wiki. In its prior incarnation, called TechWiki, there were nearly 400 technical articles on OSF. That site has now been completely rewritten and re-organized. Nearly two hundred new articles have been written in support of OSF v 3.0. Terminology related to the older cacophony (see correspondance table here) has (hopefully) been updated and corrected. Most architectural and technical diagrams have been updated. Additional documentation is being posted daily, catching up with the experience of the past twenty months.

Moving Beyond the Established Foundation

Open Semantic Framework

SD is pleased that enterprise sponsors want to continue beyond the Open Semantic Framework’s present solid foundations. While we are not at liberty to discuss specific client initiatives, a number of ongoing developments can be described broadly. First, in terms of the key engines that provide the core of OSF’s data management capabilities, initiatives are underway in the areas of visualization, business analytics and workflow orchestration and management. There are also efforts underway in more automated means for direct ingest of quality Web-based information, both based on linked data and from Web APIs. We are also pleased that efforts to further extend OSF’s tight integration with Drupal are also of interest, even while the integration efforts of the past months have not yet been fully exploited.

To Learn More

To learn more, make sure and check out the re-organized OSF wiki. See specifically the complete OSF overview, the list of all the OSF 3.0 features, and the list of all the new features to OSF 3.0. Also, for a complete soup-to-nuts view of what it takes to put up a new OSF installation, see the Users Guide. Lastly, for a broad overview of OSF, see its reference architecture and the overviews on its dedicated OSF Web site.

As a final note, Structured Dynamics would like to thank its corporate sponsors of the past five years for providing the development funds for OSF, and for agreeing with the open source purposes of the Open Semantic Framework.