In the first part to this series, we put forward the argument that incomplete provision of important support factors was limiting the adoption of open source software in the enterprise. We can liken the absence of these factors to having a chair with one or more absent or broken legs.
This second part of the series goes into the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.
These considerations are not simply a matter of idle curiosity. New approaches and new methods are required for enterprises to modernize their IT systems while adding new capabilities and preserving sunk assets. Extending and modernizing existing IT is often not in the self-interests of the original supplying vendors. And enterprises are well aware that IT commitments can extend for decades.
While the benefits and capabilities of open source software become apparent by the day, rates of open source software adoption lag in enterprises. We have seen entire Internet-based businesses arise and get huge in just a few short years. But it is the rare existing enterprise that has committed to and embraced similar Web-oriented architectures and IT strategies .
The enterprise IT ecosystem is evolving to become an unhealthy one. New software vendors have generally abandoned enterprises as a market. Much more action takes place with consumer apps and Internet plays, often premised on ad-based revenues or buzz and traffic as attractors for acquisition. Existing middle-tier enterprise vendors are themselves being gobbled up and disappearing. I’m sure all observers would agree that IT software and services are increasingly dominated by a shrinking slate of vendors. I suspect most observers — myself included — would argue that enterprise-based IT innovation is also on the wane.
The argument posed in the first part of this series is that such atrophy should not be unexpected. The current state of open source software is not addressing the realities of enterprise IT needs.
And that is where the other legs of the total open solution come in. In their entirety, they amount to a form of capacity building for the enterprise . It is not simply enough to put forward buzzwords matched with open source software packages. Exciting innovations in social networks, collaboration, semantic enterprise, mobile apps, REST, Web-oriented architectures, information extraction, linked data and a hundred others are being validated on the Internet. But until the full spectrum of success and adoption factors gets addressed, enterprises will not embrace these new innovations as central to their business.
As we describe these four legs to the total open solution, we will sometimes point to our Citizen Dan initiative . That is not because of some universal applicability of the system to the enterprise; indeed Citizen Dan is mostly targeted to local communities and municipalities. But, Citizen Dan does represent the first instance known to us where each of these total open solution success factors is being explicitly recognized and developed. We think the approach has some transferability to the broader enterprise.
Let’s now discuss these four legs in turn.
Of course, the genesis of this series is grounded in open source software and what it needs to do in order to find broader enterprise acceptance. Clearly that is the first leg amongst the four to be discussed. We also have acknowledged that, generally, best-of-breed open source software is also better documented at the code level, and has documented APIs. We will return to this topic under Leg Four below.
Open source software useful to the enterprise is often a combination of individual open source packages. Some successful vendors of open source to the enterprise in fact began as packagers and documenters of multiple packages. Red Hat for Linux or Alfresco in document management or Pentaho in business intelligence come to mind, as examples.
In the case of Citizen Dan, here are the open source packages presently contained in its offering: Linux (Ubuntu), Apache, MySQL, PHP (these comprising the LAMP stack), Drupal, a variety of third-party Drupal modules, Virtuoso, Solr, ARC2, Smarty, Yahoo UI, TinyMCE, Axiis, Flex, ClearMaps, irON, conStruct, structWSF, and some others. Such combinations of packages are not unusual in open source settings, since new value-add typically comes from extensions to existing systems or unique ways to combine or package them. For example, the installation guide for structWSF alone is quite comprehensive with multiple configuration and test scripts.
Thus, besides direct software, it is also critical that configuration, settings, installation guidance and the like be addressed to enable relatively straightforward set-up. This is an area of frequent weakness. Targeting it directly is a not-so-secret factor for how some vendors have begun to achieve some success with the enterprise market.
All software works on data. While some data is unstructured (such as plain text) and some is semi-structured (such as HTML or Web pages that mixes markup with text), the objective of information extraction or natural language processing is to extract the “structure” from such sources. Once extracted, such structure can interoperate on a common footing with the structured data common to standard databases.
Thus, we use “structure” to denote the concepts and their relationships (the “schema” or “ontology”) and the indicators and data (attributes and values) to describe them, and the “entities” (distinct individuals or nameable instances) that populate them. In other words, “structure” refers to all of the schema (concepts + relationships) + data + attributes + indicators + records that make up the information upon which software can operate.
Structure exists in many forms and serializations. Generally, software represents its internal information in one or a few canonical storage and manipulation formats, though that same software may also be able to import (ingest) or export its information and data in many different external formats.
In our semantic enterprise work, especially with its premise in ontology-driven applications using adaptive ontologies, structure is an absolutely essential construct. But, frankly, no information technology system exists that does not also depend on structure to a more or less greater extent.
The interplay between software and structure is one source of expertise that vendors guard closely and use to competitive advantage. In years past, proprietary software could partially hide the bases for performance or algorithmic advantages. Expert knowledge and intimate familiarity with these systems was the other bases to keep these advantages closely held.
It is perhaps not too surprising given this history, then, that the software industry really has very little emphasis or discussion on the interaction between software and structure. But, if software is being brought in as open source, where is the accompanying expertise or guidance for how data structure can be used to gain full advantage? The same acquired knowledge that, say, accompanied the growth of relational databases in such areas as schema development, materialized views or (de)normalization now needs to be made explicit and exposed for all sorts of open source systems.
In the realm of the semantic enterprise we are seeing attempts at this via open source ontologies and greater emphasis on APIs and documentation of same. Citizen Dan, for example, will be first publicly released with an accompanying MUNI ontology as a reference schema and starting point. Descriptions and methods for how to obtain indicator data and relevant attribute and entity information for the domain will also accompany it.
As open source software continues to emphasize semantics and interoperability, exemplar structures and best practices will need to be an essential part of the technology transfer. Just as the “secrets” of much software began to be opened up via open source, so too must the locked-up expertise of experts and practitioners in how to effectively structure data be exposed.
The need for structure explication and guidance is but one unique slice of a much broader need to expose methods and best practices surrounding a given information management initiative. The reason that any open source software might be adopted in the first place is based on the hope for some improved information management process.
Recently I have been touting MIKE2.0, the first open source, replicable and extensible framework for organizing and managing information in the enterprise. MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. It can be applied to any type of information development.
MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 and its forthcoming extensions, one of which we have developed for the semantic enterprise and are now extending into the semantic government in the context of Citizen Dan, are exciting because they provide a systematic approach and guidance for how (and for what!) to document new projects and initiatives. What MIKE2.0 represents is the first time that the embedded, proprietary expertise of traditional IT consultants has been exposed for broader use and extension.
The real premise behind any approach like MIKE2.0 or variants is to codify the expertise and knowledge that was previously locked up by experts and practitioners. The framework in MIKE2.0 provides a structure by which knowledge bases of background information can be assembled to accompany an open source project. This structure extends from initial evaluation and design all the way through operation and end of life.
The ‘CIS DocWiki’ that is being developed to accompany Citizen Dan is such an example of a MIKE2.0-informed knowledge base. At present, the CIS DocWiki has more than 300 specific articles useful to community indicator systems for local governments, and a complete deployment and maintenance methodology. By public release, it will likely be 2-3 times that size. All of this will be downloadable and installable as a wiki, and as open source content, ready for branding and modification for any local circumstance. CIS DocWiki is a natural methods and documentation complement to the Citizen Dan software and its MUNI structure. Release is scheduled for summer.
As we will focus on in Part 3 of this series, we are combining a MIKE2.0 organizational approach with a documentation and single-source publication platform to fulfill the method and documentary aspects of projects. It was really through the advantages gained by the combination of these pieces that we began to see the inadequacy of many current open source projects for the enterprise.
This series began in part with a recognition that superior open source projects are often the better documented ones. But, even there, documentation is often restricted to code-level documentation or perhaps APIs.
As the material above suggests, documentation needs to extend well beyond software. We need documentation of structure, methods, best practices, use cases, background information, deployment and management, and changing needs over the lifetime of the system. And, as we have also seen in Part 1, the lifetime of that system might be measured in decades.
Documentation is no equal to paid partners and their expertise. But, documentation can be cheaper, and if that documentation is sufficient, might be a means for changing the equation in how IT projects are solicited, acquired and managed.
Today, enterprises appear to be stuck between two difficult choices: 1) the traditional vendor lock-in approach with high costs and low innovation; or 2) open source with minimal documentation and vendor knowledge and little assurance of support longevity.
These trade-offs look pretty unpalatable.
Documentation alone, even as extended into the other legs of the solution, is not prima facie going to be a deal maker. But, its absence, I submit, is a deal breaker. Just as open source itself has taken some years to build basic comfort in the enterprise, so too a concerted attack on all acceptance factors may be necessary before actual wide adoption occurs.
The ‘CIS DocWiki’ platform noted for Citizen Dan we hope will be an exemplar for this combination of documentation and methodology. It is a single-source publishing platform that allows the entire knowledge base behind a given IT initiative to be used for collaboration, operational, training or collateral purposes. And all of this is based on open source software.
Software vendors need to recognize these documentation factors and build their ventures for success. Yes, writing code and producing software is a lot more fun and rewarding than (yeech) documentation. But, unless our current generation of vendors that is committed to open source and its benefits takes its markets seriously — and thus commits to the serious efforts these markets demand — we will continue to see minimal uptake of open source in the enterprise.
Each of these four legs of a total open solution can interact with and reinforce the other parts. Once one begins to see the problem of open source adoption in the enterprise as a holistic one, a new systems-level perspective emerges.
Enterprises know full well that software is only one means to address an information management problem, and only a first step at that. Traditional vendors to the enterprise also understand this, which is why through their embedded systems and built-up expertise they have been able to perpetuate what often amounts to a monopoly position.
Pressures are building for a earthquake in the IT landscape. Enterprises are on an anvil of global competition and limited resources. Existing IT systems are not up to the task but too expensive and embedded to abandon. Traditional vendors have near monopoly positions and little incentive to innovate. New software vendors don’t have the expertise and gravitas to handle enterprise-scale challenges. Meanwhile, the rest of the globe is leapfrogging embedded systems with agile, Web-based systems.
The true innovation that is occurring is all based around open source, nurtured by the global computing platform of the Internet, and fueled by countless individuals able to compete on downward-spiraling cost bases. But on so many levels, open source as presently constituted, either fails or poses too many risks to the commercial enterprise.
The Internet itself was the basis of a paradigm shift, but I think we are only now seeing its manifestation at the enterprise level. We are also now seeing global reordering and changes of the economic order. How will companies respond? How will their IT systems adapt? And what will new vendors need to do and recognize in order to thrive in this changing environment?
I’m not sure I have found the language or rhetoric to convey what I see coming, and coming soon. I know open source is part of it; I know enterprises need it; and I know what is presently being offered does not meet the test.
As I noted in our first part, the mantra that we use in Structured Dynamics to express this challenge is, “We’re Successful When We’re Not Needed“. I think the essence behind this statement is that premises of dependency or proprietary advantage will not survive the jet streams of change that are blowing away the old order.
Sound like too much hyperbole? Actually, my own gut feeling is that it is not nearly enough.
In any case, windy rhetoric always falls short if there is not some actionable next steps. In these first two parts of this series, I have tried to present the ingredients that need to go into the cake. In the third part I try to offer a new, and complementary, open source means for bringing stability to the foundation.
In all cases, though, I think these challenges are permanent ones and do not lend themselves to facile solutions. Four legs, or seven foundations, or twelve steps are all just too simplistic for dealing with the global and complex tsunamis blowing away the old order.
One really does not need to lick a finger to sense the direction of these winds of change. It is coming, and coming hard, and all of it is from the direction of open source. What enterprises do, and what the vendors who want to serve them do, is perhaps less clear. I think open source offers a way out of the box in which enterprise IT is currently stuck. But, at present, I also think that most open source options do not have the necessary legs to stand on.
Structured Dynamics has been engaged in open source software development for some time. Inevitably in each of our engagements we are asked about the viability of open source software, its longevity, and what the business model is behind it. Of course, I appreciate our customers seemingly asking about how we are doing and how successful we are. But I suspect there is more behind this questioning than simply good will for our prospects.
Besides the general facts that most of us know — of hundreds of thousands of open source projects only a miniscule number get traction — I think there are broader undercurrents in these questions. Even with open source, and even with good code documentation, that is not enough to ensure long-term success.
When open source broke on the scene a decade or so ago , the first enterprise concerns were based around code quality and possible “enterprise-level” risks: security, scalability, and the fact that much open source was itself LAMP-based. As comfort grew about major open source foundations — Linux, MySQL, Apache, the scripting languages of PHP, Perl and Python (that is the very building blocks of the LAMP stack) — concerns shifted to licensing and the possible “viral” effects of some licenses to compromise existing proprietary systems.
Today, of course, we see hugely successful open source projects in all conceivable venues. Granted, most open source projects get very little traction. Only a few standouts from the hundreds of thousands of open source projects on big venues like SourceForge and Google Code or their smaller brethren are used or known. But, still, in virtually every domain or application area, there are 2-3 standouts that get the lion’s share of attention, downloads and use.
I think it fair to argue that well-documented open source code generally out-competes poorly documented code. In most circumstances, well-documented open source is a contributor to the virtuous circle of community input and effort. Indeed, it is a truism that most open source projects have very few code committers. If there is a big community, it is largely devoted to documentation and assistance to newbies on various forums.
We see some successful open source projects, many paradoxically backed by venture capital, that employ the “package and document” strategy. Here, existing open source pieces are cobbled together as more easily installed comprehensive applications with closer to professional grade documentation and support. Examples like Alfresco or Pentaho come to mind. A related strategy is the “keystone” one where platform players such as Drupal, WordPress, Joomla or the like offer plug-in architectures and established user bases to attract legions of third-party developers .
I think if we stand back and look at this trajectory we can see where it is pointing. And, where it is pointing also helps define what the success factors for open source may be moving forward.
Two decades ago most large software vendors made on average 75% to 80% of their revenues from software licences and maintenance fees; quite the opposite is true today . The successful vendors have moved into consulting and services. One only needs look to three of the largest providers of enterprise software of the past two decades — IBM, Oracle and HP — to see evidence of this trend.
How is it that proprietary software with its 15% to 20% or more annual maintenance fees has been so smoothly and profitably replaced with services?
These suppliers are experienced hands in the enterprise and know what any seasoned IT manager knows: the total lifecycle costs of software and IT reside in maintenance, training, uptime and adaptation. Once installed and deployed, these systems assume a life of their own, with actual use lifetimes that can approach two to three decades.
This reality is, in part, behind my standard exhortation about respecting and leveraging existing IT assets, and why Structured Dynamics has such a commitment to semantic technology deployment in the enterprise that is layered onto existing systems. But, this very same truism can also bring insight into the acceptable (or not) factors facing open source.
Great code — even if well documented — is not alone the mousetrap that leads the world to the door. Listen to the enterprise: lifecycle costs and longevity of use are facts.
But what I am saying here is not really all that earthshaking. These truths are available to anyone with some experience. What is possibly galling to enterprises is two smug positions of new market entrants. The first, which is really naïve, is the moral superiority of open source or open data or any such silly artificial distinctions. That might work in the halls of academia, but carries no water with the enterprise. The second, more cynically based, is to wrap one’s business in the patina of open source while engaging in the “wink-wink” knowledge that only the developer of that open source is in a position to offer longer term support.
Enterprises are not stupid and understand this. So, what IT manager or CIO is going to bet their future software infrastructure on a start-up with immature code, generally poor code documentation or APIs, and definitely no clear clue about their business?
Yet, that being said, neither enterprises nor vendors nor software innovators that want to work with them can escape the inexorable force of open source. While it has many guises from cloud computing to social software or software as a service or a hundred other terms, the slow squeeze is happening. Big vendors know this; that is why there has been the rush to services. Start-up vendors see this; that is why most have gone consumer apps and ad-based revenue models. And enterprises know this, which is why most are doing nothing other than treading water because the way out of the squeeze is not apparent.
The purpose of this three-part series is to look at these issues from many angles. What might the absolute pervasiveness of open source mean to traditional IT functions? How can strategic and meaningful change be effected via these new IT realities in the enterprise? And, how can software developers and vendors desirous of engaging in large-scale initiatives with enterprises find meaningful business models?
And, after we answer those questions, we will rest for a day.
But, no, seriously, these are serious questions.
There is no doubt open source is here to stay, yet its maturity demands new thinking and perspectives. Just as enterprises have known that software is only the beginning of decades-long IT commitments and (sometimes) headaches, the purveyors and users of open source should recognize the acceptance factors facing broad enterprise adoption and reliance.
Open source offers the wonderful prospect of avoiding vendor “lock-in”. But, if the full spectrum of software use and adoption is also not so covered, all we have done is to unlock the initial selection and install of the software. Where do we turn for modifications? for updates? for integration with other packages? for ongoing training and maintenance? And, whatever we do, have we done so by making bets on some ephemeral start-up? (We know how IBM will answer that question.)
The first generation of open source has been a substitute for upfront proprietary licenses. After that, support has been a roll of the dice. Sure, broadly accepted open source software provides some solace because of more players and more attention, but how does this square with the prospect of decades of need?
The perverse reality in these questions is that most all early open source vendors are being gobbled up or co-opted by the existing big vendors. The reward of successful market entry is often a great sucking sound to perpetuate existing concentrations of market presence. In the end, how are enterprises benefiting?
Now, on the face of it, I think it neither positive nor negative whether an early open source firm with some initial traction is gobbled up by a big player or not. After all, small fish tend to be eaten by big fish.
But two real questions arise in my mind: One, how does this gobbling fix the current dysfunction of enterprise IT? And, two, what is a poor new open source vendor to do?
The answer to these questions resides in the concerns and anxieties that caused them to be raised in the first place. Enterprises don’t like “lock-in” but like even less seeing stranded investments. For open source to be successful it needs to adopt a strategy that actively extends its traditional basis in open code. It needs to embrace complete documentation, provision of the methods and systems necessary for independent maintenance, and total lifecycle commitments. In short, open source needs to transition from code to systems.
We call this approach the total open solution. It involves — in addition to the software, of course — recipes, methods, and complete documentation useful for full-life deployments. So, vendors, do you want to be an enterprise player with open source? Then, embrace the full spectrum of realities that face the enterprise.
The actual mantra that we use to express this challenge is, “We’re Successful When We’re Not Needed“. This simple mental image helps define gaps and tells us what we need to do moving forward.
The basic premise is that any taint of lock-in or not being attentive to the enterprise customer is a potential point of failure. If we can see and avoid those points and put in place systems or whatever to overcome them, then we have increased comfort in our open source offerings.
Like good open source software, this is ultimately a self-interest position to take. If we can increase comfort in the marketplace that they can adopt and sustain our efforts without us, they will adopt them to a greater degree. And, once adopted, and when extensions or new capabilities are needed, then as initial developers with a complete grasp on the entire lifecycle challenges we become a natural possible hire. Granted, that hiring is by no means guaranteed. In fact, we benefit when there are many able players available.
In the remaining two parts of this series we will discuss all of the components that make up a total open solution and present a collaboration platform for delivering the methods and documentation portions. We’re pretty sure we don’t yet have it fully right. But, we’re also pretty sure we don’t have it wrong.
Ten years ago the message was the end of obscene rents from proprietary enterprise software licenses. Five years ago the message was the arrival and fast maturing of open source. Today, the message is the open world and semantics.
These forces are conspiring to change much within enterprise IT. And, this change will undoubtedly be for the good — for the enterprise. But these forces are not necessarily good news within conventional IT departments and definitely not for traditional vendors unwilling to transform their business models.
I have been beating the tom-tom on this topic for a few months, specifically in regards to the semantic enterprise. But I have by no means been alone nor unique. The last two weeks have seen an interesting confluence of reports and commentaries by others that richen the story of the changing information technology landscape. I’ll be drawing on the observations of Thomas Wailgum (CIO magazine) , John Blossom  and Andy Mulholland, CTO of Capgemini .
Wailgum describes the “New Normal” and how it might kill IT . He picks up on the viewpoint that ties the recent meltdowns in the financial sector as a seismic force for changes in information technology. While he acknowledges many past challenges to IT from PCs and servers and Y2K and software becoming a commodity, he puts the global recession’s impact on business — the “New Normal”– into an entirely different category.
His basic thesis is that these financial shocks are forcing companies to scrutinize IT as never before, in particular “unfavorable licensing agreements and much-too-much shelfware; ill-conceived purchasing and integration strategies; and questionable software married to entrenched business processes.”
Yet, he also argues that IT and its systems are too ingrained into the core business processes of the enterprise to be allowed to fail. IT systems are now thoroughly intertwined with:
But top management is disappointed and disaffected. IT systems gobble up too many limited resources. They are inflexible. They are old and require still more limited resources to modernize. They are complex. They create and impose delays. And all of these negatives lead to huge losses in opportunity costs. Wailgum notes Gartner, for instance, as saying that by 2012 perhaps 20 percent of businesses will own no IT assets at all in their desire to outsource this headache.
I think this devastating diagnosis is largely correct, though perhaps incomplete in that no mention is made of the flipside: what IT has failed to deliver. I think this flipside is equally damning.
Despite decades of trying, IT still has not broken down the data stovepipes in the enterprise. Rather, they have proliferated like rabbits. And, IT has failed to unlock the data in the 80% of enterprise information contained within documents (unstructured data).
Unfortunately, after largely zeroing in and mostly diagnosing the situation, Wailgum’s remedy comes off sounding like a tired 12-step program. He argues for new mindsets, better communications, getting in touch with customers, being willing to take risks, and being nimble. Well, duh.
So, over the decades of IT failures there has been accompanying decades of criticism, hand-wringing, and hackneyed solutions. Without some more insightful thinking, this analysis can make our understanding of the New Normal look pretty old.
John Blossom  picks up on these arguments and looks at the issues from the vendor’s perspective. Blossom characterizes Wailgum’s piece as “outlining the enormous value gap that’s been arising in enterprise information technologies.” And, while clearly new approaches are needed and farming them out may become more prevalent, Blossom cautions this is not necessarily good news for vendors.
As Blossom puts it, “what seems to be happening is that many of the business processes through which these enterprises survived and thrived over the past several decades are shooting blanks. . . . many of the fundamental concepts of IT that have been promoted for the past few decades no longer give businesses operational advantages but they have to keep spending on them anyway.”
As he has been arguing for quite some time, one fundamental change agent has been the Web itself. “The Web has accelerated the flow of information and services that can lead to effective decision-making far more rapidly than enterprise IT managers have been able to accommodate.”
Web search engines and social media tools can begin to replace some of the dedicated expenditures and systems within the enterprise. Moreover, the extent, growth and value of external data and content is readily apparent. Without outreach and accommodation of external data — even if it can solve its own internal data federation challenges — the individual enterprise is at risk of itself becoming a stovepipe.
Prior focuses on strategy and capturing workflows are perhaps being supplanted by the need for operational flexibility and on-the-fly aggregation and rapid service development tools. In an increasingly interconnected and rapidly changing world with massive information growth, being able to control workflows and to depend on central IT platforms may become last decade’s “Old Normal.” Floating on top of these massive forces and riding with their tides is a better survival tactic than digging fixed emplacements in the face of the tsunami.
These factors of Web, open source, agnosticism as to platform or software applications, and the need to mash up innovations from anywhere are not the traditional vendor game. Just as businesses and their IT departments must get leaner, so must the expectation of vendors to extort exorbitant rents from their clients. “Fasten your seatbelts, it’s going to be a bumpy night!” 
So, Blossom agrees with the Wailgum diagnosis, but also helps us begin to understand parts of the cure. Blossom argues the importance of:
Much, if not all of this, can be provided by open source. But open source is not a sine qua non: commercial products that embrace these approaches can also be compatible components across the stack.
But — even with these components — a full cure still lacks a couple of crucial factors.
These remaining gaps are emphasized in Andy Mulholland’s recent blog post . His post was occasioned by the press announcement that Structured Dynamics (my firm) had donated its Semantic Enterprise Adoption and Solutions, or SEAS, methodology to MIKE2.0 . Mulholland was suggesting his audience needed to know about this Method for an Integrated Knowledge Environment because some of the major audit partnerships have decided to get behind MIKE2.0 with its explicit and open source purpose of managing knowledge environments and their data and provenance.
As Mulholland notes, “. . . it’s not just more data, it’s the forms of data, and what the data is used for, all of which add to the complications. . . . Sadly the proliferation of data has mostly been in unstructured data in formats suitable for direct human use.”
So, one remaining factor is thus how to extract meaning from unstructured (text) content. It is here that semantics and various natural language processing (NLP) components come in. Implied in the incorporation of data extracted from unstructured sources is a data model expressly designed for such integration.
Yet, without a fulcrum, the semantic lever can still not move the world. Mulholland insightfully nails this fundamental missing piece — the “most crucial issue” — as the use of the open world assumption.
From an enterprise perspective and in relation to the points of this article, an open world assumption is not merely a different way to look at the world. More fundamentally, it is a different way to do business and a very different way to do IT.
I have summarized these points before, but they deserve reiteration. Open world frameworks provide some incredibly important benefits for knowledge management applications in the enterprise:
Archimedes is attributed to the apocryphal quote, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.”  I have also had lawyer friends tell me that the essence of many court cases is found in a single pivotal assertion or statement in the arguments. I think it fair to say that the open world approach plays such a central role in unlocking the adaptive way for IT to move forward.
As Mulholland notes, we have donated our Open SEAS methodology  to MIKE2.0 in the hopes of seeing greater adoption and collaboration. This is useful, and all are welcome to review, comment and contribute to the methodology, indeed as is the case for all aspects of MIKE2.0.
But the essential point of this article is that Open SEAS also embraces most — if not all — of the factors necessary to address the New Normal IT function.
Open SEAS is explicitly designed to facilitate becoming an open semantic enterprise. Namely, this means an organization that uses the languages and standards of the semantic Web, including RDF, RDFS, OWL, SPARQL and others to integrate existing information assets, using the best practices of linked data and the open world assumption, and targeting knowledge management applications. It does so based on Web-oriented architectures and approaches and uses ontologies as an “integration layer” across existing assets.
The foundational approaches to the open semantic enterprise do not necessarily mean open data nor open source (though they are suitable for these purposes with many open source tools available). The techniques can equivalently be applied to internal, closed, proprietary data and structures. The techniques can themselves be used as a basis for bringing external information into the enterprise. ‘Open’ is in reference to the critical use of the open world assumption.
These practices do not require replacing current systems and assets; they can be applied equally to public or proprietary information; and they can be tested and deployed incrementally at low risk and cost. The very foundations of the practice encourage a learn-as-you-go approach and active and agile adaptation. While embracing the open semantic enterprise can lead to quite disruptive benefits and changes, it can be accomplished as such with minimal disruption in itself. This is its most compelling aspect.
We believe this offers IT an exciting, incremental and low-risk path for moving forward. All existing assets can be left in place and — in essence — modernized in place. No massive shifts and no massive commitments are required. As benefits and budgets allow, the extent of the semantic interoperability layer may be extended as needed and as affordable.
The open semantic enterprise is not magic nor some panacea. Simply consider it as bringing rationality to what has become a broken IT system. Embracing the open semantic enterprise can help the New Normal be a good and more adaptive normal.
To date, we have been the most viewed proposal by far (2x more than the second most viewed!!! Hooray!) and are in the top five of highest rated (have also been at #1 or #2, depending. Hooray!). Thanks to all of you for your interest and support.
There is much to recommend this KNC approach, not the least of which being able to attract some 2,500 proposals seeking a piece of the 2010 $5 million potential grant awards. Our proposal extends SD’s basic structWSF and conStruct Drupal frameworks to provide a data appliance and network (DAN) to support citizen journalists with data and analysis at the local, community level.
None of our rankings, of course, guarantees anything. But, we also feel good about how the market is looking at these frameworks. We have recently been awarded some pretty exciting and related contracts. Any and all of these initiatives will continue to contribute to the open source Citizen DAN vision.
And, what might that vision be? Well, after some weeks away from it, I read again our online submission to the Knight News Challenge. I have to say: It ain’t too bad! (Plus many supporting goodies and details.)
So, I repeat in its entirety below, the KNC questions and our formal responses. This information from our original submittal is unchanged, except to add some live links where they could not be submitted as such before. (BTW, the bold headers are the KNC questions.) Eventual winners are slated to be announced around mid-June. We’re keeping our fingers crossed, but we are pursuing this initiative in any case.
Citizen DAN is an open source framework to leverage relevant local data for citizen journalists. It is a:
Good decisions and good journalism require good information. Starting with pre-loaded government data, Citizen DAN provides any citizen the framework to learn and compare local statistics and data with other similar communities. This helps to promote the grist for citizen journalism; it is also a vehicle for discovery and learning across the community.
Citizen DAN comes pre-packaged with all necessary deployment components and documentation, including local data from government sources. It includes facilities for direct upload of additional local data in formats from spreadsheets to standard databases. Many standard converters are included with the basic package.
Citizen DAN may be implemented by local governments or by community advocacy groups. When deployed, using its clear documentation, sponsors may choose whether or what portions of local data are exposed to the broader Citizen DAN network. Data exposed on the network is automatically available to any other network community for comparison and analysis purposes.
This data appliance and network (DAN) is multi-lingual. It will be tested in three cities in Canada and the US, showing its multi-lingual capabilities in English, Spanish and French.
With Citizen DAN, anyone with Web access can now get, slice, and dice information about how their community is doing and how it compares to other communities. We have learned from Web 2.0 and user-generated content that once exposed, useful information can be taken and analyzed in valuable and unanticipated ways.
The trick is to get information that already exists. Citizen journalists of the past may not have either known:
By removing these hurdles, Citizen DAN improves the ways information is delivered to communities and provides the framework for sifting through it to extract meaning.
Government public data in electronic tabular form or as published listings or tables in local newspapers has been available for some time. While meeting strict ‘disclosure’ requirements, this information has neither been readily analyzable nor actionable.
The meaning of information lies in its interpretation and analysis.
Citizen DAN is innovative because it:
Structured Dynamics has already developed and released as open-source code structWSF and conStruct , the basic foundations to this proposal. structWSF provides the network and dataset “backbone” to this proposal; conStruct provides the Drupal portal and Web site framework.
To this foundation we add proven experience and knowledge of datasets and how to access them, as well as tools and converters for how to stage them for standard public use. A key expertise of Structured Dynamics is the conversion of virtually any legacy data format into interoperable canonical forms.
These are important challenges, which require experience in the semantics of data and mapping from varied forms into useful and common frameworks. Structured Dynamics has codified its expertise in these areas into the software underlying Citizen DAN.
Structured Dynamics’ principals are also multi-lingual, with language-neutral architectures and code. The company’s principals are also some of the most prominent bloggers and writers in the semantic Web. We are acknowledged as attentive to documentation and communication.
Finally, Structured Dynamics’ principals have more than a decade of track record in successful data access and mining, and software and venture development.
To this strong basis, we have preliminary city commitments for deploying this project in the United States (English and Spanish) and Canada (French and English).
ThisWeKnow offers local Census data, but no community or publishing aspects. Data sharing is in DataSF and DataMine (NYC), but they lack collaboration, community networks and comparisons, or powerful data visualization or mapping.
Citizen DAN is a turnkey platform for any size community to create, publish, search, browse, slice-and-dice, visualize or compare indicators of community well-being. Its use makes the Web more locally focused. With it, researchers, watchdog groups, reporters, local officials and interested citizens can now discover hard data for ‘new news’ or fact-check mainstream media.
There are two releases with feedback. Each task summary, listing of task hours (hr) and duration in months (mo), in rough sequence order with overlaps, is:
See attached task details.
"Information is the currency of democracy." Thomas Jefferson (n.b.)
We intuitively understand that an informed citizenry is a healthy polity. At the global level and in 250 languages, we see how Wikipedia, matched with the Internet and inexpensive laptops, is bringing unforeseen information and enrichment to all. Across the board, we are seeing the democratization of information.
But very little of this revolution has percolated to the local level.
Only in the past decade or so have we seen free, electronic access to national Census data. We still see local data only published in print or not available at all, limiting both awareness but more importantly understanding and analysis. Data locked up in municipal computers or available but not expressed via crowdsourcing is as good as non-existent.
Though many citizens at the local level are not numeric, intuition has to tell us that the absense of empirical, local data hurts our ability to understand, reason and debate our local circumstances. Are we doing better or worse than yesterday? Than in comparison with our peers? Under what measures does this have meaning about community well being?
The purpose of the Citizen DAN project is to create an appliance — in the same sense of refrigerators keeping our food from spoiling — by which any citizen can crack open and expose relevant data at the local level. Citizen DAN is about enrichening our local information and keeping our communities healthy.
We will measure the progress of the project by the number of communities and local organizations that use the Citizen DAN platform to create and publish community data. Subsidiary measures include the number of:
These measures, plus active sites with profiles of each, will be monitored and tracked on the central Citizen DAN portal.
‘Ultimate success’ is related to the general growth in transparent government at the local level. Growth in Citizen DAN-related measures on a year-over-year basis or in relation to Gov2.0 would indicate success.
There is no technical risk to this proposal, but there are risks in scope, awareness and acceptance. Our system has been operational for one year for relevant use cases; all components have been integrated, debugged, and put into production.
Scope risks relate to how much data the Citizen DAN platform is loaded with, and how much functionality is included. We balance the data question by using common public datasets for baseline data, then add features for localities to “crowdsource” their own supplementary data. We balance the functionality question by limiting new development to data visualization/mapping and to upload functions (per above), and then to refine what already exists.
Awareness risks arise from a crowded attention space. We can overcome this in two ways. The first is to satisfy users at our test sites. That will result in good recommendations to help seed a snowball effect. The second way is to use social media and our existing Web outlets aggressively. We have been building awareness for our own properties in steady, inch-by-inch measures. While a notable few Web efforts may go viral, the process is not predictable. Steady, constant focus is our preferred recipe.
Acceptance risk is intimately linked with awareness and use. If we can satisfy each Citizen DAN community, then new datasets, new functionality and new awareness will naturally arise. More users and more contributions through the network effect are the best way to broad acceptance.
Marketing and awareness efforts will include our use of social media, dedicated Web sites, support from test communities, and outreach to relevant community Web sites.
Our own blogs are popular in the semantic Web and structured data space (~3K uniques daily); we have published two posts on Citizen DAN and will continue to do so with more frequency once the effort gets underway.
We will create a central portal (http://citizen-dan.org) based on the project software (akin to our other project sites). The model for this apps and deployments clearinghouse is CrimeReports.com. Using social aspects and crowdsourcing, the site will encourage sharing and best practices amongst the growing number of Citizen DAN communities.
We will blog and post announcements for key releases and milestones on relevant external Web sites including various Gov 2.0 sites, Community Indicators Consortium, GovLoop, Knight News Challenge, the Sunlight Foundation, and so forth. In addition, we will collate and track individual community efforts (maintained on the central Citizen DAN site) and make specific outreach to community data sites (such as DataSF or DataMine at NYC.gov). We will use Twitter (#CitizenDAN, etc) and the social networks of LinkedIn, Facebook, and Meetup to promote Citizen DAN activity.
We will interact with advocates of citizen journalism, and engage civic organizations, media, and government officials (esp in our three test communities) to refine our marketing plan.
Citizen DAN is not an experiment. It is a working framework that gives any locality and its citizenry the means to assemble, share and compare measures of its community well-being with other communities. These indicators, in turn, provide substance and grist for greater advocacy and writing and blogging (“journalism”) at the local level.
Granted, there are unknowns: How many localities will adopt the Citizen DAN appliance? How essential will its data be to local advocacy and news? How active will each Citizen DAN installation be in attracting contributions and local data?
We submit the better way to frame the question is the degree of adoption, as opposed to will it work.
Web-based changes in our society and social interaction are leading to the democratization of information, access to it, and channels for expression. Whether ultimately successful in the specific form proposed herein, Citizen DAN and its open source software and frameworks will surely be adopted in one form or another — to one degree or another — in the unassailable trend toward local government transparency and citizen involvement.
In short, Yes: We believe Citizen DAN will continue long after the grant.
Our plan begins with the nature of Citizen DAN as software and framework. Sustainability is a question of whether the appliance itself is useful, and how users choose to leverage it.
Mediawiki, the software behind Wikipedia, is an analog. Mediawiki is an enabling infrastructure. Some sites using it are not successful; others wildly so. Success has required the combination of a good appliance with topicality and good management. The same is true for Citizen DAN.
Our plan thus begins with Citizen DAN as a useful appliance, as free open source with great documentation and prominent initial use cases. Our plan continues with our commitment to the local citizen marketplace.
We are developing Citizen DAN because of current trends. We foresee many hundreds of communities adopting the system. Most will be able to do so on their own. Some others may require modifications or assistance. Our self-interest is to ensure a high level of adoption.
An era of citizen engagement is unfolding at the local level, fueled by Web technologies and growing comfort with crowdsourcing and social networks. Meanwhile, local government constraints and pressures for transparency are unleashing locked-up data. These forces will create new opportunities for data literacy by the public, that will itself bring new understanding and improvements in governance and budgeting. We plan on Citizen DAN and its offspring to be one of the catalysts for those changes.
Today, Structured Dynamics is pleased to release Open SEAS, its methodology for Semantic Enterprise Adoption and Solutions. At the same time, we are donating the framework to the open source MIKE2.0 Method for an Integrated Knowledge Environment project.
Open SEAS provides a framework for the enterprise to establish a coherent, consistent and interoperable layer across its information assets. It is compliant with the MIKE2.0 Semantic Enterprise Solution Offering.
Open SEAS has been developed for enterprises desiring to initiate or extend their involvement with semantic technologies. It is inherently incremental, low-cost and low-risk.
Concurrent with this release, Structured Dynamics is also donating the methodology and all of its related intellectual assets to the MIKE2.0 project. Under Creative Commons license and MIKE2.0′s content governance policies, the community’s current 2000+ members are now free to expand and use the Open SEAS methodology in any manner they see fit.
Last week, I began to introduce MIKE2.0 and its methodology to the readers of this blog. MIKE2.0 provides a complete delivery environment and methodology for information management projects in the enterprise. Solutions — from the specific to the composite — are described and packaged with respect to plans, management communications, products (open source and proprietary), activities, benchmarks, and deliverables. Delivery is accomplished over multiple increments, split into five phases from definition and planning to deployment. The assets associated with this framework first are based on templates and guidelines that can be applied to any information management area. The framework allows for multiple projects to be combined and inter-related, all under a common methodology. More information and a good entry point is provided on the What is MIKE2.0? page on the project’s main Web site.
MIKE2.0 presently has some 800 resources across about 40 solution areas. With Structured Dynamics’ donation, there are now about 40 resources related to the semantic enterprise, many of them major, accompanied by many images and figures. This contribution makes the Semantic Enterprise Solution Offering instantly one of the more complete within MIKE2.0. As noted below, this contribution is also just a beginning of our commitment.
The Open SEAS framework is Structured Dynamics’ specific implementation framework for MIKE2.0′s Semantic Enterprise Solution Offering. This section overviews some of Open SEAS‘ key facets.
Many enterprise information systems, particularly relational ones, embody a closed world assumption that holds that any statement that is not known to be true is false. This premise works well where there is complete coverage of specific items, such as the enumeration of all customers or all products.
Yet, in most areas of the real (”open”) world there is no guarantee or likelihood of complete coverage. Under an open world assumption the lack of a given assertion or fact does not imply whether that possible assertion is true or false: it simply is not known. An open world assumption is one of the key factors that defines the open Semantic Enterprise Offering and enables it to be deployed incrementally. It is also the basis for enabling linkage to external (often incomplete) datasets.
Fortunately, there is no requirement for enterprises to make some philosophical commitment to either closed- or open-world systems or reasoning. It is perfectly acceptable to combine traditional closed-world relational systems with open-world reasoning. It is also not necessary to make any choices or trade-offs about using public v. private data or combinations thereof. All combinations are acceptable when the basis for integration is an open-world one.
Open SEAS is grounded in this “open” style. It can be employed in virtually any enterprise circumstance and at any scope, and expanded in a similar way as budget and needs allow.
Open SEAS is based on seven pillars, which themselves inform the basis for the MIKE2.0 Guiding Principles for the Open Semantic Enterprise. These principles cover data model, architecture, deployment practices and approach for how an enterprise can begin and then extend its use of semantics for information interoperability.
Important aspects are linked data or Web-oriented architecture, but it is really the unique combination of open-world approach and the RDF data model and its semantic power that provide the distinctive differences for Open SEAS. An exciting prospect — but still in its early stages of discovery and implementation — is the role of adaptive ontologies to power ontology-driven applications. These prospects, if fully realized, could totally remake how knowledge workers interact and specify the applications that manage their information environment.
Open SEAS also fully embraces the Layered Semantic Enterprise Architecture of MIKE2.0′s Semantic Enterprise Offering. This architecture acts as a subsequent set of functions or middleware with respect to the MIKE2.0′s standard SAFE Architecture. Most of the existing SAFE architecture resides in the Existing Assets layer. The specific aspects of Open SEAS resides in the layers above, namely Access/Conversion, Ontologies and the Applications Layers.
Stitching together this interoperability layer above existing information and infrastructure assets requires many diverse tools and products, and there still are gaps. The layer figure below shows the semantic enterprise architecture overlaid with some representative open source projects and tools that plug some of those gaps.
Open SEAS also maintains a comprehensive roster of open source and proprietary tools in all aspects of semantic technology, ranging from data storage and converters, to Web services and middleware, and then to ultimate user applications. A database of nearly 1,000 tools in all areas is maintained for potential applicability to the methodology.
The inherently incremental nature of the Open SEAS framework encourages experimentation, affordable deployments, and experience gathering. Because the systems and deployments put into place with this framework are based on the open world approach and use the extensible RDF data model, expansions in scope, sophistication or domain can be incorporated at any time without adverse effects on existing assets or systems or prior Open SEAS deployments.
Quick and (virtually) risk-free increments means that adopting semantic approaches in the enterprise can be accelerated (or not) based on empirical benefits and available budgets.
The Open SEAS framework is built on a solid foundation, but it also one that is incomplete. Deployments of semantic technologies and approaches are still quite early in the enterprise, whether measured in numbers, scope or depth. In order for the framework — and the practice of semantic adoption in general — to continue to expand and be relevant in the enterprise, active learning and documentation is essential. One of the reasons for the affiliation of Open SEAS with MIKE2.0 is to leverage these strong roots in methodological learning.
The nature of Open SEAS and its parent Semantic Enterprise Solution Offering touches most offerings within the MIKE2.0 framework. There is much to be done to integrate the semantic enterprise perspective into these other possibilities, plus much that needs to be learned and documented for the offering itself. The concept of the semantic enterprise, after all, is relatively new with few prominent case studies.
As the offering points out, there are some dozens of addition necessary resources that are available and ready to be packaged and moved into the MIKE2.0 framework. These efforts are a priority, and will continue over the coming weeks.
But, more importantly, beyond that, the experience and practitioner base needs to grow. Much is unknown regarding key aspects of the offering:
Despite these questions, emergence is the way complex systems arise out of a multiple of relatively simple interactions, exhibiting new and unforeseen properties in the process. RDF is an emergent model. It begins as simple “fact” statements of triples, that may then be combined and expanded into ever-more complex structures and stories. As an internal, canonical data model, RDF has advantages for information federation and development over any other approach. It can represent, describe, combine, extend and adapt data and their organizational schema flexibly and at will. Applications built upon RDF can explore and analyze in ways not easily available with other models.
Combined with an open-world approach, new information can be brought in and incorporated to the framework step-by-step. Perhaps the greatest promise in an ongoing transition to become a semantic enterprise is how an inherently incremental and building-block approach might alter prior practices and risks across the entire information management spectrum.
We invite you to join us and to contribute to this effort. I encourage you to join MIKE2.0 if you have not already done so, and check out announcements on this blog for ongoing developments.