At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing [1].
All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.
Today, Structured Dynamics is pleased to make its Citizen Dan application available for public viewing, play and downloading for the first time.
Citizen Dan is a free, open source system available to any community and its citizens to measure and track indicators of local well being. It can be branded and themed for local needs. It is under active development by Structured Dynamics with support from a number of innovative cities.
Citizen Dan is an exemplar instance of Structured Dynamics’ open semantic framework (OSF), a generalized framework for deploying semantic platforms for any domain. By changing its guiding ontologies and source content and data, what appears for Citizen Dan can be adopted for virtually any subject area.
As configured, the Citizen Dan OSF instance is a:
Citizen Dan’s information sources may include Census data, the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics.
Text and narratives and the concepts and entities they describe are integrally linked into the system via information extraction and tagging. All ingested information, whether structured or text sources, with their semantics, can be exported in multiple formats. A standard organizing schema, also open source and extensible or modifiable by all users, is provided via the optional MUNI ontology (with vocabulary details in development here), being developed expressly for Citizen Dan and its community indicator system purposes.
All of the community information contained within a Citizen Dan instance is available as linked data.
Here are the main components or widgets to this Citizen Dan demo:
the exporter component appears in multiple locations across the appliance, either as a tab option (e.g., Filter component) or as a dropdown list to the lower right of many screens. A variety (and growing!) number of export formats are available. When it appears as a dropdown list, the export is limited to the currently active slice. When invoked via tab, more export selection options are available. See further the technical documentation for this componentA number of other tools are available to admins in the actual appliance, but are not exposed in the demo:
In addition, it is not possible in the demo to save persistent dashboard views or submit stories or documents for tagging, nor to register as a user or view the admin portions of the Drupal instance.
The sample data and content in the demo is for the Iowa City (IA) metropolitan statistical area. This area embraces two counties (Johnson and Washington) and the census tracts and townships that comprise them, and about two dozen cities. Two of the notable cities are Iowa City itself, home of the University of Iowa, and Coralville, where Structured Dynamics, the developer of Citizen Dan and the open semantic framework (OSF), is headquartered.
The text content on this site is drawn from Wikipedia articles dealing with this area. About 30 stories are included.
The data content on the site is drawn from US Census Bureau data. Shape files for the various geographic areas were obtained from here, and the actual datasets by geographic area can be obtained from here.
Citizen Dan is an exemplar instance of Structured Dynamics’ open semantic framework (OSF), a generalized framework for deploying semantic platforms for specific domains.
OSF is a combination of a layered architecture and modular software. Most of the individual open source software products developed by Structured Dynamics and available on the OpenStructs site are components within the open semantic framework. These include:
The software that makes up the Citizen Dan appliance is one of the four legs that provide a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.
For Citizen Dan, the complements to this software are:
In its entirety, the total open solution amounts to a form of capacity building for the enterprise.
Inherent in the design and architecture of Citizen Dan is the potential for each instance (single installation) to act as a node in a distributed network of nodes across the Web. Via the structWSF Web service endpoints and appropriate dataset permissions, it is possible for any city in the Citizen Dan network to share (or not) any or all of its data with other cities.
This collaboration aspect has been “baked into the cake” from Day One. The system also supports differential access, rights and roles by dataset and Web service. Thus, city staffs across multiple communities could share data differently than what is provided to the general public.
Since all data management aspects of each Citizen Dan instance is also oriented around datasets, expansion to a network mode is quite straightforward.
The Citizen Dan appliance is based on the Drupal content management system, which means any community can easily theme or add to the functionality of the system with any of the available 6500 open source modules that extend the basic Drupal functionality.
All other components, including the multiple third-party ones, are also open source.
To install Citizen Dan for your own use, you need to:
(Note: there will also be some more updates in August, including the MUNI release.)
For questions and additional info, please consult the TechWiki or the OpenStructs community site.
Finally, please contact us if you’d like to learn more about the project, investigate funding or sponsorship opportunities, or contribute to development. We’d welcome your involvement!
A few weeks back I completed a three-part introductory series to what Structured Dynamics calls a ‘total open solution‘. A total open solution as we defined it is comprised of software, structure, methods and documentation. When provided in toto, these components provide all of the necessary parts for an organization to adopt new open source solutions on its own (or with the choice of its own consultants and contractors). A total open solution fulfills SD’s mantra that, “We’re successful when we’re not needed.”
Two of the four legs to this total open solution are provided by documentation and methods. These two parts can be seen as a knowledge base that instructs users on how to select, install, maintain and manage the solution at hand.
Today, SD is releasing publicly for the first time two complementary knowledge bases for these purposes: TechWiki, which is the technical and software documentation complement, in this case based around SD’s Open Semantic Framework and its associated open source software projects; and DocWiki, the process methodology and project management complement that extends this basis, in this case based around the Citizen Dan local community open data appliance.
All of the software supporting these initiatives is open source. And, all of the content in the knowledge bases is freely available under a Creative Commons 3.0 license with attribution.
In setting out the design of these knowledge bases, our mindset was to enable single-point authoring of document content, while promoting easy collaboration and rollback of versions. Thus, the design objectives became:
Assuming these objectives could be met, we then had three other objectives on our wish list:
Our initial investigations looked at conventional content and document management systems, matched with version control systems or SVNs. Somewhat surprisingly, though, we found the Mediawiki platform to fulfill all of our objectives. Mediawiki, as detailed below, has evolved to become a very mature and capable documentation platform.
While most of us know Mediawiki as a kind of organic authoring and content platform — as it is used on Wikipedia and many other leading wikis — we also found it perfect for our specific knowledge base purposes. To our knowledge, no one has yet set up and deployed Mediawiki in the specific pre-packaged knowledge base manner as described herein.
TechWiki is a Mediawiki instance designed to support the collaborative creation of technical knowledge bases. The TechWiki design is specifically geared to produce high-quality, comprehensive technical documentation associated with the OpenStructs open source software. This knowledge base is meant to be the go-to source for any and all documentation for the codes, and includes information regarding:
As of today, TechWiki contains 187 articles under 56 categories, with a further 293 images. The knowledge base is growing daily.
DocWiki is a sibling Mediawiki instance that contains all TechWiki material, but has a broader purpose. Its role is to be a complete knowledge base for a given installation of an Open Semantic Framework (in the current case, Citizen Dan). As such, it needs to include much of the technical information in the TechWiki, but also extends that in the following areas:
The methodology portions of the DocWiki are drawn from the broader MIKE2.0 (Method for Integrated Knowledge Environments) approach. I have previously written about this open source methodology championed by Bearing Point and Deloitte.
As of today, DocWiki contains 357 articles and 394 structured tasks in 70 activity areas under 77 categories. Another 115 images support this content. This knowledge base, too, is growing daily.
Both of these knowledge bases are open source and may be exported and installed locally. Then, users may revise and modify and extend that pre-packaged information in any way they see fit.
The basic design of these systems is geared to collaboration and embeds what we think are really responsive work flows. These extend from supporting initial idea noodling to full-blown public documentation. The inherent design of the system also supports single-source publishing and book or PDF creation from the material that is there. Here is the basic overview of the design:
(click for full size)
Mediawiki provides the standard authoring and collaboration environment. There are a choice of editing methods. As content is created, it is organized in a standard way and stored in the knowledge base. The Mediawiki API supports the export of information in either XHTML or XML, which in turn allows the information to be used in external apps (including other Mediawiki instances) or for various single-source publication purposes. The Collection extension is one means by which PDFs or even entire books (that is, multi-page documents with potentially chapters, etc.) may be created. Use of a well-designed CSS ensures that outputs can be readily styled and themed for different purposes or audiences.
As wikis designed from the get-go to be reusable, and then downloaded and installed locally, it is important that we maintain quality and consistency across content. (After download, users are free to do with it as they wish, but it is important the initial database be clean and coherent.) The overall interaction with the content thus occurs via one of three levels: 1) simple reading, which is publicly available without limitation to any visitor, including source inspection and export; 2) editing and authoring, which is limited to approved contributors; and 3) draft authoring and noodling, which is limited to the group in #2 but for which the in-progress content is not publicly viewable. Built-in access rights in the system enable these distinctions.
Besides meeting all of the objectives noted at the opening of this post, these wikis (knowledge bases) also have these specific features:
Many of these features come from the standard extensions in the TechWiki/DocWiki packages.
The net benefits from this design are easily shared and modified knowledge bases that users and organizations may either contribute to for the broader benefit of the OpenStructs community, or download and install with simple modifications for local use and extension. There is actually no new software in this approach, just proper attention to packaging, design, standardization and workflow.
Via the sharing of extensions, categories and CSS, it is quite easy to have multiple instances or authoring environments in this design. For Structured Dynamics, that begins with our own internal wiki. Many notes are taken and collected there, some of a proprietary nature and the majority not intended or suitable for seeing public release.
Content that has developed to the point of release, however, can be simply tagged using conventions in the workflow. Then, with a single Export command, the relevant content is then sent to an XML file. (This document can itself be edited, such as for example changing all ‘TechWiki’ references to something like ‘My Content Site’; see further here.)
Depending on the nature of the content, this exported content may then be imported with a single Import command to either the TechWiki or DocWiki sites. (Note: Import does require admin rights.) A simple migration may also occur from the TechWiki to the DocWiki. Also, of course, initial authoring may begin at any of the sites, with collaborators an explicit feature of the TechWiki or DocWiki versions.
Any DocWiki can also be specifically configured for different domains and instance types. In terms of our current example, we are using Citizen Dan, but that could be any such Open Semantic Framework instance type:
(click for full size)
Under this design, then, the workflow suggests that technical content authoring and revision take place within the TechWiki, process and methodology revision in the DocWiki. Moreover, most DocWikis are likely to be installed locally, such that once installed, their own content would likely morph into local methods and steps.
So long as page titles are kept the same, newer information can be updated on any target wiki at any time. Prior versions are kept in the version history and can be reinstated. Alternatively, if local content is clearly diverging yet updates of initial source material is still desired, the local content need only be saved under a new title to preserve it from import overwrites.
We are really excited by this design and have already seen benefits in our own internal work and documentation. We see, for example, easier management of documentation and content, permanent (canonical) URLs for specific content items, and greater consistency and common language across all projects and documentation. Also, when all documentation is consolidated into one point with a coherent organizational and category structure, documentation gaps and inconsistencies also become apparent and can readily be fixed.
Now, with the release of these systems to the OpenStructs (Open Semantic Framework) and Citizen Dan communities, we hope to see broader contributions and expansion of the content. We encourage you to check on these two sites periodically to see how the content volume continues to grow! And, we welcome all project contributors to join in and help expand these knowledge bases!
We think this general design and approach — especially in relation to a total open solution mindset — has much to recommend it for other open source projects. We think these systems, now that we have designed and worked out the workflows, are amazingly simple to set up and maintain. We welcome other projects to adopt this approach for their own. Let us know if we can be of assistance, and we welcome ideas for improvement!
Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents). A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.
Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.
We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design [1]. However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:
(click for full size)
The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:
Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.
Current semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.
Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.
These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:
(click for full size)
New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
As the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.
Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.
We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.
There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.
Actual code SVN repositories are unchanged. These code repositories may be found at:
We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!
Structured Dynamics has been in a fervent — and, we believe, fruitful — design phase for the past 18 months. All of the working parts related to how to embrace becoming a semantic enterprise have now been defined and designed. Actual tools and components accompany many of these parts and have been deployed.
Recently, I have been speaking and blogging much about rationale, process, mindset and approach for how to bring semantics into the organization. But, prior to now, we have not spoken much about the overall design behind our approach. Today, as we complete our design phase and introduce our first exemplar instance of it — Citizen Dan [1] — we are finally in a position to describe this overall approach.
We term our approach the open semantic framework, also OSF. The open semantic framework is a combination of a layered architecture and modular software. The open semantic framework represents the software component of the four-component total open solution, recently described in a three part series. I return to this topic in the conclusion of this post.
Over the past nine months, I have been focusing my writing largely on the semantic enterprise, with more specificity regarding our Open SEAS (Semantic Enterprise Adoption and Solutions) initiative. In bits and pieces, these writings have tended to reflect a number of objectives:
To date, the result of these design objectives is perhaps best captured in my Seven Pillars of the Open Semantic Enterprise posting, as well as our general discussions regarding adaptive ontologies. Yet, still, these writings have been somewhat piecemeal. What this document attempts to do is to place all of these perspectives into a single, coherent whole.
Structured Dynamics has been a strong advocate for layered architectures, with clear APIs between layers as appropriate. But these layers are not “laminates” that completely cover the layer below, nor are they all needed or necessary. Depending on the circumstance, some layers are unneeded or superfluous. Layers may be added or not incrementally.
In this manner, then, the open semantic framework is perhaps more akin to a pearl, than to a laminate or cocoon. Each subsequent layer does not “embed” the layer prior to it, and some layers actually may inter-operate with multiple layers below or above it (this is notably true for the “ontologies” layer, which has interactions up and down the stack).
Nonetheless, we can envision this pearl of the open semantic framework and its layers as follows:
(click for full size)
Others have termed this the “semantic muffin” or even “semantic muppet” or “semantic blob”. Whatever (hehe). The real idea is that layers may accrete (as in the growth of a pearl) and occur over time and be uneven. Each layer, though, does have a role to play (though it may not be needed in a given deployment), and does act to augment existing information assets in the transition to a semantic framework. Beginning at the core, each of these layers — with external references as appropriate for more details — is described below.
The open semantic framework is premised on leveraging existing information assets. Sure, once the framework is in place, new information can be brought into it in a more direct, semantic manner. But, the real thrust and benefit of this framework is to provide an incremental pathway for finally inter-operating and federating prior decades of data, structure and information assets.
These information assets may reside inside or outside the enterprise. They may (and DO!) exist in many formats and are described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is NO information asset that is not amenable to be included in this framework.
The information transformation layer provides either: 1) extraction of concepts and entities as structured metadata from source text or documents; or 2) conversion of existing data assets to interoperable form. As implemented by Structured Dynamics, the extractions are conducted by either scones (Subject Concept or Named EntitieS) or third-party utilities, and the conversions occur via irON (instance record Object Notation) or third-party “RDFizers“.
Depending on the source, the net result of the transformation is to produce interoperable data and information that can be ingested and used by other layers in the framework.
Though not strictly analogous, this layer bears some resemblance to the ETL (extract, transfer, load) utilities used in many enterprise information integration applications. Unlike those conventional systems, this information transformation layer also may capture and represent some of the source schema.
In all cases, however, these transformations are relatively simple and get parsed against the available structure (the ontologies, schema and entity reference lists) in the system to generate the semantic metadata (tags).
At this point, the extracted structure is generally at the level of instance records, or the ABox, with simple assertions of attribute-value pairs for specific records [2]. Little schema transformation or mapping occurs at this layer (if such is needed, that occurs at the structWSF layer; see next). Actual federation or interoperation occurs at later layers based on the TBox structures [2].
This modular portion of the framework is explicitly designed with APIs to allow third-party tools to be plugged in and substituted.
The major workhorse of the open semantic framework is the structWSF (Web services framework) layer. structWSF is the most complicated of the OSF layers and has many supporting software packages and capabilities. The structWSF layer provides the standard, common interface (”canonical”) layer by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack.
structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies; see below).
The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML [3], is used for internal communications across all structWSF Web services and with other layers.
structWSF has a central service that governs access rights and permissions. These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Datasets within a given structWSF instance may be accessed directly via API or via SPARQL queries to the instance’s endpoint. Depending on rights and query, results sets may be returned from a given structWSF instance in an infinite variety of ways.
This latter capability is the essential interface for subsequent layers in the open semantic framework stack. Depending on those subsequent components, pre-staged data and results sets may be returned for an essentially limitless variety of purposes.
Each structWSF instance also has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables structWSF instances to participate or not in potentially global or restricted local networks and collaboration environments. This is currently the largest untapped potential of structWSF with respect to its existing deployments.
The newest layer in the stack is the semantic components layer. This layer takes results sets — most often generated by a specific query or data slice request — from one or more structWSF instances and then presents that information via a variety of data visualization or data presentation widgets (what we specifically call ‘semantic components‘ due to their design [4]). The operation and sensitivity of these display components are themselves driven by a presentation and data analysis (including statistics) ontology.
Current display widgets include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable without modification to any domain.
As presently implemented by Structured Dynamics, this layer consists either of Flex data visualization components or structured data display templates based on Smarty. The inherent design allows for updates to other bases (such as HTML5). The layer may also be swapped out or substituted with third-party capabilities.
The strength and power of this system is governed by its own ontology, the Semantic Component Ontology (SCO) (see next).
This is an extremely flexible layer in the open semantic framework stack. Expect an ongoing series of explanatory blog posts and online resources in the upcoming weeks to explain this innovative capability.
The ontologies layer actually refers to all structured assets driving the system. As such, this layer might be considered the “brain” (though rather simply specified!) of the open semantic framework.
At a true schema or TBox level [2], the ontologies layer represents the concept and relationships of the domain at hand. This layer also hosts the specific local entities and prominent things (people, places, events, etc.) useful for extracting local and domain-specific relevance. However, those views are also supplemented with some administrative ontologies (two examples are SCO and irON) that guide how the user interfaces or widgets in the system should behave.
The concept level represents the “world view” of the specific instantiation of the open semantic framework at hand. This conceptual (TBox) view provides the structural organization of information, inferencing capabilities, and navigation, faceting and explorer structure. The entity (ABox) view provides tagging for prominent individuals and instances important to the domain at hand, and guides the structure behind data visualizations of attribute or indicator data.
The administrative level uses simple roles and relationships for attributes and indicators to inform the framework as to how and with what widget to display information. For example, a “type” of information that is geographically related can be instructed to use the map component as an option for display. Whether some information is used for totals, comparison purposes, or other specifications useful to data visualization and graphing may also be specified.
The language and relationships (predicates or properties) of these administrative ontologies are simple and straightforward. It is, for example, relatively easy to define data display functions at the broad dataset and attributes level. Simple determinations drive how results sets and their associated results types may be displayed, no matter what datasets or slices may be generated as a result of the queries or requests fed to the system.
The structure in these layers can be replaced by other structures for other instantiations and circumstances. Indeed, all other layers in the open semantic framework can remain relatively fixed while tailoring the instance to new domains solely via this layer. The ontologies layer is what gives any given instantiation of OSF — such as Citizen Dan — its unique focus and scope.
The thinnest layer (that is, least substantial with respect to this framework) is the content management system (CMS) layer. In its current form, the open semantic framework uses the Drupal CMS via our conStruct plug-in modules. The design of the framework, however, has explicitly accommodated the possibility that other CMSs may substitute for this role.
The CMS layer is optional if structWSF endpoints are sufficient or if simple Web pages hosting semantic components are deemed as adequate. Very small organizations or deployments may reasonably choose to have no CMS layer at all.
However, for most sites or portals with more than a few active users, it is desirable to have broad flexibility in theming (”skinning”), user rights and permissions, or other functionality. These are the roles of the CMS layer. Drupal, for example, is presently supported by more than 4500 third-party modules in every conceivable function, from polling to blogs and rating systems and bulletin boards.
For such generalized portals or collaboration environments, it makes sense to adopt and install a flexible CMS system, such as Drupal. Much of the user experience and functional environment can be provided through such means.
The open semantic framework is thus designed to reside easily in a CMS while also providing the hooks to take advantage of the generalized user rights and functionality of the CMS. In this manner, the open semantic framework is able to stay focused on its structured data and interoperability purposes, while still gaining the advantages of rich-featured content management systems.
With its inherent open-world orientation [5] and distributed and collaborative potential, the open semantic framework was designed from the outset to be Web-capable and Web-oriented:
(click for full size)
A Web-oriented architecture (WOA) has a number of understood requirements, to which the open semantic framework adheres. Specifically, these design considerations support the framework as being part of WOA:
Citizen Dan is our first exemplar instance of this open semantic framework. The details page for the project goes into some of Citizen Dan’s functionality and capabilities.
Citizen Dan is specifically geared to local governments and localities, with an emphasis on community indicator systems (CIS). CIS have become a popular way of measuring and tracking measures of local economic and social well-being; they are closely related to sustainability and how to measure it as used in many economic and environmental domains.
However, in the context of this post, what is really interesting about Citizen Dan is that its semantic framework is a completely open and generic one. The same set of tools and capabilities described on its details page can be applied to any domain that needs to manage and understand information in its own domain. This includes from unstructured text or documents to conventional structured databases.
What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists; see above) that are fed to this open semantic framework. By swapping out new structures, what can be called Citizen Dan in one instance can morph to become Curriculum Carla in say, the education instance or Doctor Doolittle in the veterinary science instance [6].
We can illustrate these multiple instances as follows:
(click for full size)
What this figure illustrates is that even a branded expression of the framework — such as Citizen Dan — is merely an instance of that framework. And, actually, when expressed in such a packaged manner, we can more accurately call the standard and bundled suite of generic functions and accompanying structure of Citizen Dan as an instantiation of the open semantic framework:
in·stan·ti·a·tion \in-‘stan(t)-shē-ā-shən\ (noun) [7]
By replacing the structure bases, and by tailoring the function suite appropriate to a given market and use, we can create many instantiations of the open semantic framework for different domains and markets. In this manner, Citizen Dan can be seen as an early exemplar of the framework, but not as a definer and limiter to it.
So far, this discussion has focused solely on considerations of software and architecture. While we see the power of the open semantic framework, highly useful in itself, this is inadequate alone to achieve acceptance and success in the enterprise (as we noted in our most recent posts). The very forces that are compelling enterprises to look at new options, are also the same ones that pose difficult hurdle rates for acceptance of open source.
To address this issue, we have developed a four-legged foundation to what we termed the total open solution. The solution involves software, structure, documentation and methods (or best practices). Each of these connect and relate to the other foundations.
The open semantic framework is clearly the software (and architecture) leg to this foundation. Again, however, what is interesting is that the mere swapping out of the structure can also make the system relatively ready for other domains.
We see these relationships in the following diagram, that also shows that the DocWiki portions of the solution embody the documentation (aside from code-level comments) and methods legs of the foundation:
(click for full size)
Differences between domains may also lead to differences as to which components are included or not in that domain’s desired instantiation.
The hugely important implied point, however, from the diagram above, is to show how nearly universal the content and methods in the DocWiki may be to other domains. Because the deltas between domains largely result from structure and what specific functional components are included or not, it becomes clear that most documentation and practices shared with the DocWiki will be applicable across domains. Sure, the use cases and some of the specific terminology may change, but we can also now see a high degree of re-usability of documentation and knowledge base across markets. This realization makes the usefulness and leverage of the DocWiki even higher.
Developing “common language” by which to describe and convey things — especially new things like semantics that also have strong technical aspects — is tough, very tough. We are only now beginning on this process; we look to many in the community and elsewhere to help define informative and evocative terminology.
Per the original design objectives above, Structured Dynamics has approached the challenge of the semantic enterprise in what we think is both a pragmatic and a new way. The insistence on preserving and respecting existing information assets, matched with the opportunities and different mindsets arising from an open-world approach [5], have necessitated thinking through new designs and developing new concepts. Any time such new thinking and concepts occurs, new language and new metaphors must accompany it.
While certainly there are components and various software packages that populate and comprise an open semantic framework, the framework is also just as importantly a world view or way to think about information, information development, and its architecture. For example, a pivotal concept is that an open semantic framework is built around generic tools responsive to the information structures fed to them. This realization shifts the locus of emphasis from software development per se to creating, managing and adapting data and information structures. While this democratizes the information development process and is more inclusive of all knowledge workers, it also imposes needs for new toolsets and business processes. We are only at the nascent stages of understanding and learning about these differences.
Similarly, a development approach that is inherently incremental and leverages (rather than replaces or displaces) existing information assets means IT projects need to be considered in a new light. Small projects with more emphasis on tangible and demonstrable benefits will alter budgets, lower risks, and place a need for quicker turnaround. Like the architecture of the open semantic framework itself, projects based on OSF are also more distributed, decentralized and modular.
With such decentralization also comes the need for mechanisms and systems to overcome vendor “lock-in” and proprietary systems. A key thrust in support of what we have called the total open solution and its mixture of documentation and methods to accompany software and structure is specifically targeted at this issue. Tools and means for collaboration and concurrent contributions are another possible answer. Prior software practices in agile development and version control will see extensions to all manner of information development across the enterprise.
We are proud of our design work and proof-testing with clients over the past 18 months. We believe the open semantic framework and its implications to be a fundamental shift in how organizations need to think about their information development, existing information assets, and IT budgets and processes. We know widescale adoption is not yet at hand — enterprises are justifiably conservative when it comes to new thinking. But, given global competition and tight pocketbooks, the open semantic framework is a formulation to which enterprises and governments should pay very close attention.
In the first part to this series, we began with the argument that open source software alone was not sufficient to meet the required acceptance factors in the enterprise. As a guiding way to create the right mindset around these issues we shared the saying that we have adopted at Structured Dynamics that, “We’re successful when we are not needed.”
In the second part of this series we described the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we termed this a total open solution.
Now, in this third and concluding part to our series, we introduce the open source documentation and methodology system called ‘DocWiki’. It complements the base open source software, in the process completing the conditions for a total open solution.
Though we call this system ‘DocWiki’, it is not meant to be a brand or particular product description for what Structured Dynamics is offering. Rather, ‘DocWiki’ is merely a placeholder name for a generic, open source system and knowledge base that can be downloaded, installed, branded, modified and extended in whatever way the user sees fit. ‘DocWiki’ is a baseline documentation and methodology “starter kit” that can be dressed up in new clothes or packaged and named in whatever manner best suited to a given deployment.
In describing the major components of this ‘DocWiki’ system we will again use our Citizen Dan initiative [1] as we did in Part 2. This gives us a real use case, though the same approach is applicable to any open source information management initiative by enterprises.
We call the specific version of the ‘DocWiki’ used in the case of Citizen Dan the ‘CIS DocWiki‘ (for community indicator systems), specific to the domain and local government focus of Citizen Dan. Similarly, the structured vocabulary and ontology that guides the system is the MUNI ontology. For other information development initiatives, the specific content of these components would be swapped out for ones appropriate to that initiative.
A number of desires and objectives intersected to guide the design of the ‘DocWiki’ system. We wanted:
In first formulating this design, our assumption was the major building blocks would be an open source document management system linked with some form of version control. Though we think such a formulation could work OK, our exposure to the MIKE2.0 methodology actually caused us to re-look at and re-think a wiki-based approach. Ultimately the trump card that decided the design for us was familiarity and ease-of-use.
The resulting architecture of the full ‘DocWiki’ system is shown below:
(click for full size)
What is cool about this design is that a single software download install with a few extensions (Mediawiki, the Wikipedia software, plus some standard extensions and judicious use of Semantic Mediawiki) and a single loadable database are all that is required to transfer and install the ‘DocWiki’ system.
To better describe this system, we will focus on three major interconnecting pieces in this architectural diagram: the knowledge base; the vocabulary and structure (ontology); and the authoring and publishing system (wiki).
The pre-loaded content for the ‘DocWiki’ system comes from its knowledge base. This is provided as a text-exported MySQL database that can be modified en masse before loading (such as substituting ‘YourName’ for ‘DocWiki’). The exemplar upon which this knowledge base is modeled is the MIKE2.0 framework.
MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 has a generalized methodology and set of templates applicable to initiatives, the phases, activities and tasks to undertake them, and supporting assets. Supporting assets can range from glossaries and definition of terms and concepts to very specific technical documents or background material. The entire system is logical and applies a consistent design and organizational structure and categories.
For our purposes, we wanted a complete, turnkey content knowledge base. This meant that we needed to accommodate all forms of project management and guidance, ranging from specific “how-to” and technical discussions to the entire suite of background and supporting material. The scope of this knowledge content is defined as what a new person assigned a lead or implementation responsibility would need to read or master.
As a destination site MIKE2.0 is quite broad: it embraces the ability to model virtually any information management initiative. This makes MIKE2.0 an invaluable source of structure and methodology guidance, but also results in it being quite limited in the specific how-tos associated with any given initiative. I have earlier spoken about the structure of MIKE2.0 and in particular its applicability to the semantic enterprise.
The strength of MIKE2.0, however, is that its structure can be grabbed and quickly applied to form an organizational and structural basis for filling out the knowledge base for any specific information development initiative. And, that is exactly what we did with the ‘CIS DocWiki.’
MIKE2.0 hosts and maintains its project-related structure in Mediawiki (with some extensions). Combined with its templates, this provides a rapid-start baseline for beginning to tailor and flesh out the specific details for a given information management initiative. Thus, after copying broad aspects of the MIKE2.0 system into the incipient ‘DocWiki’, it was relatively straightforward to let the existing structure and templates of MIKE2.0 guide next steps.
As of today’s date, the ‘CIS DocWiki’ contains about 300 substantive articles, a complete activity and tasking structure, and various re-usable templates based on Semantic Mediawiki for structured and consistent access and retrieval. New tasks and structure can be readily added to the system. Existing structure or content can be deleted or marked as archive for non-display. We are still gathering all requisite content pieces, and anticipate by first public release that the baseline knowledge base will include 2x to 3x the scale of its current content.
For new ‘CIS DocWiki’ (or Citizen Dan-based) deployments, this means the knowledge base can be completely modified and extended for local circumstances. The set-up of the Mediawiki instance is separate from the loading or modification of the knowledge base, which means the look-and-feel of the entire system, not to mention user rights and permissions, can also be readily tailored for local requirements.
The core content of the ‘CIS DocWiki’ and its basis in a set structure and methodology (derived from MIKE2.0) means that the knowledge base is also adaptable for other broader information development areas, especially in the semantic enterprise or semantic government arenas. Thus, while Structured Dynamics is first releasing the ‘CIS DocWiki’ in the context of Citizen Dan and semantic government, we also are developing a parallel instance for the Open SEAS approach to the semantic enterprise.
The approach taken here is somewhat different than the standard wiki use. As experts, we are basically sole authoring (with contributions from selected collaborators and our clients) the starting basis of the knowledge base. Unlike many wikis, this enables us to be quite consistent in content, style, and organization. Such an approach allows us to present a coherent and complete starting content and methodology foundation. However, once delivered and installed for a given deployment, its users are then free to extend and change this knowledge foundation in the standard wiki manner. Whether those subsequent extensions are free-form or more tightly controlled and managed is the choice of the new deployment’s administrators.
Strictly speaking, the vocabularies and structures (including, of course, ontologies) that drive our semantic government or semantic enterprise offerings are also part of the knowledge base. And, in fact, many of these aspects, especially related to the actual operating of the instances, are included as part of the standard knowledge base.
However, the applicable domain ontology itself is separately maintained. Descriptions of how to use and modify such ontologies are part of the general ‘DocWiki’ knowledge base, but the ontology is not. This arm’s length-separation is done to acknowledge that the ontology has independent use and value apart from the knowledge base or the software (Citizen Dan, in this case) that is the focus of it.
In the Citizen Dan instance, this structure is the MUNI ontology. MUNI is a general local government domain ontology that can find use in a broad array of circumstances, using or not Citizen Dan. Thus, like other ontologies developed and maintained by Structured Dynamics, such as BIBO (the Bibliographic Ontology), the ontology itself and its documentation, discussion forums and use cases are maintained separately.
The first release of MUNI is still under development and will be released this summer.
The software framework that hosts and manages all of this content is the Mediawiki software, originally developed for Wikipedia. This framework is supported by a number of standard extensions packaged with the ‘DocWiki’ distribution. One of the more notable extensions is Semantic Mediawiki. Mediawiki also is the wiki framework underlying MIKE2.0, so content sharing between the systems is straightforward.
The first use of the ‘DocWiki’ is to add new content to the knowledge base and to modify or extend what is provided in the baseline. For straight authoring, ‘DocWiki’ offers the standard wikitext basis for content entry and editing, as well as the WikED enhanced editor and the FCKEditor WYSIWYG rich-text editor. Each of these may be turned on or off at will.
All of the baseline content is fully organized and categorized via a standard structure. Pre-existing templates aid in entering new content in specific areas consistently or in providing standard administrative ways of tagging content for completeness or need for editorial attention. Tasks and concepts, in particular, follow set ways of entry and description. These set templates, some forms-based and some derived from Semantic Mediawiki, are also tied into automatic internal scripts for listing and organizing various items. So long as new material is entered properly, it will be reflected in various stats and listings. Unlike sole reliance on Semantic Mediawiki, the ‘DocWiki’ approach is a mix of standard wiki categories and semantic types. Both are used for effective organization of the knowledge base.
Besides the knowledge base of domain content and “how-to”, the system also comes pre-packaged with many wiki “how-to” and best practices guidance for using the system effectively and consistently. Of course, a given deployment may or may not enforce all of these practices. A poorly administered instance, for example, could degenerate fairly quickly and lose the native structure and organization of the baseline system.
As with standard wikis, there is a history of prior page revisions that gives the system rollback and version control. Mediawiki has a pretty good user access and permissions framework ranging from access, reading, editing and to uploads.
Besides the standard and required extensions, ‘DocWiki’ also comes packaged with the necessary settings and configuration files to operate “out-of-the-box” in its designed baseline mode. Of course, these settings, too, can be changed and modified by site administrators, and ‘DocWiki’ also includes guidance on how to do that.
A little known but highly useful part of the Mediawiki API allows direct export of XHTML content [2]. Then, with minor XSLT conversion templates, it is possible to strip out wiki-specific conventions (such as the editing of individual sections) or to create straight XML versions. When this is combined with the use of internal ‘DocWiki’ CSS style sheets that impose some clean and semantic style identifiers, a common canonical output basis for content is possible.
From that point, a given deployment may use its own CSS styles to theme output content. Output Web pages (XHTML) or XML files then can be processed using existing and accurate utilities to produce PDF or *.doc documents. Then, with systems such as OpenOffice, an even wider variety of document formats can be produced. These facilities mean that the ‘DocWiki’ can also act as a single-source publishing environment.
In its initial release, re-purposing ‘DocWiki’ content into other presentations (for example, combining sections from multiple pages into a new document as opposed to re-using existing pages as is) will require creating new wiki pages and then cutting-and-pasting the desired content. However, it should also be noted that both DocBook and DITA have been applied to Mediawiki installations [3]. It should be possible to enable a more flexible re-purposing framework for ‘DocWiki’ moving into the future.
The ‘CIS DocWiki’ is meant to accompany the first release of Citizen Dan, likely by the end of summer. The MUNI ontology will also be released roughly at the same time. At release, the ‘CIS DocWiki’ is anticipated to have on the order of 500-800 baseline content and “how to” articles.
Depending on time availability and other commitments, Structured Dynamics will also be using this information to build a semantic government composite offering to MIKE2.0. We will be contributing this new offering for free, similar to what we have done earlier for a semantic enterprise offering.
Subsequent to those events, we will then be modifying the ‘CIS DocWiki’ for the semantic enterprise domain. Much of the necessary content will have already been assembled for the ‘CIS DocWiki’.
Paradoxically, while developing such knowledge bases and systems such as ‘DocWiki’ appears to be extra work, from our standpoint as developers it is useful and efficient. Structured Dynamics already researches and assembles much material and tries to “document as it goes.” Having the ‘DocWiki’ framework not only provides a consistent and coherent way to organize that information, but it also helps to point out essential gaps in our offerings.
The ‘DocWiki’ delivers the methods, documentation and portions of the structure to a total open solution. The ‘DocWiki’ is the primary means — along with software development and accompanying code-level and API documentation, of course — for us to fulfill our mantra that “We’re successful when we are not needed.” As we pointed out in Part 1 of this series, we really think such an attitude is ultimately a self-interested one. The better we can address the acceptance factors in the enterprise for our offerings, the more opportunities we will gain.
We would like to think that other enlightened open source software developers, especially those in the semantic space but certainly not limited to them, will see the wisdom of this four-legged foundation to total open solutions. Up until now, pragmatic guidance for what it takes to create a complete open source offering to businesses and enterprises has been lacking.
The tools, methods, and workflows all exist for making total open solutions real today. All of the pieces are themselves open source. There are many useful guides for best practices across the pipeline. It is just that — prior to this — no one apparently took the time to assemble and articulate them. We think this three-part series and some of the “how to” guidance in the ‘DocWiki’ system can help fix this oversight.
Ultimately, with wider adoption by developers, goaded in part by demands of the marketplace for them, we would hope that additional innovations and ideas may be forthcoming to improve the industry’s ability to offer total open source solutions. Adding just a small bit of attentive effort to how we organize and package what we know is but a small price to pay for greater acceptance and success.
In the first part to this series, we put forward the argument that incomplete provision of important support factors was limiting the adoption of open source software in the enterprise. We can liken the absence of these factors to having a chair with one or more absent or broken legs.
This second part of the series goes into the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.
These considerations are not simply a matter of idle curiosity. New approaches and new methods are required for enterprises to modernize their IT systems while adding new capabilities and preserving sunk assets. Extending and modernizing existing IT is often not in the self-interests of the original supplying vendors. And enterprises are well aware that IT commitments can extend for decades.
While the benefits and capabilities of open source software become apparent by the day, rates of open source software adoption lag in enterprises. We have seen entire Internet-based businesses arise and get huge in just a few short years. But it is the rare existing enterprise that has committed to and embraced similar Web-oriented architectures and IT strategies [1].
The enterprise IT ecosystem is evolving to become an unhealthy one. New software vendors have generally abandoned enterprises as a market. Much more action takes place with consumer apps and Internet plays, often premised on ad-based revenues or buzz and traffic as attractors for acquisition. Existing middle-tier enterprise vendors are themselves being gobbled up and disappearing. I’m sure all observers would agree that IT software and services are increasingly dominated by a shrinking slate of vendors. I suspect most observers — myself included — would argue that enterprise-based IT innovation is also on the wane.
The argument posed in the first part of this series is that such atrophy should not be unexpected. The current state of open source software is not addressing the realities of enterprise IT needs.
And that is where the other legs of the total open solution come in. In their entirety, they amount to a form of capacity building for the enterprise [2]. It is not simply enough to put forward buzzwords matched with open source software packages. Exciting innovations in social networks, collaboration, semantic enterprise, mobile apps, REST, Web-oriented architectures, information extraction, linked data and a hundred others are being validated on the Internet. But until the full spectrum of success and adoption factors gets addressed, enterprises will not embrace these new innovations as central to their business.
As we describe these four legs to the total open solution, we will sometimes point to our Citizen Dan initiative [3]. That is not because of some universal applicability of the system to the enterprise; indeed Citizen Dan is mostly targeted to local communities and municipalities. But, Citizen Dan does represent the first instance known to us where each of these total open solution success factors is being explicitly recognized and developed. We think the approach has some transferability to the broader enterprise.
Let’s now discuss these four legs in turn.
Of course, the genesis of this series is grounded in open source software and what it needs to do in order to find broader enterprise acceptance. Clearly that is the first leg amongst the four to be discussed. We also have acknowledged that, generally, best-of-breed open source software is also better documented at the code level, and has documented APIs. We will return to this topic under Leg Four below.
Open source software useful to the enterprise is often a combination of individual open source packages. Some successful vendors of open source to the enterprise in fact began as packagers and documenters of multiple packages. Red Hat for Linux or Alfresco in document management or Pentaho in business intelligence come to mind, as examples.
In the case of Citizen Dan, here are the open source packages presently contained in its offering: Linux (Ubuntu), Apache, MySQL, PHP (these comprising the LAMP stack), Drupal, a variety of third-party Drupal modules, Virtuoso, Solr, ARC2, Smarty, Yahoo UI, TinyMCE, Axiis, Flex, ClearMaps, irON, conStruct, structWSF, and some others. Such combinations of packages are not unusual in open source settings, since new value-add typically comes from extensions to existing systems or unique ways to combine or package them. For example, the installation guide for structWSF alone is quite comprehensive with multiple configuration and test scripts.
Thus, besides direct software, it is also critical that configuration, settings, installation guidance and the like be addressed to enable relatively straightforward set-up. This is an area of frequent weakness. Targeting it directly is a not-so-secret factor for how some vendors have begun to achieve some success with the enterprise market.
All software works on data. While some data is unstructured (such as plain text) and some is semi-structured (such as HTML or Web pages that mixes markup with text), the objective of information extraction or natural language processing is to extract the “structure” from such sources. Once extracted, such structure can interoperate on a common footing with the structured data common to standard databases.
Thus, we use “structure” to denote the concepts and their relationships (the “schema” or “ontology”) and the indicators and data (attributes and values) to describe them, and the “entities” (distinct individuals or nameable instances) that populate them. In other words, “structure” refers to all of the schema (concepts + relationships) + data + attributes + indicators + records that make up the information upon which software can operate.
Structure exists in many forms and serializations. Generally, software represents its internal information in one or a few canonical storage and manipulation formats, though that same software may also be able to import (ingest) or export its information and data in many different external formats.
In our semantic enterprise work, especially with its premise in ontology-driven applications using adaptive ontologies, structure is an absolutely essential construct. But, frankly, no information technology system exists that does not also depend on structure to a more or less greater extent.
The interplay between software and structure is one source of expertise that vendors guard closely and use to competitive advantage. In years past, proprietary software could partially hide the bases for performance or algorithmic advantages. Expert knowledge and intimate familiarity with these systems was the other bases to keep these advantages closely held.
It is perhaps not too surprising given this history, then, that the software industry really has very little emphasis or discussion on the interaction between software and structure. But, if software is being brought in as open source, where is the accompanying expertise or guidance for how data structure can be used to gain full advantage? The same acquired knowledge that, say, accompanied the growth of relational databases in such areas as schema development, materialized views or (de)normalization now needs to be made explicit and exposed for all sorts of open source systems.
In the realm of the semantic enterprise we are seeing attempts at this via open source ontologies and greater emphasis on APIs and documentation of same. Citizen Dan, for example, will be first publicly released with an accompanying MUNI ontology as a reference schema and starting point. Descriptions and methods for how to obtain indicator data and relevant attribute and entity information for the domain will also accompany it.
As open source software continues to emphasize semantics and interoperability, exemplar structures and best practices will need to be an essential part of the technology transfer. Just as the “secrets” of much software began to be opened up via open source, so too must the locked-up expertise of experts and practitioners in how to effectively structure data be exposed.
The need for structure explication and guidance is but one unique slice of a much broader need to expose methods and best practices surrounding a given information management initiative. The reason that any open source software might be adopted in the first place is based on the hope for some improved information management process.
Recently I have been touting MIKE2.0, the first open source, replicable and extensible framework for organizing and managing information in the enterprise. MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. It can be applied to any type of information development.
MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 and its forthcoming extensions, one of which we have developed for the semantic enterprise and are now extending into the semantic government in the context of Citizen Dan, are exciting because they provide a systematic approach and guidance for how (and for what!) to document new projects and initiatives. What MIKE2.0 represents is the first time that the embedded, proprietary expertise of traditional IT consultants has been exposed for broader use and extension.
The real premise behind any approach like MIKE2.0 or variants is to codify the expertise and knowledge that was previously locked up by experts and practitioners. The framework in MIKE2.0 provides a structure by which knowledge bases of background information can be assembled to accompany an open source project. This structure extends from initial evaluation and design all the way through operation and end of life.
The ‘CIS DocWiki’ that is being developed to accompany Citizen Dan is such an example of a MIKE2.0-informed knowledge base. At present, the CIS DocWiki has more than 300 specific articles useful to community indicator systems for local governments, and a complete deployment and maintenance methodology. By public release, it will likely be 2-3 times that size. All of this will be downloadable and installable as a wiki, and as open source content, ready for branding and modification for any local circumstance. CIS DocWiki is a natural methods and documentation complement to the Citizen Dan software and its MUNI structure. Release is scheduled for summer.
As we will focus on in Part 3 of this series, we are combining a MIKE2.0 organizational approach with a documentation and single-source publication platform to fulfill the method and documentary aspects of projects. It was really through the advantages gained by the combination of these pieces that we began to see the inadequacy of many current open source projects for the enterprise.
This series began in part with a recognition that superior open source projects are often the better documented ones. But, even there, documentation is often restricted to code-level documentation or perhaps APIs.
As the material above suggests, documentation needs to extend well beyond software. We need documentation of structure, methods, best practices, use cases, background information, deployment and management, and changing needs over the lifetime of the system. And, as we have also seen in Part 1, the lifetime of that system might be measured in decades.
Documentation is no equal to paid partners and their expertise. But, documentation can be cheaper, and if that documentation is sufficient, might be a means for changing the equation in how IT projects are solicited, acquired and managed.
Today, enterprises appear to be stuck between two difficult choices: 1) the traditional vendor lock-in approach with high costs and low innovation; or 2) open source with minimal documentation and vendor knowledge and little assurance of support longevity.
These trade-offs look pretty unpalatable.
Documentation alone, even as extended into the other legs of the solution, is not prima facie going to be a deal maker. But, its absence, I submit, is a deal breaker. Just as open source itself has taken some years to build basic comfort in the enterprise, so too a concerted attack on all acceptance factors may be necessary before actual wide adoption occurs.
The ‘CIS DocWiki’ platform noted for Citizen Dan we hope will be an exemplar for this combination of documentation and methodology. It is a single-source publishing platform that allows the entire knowledge base behind a given IT initiative to be used for collaboration, operational, training or collateral purposes. And all of this is based on open source software.
Software vendors need to recognize these documentation factors and build their ventures for success. Yes, writing code and producing software is a lot more fun and rewarding than (yeech) documentation. But, unless our current generation of vendors that is committed to open source and its benefits takes its markets seriously — and thus commits to the serious efforts these markets demand — we will continue to see minimal uptake of open source in the enterprise.
Each of these four legs of a total open solution can interact with and reinforce the other parts. Once one begins to see the problem of open source adoption in the enterprise as a holistic one, a new systems-level perspective emerges.
Enterprises know full well that software is only one means to address an information management problem, and only a first step at that. Traditional vendors to the enterprise also understand this, which is why through their embedded systems and built-up expertise they have been able to perpetuate what often amounts to a monopoly position.
Pressures are building for a earthquake in the IT landscape. Enterprises are on an anvil of global competition and limited resources. Existing IT systems are not up to the task but too expensive and embedded to abandon. Traditional vendors have near monopoly positions and little incentive to innovate. New software vendors don’t have the expertise and gravitas to handle enterprise-scale challenges. Meanwhile, the rest of the globe is leapfrogging embedded systems with agile, Web-based systems.
The true innovation that is occurring is all based around open source, nurtured by the global computing platform of the Internet, and fueled by countless individuals able to compete on downward-spiraling cost bases. But on so many levels, open source as presently constituted, either fails or poses too many risks to the commercial enterprise.
The Internet itself was the basis of a paradigm shift, but I think we are only now seeing its manifestation at the enterprise level. We are also now seeing global reordering and changes of the economic order. How will companies respond? How will their IT systems adapt? And what will new vendors need to do and recognize in order to thrive in this changing environment?
I’m not sure I have found the language or rhetoric to convey what I see coming, and coming soon. I know open source is part of it; I know enterprises need it; and I know what is presently being offered does not meet the test.
As I noted in our first part, the mantra that we use in Structured Dynamics to express this challenge is, “We’re Successful When We’re Not Needed“. I think the essence behind this statement is that premises of dependency or proprietary advantage will not survive the jet streams of change that are blowing away the old order.
Sound like too much hyperbole? Actually, my own gut feeling is that it is not nearly enough.
In any case, windy rhetoric always falls short if there is not some actionable next steps. In these first two parts of this series, I have tried to present the ingredients that need to go into the cake. In the third part I try to offer a new, and complementary, open source means for bringing stability to the foundation.
In all cases, though, I think these challenges are permanent ones and do not lend themselves to facile solutions. Four legs, or seven foundations, or twelve steps are all just too simplistic for dealing with the global and complex tsunamis blowing away the old order.
One really does not need to lick a finger to sense the direction of these winds of change. It is coming, and coming hard, and all of it is from the direction of open source. What enterprises do, and what the vendors who want to serve them do, is perhaps less clear. I think open source offers a way out of the box in which enterprise IT is currently stuck. But, at present, I also think that most open source options do not have the necessary legs to stand on.

Structured Dynamics has been engaged in open source software development for some time. Inevitably in each of our engagements we are asked about the viability of open source software, its longevity, and what the business model is behind it. Of course, I appreciate our customers seemingly asking about how we are doing and how successful we are. But I suspect there is more behind this questioning than simply good will for our prospects.
Besides the general facts that most of us know — of hundreds of thousands of open source projects only a miniscule number get traction — I think there are broader undercurrents in these questions. Even with open source, and even with good code documentation, that is not enough to ensure long-term success.
When open source broke on the scene a decade or so ago [1], the first enterprise concerns were based around code quality and possible “enterprise-level” risks: security, scalability, and the fact that much open source was itself LAMP-based. As comfort grew about major open source foundations — Linux, MySQL, Apache, the scripting languages of PHP, Perl and Python (that is the very building blocks of the LAMP stack) — concerns shifted to licensing and the possible “viral” effects of some licenses to compromise existing proprietary systems.
Today, of course, we see hugely successful open source projects in all conceivable venues. Granted, most open source projects get very little traction. Only a few standouts from the hundreds of thousands of open source projects on big venues like SourceForge and Google Code or their smaller brethren are used or known. But, still, in virtually every domain or application area, there are 2-3 standouts that get the lion’s share of attention, downloads and use.
I think it fair to argue that well-documented open source code generally out-competes poorly documented code. In most circumstances, well-documented open source is a contributor to the virtuous circle of community input and effort. Indeed, it is a truism that most open source projects have very few code committers. If there is a big community, it is largely devoted to documentation and assistance to newbies on various forums.
We see some successful open source projects, many paradoxically backed by venture capital, that employ the “package and document” strategy. Here, existing open source pieces are cobbled together as more easily installed comprehensive applications with closer to professional grade documentation and support. Examples like Alfresco or Pentaho come to mind. A related strategy is the “keystone” one where platform players such as Drupal, WordPress, Joomla or the like offer plug-in architectures and established user bases to attract legions of third-party developers [2].
I think if we stand back and look at this trajectory we can see where it is pointing. And, where it is pointing also helps define what the success factors for open source may be moving forward.
Two decades ago most large software vendors made on average 75% to 80% of their revenues from software licences and maintenance fees; quite the opposite is true today [3]. The successful vendors have moved into consulting and services. One only needs look to three of the largest providers of enterprise software of the past two decades — IBM, Oracle and HP — to see evidence of this trend.
How is it that proprietary software with its 15% to 20% or more annual maintenance fees has been so smoothly and profitably replaced with services?
These suppliers are experienced hands in the enterprise and know what any seasoned IT manager knows: the total lifecycle costs of software and IT reside in maintenance, training, uptime and adaptation. Once installed and deployed, these systems assume a life of their own, with actual use lifetimes that can approach two to three decades.
This reality is, in part, behind my standard exhortation about respecting and leveraging existing IT assets, and why Structured Dynamics has such a commitment to semantic technology deployment in the enterprise that is layered onto existing systems. But, this very same truism can also bring insight into the acceptable (or not) factors facing open source.
Great code — even if well documented — is not alone the mousetrap that leads the world to the door. Listen to the enterprise: lifecycle costs and longevity of use are facts.
But what I am saying here is not really all that earthshaking. These truths are available to anyone with some experience. What is possibly galling to enterprises is two smug positions of new market entrants. The first, which is really naïve, is the moral superiority of open source or open data or any such silly artificial distinctions. That might work in the halls of academia, but carries no water with the enterprise. The second, more cynically based, is to wrap one’s business in the patina of open source while engaging in the “wink-wink” knowledge that only the developer of that open source is in a position to offer longer term support.
Enterprises are not stupid and understand this. So, what IT manager or CIO is going to bet their future software infrastructure on a start-up with immature code, generally poor code documentation or APIs, and definitely no clear clue about their business?
Yet, that being said, neither enterprises nor vendors nor software innovators that want to work with them can escape the inexorable force of open source. While it has many guises from cloud computing to social software or software as a service or a hundred other terms, the slow squeeze is happening. Big vendors know this; that is why there has been the rush to services. Start-up vendors see this; that is why most have gone consumer apps and ad-based revenue models. And enterprises know this, which is why most are doing nothing other than treading water because the way out of the squeeze is not apparent.
The purpose of this three-part series is to look at these issues from many angles. What might the absolute pervasiveness of open source mean to traditional IT functions? How can strategic and meaningful change be effected via these new IT realities in the enterprise? And, how can software developers and vendors desirous of engaging in large-scale initiatives with enterprises find meaningful business models?

And, after we answer those questions, we will rest for a day.
But, no, seriously, these are serious questions.
There is no doubt open source is here to stay, yet its maturity demands new thinking and perspectives. Just as enterprises have known that software is only the beginning of decades-long IT commitments and (sometimes) headaches, the purveyors and users of open source should recognize the acceptance factors facing broad enterprise adoption and reliance.
Open source offers the wonderful prospect of avoiding vendor “lock-in”. But, if the full spectrum of software use and adoption is also not so covered, all we have done is to unlock the initial selection and install of the software. Where do we turn for modifications? for updates? for integration with other packages? for ongoing training and maintenance? And, whatever we do, have we done so by making bets on some ephemeral start-up? (We know how IBM will answer that question.)
The first generation of open source has been a substitute for upfront proprietary licenses. After that, support has been a roll of the dice. Sure, broadly accepted open source software provides some solace because of more players and more attention, but how does this square with the prospect of decades of need?
The perverse reality in these questions is that most all early open source vendors are being gobbled up or co-opted by the existing big vendors. The reward of successful market entry is often a great sucking sound to perpetuate existing concentrations of market presence. In the end, how are enterprises benefiting?
Now, on the face of it, I think it neither positive nor negative whether an early open source firm with some initial traction is gobbled up by a big player or not. After all, small fish tend to be eaten by big fish.
But two real questions arise in my mind: One, how does this gobbling fix the current dysfunction of enterprise IT? And, two, what is a poor new open source vendor to do?
The answer to these questions resides in the concerns and anxieties that caused them to be raised in the first place. Enterprises don’t like “lock-in” but like even less seeing stranded investments. For open source to be successful it needs to adopt a strategy that actively extends its traditional basis in open code. It needs to embrace complete documentation, provision of the methods and systems necessary for independent maintenance, and total lifecycle commitments. In short, open source needs to transition from code to systems.
We call this approach the total open solution. It involves — in addition to the software, of course — recipes, methods, and complete documentation useful for full-life deployments. So, vendors, do you want to be an enterprise player with open source? Then, embrace the full spectrum of realities that face the enterprise.
The actual mantra that we use to express this challenge is, “We’re Successful When We’re Not Needed“. This simple mental image helps define gaps and tells us what we need to do moving forward.
The basic premise is that any taint of lock-in or not being attentive to the enterprise customer is a potential point of failure. If we can see and avoid those points and put in place systems or whatever to overcome them, then we have increased comfort in our open source offerings.
Like good open source software, this is ultimately a self-interest position to take. If we can increase comfort in the marketplace that they can adopt and sustain our efforts without us, they will adopt them to a greater degree. And, once adopted, and when extensions or new capabilities are needed, then as initial developers with a complete grasp on the entire lifecycle challenges we become a natural possible hire. Granted, that hiring is by no means guaranteed. In fact, we benefit when there are many able players available.
In the remaining two parts of this series we will discuss all of the components that make up a total open solution and present a collaboration platform for delivering the methods and documentation portions. We’re pretty sure we don’t yet have it fully right. But, we’re also pretty sure we don’t have it wrong.