A few weeks back I completed a three-part introductory series to what Structured Dynamics calls a ‘total open solution‘. A total open solution as we defined it is comprised of software, structure, methods and documentation. When provided in toto, these components provide all of the necessary parts for an organization to adopt new open source solutions on its own (or with the choice of its own consultants and contractors). A total open solution fulfills SD’s mantra that, “We’re successful when we’re not needed.”
Two of the four legs to this total open solution are provided by documentation and methods. These two parts can be seen as a knowledge base that instructs users on how to select, install, maintain and manage the solution at hand.
Today, SD is releasing publicly for the first time two complementary knowledge bases for these purposes: TechWiki, which is the technical and software documentation complement, in this case based around SD’s Open Semantic Framework and its associated open source software projects; and DocWiki, the process methodology and project management complement that extends this basis, in this case based around the Citizen Dan local community open data appliance.
All of the software supporting these initiatives is open source. And, all of the content in the knowledge bases is freely available under a Creative Commons 3.0 license with attribution.
In setting out the design of these knowledge bases, our mindset was to enable single-point authoring of document content, while promoting easy collaboration and rollback of versions. Thus, the design objectives became:
Assuming these objectives could be met, we then had three other objectives on our wish list:
Our initial investigations looked at conventional content and document management systems, matched with version control systems or SVNs. Somewhat surprisingly, though, we found the Mediawiki platform to fulfill all of our objectives. Mediawiki, as detailed below, has evolved to become a very mature and capable documentation platform.
While most of us know Mediawiki as a kind of organic authoring and content platform — as it is used on Wikipedia and many other leading wikis — we also found it perfect for our specific knowledge base purposes. To our knowledge, no one has yet set up and deployed Mediawiki in the specific pre-packaged knowledge base manner as described herein.
TechWiki is a Mediawiki instance designed to support the collaborative creation of technical knowledge bases. The TechWiki design is specifically geared to produce high-quality, comprehensive technical documentation associated with the OpenStructs open source software. This knowledge base is meant to be the go-to source for any and all documentation for the codes, and includes information regarding:
As of today, TechWiki contains 187 articles under 56 categories, with a further 293 images. The knowledge base is growing daily.
DocWiki is a sibling Mediawiki instance that contains all TechWiki material, but has a broader purpose. Its role is to be a complete knowledge base for a given installation of an Open Semantic Framework (in the current case, Citizen Dan). As such, it needs to include much of the technical information in the TechWiki, but also extends that in the following areas:
The methodology portions of the DocWiki are drawn from the broader MIKE2.0 (Method for Integrated Knowledge Environments) approach. I have previously written about this open source methodology championed by Bearing Point and Deloitte.
As of today, DocWiki contains 357 articles and 394 structured tasks in 70 activity areas under 77 categories. Another 115 images support this content. This knowledge base, too, is growing daily.
Both of these knowledge bases are open source and may be exported and installed locally. Then, users may revise and modify and extend that pre-packaged information in any way they see fit.
The basic design of these systems is geared to collaboration and embeds what we think are really responsive work flows. These extend from supporting initial idea noodling to full-blown public documentation. The inherent design of the system also supports single-source publishing and book or PDF creation from the material that is there. Here is the basic overview of the design:
Mediawiki provides the standard authoring and collaboration environment. There are a choice of editing methods. As content is created, it is organized in a standard way and stored in the knowledge base. The Mediawiki API supports the export of information in either XHTML or XML, which in turn allows the information to be used in external apps (including other Mediawiki instances) or for various single-source publication purposes. The Collection extension is one means by which PDFs or even entire books (that is, multi-page documents with potentially chapters, etc.) may be created. Use of a well-designed CSS ensures that outputs can be readily styled and themed for different purposes or audiences.
As wikis designed from the get-go to be reusable, and then downloaded and installed locally, it is important that we maintain quality and consistency across content. (After download, users are free to do with it as they wish, but it is important the initial database be clean and coherent.) The overall interaction with the content thus occurs via one of three levels: 1) simple reading, which is publicly available without limitation to any visitor, including source inspection and export; 2) editing and authoring, which is limited to approved contributors; and 3) draft authoring and noodling, which is limited to the group in #2 but for which the in-progress content is not publicly viewable. Built-in access rights in the system enable these distinctions.
Besides meeting all of the objectives noted at the opening of this post, these wikis (knowledge bases) also have these specific features:
Many of these features come from the standard extensions in the TechWiki/DocWiki packages.
The net benefits from this design are easily shared and modified knowledge bases that users and organizations may either contribute to for the broader benefit of the OpenStructs community, or download and install with simple modifications for local use and extension. There is actually no new software in this approach, just proper attention to packaging, design, standardization and workflow.
Via the sharing of extensions, categories and CSS, it is quite easy to have multiple instances or authoring environments in this design. For Structured Dynamics, that begins with our own internal wiki. Many notes are taken and collected there, some of a proprietary nature and the majority not intended or suitable for seeing public release.
Content that has developed to the point of release, however, can be simply tagged using conventions in the workflow. Then, with a single Export command, the relevant content is then sent to an XML file. (This document can itself be edited, such as for example changing all ‘TechWiki’ references to something like ‘My Content Site’; see further here.)
Depending on the nature of the content, this exported content may then be imported with a single Import command to either the TechWiki or DocWiki sites. (Note: Import does require admin rights.) A simple migration may also occur from the TechWiki to the DocWiki. Also, of course, initial authoring may begin at any of the sites, with collaborators an explicit feature of the TechWiki or DocWiki versions.
Any DocWiki can also be specifically configured for different domains and instance types. In terms of our current example, we are using Citizen Dan, but that could be any such Open Semantic Framework instance type:
Under this design, then, the workflow suggests that technical content authoring and revision take place within the TechWiki, process and methodology revision in the DocWiki. Moreover, most DocWikis are likely to be installed locally, such that once installed, their own content would likely morph into local methods and steps.
So long as page titles are kept the same, newer information can be updated on any target wiki at any time. Prior versions are kept in the version history and can be reinstated. Alternatively, if local content is clearly diverging yet updates of initial source material is still desired, the local content need only be saved under a new title to preserve it from import overwrites.
We are really excited by this design and have already seen benefits in our own internal work and documentation. We see, for example, easier management of documentation and content, permanent (canonical) URLs for specific content items, and greater consistency and common language across all projects and documentation. Also, when all documentation is consolidated into one point with a coherent organizational and category structure, documentation gaps and inconsistencies also become apparent and can readily be fixed.
Now, with the release of these systems to the OpenStructs (Open Semantic Framework) and Citizen Dan communities, we hope to see broader contributions and expansion of the content. We encourage you to check on these two sites periodically to see how the content volume continues to grow! And, we welcome all project contributors to join in and help expand these knowledge bases!
We think this general design and approach — especially in relation to a total open solution mindset — has much to recommend it for other open source projects. We think these systems, now that we have designed and worked out the workflows, are amazingly simple to set up and maintain. We welcome other projects to adopt this approach for their own. Let us know if we can be of assistance, and we welcome ideas for improvement!
OK. So, you’re looking at your garage … or your bedroom closet … or your office and its files. They are a mess, and you can’t find anything and you can’t stuff anything more into the nooks, cubbies, crannies or cabinets. What do you do?
Well, when you finally get fed up and have a rainy day or some other excuse, you tackle the mess. Maybe you grab a big mug of coffee to prepare for the pending battle. Maybe you strip down to comfort clothes. Then, if you’re like me, you begin to organize stuff into piles. Labeled piles and throwaway piles and any other piles that can provide a means to start bringing order to the chaos.
In the semantic Web world, there is a phrase coined by Jim Hendler that captures this approach: A little semantics goes a long way . A little semantics, just like your labeled piles, helps to bring order to information chaos.
Mind you, this is not fancy or expensive stuff. In the case of my office, it is colored sheets of paper labeled with Magic Markers as “Taxes” or “Internal” or “Blog Posts” or whatever. Then, I begin sifting and distributing. In the case of the semantic world, these are classifying things into like categories and simply relating them to other categories with simple relationships, such as “is Part Of” or “is Narrower Than”.
Of course, I could have approached my mess in a different way. I could have hired an efficiency expert to come in, interview me and all of my employees and colleagues, gotten a written analysis and report, and then committed to a multi-week project to completely store and place every single last piece of paper in my office or organize every rake and set of abandoned golf clubs in my garage. When done, I would have shelled out much money and I suspect still not have been able to find anything.
Sort of sounds like the traditional way IT does its business, doesn’t it? To clean up their information messes, enterprises need to find a better strategy.
I’m not too long from having returned from the SemTech conference, which overall was quite an excellent show. But despite its emphasis on semantic technologies and their usefulness to businesses and enterprises, I found one critical theme unspoken: the ability of semantic approaches to change how enterprise IT actually does business. New ways have got to be found to clean up the many and growing information piles emerging all around us.
IT is — and has been — going through a fundamental set of changes for decades. In the last decade, these changes have led to lowered relative spending, a shift in spending priorities toward services, less innovation, and less productivity. Some data and observations by researchers and analysts document these trends.
The following chart, using US Bureau of Economic Analysis data , shows the clear 50-year trend in declining hardware costs for enterprises, mostly resulting from the observation known as Moore’s Law. These massive hardware cost reductions (logarithmic scale) have also resulted in lower prices for IT as a whole. In 2008, for example, total relative IT prices were about two-thirds what they were a mere decade earlier:
In contrast, relative prices for software and services have remained remarkably flat over this entire period, including for the past decade. This is somewhat surprising given the emergence of packaged software and more recently open source. However, relative percentage expenditures for custom software and software developed in-house have also remained strong over the past decade .
The mid- to late-1990s represented the high-water mark on many bases for enterprise IT, expenditures and vendors. Roughly in 1997 or so, the number of public enterprise software vendors peaked as did venture funding  and relative expenditures for IT in relation to GDP. There was a major uptick in relation to preparing for Y2K and a major downtick due to the dot-com bubble, and then of course the past two years or so have seen a global economic downturn. But, as the figure below shows (red), the long-term trend tends to suggest a relative plateau for IT expenditures in relation to GDP somewhat around 2000:
Yet, like the first chart, software seems to be bucking this trend (blue lines above). Though perhaps the rate of growth in expenditures for software is slowing a bit, it is still on a growth upslope, especially in relation to overall IT expenditures. The next chart, in fact, specifically compares software expenditures to total IT expenditures. Software expenditures are some 40% higher in relation to total IT than they were a mere decade ago:
The mix of these software expenditures is also changing in major ways while stagnating in others.
The changing aspect is coming about from the shift of expenditures from license and maintenance fees to services. A number of software vendors began to see revenues from services overcome that from licensing in the 1990s. By the early 2000s, this was true for the enterprise software sector as a whole . Today, service revenues account for 70% or so of aggregate sector revenues. Combined with the emergence of open source and other alternatives such as software as a service (SaaS), I think it fair to say that the era of proprietary software with exceedingly high margins from monopoly rents is over .
The stagnating aspect occurs in how the software expenditures are applied. According to Gartner, in the US, more than 70% of IT expenditures are devoted to simply running existing systems, with only about 11% of budgets devoted to innovation; other parts of the world spend nearly double on innovation and much lower for operations . This relative lack of support for innovation and high percentages for running existing systems has held true for about a decade. Meanwhile, IT’s contribution to US productivity has been declining since 2001 .
Last year, PricewaterhouseCoopers published a major report with the provocative title, “Why Isn’t IT Spending Creating More Value?” . The 42-page report covered many of the aspects above. Among other factors, the PWC authors speculated that:
I suppose one could add to this litany other factors such as the growth and emergence of the Internet, sector consolidations through mergers and acquisitions, the rise of open source and alternatives such as SaaS, etc.
But which of these are causes? Which are symptoms? And which might only be consequences or coincident?
To be sure, all recognize the explosion of digital data and information, with sources and formats springing up faster than Whack-a-Mole. It is such an evident and ubiquitous phenomenon that pointing to it as a cause appears on the face of it quite obvious. Also obvious is that these new sources carry with them a diversity of systems and tools. While not categorically stated as such, it appears that PWC fingers the difficulties of “cobbling” these systems together as the root cause for low productivity and thus the IT cost crisis.
I agree totally that these are symptoms of what we see in IT’s current circumstance. I would even say these factors are a proximate cause to these ills. But I disagree they are the root cause. To discover that root, I believe, we must look deeper to mindset and assumptions.
There are some phenomena that are so obvious that they are easily missed. Not seeing your fingertip six inches between your eyes is one of these. We aren’t used to focusing on things so near at hand.
So, let’s look for a moment at the closed world assumption (CWA), a key underpinning to most standard relational data systems and enterprise schema and logics. CWA is the logic assumption that what is not currently known to be true, is false. If CWA is not directly familiar to you that is understandable; it is an implied assumption of these systems and logics. As such, it is not often inspected directly and therefore not often questioned .
With regard to standard IT systems, the closed world assumption has two important aspects:
On the face of them, these assumptions seem tame enough. And, indeed, there are some enterprise data systems that absolutely rely on them for efficient processing and completion times, such as most transaction systems. CWA is absolutely the appropriate design for such applications.
However, for knowledge management or representation applications — that is, applications which involve combining or using heterogeneous data or information from multiple data sources, which are exactly the same sources requiring information “cobbling” noted above by PWC — there are two very critical implications of the closed-world assumption (CWA):
The net effect, which I have argued before, most notably in a major piece about the open world assumption , is that typical projects with a knowledge management aspect have become costly, take very long to complete, often fail, and require much planning and coordination. These facts have been true for three decades as enterprises have attempted to extract knowledge from their electronic information using closed world approaches based on relational systems. And, as recognized by PWC, these problems are only getting worse with growth in diversity and scope of systems.
The implications of closed world v. open world approaches are absolutely at the root of the causes leading to declining productivity, low innovation, significant failures and increasing costs — all exacerbated with more data and more systems — now characterizing traditional enterprise IT. Moreover, it is not a problem for open world systems to link to and incorporate closed world approaches. With open world, there is no need for Hobson’s choices. Unfortunately, such is not true when one begins with a closed world premise.
As best as I can tell, Alon Halevy was the first to use the phrase “pay as you go” in 2006 to describe the incremental aspect of the open world approach in relation to the semantic Web . The “pay as you go” phrase had been applied earlier to data management and storage and had also been used to describe phone calling plans.
Incremental concepts and “agility” have been popular topics for the past five to ten years in IT, most often related to software development. And, while “incremental” sounds good in relation to enterprise projects, especially of a knowledge management or information integration/federation nature, the actual methodologies put forward were anything but incremental in their conceptual underpinnings.
Unfortunately, the “pay as you go” phrase has (and still is) largely confined to incremental, open world approaches involving the semantic Web. How this approach might apply and benefit enterprises has yet to be articulated. Nonetheless, I like the phrase, and I think it evokes the right mindset. In fact, I think with linked data and many other aspects of the current semantic Web we are seeing such approaches come to fruition. Inch-by-inch, brick-by-brick, data on the Web is getting exposed and interlinked. “Pay as you go” is incremental, and that is good.
Yet the idea of “pay as you benefit” is more purposeful, able to be planned and implemented, and founded on standard enterprise cost-benefit principles. I think it is a better (and more nuanced) expression of the “pay as you go” mindset in an enterprise setting. What it means is you can start small and be incomplete. You can target any domain or department or scope that is most useful and illustrative for your organization. You can deploy your first stand-ups as proofs-of-concept or sandboxes. And, you can build on each prior step with each subsequent one.
One of the reasons we (Structured Dynamics) embraced the MIKE2.0 methodology  was its inherent incremental character. (Government deployments often call them “spirals”.) In general, the five phases of MIKE2.0 can be represented as follows:
It is specifically during the fifth phase, testing and improvement, that quantitative and qualitative benefits from the current increment are calculated and documented. This evolving methodology is where the enterprise can assess the results of its prior investment and scope and budget for the next one. These can be quick, rapid increments, or more involved ones, depending on the schedule, prior results and risk profile of the enterprise (or department) at that time.
Much is made of “incremental” or “agile” deployments within enterprises, but the nature of the traditional data system (and its closed world assumption) can act to undermine these laudable steps. The inherent nature of an open world approach, matched with methodologies and best practices, can work wonderfully with KM-related projects.
We see in our current IT circumstances a number of embedded practices and assumptions. We have been assuming control and completeness — the closed world opposite to the open world approach. We have thus embraced and promoted “global” or enterprise-wide solutions: be they desktop operating systems or browsers or expensive enterprise-level proprietary software solutions. This scope leads to immense hurdle rates and risks: we better get our choices right up front, because if we don’t, the department or enterprise are at risk. We have an inward focus about our own resources, our own networks, our own systems. Meanwhile, when we look outward, we wonder how all of these new Web companies can grow and expand so rapidly in comparison to us.
Clearly, we are seeing shifts to more services than products, more open source, more outsourcing, and more software as a service. Yet, because of the legacy of decades-long commitments from prior IT investment and the failures of many hyped “solutions” such as ERP or BI or data warehousing or a dozen others, we also see a decline and a reluctance for IT to embrace new and transforming approaches. Our prior choices were practically tantamount to “betting the enterprise.” What if our new approaches fail as so many of their predecessors did? In a demanding, competitive environment can we afford to make such wrong choices again with such immense implications?
Yet, now that information technology is a given, it only seems natural that its role becomes an integral part of the enterprise, and not a special function. Like procurement, IT has matured to become a support function. Businesses should not succeed or fail based on the types of pencils and paper stock they use; so should they not depend on the software support choices that IT makes. Enterprises are now past the need to get “computerized”; they are thoroughly so. But our understanding of IT’s role and position has not evolved with its own success.
The first whiffs of these challenges to IT’s initial hegemony came from the departmental introduction of PCs and local networks in the early 1980s. It has continued with desktop software, spreadsheets and Web portals and sites. Large, mature companies awoke in horror in the last decade to discover they had hundreds — sometimes thousands — of Web sites and content dissemination points over which IT had little or no control. Such is the nature of entropy, and it is a fact for any organization of any size.
So, now, with strategies such as “pay as you benefit,” there is no longer an excuse not to innovate. There is not a justification to put off testing and discovering benefits that the open world and semantic approaches can bring to your organization. There is now a basis to make the case and set the affordable budgets within desirable timelines for becoming a semantic enterprise.
Mindsets and expectations do require some adjustment. For example, not everything will be known or modeled in early phases. But, is that also not true in any “real” real world? We’re not talking high-throughput transaction systems here, but beginning to pull together and link the information that is important to your organization strategically.
Remember the intro statement that “a little semantics goes a long way”? Well, that truth — and it is true — when combined with incremental deployment firmly tied to demonstrable results, promises quite simply a different way to do business. Never before have enterprises had working and winnable approaches such as this to test and innovate and learn and discover. Jump on in; the water is clear and warm.
And, oh, as to that mess in your closet or garage? Well, if you adhere to CWA, you will need to define a place for everything to go before you can start cleaning things up. I say: forget those false hurdles. If you’d really want to make a dent in the mess, grab a broom and start cleaning.
In the first part to this series, we began with the argument that open source software alone was not sufficient to meet the required acceptance factors in the enterprise. As a guiding way to create the right mindset around these issues we shared the saying that we have adopted at Structured Dynamics that, “We’re successful when we are not needed.”
In the second part of this series we described the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we termed this a total open solution.
Now, in this third and concluding part to our series, we introduce the open source documentation and methodology system called ‘DocWiki’. It complements the base open source software, in the process completing the conditions for a total open solution.
Though we call this system ‘DocWiki’, it is not meant to be a brand or particular product description for what Structured Dynamics is offering. Rather, ‘DocWiki’ is merely a placeholder name for a generic, open source system and knowledge base that can be downloaded, installed, branded, modified and extended in whatever way the user sees fit. ‘DocWiki’ is a baseline documentation and methodology “starter kit” that can be dressed up in new clothes or packaged and named in whatever manner best suited to a given deployment.
In describing the major components of this ‘DocWiki’ system we will again use our Citizen Dan initiative  as we did in Part 2. This gives us a real use case, though the same approach is applicable to any open source information management initiative by enterprises.
We call the specific version of the ‘DocWiki’ used in the case of Citizen Dan the ‘CIS DocWiki‘ (for community indicator systems), specific to the domain and local government focus of Citizen Dan. Similarly, the structured vocabulary and ontology that guides the system is the MUNI ontology. For other information development initiatives, the specific content of these components would be swapped out for ones appropriate to that initiative.
A number of desires and objectives intersected to guide the design of the ‘DocWiki’ system. We wanted:
In first formulating this design, our assumption was the major building blocks would be an open source document management system linked with some form of version control. Though we think such a formulation could work OK, our exposure to the MIKE2.0 methodology actually caused us to re-look at and re-think a wiki-based approach. Ultimately the trump card that decided the design for us was familiarity and ease-of-use.
The resulting architecture of the full ‘DocWiki’ system is shown below:
What is cool about this design is that a single software download install with a few extensions (Mediawiki, the Wikipedia software, plus some standard extensions and judicious use of Semantic Mediawiki) and a single loadable database are all that is required to transfer and install the ‘DocWiki’ system.
To better describe this system, we will focus on three major interconnecting pieces in this architectural diagram: the knowledge base; the vocabulary and structure (ontology); and the authoring and publishing system (wiki).
The pre-loaded content for the ‘DocWiki’ system comes from its knowledge base. This is provided as a text-exported MySQL database that can be modified en masse before loading (such as substituting ‘YourName’ for ‘DocWiki’). The exemplar upon which this knowledge base is modeled is the MIKE2.0 framework.
MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 has a generalized methodology and set of templates applicable to initiatives, the phases, activities and tasks to undertake them, and supporting assets. Supporting assets can range from glossaries and definition of terms and concepts to very specific technical documents or background material. The entire system is logical and applies a consistent design and organizational structure and categories.
For our purposes, we wanted a complete, turnkey content knowledge base. This meant that we needed to accommodate all forms of project management and guidance, ranging from specific “how-to” and technical discussions to the entire suite of background and supporting material. The scope of this knowledge content is defined as what a new person assigned a lead or implementation responsibility would need to read or master.
As a destination site MIKE2.0 is quite broad: it embraces the ability to model virtually any information management initiative. This makes MIKE2.0 an invaluable source of structure and methodology guidance, but also results in it being quite limited in the specific how-tos associated with any given initiative. I have earlier spoken about the structure of MIKE2.0 and in particular its applicability to the semantic enterprise.
The strength of MIKE2.0, however, is that its structure can be grabbed and quickly applied to form an organizational and structural basis for filling out the knowledge base for any specific information development initiative. And, that is exactly what we did with the ‘CIS DocWiki.’
MIKE2.0 hosts and maintains its project-related structure in Mediawiki (with some extensions). Combined with its templates, this provides a rapid-start baseline for beginning to tailor and flesh out the specific details for a given information management initiative. Thus, after copying broad aspects of the MIKE2.0 system into the incipient ‘DocWiki’, it was relatively straightforward to let the existing structure and templates of MIKE2.0 guide next steps.
As of today’s date, the ‘CIS DocWiki’ contains about 300 substantive articles, a complete activity and tasking structure, and various re-usable templates based on Semantic Mediawiki for structured and consistent access and retrieval. New tasks and structure can be readily added to the system. Existing structure or content can be deleted or marked as archive for non-display. We are still gathering all requisite content pieces, and anticipate by first public release that the baseline knowledge base will include 2x to 3x the scale of its current content.
For new ‘CIS DocWiki’ (or Citizen Dan-based) deployments, this means the knowledge base can be completely modified and extended for local circumstances. The set-up of the Mediawiki instance is separate from the loading or modification of the knowledge base, which means the look-and-feel of the entire system, not to mention user rights and permissions, can also be readily tailored for local requirements.
The core content of the ‘CIS DocWiki’ and its basis in a set structure and methodology (derived from MIKE2.0) means that the knowledge base is also adaptable for other broader information development areas, especially in the semantic enterprise or semantic government arenas. Thus, while Structured Dynamics is first releasing the ‘CIS DocWiki’ in the context of Citizen Dan and semantic government, we also are developing a parallel instance for the Open SEAS approach to the semantic enterprise.
The approach taken here is somewhat different than the standard wiki use. As experts, we are basically sole authoring (with contributions from selected collaborators and our clients) the starting basis of the knowledge base. Unlike many wikis, this enables us to be quite consistent in content, style, and organization. Such an approach allows us to present a coherent and complete starting content and methodology foundation. However, once delivered and installed for a given deployment, its users are then free to extend and change this knowledge foundation in the standard wiki manner. Whether those subsequent extensions are free-form or more tightly controlled and managed is the choice of the new deployment’s administrators.
Strictly speaking, the vocabularies and structures (including, of course, ontologies) that drive our semantic government or semantic enterprise offerings are also part of the knowledge base. And, in fact, many of these aspects, especially related to the actual operating of the instances, are included as part of the standard knowledge base.
However, the applicable domain ontology itself is separately maintained. Descriptions of how to use and modify such ontologies are part of the general ‘DocWiki’ knowledge base, but the ontology is not. This arm’s length-separation is done to acknowledge that the ontology has independent use and value apart from the knowledge base or the software (Citizen Dan, in this case) that is the focus of it.
In the Citizen Dan instance, this structure is the MUNI ontology. MUNI is a general local government domain ontology that can find use in a broad array of circumstances, using or not Citizen Dan. Thus, like other ontologies developed and maintained by Structured Dynamics, such as BIBO (the Bibliographic Ontology), the ontology itself and its documentation, discussion forums and use cases are maintained separately.
The first release of MUNI is still under development and will be released this summer.
The software framework that hosts and manages all of this content is the Mediawiki software, originally developed for Wikipedia. This framework is supported by a number of standard extensions packaged with the ‘DocWiki’ distribution. One of the more notable extensions is Semantic Mediawiki. Mediawiki also is the wiki framework underlying MIKE2.0, so content sharing between the systems is straightforward.
The first use of the ‘DocWiki’ is to add new content to the knowledge base and to modify or extend what is provided in the baseline. For straight authoring, ‘DocWiki’ offers the standard wikitext basis for content entry and editing, as well as the WikED enhanced editor and the FCKEditor WYSIWYG rich-text editor. Each of these may be turned on or off at will.
All of the baseline content is fully organized and categorized via a standard structure. Pre-existing templates aid in entering new content in specific areas consistently or in providing standard administrative ways of tagging content for completeness or need for editorial attention. Tasks and concepts, in particular, follow set ways of entry and description. These set templates, some forms-based and some derived from Semantic Mediawiki, are also tied into automatic internal scripts for listing and organizing various items. So long as new material is entered properly, it will be reflected in various stats and listings. Unlike sole reliance on Semantic Mediawiki, the ‘DocWiki’ approach is a mix of standard wiki categories and semantic types. Both are used for effective organization of the knowledge base.
Besides the knowledge base of domain content and “how-to”, the system also comes pre-packaged with many wiki “how-to” and best practices guidance for using the system effectively and consistently. Of course, a given deployment may or may not enforce all of these practices. A poorly administered instance, for example, could degenerate fairly quickly and lose the native structure and organization of the baseline system.
As with standard wikis, there is a history of prior page revisions that gives the system rollback and version control. Mediawiki has a pretty good user access and permissions framework ranging from access, reading, editing and to uploads.
Besides the standard and required extensions, ‘DocWiki’ also comes packaged with the necessary settings and configuration files to operate “out-of-the-box” in its designed baseline mode. Of course, these settings, too, can be changed and modified by site administrators, and ‘DocWiki’ also includes guidance on how to do that.
A little known but highly useful part of the Mediawiki API allows direct export of XHTML content . Then, with minor XSLT conversion templates, it is possible to strip out wiki-specific conventions (such as the editing of individual sections) or to create straight XML versions. When this is combined with the use of internal ‘DocWiki’ CSS style sheets that impose some clean and semantic style identifiers, a common canonical output basis for content is possible.
From that point, a given deployment may use its own CSS styles to theme output content. Output Web pages (XHTML) or XML files then can be processed using existing and accurate utilities to produce PDF or *.doc documents. Then, with systems such as OpenOffice, an even wider variety of document formats can be produced. These facilities mean that the ‘DocWiki’ can also act as a single-source publishing environment.
In its initial release, re-purposing ‘DocWiki’ content into other presentations (for example, combining sections from multiple pages into a new document as opposed to re-using existing pages as is) will require creating new wiki pages and then cutting-and-pasting the desired content. However, it should also be noted that both DocBook and DITA have been applied to Mediawiki installations . It should be possible to enable a more flexible re-purposing framework for ‘DocWiki’ moving into the future.
The ‘CIS DocWiki’ is meant to accompany the first release of Citizen Dan, likely by the end of summer. The MUNI ontology will also be released roughly at the same time. At release, the ‘CIS DocWiki’ is anticipated to have on the order of 500-800 baseline content and “how to” articles.
Depending on time availability and other commitments, Structured Dynamics will also be using this information to build a semantic government composite offering to MIKE2.0. We will be contributing this new offering for free, similar to what we have done earlier for a semantic enterprise offering.
Subsequent to those events, we will then be modifying the ‘CIS DocWiki’ for the semantic enterprise domain. Much of the necessary content will have already been assembled for the ‘CIS DocWiki’.
Paradoxically, while developing such knowledge bases and systems such as ‘DocWiki’ appears to be extra work, from our standpoint as developers it is useful and efficient. Structured Dynamics already researches and assembles much material and tries to “document as it goes.” Having the ‘DocWiki’ framework not only provides a consistent and coherent way to organize that information, but it also helps to point out essential gaps in our offerings.
The ‘DocWiki’ delivers the methods, documentation and portions of the structure to a total open solution. The ‘DocWiki’ is the primary means — along with software development and accompanying code-level and API documentation, of course — for us to fulfill our mantra that “We’re successful when we are not needed.” As we pointed out in Part 1 of this series, we really think such an attitude is ultimately a self-interested one. The better we can address the acceptance factors in the enterprise for our offerings, the more opportunities we will gain.
We would like to think that other enlightened open source software developers, especially those in the semantic space but certainly not limited to them, will see the wisdom of this four-legged foundation to total open solutions. Up until now, pragmatic guidance for what it takes to create a complete open source offering to businesses and enterprises has been lacking.
The tools, methods, and workflows all exist for making total open solutions real today. All of the pieces are themselves open source. There are many useful guides for best practices across the pipeline. It is just that — prior to this — no one apparently took the time to assemble and articulate them. We think this three-part series and some of the “how to” guidance in the ‘DocWiki’ system can help fix this oversight.
Ultimately, with wider adoption by developers, goaded in part by demands of the marketplace for them, we would hope that additional innovations and ideas may be forthcoming to improve the industry’s ability to offer total open source solutions. Adding just a small bit of attentive effort to how we organize and package what we know is but a small price to pay for greater acceptance and success.
In the first part to this series, we put forward the argument that incomplete provision of important support factors was limiting the adoption of open source software in the enterprise. We can liken the absence of these factors to having a chair with one or more absent or broken legs.
This second part of the series goes into the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.
These considerations are not simply a matter of idle curiosity. New approaches and new methods are required for enterprises to modernize their IT systems while adding new capabilities and preserving sunk assets. Extending and modernizing existing IT is often not in the self-interests of the original supplying vendors. And enterprises are well aware that IT commitments can extend for decades.
While the benefits and capabilities of open source software become apparent by the day, rates of open source software adoption lag in enterprises. We have seen entire Internet-based businesses arise and get huge in just a few short years. But it is the rare existing enterprise that has committed to and embraced similar Web-oriented architectures and IT strategies .
The enterprise IT ecosystem is evolving to become an unhealthy one. New software vendors have generally abandoned enterprises as a market. Much more action takes place with consumer apps and Internet plays, often premised on ad-based revenues or buzz and traffic as attractors for acquisition. Existing middle-tier enterprise vendors are themselves being gobbled up and disappearing. I’m sure all observers would agree that IT software and services are increasingly dominated by a shrinking slate of vendors. I suspect most observers — myself included — would argue that enterprise-based IT innovation is also on the wane.
The argument posed in the first part of this series is that such atrophy should not be unexpected. The current state of open source software is not addressing the realities of enterprise IT needs.
And that is where the other legs of the total open solution come in. In their entirety, they amount to a form of capacity building for the enterprise . It is not simply enough to put forward buzzwords matched with open source software packages. Exciting innovations in social networks, collaboration, semantic enterprise, mobile apps, REST, Web-oriented architectures, information extraction, linked data and a hundred others are being validated on the Internet. But until the full spectrum of success and adoption factors gets addressed, enterprises will not embrace these new innovations as central to their business.
As we describe these four legs to the total open solution, we will sometimes point to our Citizen Dan initiative . That is not because of some universal applicability of the system to the enterprise; indeed Citizen Dan is mostly targeted to local communities and municipalities. But, Citizen Dan does represent the first instance known to us where each of these total open solution success factors is being explicitly recognized and developed. We think the approach has some transferability to the broader enterprise.
Let’s now discuss these four legs in turn.
Of course, the genesis of this series is grounded in open source software and what it needs to do in order to find broader enterprise acceptance. Clearly that is the first leg amongst the four to be discussed. We also have acknowledged that, generally, best-of-breed open source software is also better documented at the code level, and has documented APIs. We will return to this topic under Leg Four below.
Open source software useful to the enterprise is often a combination of individual open source packages. Some successful vendors of open source to the enterprise in fact began as packagers and documenters of multiple packages. Red Hat for Linux or Alfresco in document management or Pentaho in business intelligence come to mind, as examples.
In the case of Citizen Dan, here are the open source packages presently contained in its offering: Linux (Ubuntu), Apache, MySQL, PHP (these comprising the LAMP stack), Drupal, a variety of third-party Drupal modules, Virtuoso, Solr, ARC2, Smarty, Yahoo UI, TinyMCE, Axiis, Flex, ClearMaps, irON, conStruct, structWSF, and some others. Such combinations of packages are not unusual in open source settings, since new value-add typically comes from extensions to existing systems or unique ways to combine or package them. For example, the installation guide for structWSF alone is quite comprehensive with multiple configuration and test scripts.
Thus, besides direct software, it is also critical that configuration, settings, installation guidance and the like be addressed to enable relatively straightforward set-up. This is an area of frequent weakness. Targeting it directly is a not-so-secret factor for how some vendors have begun to achieve some success with the enterprise market.
All software works on data. While some data is unstructured (such as plain text) and some is semi-structured (such as HTML or Web pages that mixes markup with text), the objective of information extraction or natural language processing is to extract the “structure” from such sources. Once extracted, such structure can interoperate on a common footing with the structured data common to standard databases.
Thus, we use “structure” to denote the concepts and their relationships (the “schema” or “ontology”) and the indicators and data (attributes and values) to describe them, and the “entities” (distinct individuals or nameable instances) that populate them. In other words, “structure” refers to all of the schema (concepts + relationships) + data + attributes + indicators + records that make up the information upon which software can operate.
Structure exists in many forms and serializations. Generally, software represents its internal information in one or a few canonical storage and manipulation formats, though that same software may also be able to import (ingest) or export its information and data in many different external formats.
In our semantic enterprise work, especially with its premise in ontology-driven applications using adaptive ontologies, structure is an absolutely essential construct. But, frankly, no information technology system exists that does not also depend on structure to a more or less greater extent.
The interplay between software and structure is one source of expertise that vendors guard closely and use to competitive advantage. In years past, proprietary software could partially hide the bases for performance or algorithmic advantages. Expert knowledge and intimate familiarity with these systems was the other bases to keep these advantages closely held.
It is perhaps not too surprising given this history, then, that the software industry really has very little emphasis or discussion on the interaction between software and structure. But, if software is being brought in as open source, where is the accompanying expertise or guidance for how data structure can be used to gain full advantage? The same acquired knowledge that, say, accompanied the growth of relational databases in such areas as schema development, materialized views or (de)normalization now needs to be made explicit and exposed for all sorts of open source systems.
In the realm of the semantic enterprise we are seeing attempts at this via open source ontologies and greater emphasis on APIs and documentation of same. Citizen Dan, for example, will be first publicly released with an accompanying MUNI ontology as a reference schema and starting point. Descriptions and methods for how to obtain indicator data and relevant attribute and entity information for the domain will also accompany it.
As open source software continues to emphasize semantics and interoperability, exemplar structures and best practices will need to be an essential part of the technology transfer. Just as the “secrets” of much software began to be opened up via open source, so too must the locked-up expertise of experts and practitioners in how to effectively structure data be exposed.
The need for structure explication and guidance is but one unique slice of a much broader need to expose methods and best practices surrounding a given information management initiative. The reason that any open source software might be adopted in the first place is based on the hope for some improved information management process.
Recently I have been touting MIKE2.0, the first open source, replicable and extensible framework for organizing and managing information in the enterprise. MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. It can be applied to any type of information development.
MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 and its forthcoming extensions, one of which we have developed for the semantic enterprise and are now extending into the semantic government in the context of Citizen Dan, are exciting because they provide a systematic approach and guidance for how (and for what!) to document new projects and initiatives. What MIKE2.0 represents is the first time that the embedded, proprietary expertise of traditional IT consultants has been exposed for broader use and extension.
The real premise behind any approach like MIKE2.0 or variants is to codify the expertise and knowledge that was previously locked up by experts and practitioners. The framework in MIKE2.0 provides a structure by which knowledge bases of background information can be assembled to accompany an open source project. This structure extends from initial evaluation and design all the way through operation and end of life.
The ‘CIS DocWiki’ that is being developed to accompany Citizen Dan is such an example of a MIKE2.0-informed knowledge base. At present, the CIS DocWiki has more than 300 specific articles useful to community indicator systems for local governments, and a complete deployment and maintenance methodology. By public release, it will likely be 2-3 times that size. All of this will be downloadable and installable as a wiki, and as open source content, ready for branding and modification for any local circumstance. CIS DocWiki is a natural methods and documentation complement to the Citizen Dan software and its MUNI structure. Release is scheduled for summer.
As we will focus on in Part 3 of this series, we are combining a MIKE2.0 organizational approach with a documentation and single-source publication platform to fulfill the method and documentary aspects of projects. It was really through the advantages gained by the combination of these pieces that we began to see the inadequacy of many current open source projects for the enterprise.
This series began in part with a recognition that superior open source projects are often the better documented ones. But, even there, documentation is often restricted to code-level documentation or perhaps APIs.
As the material above suggests, documentation needs to extend well beyond software. We need documentation of structure, methods, best practices, use cases, background information, deployment and management, and changing needs over the lifetime of the system. And, as we have also seen in Part 1, the lifetime of that system might be measured in decades.
Documentation is no equal to paid partners and their expertise. But, documentation can be cheaper, and if that documentation is sufficient, might be a means for changing the equation in how IT projects are solicited, acquired and managed.
Today, enterprises appear to be stuck between two difficult choices: 1) the traditional vendor lock-in approach with high costs and low innovation; or 2) open source with minimal documentation and vendor knowledge and little assurance of support longevity.
These trade-offs look pretty unpalatable.
Documentation alone, even as extended into the other legs of the solution, is not prima facie going to be a deal maker. But, its absence, I submit, is a deal breaker. Just as open source itself has taken some years to build basic comfort in the enterprise, so too a concerted attack on all acceptance factors may be necessary before actual wide adoption occurs.
The ‘CIS DocWiki’ platform noted for Citizen Dan we hope will be an exemplar for this combination of documentation and methodology. It is a single-source publishing platform that allows the entire knowledge base behind a given IT initiative to be used for collaboration, operational, training or collateral purposes. And all of this is based on open source software.
Software vendors need to recognize these documentation factors and build their ventures for success. Yes, writing code and producing software is a lot more fun and rewarding than (yeech) documentation. But, unless our current generation of vendors that is committed to open source and its benefits takes its markets seriously — and thus commits to the serious efforts these markets demand — we will continue to see minimal uptake of open source in the enterprise.
Each of these four legs of a total open solution can interact with and reinforce the other parts. Once one begins to see the problem of open source adoption in the enterprise as a holistic one, a new systems-level perspective emerges.
Enterprises know full well that software is only one means to address an information management problem, and only a first step at that. Traditional vendors to the enterprise also understand this, which is why through their embedded systems and built-up expertise they have been able to perpetuate what often amounts to a monopoly position.
Pressures are building for a earthquake in the IT landscape. Enterprises are on an anvil of global competition and limited resources. Existing IT systems are not up to the task but too expensive and embedded to abandon. Traditional vendors have near monopoly positions and little incentive to innovate. New software vendors don’t have the expertise and gravitas to handle enterprise-scale challenges. Meanwhile, the rest of the globe is leapfrogging embedded systems with agile, Web-based systems.
The true innovation that is occurring is all based around open source, nurtured by the global computing platform of the Internet, and fueled by countless individuals able to compete on downward-spiraling cost bases. But on so many levels, open source as presently constituted, either fails or poses too many risks to the commercial enterprise.
The Internet itself was the basis of a paradigm shift, but I think we are only now seeing its manifestation at the enterprise level. We are also now seeing global reordering and changes of the economic order. How will companies respond? How will their IT systems adapt? And what will new vendors need to do and recognize in order to thrive in this changing environment?
I’m not sure I have found the language or rhetoric to convey what I see coming, and coming soon. I know open source is part of it; I know enterprises need it; and I know what is presently being offered does not meet the test.
As I noted in our first part, the mantra that we use in Structured Dynamics to express this challenge is, “We’re Successful When We’re Not Needed“. I think the essence behind this statement is that premises of dependency or proprietary advantage will not survive the jet streams of change that are blowing away the old order.
Sound like too much hyperbole? Actually, my own gut feeling is that it is not nearly enough.
In any case, windy rhetoric always falls short if there is not some actionable next steps. In these first two parts of this series, I have tried to present the ingredients that need to go into the cake. In the third part I try to offer a new, and complementary, open source means for bringing stability to the foundation.
In all cases, though, I think these challenges are permanent ones and do not lend themselves to facile solutions. Four legs, or seven foundations, or twelve steps are all just too simplistic for dealing with the global and complex tsunamis blowing away the old order.
One really does not need to lick a finger to sense the direction of these winds of change. It is coming, and coming hard, and all of it is from the direction of open source. What enterprises do, and what the vendors who want to serve them do, is perhaps less clear. I think open source offers a way out of the box in which enterprise IT is currently stuck. But, at present, I also think that most open source options do not have the necessary legs to stand on.
As I reported about a year ago after my first attendance, I think the Semantic Technology Conference is the best venue going for pragmatic discussion of semantic approaches in the enterprise. I’m really pleased that I will be attending again this year. The conference (#SemTech) will be held at the Hilton Union Square in downtown San Francisco on June 21-25, 2010. Now in its sixth year and the largest of its kind, it is again projected to attract 1500 attendees or so.
A really exciting presentation for us is, Sizzle for the Steak: Rich, Visual Interfaces for Ontology-driven Apps, on Wed, June 23 in the 2:00 PM – 3:00 PM session.
A nagging gap in the semantic technology stack is acceptable — better still, compelling — user experiences. After our exile for a couple of years doing essential infrastructure work, we have been unshackled over the past year or so to innovate on user interfaces for semantic technologies.
Our unique approach uses adaptive ontologies to drive rich Internet applications (RIAs) through what we call “semantic components.” This framework is unbelievably flexible and powerful and can seamlessly interact with our structWSF Web services framework and its conStruct Drupal implementations.
We will be showing these rich user interfaces for the first time in this session. We will show concept explorers, “slicer-and-dicers”, dashboards, information extraction and annotation, mapping, data visualization and ontology management. Get your visualization anyway you’d like, and for any slice you’d like!
While we will focus on the sizzle and demos, we will also explain a bit of the technology that is working behind Oz’s curtain.
A more informal, interactive F2F discussion will be, MIKE2.0 for the Semantic Enterprise, on Thurs, June 24 in the 4:45 PM – 5:45 PM slot.
MIKE2.0 (Method for an Integrated Knowledge Environment) is an open source methodology for enterprise information management that is coupled with a collaborative framework for information development. It is oriented around a variety of solution “offerings”, ranging from the comprehensive and the composite to specific practices and technologies. A couple of months back, I gave an overview of MIKE2.0 that was pretty popular.
We have been instrumental in adding a semantic enterprise component to MIKE2.0, with our specific version of it called Open SEAS. In this Face-to-Face session, experts and desirous practitioners will join together to discuss how to effectively leverage this framework. While I will intro and facilitate, expect many other MIKE2.0 aficionados to participate.
This is perhaps a new concept to many, but what is exciting about MIKE2.0 is that it provides a methodology and documentation complement to technology alone. When combined with that technology, all pieces comprise what might be called a total open solution. I personally think it is the next logical step beyond open source.
So, if you have not already made plans, consider adjusting your schedule today. And, contact me in advance (mike at structureddynamics dot com) if you’ll be there. We’d love to chat!