Posted:June 11, 2010

How Shall We Measure Progress Over the Past Three Years?

Friday     Brown Bag Lunch
Colorado  Interstate construction - 1970; courtesy National ArchivesFor a dozen years, my career has been centered on Internet search, dynamic content and the deep Web. For the past few years, I have been somewhat obsessed by two topics.

The first topic, a conviction really, is that implicit structure needs to be extracted from Web content to enable it to be disambiguated, organized, shared and re-purposed. The second topic, more an open question as a former academic married to a professor, is what might replace editorial selections and peer review to establish the authoritativeness of content. These topics naturally steer one to the semantic Web.

A Millennial Perspective

The semantic Web, by whatever name it comes to be called, is an inevitability. History tells us that as information content grows, so do the mechanisms for organizing and managing it. Over human history, innovations such as writing systems, alphabetization, pagination, tables of contents, indexes, concordances, reference look-ups, classification systems, tables, figures, and statistics have emerged in parallel with content growth [19].

When the Lycos search engine, one of the first profitable Internet ventures, was publicly released in 1994, it indexed a mere 54,000 pages [1]. When Google wowed us with its page-ranking algorithm in 1998, it soon replaced my then favorite search engine, AltaVista. Now, tens of billions of indexed documents later, I often find Google’s results to be overwhelming dross — unfortunately true again for all of the major search engines. Faceted browsing, vertical search, and Web 2.0′s tagging and folksonomies demonstrate humanity’s natural penchant to fight this entropy, efforts that will next continue with the semantic Web and then mechanisms unforeseen to manage the chaos of burgeoning content.

An awful lot of hot air has been expelled over the false dichotomy of whether the semantic Web will fail or is on the verge of nirvana. Arguments extend from the epistemological versus ontological (classically defined) to Web 3.0 versus SemWeb or Web services (WS*) versus REST (Representational State Transfer). My RSS feed reader points to at least one such dust up every week.

Some set the difficulties of resolving semantic heterogeneities as absolutes, leading to an illogical and false rejection of semantic Web objectives. In contrast, some advocates set equally divisive arguments for semantic Web purity by insisting on formal ontologies and descriptive logics. Meanwhile, studied leaks about “stealth” semantic Web ventures mean you should grab your wallet while simultaneously shaking your head.

A Decades-Long Perspective

My mental image of the semantic Web is a road from here to some achievable destination — say, Detroit. Parts of the road are well paved; indeed, portions are already superhighways with controlled on-ramps and off-ramps. Other portions are two lanes, some with way too many traffic lights and some with dangerous intersections. A few small portions remain unpaved gravel and rough going.

1919 Wreck in Nebraska

Wreck in Nebraska during the 1919 Transcontinental Motor Convoy

A lack of perspective makes things appear either too close or too far away. The automobile isn’t yet a century old as a mass-produced item. It wasn’t until 1919 that the US Army Transcontinental Motor Convoy made the first automobile trip across the United States.

The 3,200 mile route roughly followed today’s Lincoln Highway, US 30, from Washington, D.C. to San Francisco. The convoy took 62 days and 250 recorded accidents to complete the trip (see figure), half on dirt roads at an average speed of 6 miles per hour. A tank officer on that trip later observed Germany’s autobahns during World War II. When he subsequently became President Dwight D. Eisenhower, he proposed and then signed the Interstate Highway Act.

That was 50 years ago. Today, the US is crisscrossed with 50,000 miles of interstates, which have completely remade the nation’s economy and culture [2].

Today’s Perspective

Like the interstate system in its early years, today’s semantic Web lets you link together a complete trip, but the going isn’t as smooth or as fast as it could be. Nevertheless, making the trip is doable and keeps improving day by day, month by month.

My view of what’s required to smooth the road begins with extracting structure and meaningful information according to understandable schema from mostly uncharacterized content. Then we store the now-structured content as RDF triples that can be further managed and manipulated at scale. By necessity, the journey embraces tools and requirements that, individually, might not constitute semantic Web technology as some strictly define it. These tools and requirements are nonetheless integral to reaching the destination. We are well into that journey’s first leg, what I and others are calling the structured Web.

For the past six months or so I have been researching and assembling as many semantic Web and related tools as I can find [3]. That Sweet Tools listing now exceeds 500 tools [4] (with its presentation using the nifty lightweight Exhibit publication system from MIT’s Simile program [5]). I’ve come to understand the importance of many ancillary tool sets to the entire semantic Web highway, such as natural language processing and information extraction. I’ve also found new categories of pragmatic tools that embody semantic Web and data mediation processes but don’t label themselves as such.

In its entirety, the Sweet Tools listing provides a pretty good picture of the semantic Web’s state. It’s a surprisingly robust picture — though with some notable potholes — and includes impressive open source options in all categories. Content publishing, indexing, and retrieval at massive scales are largely solved problems. We also have the infrastructure, languages, and (yes!) standards for tying this content together meaningfully at the data and object levels.

I also think a degree of consensus has emerged on RDF as the canonical data model for semantic information. RDF triple stores are rapidly improving toward industrial strength, and RESTful designs enable massive scalability, as terabyte- and petabyte-scale full-text indexes prove.

Powerful and flexible middleware options, such as those from OpenLink [6], can transform and integrate diverse file formats with a variety of back ends. The World Wide Web Consortium’s GRDDL standard [7] and related tools, plus various “RDF-izers” from Massachusetts Institute of Technology and elsewhere [8], largely provide the conversion infrastructure for getting Web data into that canonical RDF form. Sure, some of these converters are still research-grade, but getting them to operational capabilities at scale now appears trivial.

Things start getting shakier when trying to structure information into a semantic formalism. Controlled vocabularies and ontologies range broadly and remain a contentious area. Publishers and authors perhaps have too many choices: from straight Atom or RSS feeds and feeds with tags to informal folksonomies and then Outline Processor Markup Language [9] or microformats [10]. From there, the formalism increases further to include the standard RDF ontologies such as SIOC (Semantically-Interlinked Online Communities), SKOS (Simple Knowledge Organizing System), DOAP (Description of a Project), and FOAF (Friend of a Friend) [11] and the still greater formalism of OWL’s various dialects [12].

If we compare the semantic Web to the US interstate highway system, we’re still in the early stages of a journey that will remake our economy and culture.
Many potholes on the road to the semantic Web exist.
One ready task is to transform existing structure to RDF. Another priority is to refine tools to extract structure and meaningful information from uncharacterized content.

Arguing which of these is the theoretical best method is doomed to failure, except possibly in a bounded enterprise environment. We live in the real world, where multiple options will always have their advocates and their applications.

All of us should welcome whatever structure we can add to our information base, no matter where it comes from or how it’s done. The sooner we can embrace content in any of these formats and convert it into canonical RDF form, we can then move on to needed developments in semantic mediation, some of the roughest road on the journey.

Potholes on the Semantic Highway

Semantic mediation requires appropriate structured content. Many potholes on the road to the semantic Web exist because the content lacks structured markup; others arise because existing structure requires transformation. We need improved ways to address both problems. We also need more intuitive means for applying schema to structure. Some have referred to these issues as “who pays the tax.”

Recent experience with social software and collaboration proves that a portion of the Internet user community is willing to tag and characterize content. Furthermore, we can readily leverage that resulting structure, and free riders are welcomed. The real pothole is the lack of easy — even fun — data extractors and “structurizers.” But we’re tantalizingly close.

Tools such as Solvent and Sifter from MIT’s Simile program [13] and Marmite from Carnegie Mellon University [14] are showing the way to match DOM (document object model) inspectors with automated structure extractors. DBpedia, the alpha version of Freebase, and System One now provide large-scale, open Web data sets in RDF [15], including all of Wikipedia. Browser extensions such as Zotero [16] are showing how to integrate structure management into acceptable user interfaces, as are services such as Zoominfo [17]. Yet we still lack easy means to design the differing structures suitable for a plenitude of destinations.

Amazingly, a compelling road map for how all these pieces could truly fit together is also incomplete. How do we actually get from here to Detroit? Within specific components, architectural understandings are sometimes OK (although documentation is usually awful for open source projects, as most of the current tools are). Until our community better documents that vision, attracting new contributors will be needlessly slower, thus delaying the benefits of network effects.

So, let’s create a road map and get on with paving the gaps and filling the potholes. It’s not a matter of standards or technology — we have those in abundance. Let’s stop the silly squabbles and commit to the journey in earnest. The structured Web‘s ability to reach Hyperland [18], Douglas Adam’s prescient 1990 forecast of the semantic Web, now looks to be no further away than Detroit.

Friday      Brown Bag Lunch This Friday brown bag leftover was first placed into the AI3 refrigerator about three years ago on May 3, 2007.  The piece was my answer to a request by Jim Hendler to pen some thoughts on the semantic Web, based on I believe what he thought might be a pragmatic perspective combining Internet business with Web science. The formal piece appeared as a guest editorial in the May/June 2007 issue of IEEE Intelligent Systems. What appears above is unaltered from my original posting (aside from some minor formatting clean-up and — sorry to say — some of the projects are now defunct).

[1] Chris Sherman, “Happy Birthday, Lycos!,” Search Engine Watch, August 14, 2002. See http://searchenginewatch.com/showPage.html?page=2160551.
[2] David A. Pfeiffer, “Ike’s Interstates at 50: Anniversary of the Highway System Recalls Eisenhower’s Role as Catalyst,” Prologue Magazine, National Archives, Summer 2006, Vol. 38, No. 2. See: http://www.archives.gov/publications/prologue/2006/summer/interstates.html.
[3] The mention of specific tool names is meant to be illustrative and not necessarily a recommendation.
[6] OpenLink Software’s Virtuoso and Data Spaces products; see http://www.openlinksw.com/.
[7] W3C’s Gleaning Resource Descriptions from Dialects of Languages (GRDDL, pronounced “griddle”). See http://www.w3.org/2004/01/rdxh/spec.
[9] Outline Processor Markup Language (OPML); see http://www.opml.org/.
[10] Microformats; see http://microformats.org/.
[12] W3C’s Web Ontology Language (OWL). See http://www.w3.org/TR/owl-features/.
[13] Solvent (http://simile.mit.edu/wiki/Solvent) and Sifter (http://simile.mit.edu/wiki/Sifter) are from MIT’s Simile program.
[14] Marmite (http://www.cs.cmu.edu/~jasonh/projects/marmite/) is from Carnegie Mellon University.
[15] DBpedia (http://dbpedia.org/docs/) and Freebase (in alpha, by invitation only at http://www.freebase.com/) are two of the first large-scale open datasets on the Web; Wikipedia has also been converted to RDF by System One (http://labs.systemone.at/wikipedia3).
[16] Zotero is produced by George Mason University’s Center for History and New Media; see http://www.zotero.org.
[17] ZoomInfo (http://www.zoominfo.com/) provides online structured search of companies and people, plus broader services to enterprises.
[18] The late Douglas Adams, of Doctor Who and A Hitchhiker’s Guide to the Galaxy fame, produced a TV program for BBC2 presaging the Internet called Hyperland. This 50-min video can be seen in five parts via YouTube at Part 1 of 5, 2 of 5, 3 of 5, 4 of 5 and 5 of 5.
[19] Since I first wrote this piece, I have systematized these developments in my Timeline of Information History.
Posted:May 31, 2010

Total Open Solution

Introducing the Open Source ‘DocWiki’ System

In the first part to this series, we began with the argument that open source software alone was not sufficient to meet the required acceptance factors in the enterprise. As a guiding way to create the right mindset around these issues we shared the saying that we have adopted at Structured Dynamics that, “We’re successful when we are not needed.”

In the second part of this series we described the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we termed this a total open solution.

Now, in this third and concluding part to our series, we introduce the open source documentation and methodology system called ‘DocWiki’. It complements the base open source software, in the process completing the conditions for a total open solution.

Though we call this system ‘DocWiki’, it is not meant to be a brand or particular product description for what Structured Dynamics is offering. Rather, ‘DocWiki’ is merely a placeholder name for a generic, open source system and knowledge base that can be downloaded, installed, branded, modified and extended in whatever way the user sees fit. ‘DocWiki’ is a baseline documentation and methodology “starter kit” that can be dressed up in new clothes or packaged and named in whatever manner best suited to a given deployment.Citizen Dan Community Indicators System

In describing the major components of this ‘DocWiki’ system we will again use our Citizen Dan initiative [1] as we did in Part 2. This gives us a real use case, though the same approach is applicable to any open source information management initiative by enterprises.

We call the specific version of the ‘DocWiki’ used in the case of Citizen Dan the ‘CIS DocWiki‘ (for community indicator systems), specific to the domain and local government focus of Citizen Dan. Similarly, the structured vocabulary and ontology that guides the system is the MUNI ontology. For other information development initiatives, the specific content of these components would be swapped out for ones appropriate to that initiative.

Overview of the ‘DocWiki’ System

A number of desires and objectives intersected to guide the design of the ‘DocWiki’ system. We wanted:

  • A consolidated knowledge base with complete, turnkey implementation content
  • A collaborative document authoring system with authoring tools comfortable to most knowledge workers
  • A version control system to enable rollbacks and restoration of prior official versions
  • A system that would enable and facilitate the collection and import of relevant content; in our own case, that included widely distributed internal content in many forms and locations plus relevant external content (such as defined items in Wikipedia)
  • A document management framework that would allow existing content to be mixed, combined and re-purposed for different uses, from training to marketing collateral
  • A single source publishing system that would allow content to be published as paper documents, PDFs, Web pages and the like
  • A system that could be easily themed, skinned and branded, tailored for any given deployment or circumstance, and
  • A system built entirely from open source components and with content that had no restrictions on use or re-use.

In first formulating this design, our assumption was the major building blocks would be an open source document management system linked with some form of version control. Though we think such a formulation could work OK, our exposure to the MIKE2.0 methodology actually caused us to re-look at and re-think a wiki-based approach. Ultimately the trump card that decided the design for us was familiarity and ease-of-use.

The resulting architecture of the full ‘DocWiki’ system is shown below:

The Full DocWiki System

(click for full size)

What is cool about this design is that a single software download install with a few extensions (Mediawiki, the Wikipedia software, plus some standard extensions and judicious use of Semantic Mediawiki) and a single loadable database are all that is required to transfer and install the ‘DocWiki’ system.

To better describe this system, we will focus on three major interconnecting pieces in this architectural diagram: the knowledge base; the vocabulary and structure (ontology); and the authoring and publishing system (wiki).

The DocWiki Knowledge Base

The ‘DocWiki’ Knowledge Base

The pre-loaded content for the ‘DocWiki’ system comes from its knowledge base. This is provided as a text-exported MySQL database that can be modified en masse before loading (such as substituting ‘YourName’ for ‘DocWiki’). The exemplar upon which this knowledge base is modeled is the MIKE2.0 framework.

MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.

MIKE2.0 has a generalized methodology and set of templates applicable to initiatives, the phases, activities and tasks to undertake them, and supporting assets. Supporting assets can range from glossaries and definition of terms and concepts to very specific technical documents or background material. The entire system is logical and applies a consistent design and organizational structure and categories.

For our purposes, we wanted a complete, turnkey content knowledge base. This meant that we needed to accommodate all forms of project management and guidance, ranging from specific “how-to” and technical discussions to the entire suite of background and supporting material. The scope of this knowledge content is defined as what a new person assigned a lead or implementation responsibility would need to read or master.

As a destination site MIKE2.0 is quite broad: it embraces the ability to model virtually any information management initiative. This makes MIKE2.0 an invaluable source of structure and methodology guidance, but also results in it being quite limited in the specific how-tos associated with any given initiative. I have earlier spoken about the structure of MIKE2.0 and in particular its applicability to the semantic enterprise.

The strength of MIKE2.0, however, is that its structure can be grabbed and quickly applied to form an organizational and structural basis for filling out the knowledge base for any specific information development initiative. And, that is exactly what we did with the ‘CIS DocWiki.’

MIKE2.0 hosts and maintains its project-related structure in Mediawiki (with some extensions). Combined with its templates, this provides a rapid-start baseline for beginning to tailor and flesh out the specific details for a given information management initiative. Thus, after copying broad aspects of the MIKE2.0 system into the incipient ‘DocWiki’, it was relatively straightforward to let the existing structure and templates of MIKE2.0 guide next steps.

As of today’s date, the ‘CIS DocWiki’ contains about 300 substantive articles, a complete activity and tasking structure, and various re-usable templates based on Semantic Mediawiki for structured and consistent access and retrieval. New tasks and structure can be readily added to the system. Existing structure or content can be deleted or marked as archive for non-display. We are still gathering all requisite content pieces, and anticipate by first public release that the baseline knowledge base will include 2x to 3x the scale of its current content.

For new ‘CIS DocWiki’ (or Citizen Dan-based) deployments, this means the knowledge base can be completely modified and extended for local circumstances. The set-up of the Mediawiki instance is separate from the loading or modification of the knowledge base, which means the look-and-feel of the entire system, not to mention user rights and permissions, can also be readily tailored for local requirements.

The core content of the ‘CIS DocWiki’ and its basis in a set structure and methodology (derived from MIKE2.0) means that the knowledge base is also adaptable for other broader information development areas, especially in the semantic enterprise or semantic government arenas. Thus, while Structured Dynamics is first releasing the ‘CIS DocWiki’ in the context of Citizen Dan and semantic government, we also are developing a parallel instance for the Open SEAS approach to the semantic enterprise.

The approach taken here is somewhat different than the standard wiki use. As experts, we are basically sole authoring (with contributions from selected collaborators and our clients) the starting basis of the knowledge base. Unlike many wikis, this enables us to be quite consistent in content, style, and organization. Such an approach allows us to present a coherent and complete starting content and methodology foundation. However, once delivered and installed for a given deployment, its users are then free to extend and change this knowledge foundation in the standard wiki manner. Whether those subsequent extensions are free-form or more tightly controlled and managed is the choice of the new deployment’s administrators.

The Supporting MUNI Structure

The Supporting MUNI Structure

Strictly speaking, the vocabularies and structures (including, of course, ontologies) that drive our semantic government or semantic enterprise offerings are also part of the knowledge base.  And, in fact, many of these aspects, especially related to the actual operating of the instances, are included as part of the standard knowledge base.

However, the applicable domain ontology itself is separately maintained. Descriptions of how to use and modify such ontologies are part of the general ‘DocWiki’ knowledge base, but the ontology is not. This arm’s length-separation is done to acknowledge that the ontology has independent use and value apart from the knowledge base or the software (Citizen Dan, in this case) that is the focus of it.

In the Citizen Dan instance, this structure is the MUNI ontology. MUNI is a general local government domain ontology that can find use in a broad array of circumstances, using or not Citizen Dan. Thus, like other ontologies developed and maintained by Structured Dynamics, such as BIBO (the Bibliographic Ontology), the ontology itself and its documentation, discussion forums and use cases are maintained separately.

The first release of MUNI is still under development and will be released this summer.

The Wiki/Publication Portion of DocWiki

The Wiki/Publication Portion of ‘DocWiki’

The software framework that hosts and manages all of this content is the Mediawiki software, originally developed for Wikipedia. This framework is supported by a number of standard extensions packaged with the ‘DocWiki’ distribution. One of the more notable extensions is Semantic Mediawiki. Mediawiki also is the wiki framework underlying MIKE2.0, so content sharing between the systems is straightforward.

The Collaborative Wiki Portion

The first use of the ‘DocWiki’ is to add new content to the knowledge base and to modify or extend what is provided in the baseline. For straight authoring, ‘DocWiki’ offers the standard wikitext basis for content entry and editing, as well as the WikED enhanced editor and the FCKEditor WYSIWYG rich-text editor. Each of these may be turned on or off at will.

All of the baseline content is fully organized and categorized via a standard structure. Pre-existing templates aid in entering new content in specific areas consistently or in providing standard administrative ways of tagging content for completeness or need for editorial attention. Tasks and concepts, in particular, follow set ways of entry and description. These set templates, some forms-based and some derived from Semantic Mediawiki, are also tied into automatic internal scripts for listing and organizing various items. So long as new material is entered properly, it will be reflected in various stats and listings. Unlike sole reliance on Semantic Mediawiki, the ‘DocWiki’ approach is a mix of standard wiki categories and semantic types. Both are used for effective organization of the knowledge base.

Besides the knowledge base of domain content and “how-to”, the system also comes pre-packaged with many wiki “how-to” and best practices guidance for using the system effectively and consistently. Of course, a given deployment may or may not enforce all of these practices. A poorly administered instance, for example, could degenerate fairly quickly and lose the native structure and organization of the baseline system.

As with standard wikis, there is a history of prior page revisions that gives the system rollback and version control. Mediawiki has a pretty good user access and permissions framework ranging from access, reading, editing and to uploads.

Besides the standard and required extensions, ‘DocWiki’ also comes packaged with the necessary settings and configuration files to operate “out-of-the-box” in its designed baseline mode. Of course, these settings, too, can be changed and modified by site administrators, and ‘DocWiki’ also includes guidance on how to do that.

The Publication Portion

A little known but highly useful part of the Mediawiki API allows direct export of XHTML content [2]. Then, with minor XSLT conversion templates, it is possible to strip out wiki-specific conventions (such as the editing of individual sections) or to create straight XML versions. When this is combined with the use of internal ‘DocWiki’ CSS style sheets that impose some clean and semantic style identifiers, a common canonical output basis for content is possible.

From that point, a given deployment may use its own CSS styles to theme output content. Output Web pages (XHTML) or XML files then can be processed using existing and accurate utilities to produce PDF or *.doc documents. Then, with systems such as OpenOffice, an even wider variety of document formats can be produced. These facilities mean that the ‘DocWiki’ can also act as a single-source publishing environment.

In its initial release, re-purposing ‘DocWiki’ content into other presentations (for example, combining sections from multiple pages into a new document as opposed to re-using existing pages as is) will require creating new wiki pages and then cutting-and-pasting the desired content. However, it should also be noted that both DocBook and DITA have been applied to Mediawiki installations [3]. It should be possible to enable a more flexible re-purposing framework for ‘DocWiki’ moving into the future.

When Available

The ‘CIS DocWiki’ is meant to accompany the first release of Citizen Dan, likely by the end of summer. The MUNI ontology will also be released roughly at the same time. At release, the ‘CIS DocWiki’ is anticipated to have on the order of 500-800 baseline content and “how to” articles.

Depending on time availability and other commitments, Structured Dynamics will also be using this information to build a semantic government composite offering to MIKE2.0. We will be contributing this new offering for free, similar to what we have done earlier for a semantic enterprise offering.

Subsequent to those events, we will then be modifying the ‘CIS DocWiki’ for the semantic enterprise domain. Much of the necessary content will have already been assembled for the ‘CIS DocWiki’.

Conclusions and Applicability

Paradoxically, while developing such knowledge bases and systems such as ‘DocWiki’ appears to be extra work, from our standpoint as developers it is useful and efficient. Structured Dynamics already researches and assembles much material and tries to “document as it goes.” Having the ‘DocWiki’ framework not only provides a consistent and coherent way to organize that information, but it also helps to point out essential gaps in our offerings.

The ‘DocWiki’ delivers the methods, documentation and portions of the structure to a total open solution. The ‘DocWiki’ is the primary means — along with software development and accompanying code-level and API documentation, of course — for us to fulfill our mantra that “We’re successful when we are not needed.” As we pointed out in Part 1 of this series, we really think such an attitude is ultimately a self-interested one. The better we can address the acceptance factors in the enterprise for our offerings, the more opportunities we will gain.

We would like to think that other enlightened open source software developers, especially those in the semantic space but certainly not limited to them, will see the wisdom of this four-legged foundation to total open solutions. Up until now, pragmatic guidance for what it takes to create a complete open source offering to businesses and enterprises has been lacking.

The tools, methods, and workflows all exist for making total open solutions real today. All of the pieces are themselves open source. There are many useful guides for best practices across the pipeline. It is just that — prior to this — no one apparently took the time to assemble and articulate them. We think this three-part series and some of the “how to” guidance in the ‘DocWiki’ system can help fix this oversight.

Ultimately, with wider adoption by developers, goaded in part by demands of the marketplace for them, we would hope that additional innovations and ideas may be forthcoming to improve the industry’s ability to offer total open source solutions. Adding just a small bit of attentive effort to how we organize and package what we know is but a small price to pay for greater acceptance and success.


[1] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.
[2] Clean XHTML can be generated directly from the Mediawiki API. This can be done directly via URL with the action=render command. See for example: http://www.mediawiki.org/wiki/API:Parsing_wikitext.
[3] For example, there are a number of paths to migrate from HTML or XHTML to DocBook; see http://wiki.docbook.org/topic/Html2DocBook. But, there is a specific project that also goes directly from Mediawiki; see http://code.google.com/p/gwtwiki/wiki/Mediawiki2Docbook.

Posted by AI3's author, Mike Bergman Posted on May 31, 2010 at 10:08 pm in Adaptive Innovation, MIKE2.0, Open Source | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/884/listening-to-the-enterprise-total-open-solutions-part-3/
The URI to trackback this post is: http://www.mkbergman.com/884/listening-to-the-enterprise-total-open-solutions-part-3/trackback/
Posted:May 25, 2010

Broken Chair sculpture, Geneva
The Four Legs to a Stable Open Source Solution

In the first part to this series, we put forward the argument that incomplete provision of important support factors was limiting the adoption of open source software in the enterprise. We can liken the absence of these factors to having a chair with one or more absent or broken legs.

This second part of the series goes into the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.

These considerations are not simply a matter of idle curiosity. New approaches and new methods are required for enterprises to modernize their IT systems while adding new capabilities and preserving sunk assets. Extending and modernizing existing IT is often not in the self-interests of the original supplying vendors. And enterprises are well aware that IT commitments can extend for decades.

While the benefits and capabilities of open source software become apparent by the day, rates of open source software adoption lag in enterprises. We have seen entire Internet-based businesses arise and get huge in just a few short years. But it is the rare existing enterprise that has committed to and embraced similar Web-oriented architectures and IT strategies [1].

The enterprise IT ecosystem is evolving to become an unhealthy one. New software vendors have generally abandoned enterprises as a market. Much more action takes place with consumer apps and Internet plays, often premised on ad-based revenues or buzz and traffic as attractors for acquisition. Existing middle-tier enterprise vendors are themselves being gobbled up and disappearing.  I’m sure all observers would agree that IT software and services are increasingly dominated by a shrinking slate of vendors. I suspect most observers — myself included — would argue that enterprise-based IT innovation is also on the wane.

The argument posed in the first part of this series is that such atrophy should not be unexpected. The current state of open source software is not addressing the realities of enterprise IT needs.

And that is where the other legs of the total open solution come in. In their entirety, they amount to a form of capacity building for the enterprise [2]. It is not simply enough to put forward buzzwords matched with open source software packages. Exciting innovations in social networks, collaboration, semantic enterprise, mobile apps, REST, Web-oriented architectures, information extraction, linked data and a hundred others are being validated on the Internet. But until the full spectrum of success and adoption factors gets addressed, enterprises will not embrace these new innovations as central to their business.Citizen Dan Community Indicators System

As we describe these four legs to the total open solution, we will sometimes point to our Citizen Dan initiative [3]. That is not because of some universal applicability of the system to the enterprise; indeed Citizen Dan is mostly targeted to local communities and municipalities. But, Citizen Dan does represent the first instance known to us where each of these total open solution success factors is being explicitly recognized and developed. We think the approach has some transferability to the broader enterprise.

Let’s now discuss these four legs in turn.

The Software Leg to a Total Open Solution

Leg One: Software

Of course, the genesis of this series is grounded in open source software and what it needs to do in order to find broader enterprise acceptance. Clearly that is the first leg amongst the four to be discussed. We also have acknowledged that, generally, best-of-breed open source software is also better documented at the code level, and has documented APIs. We will return to this topic under Leg Four below.

Open source software useful to the enterprise is often a combination of individual open source packages. Some successful vendors of open source to the enterprise in fact began as packagers and documenters of multiple packages. Red Hat for Linux or Alfresco in document management or Pentaho in business intelligence come to mind, as examples.

In the case of Citizen Dan, here are the open source packages presently contained in its offering: Linux (Ubuntu), Apache, MySQL, PHP (these comprising the LAMP stack), Drupal, a variety of third-party Drupal modules, Virtuoso, Solr, ARC2, Smarty, Yahoo UI, TinyMCE, Axiis, Flex, ClearMaps, irON, conStruct, structWSF, and some others. Such combinations of packages are not unusual in open source settings, since new value-add typically comes from extensions to existing systems or unique ways to combine or package them. For example, the installation guide for structWSF alone is quite comprehensive with multiple configuration and test scripts.

Thus, besides direct software, it is also critical that configuration, settings, installation guidance and the like be addressed to enable relatively straightforward set-up. This is an area of frequent weakness. Targeting it directly is a not-so-secret factor for how some vendors have begun to achieve some success with the enterprise market.

The Structure Leg to a Total Open Solution

Leg Two: Structure

All software works on data. While some data is unstructured (such as plain text) and some is semi-structured (such as HTML or Web pages that mixes markup with text), the objective of information extraction or natural language processing is to extract the “structure” from such sources. Once extracted, such structure can interoperate on a common footing with the structured data common to standard databases.

Thus, we use “structure” to denote the concepts and their relationships (the “schema” or “ontology”) and the indicators and data (attributes and values) to describe them, and the “entities” (distinct individuals or nameable instances) that populate them. In other words, “structure” refers to all of the schema (concepts + relationships) + data + attributes + indicators + records that make up the information upon which software can operate.

Structure exists in many forms and serializations. Generally, software represents its internal information in one or a few canonical storage and manipulation formats, though that same software may also be able to import (ingest) or export its information and data in many different external formats.

In our semantic enterprise work, especially with its premise in ontology-driven applications using adaptive ontologies, structure is an absolutely essential construct. But, frankly, no information technology system exists that does not also depend on structure to a more or less greater extent.

The interplay between software and structure is one source of expertise that vendors guard closely and use to competitive advantage. In years past, proprietary software could partially hide the bases for performance or algorithmic advantages. Expert knowledge and intimate familiarity with these systems was the other bases to keep these advantages closely held.

It is perhaps not too surprising given this history, then, that the software industry really has very little emphasis or discussion on the interaction between software and structure. But, if software is being brought in as open source, where is the accompanying expertise or guidance for how data structure can be used to gain full advantage? The same acquired knowledge that, say, accompanied the growth of relational databases in such areas as schema development, materialized views or (de)normalization now needs to be made explicit and exposed for all sorts of open source systems.

In the realm of the semantic enterprise we are seeing attempts at this via open source ontologies and greater emphasis on APIs and documentation of same. Citizen Dan, for example, will be first publicly released with an accompanying MUNI ontology as a reference schema and starting point. Descriptions and methods for how to obtain indicator data and relevant attribute and entity information for the domain will also accompany it.

As open source software continues to emphasize semantics and interoperability, exemplar structures and best practices will need to be an essential part of the technology transfer. Just as the “secrets” of much software began to be opened up via open source, so too must the locked-up expertise of experts and practitioners in how to effectively structure data be exposed.

The Methods Leg to a Total Open Solution

Leg Three: Methods

The need for structure explication and guidance is but one unique slice of a much broader need to expose methods and best practices surrounding a given information management initiative. The reason that any open source software might be adopted in the first place is based on the hope for some improved information management process.

Recently I have been touting MIKE2.0, the first open source, replicable and extensible framework for organizing and managing information in the enterprise. MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. It can be applied to any type of information development.

MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.

MIKE2.0 and its forthcoming extensions, one of which we have developed for the semantic enterprise and are now extending into the semantic government in the context of Citizen Dan, are exciting because they provide a systematic approach and guidance for how (and for what!) to document new projects and initiatives. What MIKE2.0 represents is the first time that the embedded, proprietary expertise of traditional IT consultants has been exposed for broader use and extension.

The real premise behind any approach like MIKE2.0 or variants is to codify the expertise and knowledge that was previously locked up by experts and practitioners. The framework in MIKE2.0 provides a structure by which knowledge bases of background information can be assembled to accompany an open source project. This structure extends from initial evaluation and design all the way through operation and end of life.

The ‘CIS DocWiki’ that is being developed to accompany Citizen Dan is such an example of a MIKE2.0-informed knowledge base. At present, the CIS DocWiki has more than 300 specific articles useful to community indicator systems for local governments, and a complete deployment and maintenance methodology. By public release, it will likely be 2-3 times that size. All of this will be downloadable and installable as a wiki, and as open source content, ready for branding and modification for any local circumstance. CIS DocWiki is a natural methods and documentation complement to the Citizen Dan software and its MUNI structure. Release is scheduled for summer.

As we will focus on in Part 3 of this series, we are combining a MIKE2.0 organizational approach with a documentation and single-source publication platform to fulfill the method and documentary aspects of projects. It was really through the advantages gained by the combination of these pieces that we began to see the inadequacy of many current open source projects for the enterprise.

The Documentation Leg to a Total Open Solution

Leg Four: Documentation

This series began in part with a recognition that superior open source projects are often the better documented ones. But, even there, documentation is often restricted to code-level documentation or perhaps APIs.

As the material above suggests, documentation needs to extend well beyond software. We need documentation of structure, methods, best practices, use cases, background information, deployment and management, and changing needs over the lifetime of the system. And, as we have also seen in Part 1, the lifetime of that system might be measured in decades.

Documentation is no equal to paid partners and their expertise. But, documentation can be cheaper, and if that documentation is sufficient, might be a means for changing the equation in how IT projects are solicited, acquired and managed.

Today, enterprises appear to be stuck between two difficult choices: 1) the traditional vendor lock-in approach with high costs and low innovation; or 2) open source with minimal documentation and vendor knowledge and little assurance of support longevity.

These trade-offs look pretty unpalatable.

Documentation alone, even as extended into the other legs of the solution, is not prima facie going to be a deal maker. But, its absence, I submit, is a deal breaker. Just as open source itself has taken some years to build basic comfort in the enterprise, so too a concerted attack on all acceptance factors may be necessary before actual wide adoption occurs.

The ‘CIS DocWiki’ platform noted for Citizen Dan we hope will be an exemplar for this combination of documentation and methodology. It is a single-source publishing platform that allows the entire knowledge base behind a given IT initiative to be used for collaboration, operational, training or collateral purposes. And all of this is based on open source software.

Software vendors need to recognize these documentation factors and build their ventures for success. Yes, writing code and producing software is a lot more fun and rewarding than (yeech) documentation. But, unless our current generation of vendors that is committed to open source and its benefits takes its markets seriously — and thus commits to the serious efforts these markets demand — we will continue to see minimal uptake of open source in the enterprise.

An Interacting Whole Greater than the Sum of its Parts

Each of these four legs of a total open solution can interact with and reinforce the other parts. Once one begins to see the problem of open source adoption in the enterprise as a holistic one, a new systems-level perspective emerges.Total Open Solution

Enterprises know full well that software is only one means to address an information management problem, and only a first step at that. Traditional vendors to the enterprise also understand this, which is why through their embedded systems and built-up expertise they have been able to perpetuate what often amounts to a monopoly position.

Pressures are building for a earthquake in the IT landscape. Enterprises are on an anvil of global competition and limited resources. Existing IT systems are not up to the task but too expensive and embedded to abandon. Traditional vendors have near monopoly positions and little incentive to innovate. New software vendors don’t have the expertise and gravitas to handle enterprise-scale challenges. Meanwhile, the rest of the globe is leapfrogging embedded systems with agile, Web-based systems.

The true innovation that is occurring is all based around open source, nurtured by the global computing platform of the Internet, and fueled by countless individuals able to compete on downward-spiraling cost bases. But on so many levels, open source as presently constituted, either fails or poses too many risks to the commercial enterprise.

The Internet itself was the basis of a paradigm shift, but I think we are only now seeing its manifestation at the enterprise level. We are also now seeing global reordering and changes of the economic order. How will companies respond? How will their IT systems adapt? And what will new vendors need to do and recognize in order to thrive in this changing environment?

I’m not sure I have found the language or rhetoric to convey what I see coming, and coming soon. I know open source is part of it; I know enterprises need it; and I know what is presently being offered does not meet the test.

As I noted in our first part, the mantra that we use in Structured Dynamics to express this challenge is, “We’re Successful When We’re Not Needed“. I think the essence behind this statement is that premises of dependency or proprietary advantage will not survive the jet streams of change that are blowing away the old order.

Sound like too much hyperbole? Actually, my own gut feeling is that it is not nearly enough.

In any case, windy rhetoric always falls short if there is not some actionable next steps. In these first two parts of this series, I have tried to present the ingredients that need to go into the cake. In the third part I try to offer a new, and complementary, open source means for bringing stability to the foundation.

In all cases, though, I think these challenges are permanent ones and do not lend themselves to facile solutions. Four legs, or seven foundations, or twelve steps are all just too simplistic for dealing with the global and complex tsunamis blowing away the old order.

One really does not need to lick a finger to sense the direction of these winds of change. It is coming, and coming hard, and all of it is from the direction of open source. What enterprises do, and what the vendors who want to serve them do, is perhaps less clear. I think open source offers a way out of the box in which enterprise IT is currently stuck. But, at present, I also think that most open source options do not have the necessary legs to stand on.


[1] One notable exception to this are the consumer-facing aspects of some businesses, such as automobiles or personal care or fashion products. These businesses are leading the way into some of the “build your own” or “design your own” uses of modern Web technology.
[2] In the 1970s the major term for this approach was “technology transfer.”
[3] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.

Posted by AI3's author, Mike Bergman Posted on May 25, 2010 at 9:24 am in Adaptive Innovation, MIKE2.0, Open Source | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/883/listening-to-the-enterprise-total-open-solutions-part-2/
The URI to trackback this post is: http://www.mkbergman.com/883/listening-to-the-enterprise-total-open-solutions-part-2/trackback/
Posted:May 20, 2010

Structured Dynamics LLC LogoGrowth Demanded a Professional Upgrade

Structured Dynamics today updated its image with a new logo and new color schemes on its Web pages and collateral. Other upgrades to various SD product logos and other adjustments are also being made.

Fred Giasson and I formed the company rapidly back in November 2008. We had other fish to fry, namely starting work with customers coming out of the gate, and (literally) grabbed a toss-off logo that had been laying in the drawer to start the company. That worked well in the early days, but we increasingly felt our image looked tired and distinctly “non-dynamic.”

So we commissioned a competition a few weeks back and left the next steps to the professionals. The winning design is shown above. We had many good options to choose from, and we will be working with some of the other finalist designers for some of our other product designs. The first in that series is the Citizen Dan logo.

So, with growth and presence it feels good to now have a professional look as well. We’re proud that we continue to be able to fully self-fund the company and look to walk arm-in-arm with this logo for quite some time to come!

Posted by AI3's author, Mike Bergman Posted on May 20, 2010 at 3:03 pm in Structured Dynamics | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/886/sd-gets-new-logo-look/
The URI to trackback this post is: http://www.mkbergman.com/886/sd-gets-new-logo-look/trackback/
Posted:May 12, 2010

2009-10 Campaign from http://www.pasadenasymphony-pops.org/
“We’re Successful When We’re Not Needed”

Structured Dynamics has been engaged in open source software development for some time. Inevitably in each of our engagements we are asked about the viability of open source software, its longevity, and what the business model is behind it. Of course, I appreciate our customers seemingly asking about how we are doing and how successful we are. But I suspect there is more behind this questioning than simply good will for our prospects.

Besides the general facts that most of us know — of hundreds of thousands of open source projects only a miniscule number get traction — I think there are broader undercurrents in these questions. Even with open source, and even with good code documentation, that is not enough to ensure long-term success.

When open source broke on the scene a decade or so ago [1], the first enterprise concerns were based around code quality and possible “enterprise-level” risks: security, scalability, and the fact that much open source was itself LAMP-based. As comfort grew about major open source foundations — Linux, MySQL, Apache, the scripting languages of PHP, Perl and Python (that is the very building blocks of the LAMP stack) — concerns shifted to licensing and the possible “viral” effects of some licenses to compromise existing proprietary systems.

Today, of course, we see hugely successful open source projects in all conceivable venues. Granted, most open source projects get very little traction. Only a few standouts from the hundreds of thousands of open source projects on big venues like SourceForge and Google Code or their smaller brethren are used or known. But, still, in virtually every domain or application area, there are 2-3 standouts that get the lion’s share of attention, downloads and use.

Conventional Open SourceI think it fair to argue that well-documented open source code generally out-competes poorly documented code. In most circumstances, well-documented open source is a contributor to the virtuous circle of community input and effort. Indeed, it is a truism that most open source projects have very few code committers. If there is a big community, it is largely devoted to documentation and assistance to newbies on various forums.

We see some successful open source projects, many paradoxically backed by venture capital, that employ the “package and document” strategy. Here, existing open source pieces are cobbled together as more easily installed comprehensive applications with closer to professional grade documentation and support. Examples like Alfresco or Pentaho come to mind. A related strategy is the “keystone” one where platform players such as Drupal, WordPress, Joomla or the like offer plug-in architectures and established user bases to attract legions of third-party developers [2].

OK, So What Has This to Do with the Enterprise?

I think if we stand back and look at this trajectory we can see where it is pointing. And, where it is pointing also helps define what the success factors for open source may be moving forward.

Two decades ago most large software vendors made on average 75% to 80% of their revenues from software licences and maintenance fees; quite the opposite is true today [3]. The successful vendors have moved into consulting and services. One only needs look to three of the largest providers of enterprise software of the past two decades — IBM, Oracle and HP — to see evidence of this trend.

How is it that proprietary software with its 15% to 20% or more annual maintenance fees has been so smoothly and profitably replaced with services?

These suppliers are experienced hands in the enterprise and know what any seasoned IT manager knows: the total lifecycle costs of software and IT reside in maintenance, training, uptime and adaptation. Once installed and deployed, these systems assume a life of their own, with actual use lifetimes that can approach two to three decades.

This reality is, in part, behind my standard exhortation about respecting and leveraging existing IT assets, and why Structured Dynamics has such a commitment to semantic technology deployment in the enterprise that is layered onto existing systems. But, this very same truism can also bring insight into the acceptable (or not) factors facing open source.

Great code — even if well documented — is not alone the mousetrap that leads the world to the door. Listen to the enterprise: lifecycle costs and longevity of use are facts.

But what I am saying here is not really all that earthshaking. These truths are available to anyone with some experience. What is possibly galling to enterprises is two smug positions of new market entrants. The first, which is really naïve, is the moral superiority of open source or open data or any such silly artificial distinctions. That might work in the halls of academia, but carries no water with the enterprise. The second, more cynically based, is to wrap one’s business in the patina of open source while engaging in the “wink-wink” knowledge that only the developer of that open source is in a position to offer longer term support.

Enterprises are not stupid and understand this. So, what IT manager or CIO is going to bet their future software infrastructure on a start-up with immature code, generally poor code documentation or APIs, and definitely no clear clue about their business?

The Slow Squeeze

Yet, that being said, neither enterprises nor vendors nor software innovators that want to work with them can escape the inexorable force of open source. While it has many guises from cloud computing to social software or software as a service or a hundred other terms, the slow squeeze is happening. Big vendors know this; that is why there has been the rush to services. Start-up vendors see this; that is why most have gone consumer apps and ad-based revenue models. And enterprises know this, which is why most are doing nothing other than treading water because the way out of the squeeze is not apparent.

The purpose of this three-part series is to look at these issues from many angles. What might the absolute pervasiveness of open source mean to traditional IT functions? How can strategic and meaningful change be effected via these new IT realities in the enterprise? And, how can software developers and vendors desirous of engaging in large-scale initiatives with enterprises find meaningful business models?

Lead-in to the Series: a Total Open SolutionTotal Open Solution

And, after we answer those questions, we will rest for a day.

But, no, seriously, these are serious questions.

There is no doubt open source is here to stay, yet its maturity demands new thinking and perspectives. Just as enterprises have known that software is only the beginning of decades-long IT commitments and (sometimes) headaches, the purveyors and users of open source should recognize the acceptance factors facing broad enterprise adoption and reliance.

Open source offers the wonderful prospect of avoiding vendor “lock-in”. But, if the full spectrum of software use and adoption is also not so covered, all we have done is to unlock the initial selection and install of the software. Where do we turn for modifications? for updates? for integration with other packages? for ongoing training and maintenance? And, whatever we do, have we done so by making bets on some ephemeral start-up? (We know how IBM will answer that question.)

The first generation of open source has been a substitute for upfront proprietary licenses. After that, support has been a roll of the dice. Sure, broadly accepted open source software provides some solace because of more players and more attention, but how does this square with the prospect of decades of need?

The perverse reality in these questions is that most all early open source vendors are being gobbled up or co-opted by the existing big vendors. The reward of successful market entry is often a great sucking sound to perpetuate existing concentrations of market presence. In the end, how are enterprises benefiting?

Now, on the face of it, I think it neither positive nor negative whether an early open source firm with some initial traction is gobbled up by a big player or not. After all, small fish tend to be eaten by big fish.

But two real questions arise in my mind: One, how does this gobbling fix the current dysfunction of enterprise IT? And, two, what is a poor new open source vendor to do?

The answer to these questions resides in the concerns and anxieties that caused them to be raised in the first place. Enterprises don’t like “lock-in” but like even less seeing stranded investments. For open source to be successful it needs to adopt a strategy that actively extends its traditional basis in open code. It needs to embrace complete documentation, provision of the methods and systems necessary for independent maintenance, and total lifecycle commitments. In short, open source needs to transition from code to systems.

We call this approach the total open solution. It involves — in addition to the software, of course — recipes, methods, and complete documentation useful for full-life deployments. So, vendors, do you want to be an enterprise player with open source? Then, embrace the full spectrum of realities that face the enterprise.

“We’re Successful When We’re Not Needed”

The actual mantra that we use to express this challenge is, “We’re Successful When We’re Not Needed“. This simple mental image helps define gaps and tells us what we need to do moving forward.

The basic premise is that any taint of lock-in or not being attentive to the enterprise customer is a potential point of failure. If we can see and avoid those points and put in place systems or whatever to overcome them, then we have increased comfort in our open source offerings.

Like good open source software, this is ultimately a self-interest position to take. If we can increase comfort in the marketplace that they can adopt and sustain our efforts without us, they will adopt them to a greater degree. And, once adopted, and when extensions or new capabilities are needed, then as initial developers with a complete grasp on the entire lifecycle challenges we become a natural possible hire. Granted, that hiring is by no means guaranteed. In fact, we benefit when there are many able players available.

In the remaining two parts of this series we will discuss all of the components that make up a total open solution and present a collaboration platform for delivering the methods and documentation portions. We’re pretty sure we don’t yet have it fully right. But, we’re also pretty sure we don’t have it wrong.


[1] Of course, stalwart open source applications such as Linux and MySQL and even the open source movement extend back about twenty years. But, it was only about a decade ago that real traction and visibility in the enterprise began.
[2] BTW, with regard to the latter, I think it notable that no semantic technology player has played or attracted third parties to any notable extent. That is possibly a topic for a later blog post!
[3] I first wrote about this five years ago (and updated it a year later), with analysis of many public vendors. See M.K. Bergman, Redux: Enterprise Software Licensing on Life Support, June 2, 2006.