Posted:April 25, 2011

Advances in How to Transfer Semantic Technologies to Enterprise Users structWFS

For some time, our mantra at Structured Dynamics has been, “We’re successful when we are not needed. [1]

In support of this vision, we have been key developers of an entire stack of semantic technologies useful to the enterprise, the open semantic framework (OSF); we have formulated and contributed significant open source deployment guidance to the MIKE2.0 methodology for semantic technologies in the enterprise called Open SEAS; we have developed useful structured data standards and ontologies; and we have made massive numbers of free how-to documents and images available for download on our TechWiki. Today, we add further to these contributions with our workflows guidance. All of these pieces contribute to what we call the total open solution.

Prior documentation has described the overall architecture or layered approach of the open semantic framework (OSF). Those documents are useful, but lack a practical understanding of how the pieces fit together or how an OSF instance is developed and maintained.

This new summary overviews a series of seven different workflows for various aspects of developing and maintaining an OSF (based on Drupal) [2]. In addition, each workflow section also cross-references other key documentation on the TechWiki, as well as points to possible tools that might be used for conducting each specific workflow.


Seven different workflows are described, as shown in the diagram below. Each of the workflows is color-coded and related to the other workflows. The basic interaction with an OSF instance tends to occur from left-to-right in the diagram, though the individual parts are not absolutely sequential. As each of the seven specific workflows is described below, it is keyed by the same color-coded portion of the overall workflow.

OSF Workflow

Each of the component workflows is itself described as a series of inter-relating activities or tasks.

Installation Workflow

Installation is mostly a one-time effort and proceeds in a more-or-less sequential basis. As various components of the stack are installed, they are then configured and tested for proper installation.

The installation guide is the governing document for this process, with quite detailed scripts and configuration tests to follow. The blue bubbles in the diagram represent the major open source software components of Virtuoso (RDF triple store), Solr (full-text search) and Drupal (content management system).

Install Workflow

Another portion of this workflow is to set up the tools for the backoffice access and management, such as PuTTY and WinSCP (among others).

Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Configure & Presentation Workflow

One of the most significant efforts in the overall OSF process is the configuration and theming of the host portal, generally based on Drupal.

The three major clusters of effort in this workflow are the design of the portal, including a determination of its intended functionality; the setting of the content structure (stubbing of the site map) for the portal; and determining user groups and access rights. Each of these, in turn, is dependent on one or more plug-in modules to the Drupal system.

Some of these modules are part of the conStruct series of OSF modules, and others are evaluated and drawn from the more than 8000 third-party plug-in modules to Drupal.

Configure Workflow

The Design aspect involves picking and then modifying a theme for the portal. These may start as one of the open source existing Drupal themes, as well as those more specifically recommended for OSF. If so, it will likely be necessary to do some minor layout modifications on the PHP code and some CSS (styling) changes. Theming (skinning) of the various semantic component widgets (see below) also occurs as part of this workflow.

The Content Structure aspect involves defining and then stubbing out placeholders for eventual content. Think of this step as creating a site map structure for the OSF site, including major Drupal definitions for blocks, Views and menus. Some of the entity types are derived from the named entity dictionaries used by a given project.

More complicated User assignments and groups are best handled through a module such as Drupal’s Organic Groups. In any event, determination of user groups (such as anonymous, admins, curators, editors, etc.) is a necessary early determination, though these may be changed or modified over time.

For site functionality, Modules must be evaluated and chosen to add to the core system. Some of these steps and their configuration settings are provided in the guidelines for setting up Drupal document.

None of the initial decisions “lock in” eventual design and functionality. These may be modified at any time moving forward.

Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Structured Data Workflow

Of course, a key aspect of any OSF instance is the access and management of structured data.

There are basically two paths for getting structured data into the system. The first, involving (generally) smaller datasets is the manual conversion of the source data to one of the pre-configured OSF import formats of RDF, JSON, XML or CSV. These are based on the irON notation; a good case study for using spreadsheets is also available.

The second path (bottom branch) is the conversion of internal structured data, often from a relational data store. Various converters and templates are available for these transformations. One excellent tool is FME from Safe Software (representing the example shown utilizing a spatial data infrastructure (SDI) data store), though a very large number of options exist for extract, transform and load.

In the latter case, procedures for polling for updates, triggering notice of updates, and only extracting the deltas for the specific information changed can help reduce network traffic and upload/conversion/indexing times.

Structured Data Workflow
Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Content Workflow

The structured data from the prior workflow process is then matched with the remaining necessary content for the site. This content may be of any form and media (since all are supported by various Drupal modules), but, in general, the major emphasis is on text content.

Existing text content may be imported to the portal or new content can be added via various WYSIWYG graphical editors for use within Drupal. (The excellent WYWIWYG Drupal module provides an access point to a variety of off-the-shelf, free WYSIWYG editors; we generally use TinyMCE but multiples can also be installed simultaneously).

The intent of this workflow component is to complete content entry for the stubs earlier created during the configuration phase. It is also the component used for ongoing content additions to the site.

Content Workflow

Content that is tagged by the scones tagger is done so based on the concepts in the domain ontology (see below) and the named entities (as contained in “dictionaries”) used by a given project. Once tagged, this information can also now be related to the other structured data in the system.

Once all of this various content is entered into the system, it is then available for access and manipulation by the various conStruct modules (see figure above) and semantic component widgets (see below).

Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Ontologies Workflow

Though the next flowchart below appears rather complicated, there are really only three tasks that most OSF administrators need worry about with respect to ontologies:

  1. Adding a concept to the domain ontology (a class) and setting its relationships to other concepts
  2. Adding a dataset attribute (data characteristic) for various dataset records, or
  3. Adding or changing an annotation for either of these things, such as the labels or descriptions of the thing.

In actuality, of course, editing, modifying or deleting existing information is also important, but they are easier subsets of activities and user interfaces to the basic add (“create”) functions.

The OSF interface provides three clean user interfaces to these three basic activities [3].

These basic activities may be applied to the three major governing ontologies in any OSF installation:

  • The domain ontology, which captures the conceptual description of the instance’s domain space
  • The semantic components ontology (SCO), which sets what widgets may display what kinds of data, and
  • irON for the instance record attributes and metadata (annotations).

All of the OSF ontology tools work off of the OWLAPI as the intermediary access point. The ontologies themselves are indexed as structured data (RDF with Virtuoso) or full text (Solr) for various search, retrieval and reasoning activities.

Ontologies Workflow

Because of the central use of the OWLAPI, it is also possible to use the Protégé editor/IDE environment against the ontologies, which also provides reasoners and consistency checking.

Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Filter & Select Workflow

The filter and select activities are driven by user interaction, with no additional admin tools required. This workflow is actually the culmination of all of the previous sequences in that it exposes the structured data to users, enables them to slice-and-dice it, and then to view it with a choice of relevant widgets (semantic components).

For example, see this animation:

Animated Filtering and Selection Workflow

Considerable more detail and explanation is available for these semantic components.

Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Maintenance Workflow

The ongoing maintenance of an OSF instance is mostly a standard Drupal activity. Major activities that may occur include moderating comments; rotating or adding new content; managing users; and continued documentation of the site for internal tech transfer and training. If the portal embraces other aspects of community engagement (social media), these need to be handled as part of this workflow as well.

All aspects of the site and its constituent data may be changed, or added to at any time.

Maintenance Workflow
Click here to see the tools associated with this workflow sequence, as described in the TechWiki desktop tools document.

Moving from Here

Total Open SolutionWhen first introduced in our three-part series, we noted the interlocking pieces that constituted the total open solution of the open semantic framework (OSF) (see right). We also made the point — unfortunately still true today — that the relative maturity and completeness of all of these components still does not allow us to achieve fully, “We’re successful when we are not needed.”

As a small firm that is committed to self-funding via revenues, Structured Dynamics is only able to add to its stable of open source software and to develop methodologies and provide documentation based on our client support. Yet, despite our smallness, our superb client support has enabled us to aggressively and rapidly add to all four components of this total open solution. This newest series of ongoing workflow documents (plus some very significant expansions and refinements of the OSF code base) is merely the latest example of this dynamic.

Through judicious picking of clients (and vice versa), and our insistence that new work and documentation be open sourced because it itself has benefitted from prior open source, we and our client partners have been making steady progress to this vision of enterprises being able to adopt and install semantic solutions on their own. Inch-by-inch we are getting there.

The status of our vision today is that we are still needed in most cases to help formulate the implementation plan and then guide the initial set-up and configuration of the OSF. This support typically includes ontology development, data conversion and overall component integration. While it is true that some parties have embraced the OSF code and documentation and are implementing solutions on their own, this still requires considerable commitment and knowledge and skills in semantic technologies.

The great news about today’s status is that — after initial set-up and configuration — we are now able to transfer the technology to the client and walk away. Tools, documentation, procedures and workflows are now adequate for the client to extend and maintain their OSF instance on their own. This great news includes a certification process and program for transferring the technology to client staff and assessing their proficiency in using it.

We have been completely open about our plans and our status. In our commitment to our vision of success, much work is still needed on the initial install and configure steps and on the entire area of ontology creation, extension and mapping [4]. We are working hard to bridge these gaps. We welcome additional partners that share with us the vision of complete, turnkey frameworks — including all aspects of total open solutions. Inch-by-inch we are approaching the realization of a vision that will fundamentally change how every enterprise can leverage its existing information assets to deliver competitive advantage and greater value for all stakeholders. You are welcome aboard!

[1] This has been the thematic message on Structured Dynamics‘ Web site for at least two years. The basic idea is to look at open source semantic technologies from the perspective of the enterprise customer, and then to deliver all necessary pieces to enable that customer to install, deploy and maintain the OSF stack on its own. The sentiment has infused our overall approach to technology development, documentation, technology transfer and attention to methodologies.
[2] The first version of this article appeared as Workflow Perspectives on OSF on the OpenStructs TechWiki on April 19, 2011.
[3] The current release of OSF does not yet have these components included; they will be released to the open source SVNs by early summer.
[4] The best summary of the vision for where ontology development needs to head is provided by the Normative Landscape of Ontology Tools article on the TechWiki; see especially the second figure in that document.
Posted:April 22, 2011

This blog and all of Structured Dynamics‘ Web sites got caught in the Amazon cloud snafu of the past two days. After being down for more than 24 hrs, we have been able to restore all services.

Sorry for the inconvenience, and thanks for your patience!

Posted by AI3's author, Mike Bergman Posted on April 22, 2011 at 1:37 pm in Site-related | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 4, 2011

People in CrowdsSelf-service Information Management for Knowledge Workers

Though I have alluded to it numerous times in my past writings [1], I think one of the most pervasive and important benefits from semantic technologies in the enterprise will come from the democratization of information. These benefits will arise mostly from a fundamental change in how we manage and consume information. A new “system” of semantic technologies is now largely available that can put the collection, assembly, organization, analysis and presentation of information directly in the hands of those who need it most — the consumers of information.

The idea of “democratizing information” has been around for a couple of decades, and has accelerated in incidence since the dominance of the Internet. Most commonly, the idea is associated with developments and notions in such areas as citizen journalism, crowdsourcing, the wisdom of the crowd, social bookmarking (or collaborative tagging), and the democratic (small “d”) access to publishing via new channels such as blogs, microblogs (e.g., Twitter) and wikis. To be sure, these kinds of democratic information will (and are) benefiting from the use and application of semantics.

But the trend I’m focusing on here is much different and quite new. It is the idea that enterprise knowledge workers can now take ownership and control of their knowledge management functions. In the process, prior bottlenecks due to IT can be relieved and massive new benefits can open up to the enterprise.

Decades-long Mismatches Between KM and IT

“Enterprise systems are doing it wrong. And not just a little bit, either. Orders of magnitude wrong. Billions and billions of dollars worth of wrong. Hang-our-heads-in-shame wrong. It’s time to stop the madness.”
– Tim Bray [2]

It is no secret that IT has not served the enterprise knowledge management function well for decades.  Transaction systems and database systems geared to fast indexing and access to datum have not proved well suited to information or knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. Information management is a bit broader category, and adds such functions as document management, data management, enterprise content management, enterprise or controlled vocabularies, systems analysis, information standards and information assets management to the basic functions of KM. Since the purpose of this piece is not to get into the epistemological differences between information and knowledge, I use these terms more-or-less interchangeably herein.

Knowledge and information management is very big business. Given the breadth and differences in defining the KM and IM markets, let’s take as a proxy the business intelligence (BI) market, one of KM’s most important elements. Various estimates from IDC, Gartner and others place the current value of BI software sales somewhere in the range of $9 billion to $11 billion annually [3]. Further, BI ranked number five on the list of the top 10 technology priorities for chief information officers (CIOs) in 2011. And this pertains to the structured component of information alone.

Yet, at the same time, BI-related projects continue to have high failure rates, often cited as in the 65% to higher range [4]. These failure rates are consistent with KM projects in general [5]. These failures are merely one expression of a constant litany of issues and concerns regarding the enterprise KM function:

Conventional KM Problem Area Comments
Inflexible Reports
  • reports are rarely “self-service”
  • new requests need to be placed in queue
  • 90% of stored report templates are never used
  • unlimited “slicing and dicing” not available
Inflexible Analysis
  • analysis is rarely “self-service”
  • new requests need to be placed in queue
  • many requests not accepted due to schema rigidities, cascading changes needed
  • analysis options are “pre-canned”, inflexible
Schema Bottlenecks
  • brittleness of relational data model and typical star schema
  • crossing across schema or databases difficult
  • load and re-indexing cycles can limit access, impose expensive back-end requirements
  • can not (often) accommodate new data, structures
ETL Bottlenecks
  • getting data into the system needs to be placed in queue
  • new external data requires extract, transform and load (ETL) routines to be written
  • schedule and update cycles can be a mismatch to access needs
Reliance on Intermediaries
  • all problems above work through intermediaries
  • disconnect between those with need and decision-makers and those who implement the solutions
  • inherent issues in communicating requirements to implementers
  • related time delays to implementation exacerbate the communication of requirements
Specialized Expertise Required
  • expertise and skill sets needed to implement solutions different from those of the knowledge consumer
  • inherent issues in communicating requirements to implementers
  • high costs for attracting necessary expertise
  • expertise is inherently an overhead function
Slow Response Time
  • all problems above lead to delays, slow response
  • timely communications, analysis, decisions suffer
  • delays mean knowledge management is not an active “contact sport”, becomes mired and unresponsive
  • some needs are just not requested because of these problems
Dependence on External Apps
  • new apps need to be identified, procured
  • design and configuration of apps requires external expertise, programming skills
  • multiple sourcing of apps leads to frequent incompatibilities, high costs for integration, poor interoperability
Unmet Needs
  • many KM needs are simply not requested
  • by the time responses are forthcoming, needs and imperatives have moved on
  • communications, analysis and decisions become hassles
  • the “contact sport” of active discovery and learning is unmet
High Opportunity Costs
  • many KM insights are simply not discovered
  • delays and frustration adds to costs, friction, inefficiencies
  • no way to know the opportunity costs of what is not learned — but, surely is high
High Failure Rates
  • the net impact of all of the problems above is to lead to high failure rates (~60% to 70%) and unacceptable costs
  • reliance on IT for KM has utterly and totally failed

The seeming contradiction between continued growth and expenditures for information management coupled with continued high failure rates and disappointments is really an expression of the centrality of information to the modern enterprise. The funding and growth of the IT function is itself an expression of this centrality and perceived importance. These have been abiding trends in our transition to information or knowledge economies.

Bray [2] places the fault for wasted initiatives within the culture of IT. I believe there is some truth to this — variably, of course, depending on the specific enterprise. But the real culprit, I believe, has been the past need to “intermediate” a layer of software and IT expertise between knowledge workers and their source information. A progression of tasks has been necessary — conducted over decades with advances and learning — to get paper information into electronic form, get those forms to be understood and operate in some common ways, and then to develop tools, architectures and frameworks to make sense of it. Yet, as more tasks with required specialized skills have been added to this layer, the actual gulf between worker and information has increased. For example, enterprises still require the overhead and layers of IT to write SQL to get information out and then to prepare and fix reports.

On average, IT now consumes about 4% of all enterprise expenditures and employs about 6% of enterprise workers [6]. IT has become a very thick intermediary layer, indeed! Yet, because of the advances and learning that has occurred in growing and nurturing this layer, we also now have the basis to begin to “disintermediate” the IT layer. Many, if not all, of the challenges noted in the table above can be improved by doing so.

Early Attempts at Self-service and Semantics

One current buzzword in business intelligence is “self service”. By this term is meant giving knowledge workers the tools and systems for creating reports or doing analysis on their own without needing to work through (or be frustrated by) the IT layer. Self-service software was first postulated in the 1990s as a way for information consumers and authors (typically subject-matter experts) to automate some of their knowledge management tasks. Today, it is most commonly applied to self-service reporting or self-service analytics within the BI realm.

As a general proposition, self-service BI has been more myth than reality [7]. Forrester surveys, for example, indicate that IT still develops most BI applications. Of survey respondents in 2009, 70% responded that IT develops the enterprise’s reports and dashboards [8]. However, that figure is not 100%, as it was just a decade earlier, and there is also notable success to some open source providers such as BIRT that address a wide range of reporting needs within a typical application, ranging from operational or enterprise reporting to multi-dimensional online analytical processing (OLAP).

James Kobelius [8] is particularly bullish on the application of Web 2.0mashup” applications to knowledge worker purposes. Under this approach, Web-based applications are used and accessed directly by knowledge workers for charting and mapping purposes using Ajax or Flash widgets, such as Google Maps. The conventional BI and KM vendors have begun to more more aggressively into this area. Some notable new entrants — such as Tableau, Factual or Good Data — are also showing the way to more direct access, more flexible reporting and analysis widgets, and cleaner service or platform designs.

These initiatives reside at the display or reporting level. There is another group, including James Kobelius, Neil Raden or Seth Earley, that have addressed how to get disparate information to talk together using ontologies. They refer to “semanticizing” such traditional practices such as master data management (MDM), “ontologizing” taxonomies, or adding Web 2.0 mashups to business intelligence. While these thoughts are moving in the right direction, and will bring incremental benefits, they still are far short of the potentials at hand.

Self-service Information Management

So far, in the KM realm, the application of semantics has tended to be limited to information extraction (tagging) of text documents and first attempts at using ontologies. The tagging component is essential to enable the 80% of information presently in textual documents to become first-class citizens within business intelligence or knowledge management. The ontology efforts to date appear to be more like thin veneers over traditional taxonomies. Rather than hierarchical structures, we now see graph-oriented ones, but still intended to fulfill the same tasks of enterprise metadata and vocabulary lookups.

The ontology efforts especially are just nibbling around the edges of what can be done with semantic technologies. Rather than looking upon ontologies as just another dictionary (though that role is true), if we re-orient our thinking to make ontologies central to the KM function, a wealth of new opportunities and benefits arises.

A bit more than a year ago, we formulated the Seven Pillars of the Open Semantic Enterprise, which included ontologies and related as some of the central components. In that article [9], we noted the particular applicability of semantic technologies to the information and knowledge management functions within enterprises. We asserted the benefits for embracing the open semantic enterprise as providing the organization greater insights with lower risk, lower cost, faster deployment, and more agile responsiveness. Since that time we have been deploying such systems and documenting those benefits.

Integral to the seven pillars are those aspects that lead to the democratization of information for the knowledge worker, what combined might be called “self-service information management”. As the figure to the right shows, three of the seven pillars are essential building blocks to this capability, two pillars are further foundations to it, with the remaining two pillars only tangentially important.

What the combination of these pieces means is a fundamental change in how knowledge work is done. Through this approach, we can largely disintermediate IT from the knowledge function, can bring knowledge management directly into the hands of those who need it in real time, and fundamentally alter how knowledge management apps are designed and deployed. The best thing is these benefits are an incremental evolution, and retain the use and value of existing information assets.

Building Block #1: Adaptive Ontologies

Rather than peripheral lookup structures or thin veneers, ontologies play the central role in the design of self-service information management. We use the plural on purpose here: what is deployed is actually a library of complementary and modular ontologies that play a variety of roles. Combined, we call these libraries with their representative functions adaptive ontologies.

This library contains the expected and conventional domain ontologies. These represent the actual knowledge space for the domain at hand, and may be comprised of multiple different ontologies representing different domain or knowledge spaces.  These standard semantic Web ontologies may range from the small and simple to the large and complex, and may perform the roles of defining relationships among concepts, integrating instance data, orienting to other knowledge and domains, or mapping to other schema.

From a best practices standpoint [10], we take special care in constructing these domain ontologies such that we provide labels and cues for user interfaces. Some of the user interface considerations that can be driven by adaptive ontologies include: attribute labels and tooltips; navigation and browsing structures and trees; menu structures; auto-completion of entered data; contextual dropdown list choices; spell checkers; online help systems; etc. We also include a variety of synonyms and aliases (the combination of which we call semsets) for referring to concepts and instances in multiple ways and for aiding information extraction and tagging functions. (In addition to organizing and helping to interoperate contributing information, these domain ontologies are also used for what is called ontology-based information extraction (OBIE) via our scones [11] system.)

In addition the library of adaptive ontologies includes some administrative ontologies that guide how instance data can be imported and inter-related (via the Instance Record Object Notation, or irON); what information types drive what widgets (via the Semantic Component Ontology, or SCO); data mapping vocabularies (UMBEL Vocabulary); how to characterize datasets; and other potential specialty functionality.

A forthcoming article will describe the composition and modularity typically found in a library of these adaptive ontologies.

In combination, these adaptive ontologies are, in effect, the “brains” of the self-service system. The best aspect of these ontologies is that they can be understood, created and maintained by knowledge workers. They constitute the only specification (other than theming, if desired) necessary to create self-service knowledge management environments.

Building Block #2: Ontology-driven Apps

The piece of the puzzle that implements the instruction sets within these adaptive ontologies are the ontology-driven apps, or ODapps. A recent article describes these structures in some detail [12].

ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in the adaptive ontologies. ODapps fulfill specific generic tasks, consistent with their dedicated design to respond to adaptive ontologies. For example, current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.

Generic functionality included in these ODapps are things like filtering, setting value ranges, choosing the specific display view, and invoking or not various display templates (akin to the infoboxes on Wikipedia). By nature of the data and the ontologies submitted to them, the ODapp signals to the user or consumer what displays, views, filters or slices-and-dices might be available to them. Fed different data and different ontologies, the ODapp would signal the user differently.

Because of their generic design, driven by the ontologies, only a relatively small number of ODapps needs to be created. Once created with appropriate generic functionality, application development is essentially over. It is through the additions and changes to the adaptive ontologies — done by knowledge workers themselves — that new capability and structure gets exposed through these ontology-driven apps. This innovation shifts the locus from software and programming to data and knowledge structures.

This democratization of IT means that everything in the knowledge management realm can become self service. Users and consumers can create their own analyses; develop their own reports; and package and disseminate what they and their colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we turn prior software engineering practice on its head.

Building Block #3: Open World Assumption

Integral to this design is the embrace of the open world assumption [13]. Though not a specific artifact, as are adaptive ontologies or ODapps, the open-world approach is the logical underpinning that allows consumers or knowledge workers to add new information to the system as it is discovered or scoped. This nuance may sound esoteric, but traditional KM systems have a very different underpinning that leads to some nasty implications.

Because the predominant share of KM systems are based on relational database systems, they embody a closed-world design. This works well for transaction systems or environments where the information domain is known and bounded, but does not apply to knowledge and changing information. Moreover, the schema that govern closed-world designs are brittle and hard to change and manage. It is this fact that has put KM squarely in the bailiwick of IT and has often led to delays and frustrations. Re-architecting or adding new schema views to an existing closed-world system can be fiendishly difficult.

This difficulty is a major reason why IT resists casual or constant changes to underlying data schema. Unfortunately, this makes these brittle schema difficult to extend and therefore generally unresponsive to changing and growing knowledge. As an environment for knowledge management, the relational data system and the closed-world approach are lousy foundations.

Other Building Blocks

As the self-service information management diagram above shows, RDF and Web services are two further important foundations. RDF (Resource Description Framework) is the canonical data model upon which all input information is represented. This means that the ODapp tools and the adaptive ontologies can work off a single model of knowledge representation. The Web service and architecture component is also helpful in that it allows Web 2.0 technologies to be brought to bear and allows distributed sources and users for the KM system. This provides scalability and distributed applicability, including on smartphones.

The other two pillars of the open semantic enterprise — the layered approach and linked data — are also helpful, but not necessarily integral to the KM and self-service perspectives presented herein.

Benefits from Self-service Information Management

The benefits and flexibilities from self-service information management extend from top to bottom; from creating data and content to publishing and deploying it. Here is a listing of available potentials for self-service, drawing comparison to the current conventional approach dependent on IT:

Information Activity Conventional Approach (IT) Self-service Information Management
  • structured data only
  • not generally available directly to the knowledge worker
  • can create own datasets
  • can extract and transform own datasets
  • can tag and integrate non-structured (text + document) information
  • able to handle unstructured, semi-structured and structured data alike
  • not generally provided
  • completely open, flexible
  • can define own annotation fields, annotation schema (approaches)
  • pre-canned functions
  • structure pre-defined
  • slow performance
  • all structural dimensions can be filtered
  • all values and ranges thereof can be filtered
  • multiple analysis display widgets selectable depending on the type of input data
  • real-time configuration
  • fast (nearly instantaneous) performance
  • provision of (nearly) real-time analytics
  • additional capabilities in inferencing and reasoning
  • modeling and understanding of complex graph and relationships structures (e.g., social networks)
  • pre-canned templates or report writers
  • structure pre-defined
  • user-definable templates
  • templates automatically assignable by types of thing being reported
  • embeddable in Web pages, alternate presentation media
  • styling and theming flexibility
  • very little done through IT
  • variety of visualization widgets available (e.g., maps, charts, graphs, networks)
  • large-scale systems views possible
  • visual interactions (a la Web 2.0) possible
  • very little done through IT
  • collaboration, if done, is via separate social media
  • completely open
  • variable access and permission rights by user or group
  • built-in to the entire infrastructure
  • not directly done by knowledge worker
  • user input, if done, via problem tickets with delays
  • can be integrated into the business process or workflow
  • “soft” validations and ratings/rankings can also be included
  • consistency checking
  • satisfiability checking
  • limited to pre-canned reports
  • any report or analysis is available for publishing
  • documents and images and widget displays are available for publishing
  • multiple export formats means information, slices thereof, or analysis results thereof can be embedded and integrated into multiple presentation media
  • none directly by the knowledge worker
  • any report or analysis is available for re-purposing
  • documents and images and widget displays are available for re-purposing
  • canonical internal representations (RDF and XHTML) means available information can be deployed for a variety of purposes (Web pages, reports, documents, slide shows, etc.)
New Functionality
  • none known, if not already listed
  • semantic querying
  • data visualization
  • text mining and tagging
  • categorization
  • graph mining
  • logic checking
Developing Apps
  • none via the official systems by the knowledge worker
  • if done, via guerrilla apps
  • only generic apps needed
  • many fewer and more flexible apps push issue into the background
  • not available to most systems
  • if available, limited number of pre-canned options
  • any report or analysis is available for dashboarding
  • any widget is available for dashboarding
  • complete structure (typing, values, sources) available for filtering, “slicing and dicing”
  • all dashboard objects on a given canvas are linked, interoperate (selections in one widget reflected in other widgets)
  • dashboards may be made persistent for re-use, springboarding new dashboards (as templates)

The fact that any source — internal or external — or format — unstructured, semi-structured and structured — can be brought together with semantic technologies is a qualitative boost over existing KM approaches. Further, all information is exposed in simple text formats, which means it can be readily manipulated and managed with easy to understand tools and applications. Reliance on open standards and languages by semantic technologies also leads to greater use and availability of open source systems.

In short, self-service information management approaches should be cheaper, faster, more responsive and more capable than current approaches.

Great Progress, with Ontology Management the Next Challenge

Given these perspectives, hearing someone tout data-driven applications or advocate ontologies merely for metadata matching sounds positively Neanderthal. The prospects we have with semantic technologies, ontology-driven apps, and self-service information management systems mean so much more. The prospect at hand is to remake the entire knowledge management function, in the process bringing all aspects from creating and distributing knowledge products into the direct hands of the user. This is truly the democratization of information!

The absolutely fantastic news is none of this is theoretical or in the future. All pieces are presently proven, working and in hand. This is a practical vision, ready today.

Granted, like any new innovation, especially one that is infrastructural and systems-oriented, there are some weak or less-developed parts. These current gaps and needs include:

  • Though tools exist, the state of ontology create, edit, manage, update, delete, map and validate tools could be greatly improved [14]. As the central drivers for ODapps, a simplification of tasks geared more to the knowledge worker, and not professional ontologists, is needed (see diagram to right for some of the needed functions). Some of these developments are underway, with more desired
  • A relatively complete starting set of about 20 ODapps widgets is presently available. However, more are needed and for different deployment environments. BI analysis remains one weak area, as is an Ajax-based library
  • The number of infobox templates is small, and better (WYSIWYG or graphical) create and manage utilities would be most useful, and
  • User permission and authorization protocols exist, but are IP-based at present and could be beneficially expanded for different environments and use cases.

Yet, in the grand scheme of things, these gaps are relatively insignificant. The path and general architecture and design for moving forward are now clear.

Self-service information management via appropriately designed semantic technologies is now a reality. It promises to fulfill a vision of information access and control that has been frustrated for decades. We think these are exciting developments for the enterprise — and for the individual knowledge hound. We welcome your inquiries and invite you to join our open OSF group to contribute your ideas.

[1] Including going all the way back to my description of purpose for this blog back in 2005; see the AI3 Blogasbörd where I state, “One of my central arguments [in this blog] is that an inexorable trend through history has been the ‘democratization’ of information.”
[2] Tim Bray, 2010. “Doing it Wrong,” on his blog, January 5, 2010. The extensive comments are also worth a read.
[3] According to Marketwire quoting IDC, “Preliminary market sizing suggests that the business intelligence tools software market grew 2.6% in 2009 to reach $8.1 billion. Given the current market assumptions regarding the global economy and demand drivers in the BI tools software market, IDC forecasts this market to grow at a compound annual growth rate of 6.9% through 2014 to $11.3 billion.” CBR, citing Gartner, indicates the worldwide BI software market will grow 9.7 percent, reaching US$10.8 billion in 2011. Gartner also said BI platforms would continue to be one of the fastest growing software markets. For a very good background on BI, see Rochelle Shaw, 2011. “What is Business Intelligence,” posted in Database Trends and Applications, January 7, 2011.
[4] According to this article, by Antone Gonsalves, Poor Use Of Data Integration Tools Can Waste $500,000 Annually: Gartner (April 27, 2009), which reports on a recent Gartner Report, large global 2000 companies, using several data integration tools with overlapping features, can reduce costs by more than $500,000 annually by eliminating redundant software and leveraging a shared services model. In a further report by Roman Stanek, Business Intelligence Projects are Famous for Low Success Rates, High Costs and Time Overruns (April 25, 2009), Gartner is talking about a dirty little secret in the world of data integration, the fact that the data integration technology in place is based on generations of data integration technology being layered in the enterprise over the years. Thus, technology that was purchased to solve data integration problems, and reduce costs, is actually making the data integration problem more complex and no longer cost efficient.
[5] For example, see Roger Sessions, 2009. Cost of IT Failure, September 28, 2009. This analysis suggests failure rates of 65% with a total estimated worldwide cost of $6.2 trillion in 2009. Commenters have raised questions as to what constitutes failure and have questioned some of the analysis assumptions. Nonetheless, even with over-estimates, the scale of the numbers is alarming; see Jorge Dominguez, 2009. The CHAOS Report 2009 on IT Project Failure, June 16, 2009, which indicates combined failure and challenge rates for IT projects have ranged from 65% to 84% over the period 1994 to 2009; see Also see Dan Galorath, 2008. Software Project Failure Costs Billions; Better Estimation & Planning Can Help, June 7, 2008. In this report, Galorath compares and combines many of the available IT failure studies and summarizes that 3 of 5 IT projects do not do what they were supposed to for the expected costs, with 49% showing budget overruns, 47% showing higher than expected maintenance costs, and 41% failing to deliver expected business value; the anecdotal failure rate for years for IT projects has been claimed as 80%, with business intelligence and data warehousing particularly failure-prone areas; in 2001, a study by Mark N. Frolick and Keith Lindsey, Critical Factors for Data Warehouse Failures, for the Data Warehousing Institute noted conventional wisdom says the failure rate of data warehousing projects is 70 to 80 percent, with a then-recent study in the insurance industry found a 90-percent failure rate. This report is useful for combining many historical studies.
[7] Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see “Business Intelligence projects are famous for low success rates, high costs and time overruns. The economics of BI are visibly broken, and have been for years. Yet BI remains the #1 technology priority according to Gartner.”
[8] See James G. Kobielus, 2009. Mighty Mashups: Do-It-Yourself Business Intelligence For The New Economy, July 23, 2009, see In this report, Kobelius, the lead author from a Forrester study (August 2008, Global BI And Data Management Online Survey) that surveyed 82 IT decision-makers, noted that just over 70% responded that IT develops their reports and dashboards. About 57% responded that power users did such development. Only 18.3% reported that BI development is done by end users with limited BI skills. .
[9] M.K. Bergman, 2010. “Seven Pillars of the Open Semantic Enterprise,” in AI3:::Adaptive Information blog, January 12, 2010; see
[10] There are a series of ongoing ontology best practices articles; see
[11] The scones (Subject Concept Or Named EntitieS) tagger provides information extraction of domain-specific subject concepts and entities from unstructured text. It also provides disambiguation of this information based on the context of the source information. See further
[12] M.K. Bergman, 2011. “Ontology-Driven Apps Using Generic Applications,” in AI3:::Adaptive Information blog, March 7, 2011; see
[13] M.K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” in AI3:::Adaptive Information blog, December 21, 2009; see The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[14] M.K. Bergman, 2010. “A New Landscape in Ontology Development Tools,” in AI3:::Adaptive Information blog, Sept. 7, 2010; see