In support of this vision, we have been key developers of an entire stack of semantic technologies useful to the enterprise, the open semantic framework (OSF); we have formulated and contributed significant open source deployment guidance to the MIKE2.0 methodology for semantic technologies in the enterprise called Open SEAS; we have developed useful structured data standards and ontologies; and we have made massive numbers of free how-to documents and images available for download on our TechWiki. Today, we add further to these contributions with our workflows guidance. All of these pieces contribute to what we call the total open solution.
Prior documentation has described the overall architecture or layered approach of the open semantic framework (OSF). Those documents are useful, but lack a practical understanding of how the pieces fit together or how an OSF instance is developed and maintained.
This new summary overviews a series of seven different workflows for various aspects of developing and maintaining an OSF (based on Drupal) . In addition, each workflow section also cross-references other key documentation on the TechWiki, as well as points to possible tools that might be used for conducting each specific workflow.
Seven different workflows are described, as shown in the diagram below. Each of the workflows is color-coded and related to the other workflows. The basic interaction with an OSF instance tends to occur from left-to-right in the diagram, though the individual parts are not absolutely sequential. As each of the seven specific workflows is described below, it is keyed by the same color-coded portion of the overall workflow.
Each of the component workflows is itself described as a series of inter-relating activities or tasks.
Installation is mostly a one-time effort and proceeds in a more-or-less sequential basis. As various components of the stack are installed, they are then configured and tested for proper installation.
The installation guide is the governing document for this process, with quite detailed scripts and configuration tests to follow. The blue bubbles in the diagram represent the major open source software components of Virtuoso (RDF triple store), Solr (full-text search) and Drupal (content management system).
Another portion of this workflow is to set up the tools for the backoffice access and management, such as PuTTY and WinSCP (among others).
One of the most significant efforts in the overall OSF process is the configuration and theming of the host portal, generally based on Drupal.
The three major clusters of effort in this workflow are the design of the portal, including a determination of its intended functionality; the setting of the content structure (stubbing of the site map) for the portal; and determining user groups and access rights. Each of these, in turn, is dependent on one or more plug-in modules to the Drupal system.
Some of these modules are part of the conStruct series of OSF modules, and others are evaluated and drawn from the more than 8000 third-party plug-in modules to Drupal.
The Design aspect involves picking and then modifying a theme for the portal. These may start as one of the open source existing Drupal themes, as well as those more specifically recommended for OSF. If so, it will likely be necessary to do some minor layout modifications on the PHP code and some CSS (styling) changes. Theming (skinning) of the various semantic component widgets (see below) also occurs as part of this workflow.
The Content Structure aspect involves defining and then stubbing out placeholders for eventual content. Think of this step as creating a site map structure for the OSF site, including major Drupal definitions for blocks, Views and menus. Some of the entity types are derived from the named entity dictionaries used by a given project.
More complicated User assignments and groups are best handled through a module such as Drupal’s Organic Groups. In any event, determination of user groups (such as anonymous, admins, curators, editors, etc.) is a necessary early determination, though these may be changed or modified over time.
For site functionality, Modules must be evaluated and chosen to add to the core system. Some of these steps and their configuration settings are provided in the guidelines for setting up Drupal document.
None of the initial decisions “lock in” eventual design and functionality. These may be modified at any time moving forward.
Of course, a key aspect of any OSF instance is the access and management of structured data.
There are basically two paths for getting structured data into the system. The first, involving (generally) smaller datasets is the manual conversion of the source data to one of the pre-configured OSF import formats of RDF, JSON, XML or CSV. These are based on the irON notation; a good case study for using spreadsheets is also available.
The second path (bottom branch) is the conversion of internal structured data, often from a relational data store. Various converters and templates are available for these transformations. One excellent tool is FME from Safe Software (representing the example shown utilizing a spatial data infrastructure (SDI) data store), though a very large number of options exist for extract, transform and load.
In the latter case, procedures for polling for updates, triggering notice of updates, and only extracting the deltas for the specific information changed can help reduce network traffic and upload/conversion/indexing times.
The structured data from the prior workflow process is then matched with the remaining necessary content for the site. This content may be of any form and media (since all are supported by various Drupal modules), but, in general, the major emphasis is on text content.
Existing text content may be imported to the portal or new content can be added via various WYSIWYG graphical editors for use within Drupal. (The excellent WYWIWYG Drupal module provides an access point to a variety of off-the-shelf, free WYSIWYG editors; we generally use TinyMCE but multiples can also be installed simultaneously).
The intent of this workflow component is to complete content entry for the stubs earlier created during the configuration phase. It is also the component used for ongoing content additions to the site.
Content that is tagged by the scones tagger is done so based on the concepts in the domain ontology (see below) and the named entities (as contained in “dictionaries”) used by a given project. Once tagged, this information can also now be related to the other structured data in the system.
Once all of this various content is entered into the system, it is then available for access and manipulation by the various conStruct modules (see figure above) and semantic component widgets (see below).
Though the next flowchart below appears rather complicated, there are really only three tasks that most OSF administrators need worry about with respect to ontologies:
In actuality, of course, editing, modifying or deleting existing information is also important, but they are easier subsets of activities and user interfaces to the basic add (“create”) functions.
The OSF interface provides three clean user interfaces to these three basic activities .
These basic activities may be applied to the three major governing ontologies in any OSF installation:
All of the OSF ontology tools work off of the OWLAPI as the intermediary access point. The ontologies themselves are indexed as structured data (RDF with Virtuoso) or full text (Solr) for various search, retrieval and reasoning activities.
Because of the central use of the OWLAPI, it is also possible to use the Protégé editor/IDE environment against the ontologies, which also provides reasoners and consistency checking.
The filter and select activities are driven by user interaction, with no additional admin tools required. This workflow is actually the culmination of all of the previous sequences in that it exposes the structured data to users, enables them to slice-and-dice it, and then to view it with a choice of relevant widgets (semantic components).
For example, see this animation:
Considerable more detail and explanation is available for these semantic components.
The ongoing maintenance of an OSF instance is mostly a standard Drupal activity. Major activities that may occur include moderating comments; rotating or adding new content; managing users; and continued documentation of the site for internal tech transfer and training. If the portal embraces other aspects of community engagement (social media), these need to be handled as part of this workflow as well.
All aspects of the site and its constituent data may be changed, or added to at any time.
When first introduced in our three-part series, we noted the interlocking pieces that constituted the total open solution of the open semantic framework (OSF) (see right). We also made the point — unfortunately still true today — that the relative maturity and completeness of all of these components still does not allow us to achieve fully, “We’re successful when we are not needed.”
As a small firm that is committed to self-funding via revenues, Structured Dynamics is only able to add to its stable of open source software and to develop methodologies and provide documentation based on our client support. Yet, despite our smallness, our superb client support has enabled us to aggressively and rapidly add to all four components of this total open solution. This newest series of ongoing workflow documents (plus some very significant expansions and refinements of the OSF code base) is merely the latest example of this dynamic.
Through judicious picking of clients (and vice versa), and our insistence that new work and documentation be open sourced because it itself has benefitted from prior open source, we and our client partners have been making steady progress to this vision of enterprises being able to adopt and install semantic solutions on their own. Inch-by-inch we are getting there.
The status of our vision today is that we are still needed in most cases to help formulate the implementation plan and then guide the initial set-up and configuration of the OSF. This support typically includes ontology development, data conversion and overall component integration. While it is true that some parties have embraced the OSF code and documentation and are implementing solutions on their own, this still requires considerable commitment and knowledge and skills in semantic technologies.
The great news about today’s status is that — after initial set-up and configuration — we are now able to transfer the technology to the client and walk away. Tools, documentation, procedures and workflows are now adequate for the client to extend and maintain their OSF instance on their own. This great news includes a certification process and program for transferring the technology to client staff and assessing their proficiency in using it.
We have been completely open about our plans and our status. In our commitment to our vision of success, much work is still needed on the initial install and configure steps and on the entire area of ontology creation, extension and mapping . We are working hard to bridge these gaps. We welcome additional partners that share with us the vision of complete, turnkey frameworks — including all aspects of total open solutions. Inch-by-inch we are approaching the realization of a vision that will fundamentally change how every enterprise can leverage its existing information assets to deliver competitive advantage and greater value for all stakeholders. You are welcome aboard!
This blog and all of Structured Dynamics‘ Web sites got caught in the Amazon cloud snafu of the past two days. After being down for more than 24 hrs, we have been able to restore all services.
Sorry for the inconvenience, and thanks for your patience!
Though I have alluded to it numerous times in my past writings , I think one of the most pervasive and important benefits from semantic technologies in the enterprise will come from the democratization of information. These benefits will arise mostly from a fundamental change in how we manage and consume information. A new “system” of semantic technologies is now largely available that can put the collection, assembly, organization, analysis and presentation of information directly in the hands of those who need it most — the consumers of information.
The idea of “democratizing information” has been around for a couple of decades, and has accelerated in incidence since the dominance of the Internet. Most commonly, the idea is associated with developments and notions in such areas as citizen journalism, crowdsourcing, the wisdom of the crowd, social bookmarking (or collaborative tagging), and the democratic (small “d”) access to publishing via new channels such as blogs, microblogs (e.g., Twitter) and wikis. To be sure, these kinds of democratic information will (and are) benefiting from the use and application of semantics.
But the trend I’m focusing on here is much different and quite new. It is the idea that enterprise knowledge workers can now take ownership and control of their knowledge management functions. In the process, prior bottlenecks due to IT can be relieved and massive new benefits can open up to the enterprise.
It is no secret that IT has not served the enterprise knowledge management function well for decades. Transaction systems and database systems geared to fast indexing and access to datum have not proved well suited to information or knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. Information management is a bit broader category, and adds such functions as document management, data management, enterprise content management, enterprise or controlled vocabularies, systems analysis, information standards and information assets management to the basic functions of KM. Since the purpose of this piece is not to get into the epistemological differences between information and knowledge, I use these terms more-or-less interchangeably herein.
Knowledge and information management is very big business. Given the breadth and differences in defining the KM and IM markets, let’s take as a proxy the business intelligence (BI) market, one of KM’s most important elements. Various estimates from IDC, Gartner and others place the current value of BI software sales somewhere in the range of $9 billion to $11 billion annually . Further, BI ranked number five on the list of the top 10 technology priorities for chief information officers (CIOs) in 2011. And this pertains to the structured component of information alone.
Yet, at the same time, BI-related projects continue to have high failure rates, often cited as in the 65% to higher range . These failure rates are consistent with KM projects in general . These failures are merely one expression of a constant litany of issues and concerns regarding the enterprise KM function:
|Conventional KM Problem Area||Comments|
|Reliance on Intermediaries||
|Specialized Expertise Required||
|Slow Response Time||
|Dependence on External Apps||
|High Opportunity Costs||
|High Failure Rates||
The seeming contradiction between continued growth and expenditures for information management coupled with continued high failure rates and disappointments is really an expression of the centrality of information to the modern enterprise. The funding and growth of the IT function is itself an expression of this centrality and perceived importance. These have been abiding trends in our transition to information or knowledge economies.
Bray  places the fault for wasted initiatives within the culture of IT. I believe there is some truth to this — variably, of course, depending on the specific enterprise. But the real culprit, I believe, has been the past need to “intermediate” a layer of software and IT expertise between knowledge workers and their source information. A progression of tasks has been necessary — conducted over decades with advances and learning — to get paper information into electronic form, get those forms to be understood and operate in some common ways, and then to develop tools, architectures and frameworks to make sense of it. Yet, as more tasks with required specialized skills have been added to this layer, the actual gulf between worker and information has increased. For example, enterprises still require the overhead and layers of IT to write SQL to get information out and then to prepare and fix reports.
On average, IT now consumes about 4% of all enterprise expenditures and employs about 6% of enterprise workers . IT has become a very thick intermediary layer, indeed! Yet, because of the advances and learning that has occurred in growing and nurturing this layer, we also now have the basis to begin to “disintermediate” the IT layer. Many, if not all, of the challenges noted in the table above can be improved by doing so.
One current buzzword in business intelligence is “self service”. By this term is meant giving knowledge workers the tools and systems for creating reports or doing analysis on their own without needing to work through (or be frustrated by) the IT layer. Self-service software was first postulated in the 1990s as a way for information consumers and authors (typically subject-matter experts) to automate some of their knowledge management tasks. Today, it is most commonly applied to self-service reporting or self-service analytics within the BI realm.
As a general proposition, self-service BI has been more myth than reality . Forrester surveys, for example, indicate that IT still develops most BI applications. Of survey respondents in 2009, 70% responded that IT develops the enterprise’s reports and dashboards . However, that figure is not 100%, as it was just a decade earlier, and there is also notable success to some open source providers such as BIRT that address a wide range of reporting needs within a typical application, ranging from operational or enterprise reporting to multi-dimensional online analytical processing (OLAP).
James Kobelius  is particularly bullish on the application of Web 2.0 “mashup” applications to knowledge worker purposes. Under this approach, Web-based applications are used and accessed directly by knowledge workers for charting and mapping purposes using Ajax or Flash widgets, such as Google Maps. The conventional BI and KM vendors have begun to more more aggressively into this area. Some notable new entrants — such as Tableau, Factual or Good Data — are also showing the way to more direct access, more flexible reporting and analysis widgets, and cleaner service or platform designs.
These initiatives reside at the display or reporting level. There is another group, including James Kobelius, Neil Raden or Seth Earley, that have addressed how to get disparate information to talk together using ontologies. They refer to “semanticizing” such traditional practices such as master data management (MDM), “ontologizing” taxonomies, or adding Web 2.0 mashups to business intelligence. While these thoughts are moving in the right direction, and will bring incremental benefits, they still are far short of the potentials at hand.
So far, in the KM realm, the application of semantics has tended to be limited to information extraction (tagging) of text documents and first attempts at using ontologies. The tagging component is essential to enable the 80% of information presently in textual documents to become first-class citizens within business intelligence or knowledge management. The ontology efforts to date appear to be more like thin veneers over traditional taxonomies. Rather than hierarchical structures, we now see graph-oriented ones, but still intended to fulfill the same tasks of enterprise metadata and vocabulary lookups.
The ontology efforts especially are just nibbling around the edges of what can be done with semantic technologies. Rather than looking upon ontologies as just another dictionary (though that role is true), if we re-orient our thinking to make ontologies central to the KM function, a wealth of new opportunities and benefits arises.
A bit more than a year ago, we formulated the Seven Pillars of the Open Semantic Enterprise, which included ontologies and related as some of the central components. In that article , we noted the particular applicability of semantic technologies to the information and knowledge management functions within enterprises. We asserted the benefits for embracing the open semantic enterprise as providing the organization greater insights with lower risk, lower cost, faster deployment, and more agile responsiveness. Since that time we have been deploying such systems and documenting those benefits.
Integral to the seven pillars are those aspects that lead to the democratization of information for the knowledge worker, what combined might be called “self-service information management”. As the figure to the right shows, three of the seven pillars are essential building blocks to this capability, two pillars are further foundations to it, with the remaining two pillars only tangentially important.
What the combination of these pieces means is a fundamental change in how knowledge work is done. Through this approach, we can largely disintermediate IT from the knowledge function, can bring knowledge management directly into the hands of those who need it in real time, and fundamentally alter how knowledge management apps are designed and deployed. The best thing is these benefits are an incremental evolution, and retain the use and value of existing information assets.
Rather than peripheral lookup structures or thin veneers, ontologies play the central role in the design of self-service information management. We use the plural on purpose here: what is deployed is actually a library of complementary and modular ontologies that play a variety of roles. Combined, we call these libraries with their representative functions adaptive ontologies.
This library contains the expected and conventional domain ontologies. These represent the actual knowledge space for the domain at hand, and may be comprised of multiple different ontologies representing different domain or knowledge spaces. These standard semantic Web ontologies may range from the small and simple to the large and complex, and may perform the roles of defining relationships among concepts, integrating instance data, orienting to other knowledge and domains, or mapping to other schema.
From a best practices standpoint , we take special care in constructing these domain ontologies such that we provide labels and cues for user interfaces. Some of the user interface considerations that can be driven by adaptive ontologies include: attribute labels and tooltips; navigation and browsing structures and trees; menu structures; auto-completion of entered data; contextual dropdown list choices; spell checkers; online help systems; etc. We also include a variety of synonyms and aliases (the combination of which we call semsets) for referring to concepts and instances in multiple ways and for aiding information extraction and tagging functions. (In addition to organizing and helping to interoperate contributing information, these domain ontologies are also used for what is called ontology-based information extraction (OBIE) via our scones  system.)
In addition the library of adaptive ontologies includes some administrative ontologies that guide how instance data can be imported and inter-related (via the Instance Record Object Notation, or irON); what information types drive what widgets (via the Semantic Component Ontology, or SCO); data mapping vocabularies (UMBEL Vocabulary); how to characterize datasets; and other potential specialty functionality.
A forthcoming article will describe the composition and modularity typically found in a library of these adaptive ontologies.
In combination, these adaptive ontologies are, in effect, the “brains” of the self-service system. The best aspect of these ontologies is that they can be understood, created and maintained by knowledge workers. They constitute the only specification (other than theming, if desired) necessary to create self-service knowledge management environments.
The piece of the puzzle that implements the instruction sets within these adaptive ontologies are the ontology-driven apps, or ODapps. A recent article describes these structures in some detail .
ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in the adaptive ontologies. ODapps fulfill specific generic tasks, consistent with their dedicated design to respond to adaptive ontologies. For example, current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.
ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.
Generic functionality included in these ODapps are things like filtering, setting value ranges, choosing the specific display view, and invoking or not various display templates (akin to the infoboxes on Wikipedia). By nature of the data and the ontologies submitted to them, the ODapp signals to the user or consumer what displays, views, filters or slices-and-dices might be available to them. Fed different data and different ontologies, the ODapp would signal the user differently.
Because of their generic design, driven by the ontologies, only a relatively small number of ODapps needs to be created. Once created with appropriate generic functionality, application development is essentially over. It is through the additions and changes to the adaptive ontologies — done by knowledge workers themselves — that new capability and structure gets exposed through these ontology-driven apps. This innovation shifts the locus from software and programming to data and knowledge structures.
This democratization of IT means that everything in the knowledge management realm can become self service. Users and consumers can create their own analyses; develop their own reports; and package and disseminate what they and their colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we turn prior software engineering practice on its head.
Integral to this design is the embrace of the open world assumption . Though not a specific artifact, as are adaptive ontologies or ODapps, the open-world approach is the logical underpinning that allows consumers or knowledge workers to add new information to the system as it is discovered or scoped. This nuance may sound esoteric, but traditional KM systems have a very different underpinning that leads to some nasty implications.
Because the predominant share of KM systems are based on relational database systems, they embody a closed-world design. This works well for transaction systems or environments where the information domain is known and bounded, but does not apply to knowledge and changing information. Moreover, the schema that govern closed-world designs are brittle and hard to change and manage. It is this fact that has put KM squarely in the bailiwick of IT and has often led to delays and frustrations. Re-architecting or adding new schema views to an existing closed-world system can be fiendishly difficult.
This difficulty is a major reason why IT resists casual or constant changes to underlying data schema. Unfortunately, this makes these brittle schema difficult to extend and therefore generally unresponsive to changing and growing knowledge. As an environment for knowledge management, the relational data system and the closed-world approach are lousy foundations.
As the self-service information management diagram above shows, RDF and Web services are two further important foundations. RDF (Resource Description Framework) is the canonical data model upon which all input information is represented. This means that the ODapp tools and the adaptive ontologies can work off a single model of knowledge representation. The Web service and architecture component is also helpful in that it allows Web 2.0 technologies to be brought to bear and allows distributed sources and users for the KM system. This provides scalability and distributed applicability, including on smartphones.
The other two pillars of the open semantic enterprise — the layered approach and linked data — are also helpful, but not necessarily integral to the KM and self-service perspectives presented herein.
The benefits and flexibilities from self-service information management extend from top to bottom; from creating data and content to publishing and deploying it. Here is a listing of available potentials for self-service, drawing comparison to the current conventional approach dependent on IT:
|Information Activity||Conventional Approach (IT)||Self-service Information Management|
The fact that any source — internal or external — or format — unstructured, semi-structured and structured — can be brought together with semantic technologies is a qualitative boost over existing KM approaches. Further, all information is exposed in simple text formats, which means it can be readily manipulated and managed with easy to understand tools and applications. Reliance on open standards and languages by semantic technologies also leads to greater use and availability of open source systems.
In short, self-service information management approaches should be cheaper, faster, more responsive and more capable than current approaches.
Given these perspectives, hearing someone tout data-driven applications or advocate ontologies merely for metadata matching sounds positively Neanderthal. The prospects we have with semantic technologies, ontology-driven apps, and self-service information management systems mean so much more. The prospect at hand is to remake the entire knowledge management function, in the process bringing all aspects from creating and distributing knowledge products into the direct hands of the user. This is truly the democratization of information!
The absolutely fantastic news is none of this is theoretical or in the future. All pieces are presently proven, working and in hand. This is a practical vision, ready today.
Granted, like any new innovation, especially one that is infrastructural and systems-oriented, there are some weak or less-developed parts. These current gaps and needs include:
Yet, in the grand scheme of things, these gaps are relatively insignificant. The path and general architecture and design for moving forward are now clear.
Self-service information management via appropriately designed semantic technologies is now a reality. It promises to fulfill a vision of information access and control that has been frustrated for decades. We think these are exciting developments for the enterprise — and for the individual knowledge hound. We welcome your inquiries and invite you to join our open OSF group to contribute your ideas.