Posted:April 4, 2011

Democratizing Information with Semantics

Self-service Information Management for Knowledge Workers

Though I have alluded to it numerous times in my past writings [1], I think one of the most pervasive and important benefits from semantic technologies in the enterprise will come from the democratization of information. These benefits will arise mostly from a fundamental change in how we manage and consume information. A new “system” of semantic technologies is now largely available that can put the collection, assembly, organization, analysis and presentation of information directly in the hands of those who need it most — the consumers of information.

The idea of “democratizing information” has been around for a couple of decades, and has accelerated in incidence since the dominance of the Internet. Most commonly, the idea is associated with developments and notions in such areas as citizen journalism, crowdsourcing, the wisdom of the crowd, social bookmarking (or collaborative tagging), and the democratic (small “d”) access to publishing via new channels such as blogs, microblogs (e.g., Twitter) and wikis. To be sure, these kinds of democratic information will (and are) benefiting from the use and application of semantics.

But the trend I’m focusing on here is much different and quite new. It is the idea that enterprise knowledge workers can now take ownership and control of their knowledge management functions. In the process, prior bottlenecks due to IT can be relieved and massive new benefits can open up to the enterprise.

Decades-long Mismatches Between KM and IT

“Enterprise systems are doing it wrong. And not just a little bit, either. Orders of magnitude wrong. Billions and billions of dollars worth of wrong. Hang-our-heads-in-shame wrong. It’s time to stop the madness.”
– Tim Bray [2]

It is no secret that IT has not served the enterprise knowledge management function well for decades. Transaction systems and database systems geared to fast indexing and access to datum have not proved well suited to information or knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. Information management is a bit broader category, and adds such functions as document management, data management, enterprise content management, enterprise or controlled vocabularies, systems analysis, information standards and information assets management to the basic functions of KM. Since the purpose of this piece is not to get into the epistemological differences between information and knowledge, I use these terms more-or-less interchangeably herein.

Knowledge and information management is very big business. Given the breadth and differences in defining the KM and IM markets, let’s take as a proxy the business intelligence (BI) market, one of KM’s most important elements. Various estimates from IDC, Gartner and others place the current value of BI software sales somewhere in the range of $9 billion to $11 billion annually [3]. Further, BI ranked number five on the list of the top 10 technology priorities for chief information officers (CIOs) in 2011. And this pertains to the structured component of information alone.

Yet, at the same time, BI-related projects continue to have high failure rates, often cited as in the 65% to higher range [4]. These failure rates are consistent with KM projects in general [5]. These failures are merely one expression of a constant litany of issues and concerns regarding the enterprise KM function:

Conventional KM Problem Area	Comments
Inflexible Reports	reports are rarely “self-service” new requests need to be placed in queue 90% of stored report templates are never used unlimited “slicing and dicing” not available
Inflexible Analysis	analysis is rarely “self-service” new requests need to be placed in queue many requests not accepted due to schema rigidities, cascading changes needed analysis options are “pre-canned”, inflexible
Schema Bottlenecks	brittleness of relational data model and typical star schema crossing across schema or databases difficult load and re-indexing cycles can limit access, impose expensive back-end requirements can not (often) accommodate new data, structures
ETL Bottlenecks	getting data into the system needs to be placed in queue new external data requires extract, transform and load (ETL) routines to be written schedule and update cycles can be a mismatch to access needs
Reliance on Intermediaries	all problems above work through intermediaries disconnect between those with need and decision-makers and those who implement the solutions inherent issues in communicating requirements to implementers related time delays to implementation exacerbate the communication of requirements
Specialized Expertise Required	expertise and skill sets needed to implement solutions different from those of the knowledge consumer inherent issues in communicating requirements to implementers high costs for attracting necessary expertise expertise is inherently an overhead function
Slow Response Time	all problems above lead to delays, slow response timely communications, analysis, decisions suffer delays mean knowledge management is not an active “contact sport”, becomes mired and unresponsive some needs are just not requested because of these problems
Dependence on External Apps	new apps need to be identified, procured design and configuration of apps requires external expertise, programming skills multiple sourcing of apps leads to frequent incompatibilities, high costs for integration, poor interoperability
Unmet Needs	many KM needs are simply not requested by the time responses are forthcoming, needs and imperatives have moved on communications, analysis and decisions become hassles the “contact sport” of active discovery and learning is unmet
High Opportunity Costs	many KM insights are simply not discovered delays and frustration adds to costs, friction, inefficiencies no way to know the opportunity costs of what is not learned — but, surely is high
High Failure Rates	the net impact of all of the problems above is to lead to high failure rates (~60% to 70%) and unacceptable costs reliance on IT for KM has utterly and totally failed

The seeming contradiction between continued growth and expenditures for information management coupled with continued high failure rates and disappointments is really an expression of the centrality of information to the modern enterprise. The funding and growth of the IT function is itself an expression of this centrality and perceived importance. These have been abiding trends in our transition to information or knowledge economies.

Bray [2] places the fault for wasted initiatives within the culture of IT. I believe there is some truth to this — variably, of course, depending on the specific enterprise. But the real culprit, I believe, has been the past need to “intermediate” a layer of software and IT expertise between knowledge workers and their source information. A progression of tasks has been necessary — conducted over decades with advances and learning — to get paper information into electronic form, get those forms to be understood and operate in some common ways, and then to develop tools, architectures and frameworks to make sense of it. Yet, as more tasks with required specialized skills have been added to this layer, the actual gulf between worker and information has increased. For example, enterprises still require the overhead and layers of IT to write SQL to get information out and then to prepare and fix reports.

On average, IT now consumes about 4% of all enterprise expenditures and employs about 6% of enterprise workers [6]. IT has become a very thick intermediary layer, indeed! Yet, because of the advances and learning that has occurred in growing and nurturing this layer, we also now have the basis to begin to “disintermediate” the IT layer. Many, if not all, of the challenges noted in the table above can be improved by doing so.

Early Attempts at Self-service and Semantics

One current buzzword in business intelligence is “self service”. By this term is meant giving knowledge workers the tools and systems for creating reports or doing analysis on their own without needing to work through (or be frustrated by) the IT layer. Self-service software was first postulated in the 1990s as a way for information consumers and authors (typically subject-matter experts) to automate some of their knowledge management tasks. Today, it is most commonly applied to self-service reporting or self-service analytics within the BI realm.

As a general proposition, self-service BI has been more myth than reality [7]. Forrester surveys, for example, indicate that IT still develops most BI applications. Of survey respondents in 2009, 70% responded that IT develops the enterprise’s reports and dashboards [8]. However, that figure is not 100%, as it was just a decade earlier, and there is also notable success to some open source providers such as BIRT that address a wide range of reporting needs within a typical application, ranging from operational or enterprise reporting to multi-dimensional online analytical processing (OLAP).

James Kobelius [8] is particularly bullish on the application of Web 2.0 “mashup” applications to knowledge worker purposes. Under this approach, Web-based applications are used and accessed directly by knowledge workers for charting and mapping purposes using Ajax or Flash widgets, such as Google Maps. The conventional BI and KM vendors have begun to more more aggressively into this area. Some notable new entrants — such as Tableau, Factual or Good Data — are also showing the way to more direct access, more flexible reporting and analysis widgets, and cleaner service or platform designs.

These initiatives reside at the display or reporting level. There is another group, including James Kobelius, Neil Raden or Seth Earley, that have addressed how to get disparate information to talk together using ontologies. They refer to “semanticizing” such traditional practices such as master data management (MDM), “ontologizing” taxonomies, or adding Web 2.0 mashups to business intelligence. While these thoughts are moving in the right direction, and will bring incremental benefits, they still are far short of the potentials at hand.

Self-service Information Management

(click to expand)

So far, in the KM realm, the application of semantics has tended to be limited to information extraction (tagging) of text documents and first attempts at using ontologies. The tagging component is essential to enable the 80% of information presently in textual documents to become first-class citizens within business intelligence or knowledge management. The ontology efforts to date appear to be more like thin veneers over traditional taxonomies. Rather than hierarchical structures, we now see graph-oriented ones, but still intended to fulfill the same tasks of enterprise metadata and vocabulary lookups.

The ontology efforts especially are just nibbling around the edges of what can be done with semantic technologies. Rather than looking upon ontologies as just another dictionary (though that role is true), if we re-orient our thinking to make ontologies central to the KM function, a wealth of new opportunities and benefits arises.

A bit more than a year ago, we formulated the Seven Pillars of the Open Semantic Enterprise, which included ontologies and related as some of the central components. In that article [9], we noted the particular applicability of semantic technologies to the information and knowledge management functions within enterprises. We asserted the benefits for embracing the open semantic enterprise as providing the organization greater insights with lower risk, lower cost, faster deployment, and more agile responsiveness. Since that time we have been deploying such systems and documenting those benefits.

Integral to the seven pillars are those aspects that lead to the democratization of information for the knowledge worker, what combined might be called “self-service information management”. As the figure to the right shows, three of the seven pillars are essential building blocks to this capability, two pillars are further foundations to it, with the remaining two pillars only tangentially important.

What the combination of these pieces means is a fundamental change in how knowledge work is done. Through this approach, we can largely disintermediate IT from the knowledge function, can bring knowledge management directly into the hands of those who need it in real time, and fundamentally alter how knowledge management apps are designed and deployed. The best thing is these benefits are an incremental evolution, and retain the use and value of existing information assets.

Building Block #1: Adaptive Ontologies

Rather than peripheral lookup structures or thin veneers, ontologies play the central role in the design of self-service information management. We use the plural on purpose here: what is deployed is actually a library of complementary and modular ontologies that play a variety of roles. Combined, we call these libraries with their representative functions adaptive ontologies.

This library contains the expected and conventional domain ontologies. These represent the actual knowledge space for the domain at hand, and may be comprised of multiple different ontologies representing different domain or knowledge spaces. These standard semantic Web ontologies may range from the small and simple to the large and complex, and may perform the roles of defining relationships among concepts, integrating instance data, orienting to other knowledge and domains, or mapping to other schema.

From a best practices standpoint [10], we take special care in constructing these domain ontologies such that we provide labels and cues for user interfaces. Some of the user interface considerations that can be driven by adaptive ontologies include: attribute labels and tooltips; navigation and browsing structures and trees; menu structures; auto-completion of entered data; contextual dropdown list choices; spell checkers; online help systems; etc. We also include a variety of synonyms and aliases (the combination of which we call semsets) for referring to concepts and instances in multiple ways and for aiding information extraction and tagging functions. (In addition to organizing and helping to interoperate contributing information, these domain ontologies are also used for what is called ontology-based information extraction (OBIE) via our scones [11] system.)

In addition the library of adaptive ontologies includes some administrative ontologies that guide how instance data can be imported and inter-related (via the Instance Record Object Notation, or irON); what information types drive what widgets (via the Semantic Component Ontology, or SCO); data mapping vocabularies (UMBEL Vocabulary); how to characterize datasets; and other potential specialty functionality.

A forthcoming article will describe the composition and modularity typically found in a library of these adaptive ontologies.

In combination, these adaptive ontologies are, in effect, the “brains” of the self-service system. The best aspect of these ontologies is that they can be understood, created and maintained by knowledge workers. They constitute the only specification (other than theming, if desired) necessary to create self-service knowledge management environments.

Building Block #2: Ontology-driven Apps

The piece of the puzzle that implements the instruction sets within these adaptive ontologies are the ontology-driven apps, or ODapps. A recent article describes these structures in some detail [12].

ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in the adaptive ontologies. ODapps fulfill specific generic tasks, consistent with their dedicated design to respond to adaptive ontologies. For example, current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology.

Generic functionality included in these ODapps are things like filtering, setting value ranges, choosing the specific display view, and invoking or not various display templates (akin to the infoboxes on Wikipedia). By nature of the data and the ontologies submitted to them, the ODapp signals to the user or consumer what displays, views, filters or slices-and-dices might be available to them. Fed different data and different ontologies, the ODapp would signal the user differently.

Because of their generic design, driven by the ontologies, only a relatively small number of ODapps needs to be created. Once created with appropriate generic functionality, application development is essentially over. It is through the additions and changes to the adaptive ontologies — done by knowledge workers themselves — that new capability and structure gets exposed through these ontology-driven apps. This innovation shifts the locus from software and programming to data and knowledge structures.

This democratization of IT means that everything in the knowledge management realm can become self service. Users and consumers can create their own analyses; develop their own reports; and package and disseminate what they and their colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we turn prior software engineering practice on its head.

Building Block #3: Open World Assumption

Integral to this design is the embrace of the open world assumption [13]. Though not a specific artifact, as are adaptive ontologies or ODapps, the open-world approach is the logical underpinning that allows consumers or knowledge workers to add new information to the system as it is discovered or scoped. This nuance may sound esoteric, but traditional KM systems have a very different underpinning that leads to some nasty implications.

Because the predominant share of KM systems are based on relational database systems, they embody a closed-world design. This works well for transaction systems or environments where the information domain is known and bounded, but does not apply to knowledge and changing information. Moreover, the schema that govern closed-world designs are brittle and hard to change and manage. It is this fact that has put KM squarely in the bailiwick of IT and has often led to delays and frustrations. Re-architecting or adding new schema views to an existing closed-world system can be fiendishly difficult.

This difficulty is a major reason why IT resists casual or constant changes to underlying data schema. Unfortunately, this makes these brittle schema difficult to extend and therefore generally unresponsive to changing and growing knowledge. As an environment for knowledge management, the relational data system and the closed-world approach are lousy foundations.

Other Building Blocks

As the self-service information management diagram above shows, RDF and Web services are two further important foundations. RDF (Resource Description Framework) is the canonical data model upon which all input information is represented. This means that the ODapp tools and the adaptive ontologies can work off a single model of knowledge representation. The Web service and architecture component is also helpful in that it allows Web 2.0 technologies to be brought to bear and allows distributed sources and users for the KM system. This provides scalability and distributed applicability, including on smartphones.

The other two pillars of the open semantic enterprise — the layered approach and linked data — are also helpful, but not necessarily integral to the KM and self-service perspectives presented herein.

Benefits from Self-service Information Management

The benefits and flexibilities from self-service information management extend from top to bottom; from creating data and content to publishing and deploying it. Here is a listing of available potentials for self-service, drawing comparison to the current conventional approach dependent on IT:

Information Activity	Conventional Approach (IT)	Self-service Information Management
Creating	structured data only not generally available directly to the knowledge worker	can create own datasets can extract and transform own datasets can tag and integrate non-structured (text + document) information able to handle unstructured, semi-structured and structured data alike
Annotating	not generally provided	completely open, flexible can define own annotation fields, annotation schema (approaches)
Analyzing	pre-canned functions structure pre-defined slow performance	all structural dimensions can be filtered all values and ranges thereof can be filtered multiple analysis display widgets selectable depending on the type of input data real-time configuration fast (nearly instantaneous) performance provision of (nearly) real-time analytics additional capabilities in inferencing and reasoning modeling and understanding of complex graph and relationships structures (e.g., social networks)
Reporting	pre-canned templates or report writers structure pre-defined	user-definable templates templates automatically assignable by types of thing being reported embeddable in Web pages, alternate presentation media styling and theming flexibility
Visualizing	very little done through IT	variety of visualization widgets available (e.g., maps, charts, graphs, networks) large-scale systems views possible visual interactions (a la Web 2.0) possible
Collaborating	very little done through IT collaboration, if done, is via separate social media	completely open variable access and permission rights by user or group built-in to the entire infrastructure
Validating	not directly done by knowledge worker user input, if done, via problem tickets with delays	can be integrated into the business process or workflow “soft” validations and ratings/rankings can also be included consistency checking satisfiability checking
Publishing	limited to pre-canned reports	any report or analysis is available for publishing documents and images and widget displays are available for publishing multiple export formats means information, slices thereof, or analysis results thereof can be embedded and integrated into multiple presentation media
Re-purposing	none directly by the knowledge worker	any report or analysis is available for re-purposing documents and images and widget displays are available for re-purposing canonical internal representations (RDF and XHTML) means available information can be deployed for a variety of purposes (Web pages, reports, documents, slide shows, etc.)
New Functionality	none known, if not already listed	semantic querying data visualization text mining and tagging categorization graph mining logic checking
Developing Apps	none via the official systems by the knowledge worker if done, via guerrilla apps	only generic apps needed many fewer and more flexible apps push issue into the background
Dashboarding	not available to most systems if available, limited number of pre-canned options	any report or analysis is available for dashboarding any widget is available for dashboarding complete structure (typing, values, sources) available for filtering, “slicing and dicing” all dashboard objects on a given canvas are linked, interoperate (selections in one widget reflected in other widgets) dashboards may be made persistent for re-use, springboarding new dashboards (as templates)

The fact that any source — internal or external — or format — unstructured, semi-structured and structured — can be brought together with semantic technologies is a qualitative boost over existing KM approaches. Further, all information is exposed in simple text formats, which means it can be readily manipulated and managed with easy to understand tools and applications. Reliance on open standards and languages by semantic technologies also leads to greater use and availability of open source systems.

In short, self-service information management approaches should be cheaper, faster, more responsive and more capable than current approaches.

Great Progress, with Ontology Management the Next Challenge

(click to expand)

Given these perspectives, hearing someone tout data-driven applications or advocate ontologies merely for metadata matching sounds positively Neanderthal. The prospects we have with semantic technologies, ontology-driven apps, and self-service information management systems mean so much more. The prospect at hand is to remake the entire knowledge management function, in the process bringing all aspects from creating and distributing knowledge products into the direct hands of the user. This is truly the democratization of information!

The absolutely fantastic news is none of this is theoretical or in the future. All pieces are presently proven, working and in hand. This is a practical vision, ready today.

Granted, like any new innovation, especially one that is infrastructural and systems-oriented, there are some weak or less-developed parts. These current gaps and needs include:

Though tools exist, the state of ontology create, edit, manage, update, delete, map and validate tools could be greatly improved [14]. As the central drivers for ODapps, a simplification of tasks geared more to the knowledge worker, and not professional ontologists, is needed (see diagram to right for some of the needed functions). Some of these developments are underway, with more desired
A relatively complete starting set of about 20 ODapps widgets is presently available. However, more are needed and for different deployment environments. BI analysis remains one weak area, as is an Ajax-based library
The number of infobox templates is small, and better (WYSIWYG or graphical) create and manage utilities would be most useful, and
User permission and authorization protocols exist, but are IP-based at present and could be beneficially expanded for different environments and use cases.

Yet, in the grand scheme of things, these gaps are relatively insignificant. The path and general architecture and design for moving forward are now clear.

Self-service information management via appropriately designed semantic technologies is now a reality. It promises to fulfill a vision of information access and control that has been frustrated for decades. We think these are exciting developments for the enterprise — and for the individual knowledge hound. We welcome your inquiries and invite you to join our open OSF group to contribute your ideas.

[1] Including going all the way back to my description of purpose for this blog back in 2005; see the AI3 Blogasbörd where I state, “One of my central arguments [in this blog] is that an inexorable trend through history has been the ‘democratization’ of information.”

[2] Tim Bray, 2010. “Doing it Wrong,” on his blog, January 5, 2010. The extensive comments are also worth a read.

[3] According to Marketwire quoting IDC, “Preliminary market sizing suggests that the business intelligence tools software market grew 2.6% in 2009 to reach $8.1 billion. Given the current market assumptions regarding the global economy and demand drivers in the BI tools software market, IDC forecasts this market to grow at a compound annual growth rate of 6.9% through 2014 to $11.3 billion.” CBR, citing Gartner, indicates the worldwide BI software market will grow 9.7 percent, reaching US$10.8 billion in 2011. Gartner also said BI platforms would continue to be one of the fastest growing software markets. For a very good background on BI, see Rochelle Shaw, 2011. “What is Business Intelligence,” posted in Database Trends and Applications, January 7, 2011.

[4] According to this article, by Antone Gonsalves, Poor Use Of Data Integration Tools Can Waste $500,000 Annually: Gartner (April 27, 2009), which reports on a recent Gartner Report, large global 2000 companies, using several data integration tools with overlapping features, can reduce costs by more than $500,000 annually by eliminating redundant software and leveraging a shared services model. In a further report by Roman Stanek, Business Intelligence Projects are Famous for Low Success Rates, High Costs and Time Overruns (April 25, 2009), Gartner is talking about a dirty little secret in the world of data integration, the fact that the data integration technology in place is based on generations of data integration technology being layered in the enterprise over the years. Thus, technology that was purchased to solve data integration problems, and reduce costs, is actually making the data integration problem more complex and no longer cost efficient.

[5] For example, see Roger Sessions, 2009. Cost of IT Failure, September 28, 2009. This analysis suggests failure rates of 65% with a total estimated worldwide cost of $6.2 trillion in 2009. Commenters have raised questions as to what constitutes failure and have questioned some of the analysis assumptions. Nonetheless, even with over-estimates, the scale of the numbers is alarming; see Jorge Dominguez, 2009. The CHAOS Report 2009 on IT Project Failure, June 16, 2009, which indicates combined failure and challenge rates for IT projects have ranged from 65% to 84% over the period 1994 to 2009; see http://www.education.state.pa.us/portal/server.pt/gateway/PTARGS_0_2_690719_0_0_18/CHAOS%20Summary%202009.pdf. Also see Dan Galorath, 2008. Software Project Failure Costs Billions; Better Estimation & Planning Can Help, June 7, 2008. In this report, Galorath compares and combines many of the available IT failure studies and summarizes that 3 of 5 IT projects do not do what they were supposed to for the expected costs, with 49% showing budget overruns, 47% showing higher than expected maintenance costs, and 41% failing to deliver expected business value; the anecdotal failure rate for years for IT projects has been claimed as 80%, with business intelligence and data warehousing particularly failure-prone areas; in 2001, a study by Mark N. Frolick and Keith Lindsey, Critical Factors for Data Warehouse Failures, for the Data Warehousing Institute noted conventional wisdom says the failure rate of data warehousing projects is 70 to 80 percent, with a then-recent study in the insurance industry found a 90-percent failure rate. This report is useful for combining many historical studies.

[6] As taken from the Gartner IT Metrics: IT Spending and Staffing Report, 2010; see http://www.slideshare.net/dellenterprise/it-spending-and-staffing-report-2010.

[7] Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see http://tdwi.org/articles/2007/10/18/the-myth-of-selfservice-bi.aspx. “Business Intelligence projects are famous for low success rates, high costs and time overruns. The economics of BI are visibly broken, and have been for years. Yet BI remains the #1 technology priority according to Gartner.”

[8] See James G. Kobielus, 2009. Mighty Mashups: Do-It-Yourself Business Intelligence For The New Economy, July 23, 2009, see http://www.corda.com/pdfs/mighty–mashups-article.pdf. In this report, Kobelius, the lead author from a Forrester study (August 2008, Global BI And Data Management Online Survey) that surveyed 82 IT decision-makers, noted that just over 70% responded that IT develops their reports and dashboards. About 57% responded that power users did such development. Only 18.3% reported that BI development is done by end users with limited BI skills. .

[9] M.K. Bergman, 2010. “Seven Pillars of the Open Semantic Enterprise,” in AI3:::Adaptive Information blog, January 12, 2010; see https://www.mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise/.

[10] There are a series of ongoing ontology best practices articles; see https://www.mkbergman.com/category/ontology-best-practices/.

[11] The scones (Subject Concept Or Named EntitieS) tagger provides information extraction of domain-specific subject concepts and entities from unstructured text. It also provides disambiguation of this information based on the context of the source information. See further http://techwiki.openstructs.org/index.php/Category:Scones.

[12] M.K. Bergman, 2011. “Ontology-Driven Apps Using Generic Applications,” in AI3:::Adaptive Information blog, March 7, 2011; see https://www.mkbergman.com/948/ontology-driven-apps-using-generic-applications/.

[13] M.K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” in AI3:::Adaptive Information blog, December 21, 2009; see https://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.

[14] M.K. Bergman, 2010. “A New Landscape in Ontology Development Tools,” in AI3:::Adaptive Information blog, Sept. 7, 2010; see https://www.mkbergman.com/909/a-new-landscape-in-ontology-development-tools/.

Posted:March 21, 2011

Noteworthy Icon Libraries for Projects and Web Mapping

An Overview of Freely Available, Comprehensive Icon Sets

structWFS It is not unusual when designing up a new project that it is important to find a consistent set of icons for user interface or mapping purposes. Full libraries or icon sets can be important because mixing and matching icons from multiple sources often conveys a bit of chaos or unprofessionalism. Structured Dynamics monitors freely available icons for these purposes and provides listings to its clients so that they may tailor and choose their own looks-and feel. The material below is the reference listing of about 20 comprehensive sets of open source icons that may be used for the open semantic framework (OSF) or sWebMap interfaces. Links to other listings are also provided. These references are kept up-to-date on the OSF TechWiki.

General Icons

Here are some consistent families of general user interface icons. While there are thousands of free icons available from many venues (check out via search engines), there are fewer that have sufficient diversity and scope to encompass most user interface needs. Since it is noticeably jarring to mix icon styles in the same interface (or, at least to do so indiscriminately), it is important to have a consistent design image. Here are the candidate choices we have found. Some are provided in either multiple size formats or in vector (generally, SVG), formats:

The Silk icons from famfamfam is a set of over 700 16-by-16 pixel icons in PNG format (144 of which are also available as GIF mini-icons, see below). This is the standard open source set used as the basis for Pastel (see below), which is used in the various conStruct tools. There are also other free icons from this site

Tango is an icon library that contains a basic set of icons for the most common usage. They come in 16×16 and 22×22 sizes, and some are scalable (vector). There are also a variety of extensions for specific purposes

Pastel SVG is an icon set based on the Silk icons noted above from FamFamFam.com. Pastel uses the same style, but comes in the sizes of 16, 24, 32, 48, 64, 72, 96, 128 or 256 pixels square; a sampling is shown below

Pastel is the standard icon set chosen for conStruct tools.

The Fugue icons by Yusuke Kamiyamane is the largest set available, and contains 3000 individual icons in 16×16 PNG format. Here is a sampling:

Alternatively, there is a smaller set of 400 icons called Diagona also available from the same designer

Nuvola is a set of 600 icons in either PNG or SVG format from David Vignoni (Icon King). The PNG come in standard sizes of 16, 22, 32, 48, 64 or 128 pixels square. Here is a sampling:

Vignoni also has an alternative set of icons with a similar feel called Oxygen.

The Crystal set of more than 1300 icons is organized into six different sizes, and is divided into the categories of actions, apps, files systems, devices and mime types. Here is a sampling:

Other Sources

According to the Open Icon Library, which has a nice gallery (but which also mixes sources), here are some other key sources of open source icons not already listed above:

Echo icon themes
nuoveXT2
Open Clip Art Library
xfce icons.

See also the icon sets used within Wikipedia itself. Lastly, and perhaps most usefully, peruse the 750+ icon sets on Icon Finder.

Map Icons

With the emergence of Web 2.0 and locational services, particularly the open API and “thumbtack” aspect of Google Maps, a new category of map markers for web mapping has emerged. This category is still new enough that an accepted terminology has not yet developed. Among other terms, here are some of the ways that these locational markers on maps have been described:

Places of interest (POIs)
Points of interest (POIs)
Pins
Pushpins
Placemarks
Thumbtacks
Markers
Location markers
Map pointers.

Here are some of the consolidated sources of open source markers now available:

This is a sampling of 120 markers or so available within the Google MyMaps API (see further this link with shadows and this full listing). All have matching shadows useful for conveying a 3D feeling:

There are also about 250 standard icons provided within the Google Earth set. You can see those listed here. Also, to see the available icon libraries in Google maps (plus some others), see this link

Map Icons Collection is a set of more than 1000 free icons to use as placemarks for POI locations on maps (originally designed for the Google Maps API). Most of these icon markers are square in aspect with a pointer, and are organized by color-coordinated categories such as numbers, cinemas, hotels, banks, etc. Here is a sampling:

The Maki icon set consists of more than 100 black and white 15×15 map markers
This listing provides three different colors in the Google Map “teardrop” style for all letters and 99 numbers
Geosilk is an extension of the standard Silk icon set noted above. It is more applicable to UI icons relating to map functions than to map markers per se
Green Map contains a set of about 170 monochrome (can be colored differently) POI markers, with an orientation to nature or ecological categories. There are also local extensions
Map Pins provides 22 alternative map pins and flags:

50 monochrome POI and map marker symbols from the US National Park Service (NPS):

There is a similar (and complementary in design) set of 50 monochrome pedestrian and transportation symbols from AIGA in cooperation with the US Department of Transportation

Dynamic Markers

Some markers can be created dynamically with the Google Map API. Here are some background articles and links:

http://mapki.com/wiki/Icon_Image_Sets
http://gmaps-utility-library.googlecode.com/svn/trunk/mapiconmaker/release/docs/
Very useful is the explanation of dynamic markers within the Google Map API

Other Listings

Various other listings, many with icons but perhaps not organized into the same uniform sets, include:

http://wiki.openstreetmap.org/wiki/Map_Icons
http://www.poi-factory.com/gps-icons
http://www.gpsdrive.de/development/map-icons/overview.en.shtml
http://commons.wikimedia.org/wiki/Category:Map_pointers
Map pin search on Icon Finder. Some specific options on Icon Finder are:
- http://www.iconfinder.com/search/?q=iconset%3Aiconslandgps
- http://www.iconfinder.com/search/?q=iconset%3Agis.

Posted:March 18, 2011

Brown Bag Lunch: ‘Structs’: Naïve Data Formats and the ABox

Writing and Sharing Data Can be Lightened Up

Ever since I first started to learn in earnest about ontology, something has been gnawing at me. The term seemed to be (shall I say?) an obtuse one whose obscurity was not the result of subtle precision or technicality, but rather one of fuzziness. As I introduced my Intrepid Guide to Ontology two years ago, I noted:

The root of the [ontology] term is the Greek ontos, or being or the nature of things. Literally and in classical philosophy, ontology was used in relation to the study of the nature of being or the world, the nature of existence. Tom Gruber, among others, made the term popular in relation to computer science and artificial intelligence about 15 years ago when he defined ontology as a “formal specification of a conceptualization.”

Simple Data Structs Since then, I have continued to find ontology one of the hardest concepts to communicate to clients and quite a muddled mess even as used by practitioners. I have come to the conclusion that this problem is not because I have failed to grasp some ephemeral nuance, but because the term as used in practice is indeed fuzzy and imprecise.

What Isn’t an Ontology?

Even two years ago, I noted more than 40 different types of information structure that have at one time or another been labelled as an example of an “ontology”:

Since then, I could add even more terms to this list.

Lack of precision as to what ontology means has meant that it has been sloppily defined. As I have harped upon many times regarding semantic Web terminology, this is a sad state of affairs for the semWeb endeavor that has meaning at the core of its purpose.

I’m pretty sure that the original intent in embracing the concept of ontology within the realm of knowledge representation was not to see this term so broadly misused or mis-applied. I suspect, as well, that if we could sharpen up our understanding and remove some of the fuzziness that we could improve communications with the lay public across many levels of the semWeb enterprise.

The Useful Distinction of the TBox and ABox

Recently, I have been looking to the semantic Web’s roots in description logics. One of my writings, Thinking ‘Inside the Box’ with Description Logics, looked at the conceptual distinctions between the so-called ‘TBox‘ and ‘ABox‘. That is, a knowledge base is a logical schema of roles and concepts and the relationships between them (the TBox), which is populated by the actual data (instances) asserting memberships and attributes (“facts”) (the ABox).

By analogy, in a conventional relational database system, the database or logical schema would correspond to the TBox; the actual data records or tables would correspond to the ABox. Often, the term ontology is used to cover both ABox and TBox statements (which, I argue, only makes the understanding of the ‘ontology’ concept more difficult).

My recent writing, Back to the Future with Description Logics, discussed at some length the advantages of keeping the TBox and ABox separate. This current article now expands on those thoughts, particularly with respect to the definition and understanding of ontology.

The starting point for this new mindset is to return to the ideas of data records or data tables v. the logical schema that is prevalent in relational databases.

So Many Structs, So Little Time

The last time I took a census, about a year ago, there were more than 100 converters of various record and data structure types to RDF [2]. These converters — also sometimes known as translators or ‘RDFizers’ — generally take some input data records with varying formats or serializations and convert them to a form of RDF serialization (such as RDF/XML or N3), often with some ontology matching or characterizations. That last census listed these converters:

RDF
- Serialization formats:
- - RDF/XML
  - N3
  - Turtle
- Automatically recognized ontologies:
- - SIOC
  - SKOS
  - FOAF
  - AtomOWL
  - Annotea
  - Music Ontology
  - Bibliographic Ontology
  - EXIF
  - vCard
  - Others
(X)HTML pages
HTML header metadata
- Dublin Core
Embedded microformats
- eRDF
- RDFa
- hCard
- hCalendar
- XFN
- xFolk
Syndication Formats:
- RSS 2.0
- Atom
- OPML
- OCS
- XBEL (for bookmarks)
GRDDL [1]

REST-style Web service APIs:
- Google Base
- Flickr
- Del.icio.us
- Ning
- Amazon
- eBay
- Freebase
- Facebook
- raw HTTP
- Etc.
Files (multitude of file formats and MIME types, including):
- MS Office
- OpenOffice
- Open Document Format
- images
- audio
- video
- Etc.
Web services:
- BPEL
- WSDL
- XBRL
- XBEL
Data exchange formats
- iCalendar
- vCard
Virtuoso VADs
OpenLink license files
Third party metadata extraction frameworks:
- Aperture
- Spotlight
- SIMILE RDFizers

Note that MIT’s SIMILE RDFizers also recognizes these formats:

JPEG → RDF
MARC/MODS → RDF
OAI-PMH → RDF
OCW → RDF
EMail → RDF
BibTEX → RDF
POM → RDF
DEB → RDF
CRW → RDF
Flat → RDF
Weather → RDF
Java → RDF
Javadoc → RDF
Jira → RDF
Subversion → RDF
Random → RDF

There is a growing list of third-party RDFizers as well:

LDIF → RDF
iCal → RDF
Palm → RDF
Outlook → RDF
RFC822 → RDF
Garmin → RDF
Fink → RDF
D2RQ
D2RMAP
XLS → RDF
CSV → RDF
RDF123
XSD → OWL
XML → RDF
MPEG-7/CS → OWL
XMP → RDF
BitTorrent → RDF
RPM → RDF
BibTeX → RDF

This wealth of formats shows the robustness of the RDF data model to capture structure and data relationships from virtually any input form. This is what makes RDF so exciting as a canonical target for getting data to interoperate.

Let’s Make this Elementary, Dr. Watson

However — and this is crucial — standard users for decades have preferred simple, text-based and human readable formats for writing and transferring their structured data.

These various forms, sometimes well specified with APIs and sometimes almost ad hoc as in spreadsheet listings, are what we call ‘structs‘. Structs can all be displayed as text and have, at minimum, explicit or inferrable key-value pairs to convey data relationships and attributes, with data types and values often noted by various white space, delimiter or other text conventions.

There is no doubt that the vast majority of extant data is found in such formats, including the results of data or information extraction from unstructured text. Indeed, even HTML and many markup languages with their angle bracket-delimited fields fall into this category.

There have literally been hundreds of various formats proposed over decades for conveying lightweight data structures. Most have been proprietary or limited to specific domains or users. Some, such as fielded text, structured text, simple declarative language (SDL), or more recently YAML or its simpler cousin JSON, have become more widely adopted and supported by formal specifications, tools or APIs. JSON, especially, is a preferred form for Web 2.0 applications.

Some, like microformats or this example BibTeX record below (with some non-standard extensions), rely less on syntax conventions and may use reserved keywords (such as AUTHOR or TITLE as shown) to signal the key type for the key-value pair:

ID_LOCAL arXiv:0711.3808
AUTHOR <a href="#Schramm_O">Oded Schramm</a>
BIBTYPE ARTICLE
ID arXiv:0711.3808
JOURNAL Electron. Res. Announc. Math. Sci.
PAGES 17--23
SUBJECTS geom
TITLE Hyperfinite graph limits
URL http://www.aimsciences.org/journals/doIpChk.jsp?paperID=3117&mode=full
URL http://www.aimsciences.org/journals/displayPapers0.jsp?comments=&pubID=221&journID=14&pubString_num=Volume:
15, 2008 Journal Issue
VOLUME 15
YEAR 2008

Some of these simple formats have been more successful than others, though none have achieved market dominance. There also appear to be few universal principles that have emerged as to syntax or format. Nonetheless, any of these various struct forms are easy for casual readers to understand and easy for domain experts to write.

For modeling and interoperability purposes, many of these forms are patently inadequate. That is why many of these simpler forms might be called “naïve”: they achieve their immediate purpose of simple relationships and communication, but require understood or explicit context in order to be meaningfully (semantically) related to other forms or data.

Yet, if we have learned nothing else with the phenomenal success of the Web it is this: simplicity trumps elegance or expressivity.

RDF and the Skinny ABox

The RDF (Resource Description Framework) data model is expressed as simple subject–predicate–object “triple” statements. That sounds fancy, but just substitute verb for predicate and noun for subject and object. In other words: Dick sees Jane; or, the ball is round. It may sound like a kindergartner reader, but it is how data can be easily represented and built up into more complex structures and stories.

RDF triples can be applied equally to all structured, semi-structured and unstructured content. RDF is clearly a most capable data model that — through its ability to be extended with further concepts and relationships (vocabulary) — can create elegant and logical structures to represent comprehensive domains and knowledge bases. Finding such a model has been a quest in my professional life; I believe we finally have a winner to facilitate data interoperability using RDF.

But RDF has not achieved the market acceptance that its suitability as a data representation model might suggest. I think there are three reasons for this:

First, RDF was first presented and “sold” as an XML serialization. This failing has been well understood for some time. This unfortunate early linkage of RDF caused confusion between data model and the XML syntax. The rather simple and incremental building blocks of triple RDF statements when presented in the nested XML syntax led to lengthy and hard-to-read specifications (for easier reading and use, see either the N3 or Turtle syntaxes)
Second, triples by definition are 50% more complicated than a key-value pair. While the basic RDF statement might be simple like a Dick-and-Jane reader, as a data specification format it is still more complex than my personal attributes of sex:Male and hair:Red and born:California. Those three “facts” can not be said nearly so quickly in RDF. And, if we also adhere to linked data, each one of these items requires a URI unique identifier to boot! It is important not to ignore the desire for simple and human readable data-specification formats
Third, as this entry began and as we will conclude, RDF and its fuzzy relationship to ontology has led to over-specification of what needs to be included in the data record. What could simply be a record specification of an object and its attributes presented as simple key-value pairs has become burdened with “ontology” and “conceptual” relationships.

Canonical forms embody all of the specification that the canon guiding them requires. What we may have failed to see in embracing RDF, however, is that getting useful data into the system need not carry all of this burden.

Lightening Up and Shifting Work to the TBox

Click for full-size

So, what does all of this have to do with my starting diatribe about the term ontology?

Whether a single database or the federation across all information known to human kind, we have data records (structs of instances) and a logical schema (ontology of concepts and relationships) by which we try to relate this information. This is a natural and meaningful split: structure and relationships v. the instances that populate that structure.

Stated this way, particularly for anyone with a relational database background, the split between schema and data is clear and obvious. Yet, the RDF, semantic Web and linked data communities have done an abysmal job of recognizing this fundamental separation of concerns.

We create “ontologies” that mix instances and schema. We insist on simple data record conversions that are burdened with relationship specifications as well. We tout a “linked data” infrastructure that is based solely on the same identity of instances without respect or attention to structure or conceptual relationships. We dismiss communities that work to express their data with useful local structures. We insist on standards and practices up and down the data staging and preparation chain that turns off the general market and makes us seem arrogant and dismissive. Frankly, in so many ways, we just don’t get it [3].

What has struck me personally over the past few months as these realizations have unfolded has been how much our own mindsets and language may be trapping us.

Does existing structured data need to be expressed as RDF in order to be useful and integrated?
Exposing linked and instance data is great, but to what end; what are the conceptual or structural schema?
Why is our standards process so inward looking and parochial (often petty)? What purpose or who does this serve?

At least for this diatribe, my essential conclusion is that we need to shift the burden of the schema and conceptual relations and (yes) world views to the TBox. We need to skinny down the ABox and make it a warm and welcoming environment by which any structured data (including the most naïve) can join.

So, ultimately, the bottom line is this: the burden of the semantic Web rests on us, not the providers of structured data.

It is time to streamline the ABox to smooth data contributions, assume as publishers the responsibility for the TBox, and keep those concerns separate. As for instance-related stuff, I now intend to refer to them as structs governed by a controlled vocabulary (at most). I intend to reserve ontology as a means to describe a given world view, a TBox, the schema and its relations of the domain at hand. And, frankly, this definition of ontology brings it back in balance with its roots in ontos and the nature of the world.

It’s a good time to lighten up!

This Friday brown bag leftover was first placed into the AI3 refrigerator on January 22, 2009, and is one of the more popular historical posts of this blog. This reprise is unchanged since its original posting, though we have continued to make progress on constructs such as irON to capture this idea. Microdata in HTML5 is also an important contribution, to which we will devote some attention in the near future.

[1] GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a W3C markup format for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT GRDDL accomodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).

[2] Also see the listing of “dynamic” RDFizers at http://esw.w3.org/topic/DynamicRDFizers.

[3] I don’t mean to imply that there are not those in the community interested in lightweight data structures or their conversion, just that they have been more of a minority to date. For example, the 5th Workshop on Scripting and Development for the Semantic Web is coming up this summer in conjunction with the 6th European Semantic Web Conference in Crete, Greece; this year’s organizers are Gunnar Aastrand Grimnes (DFKI Knowledge Management Lab), Chris Bizer (Freie U niversität Berlin) and Sören Auer (U niversität Leipzig). As other examples focusing on JSON, there are a couple of efforts to define representation conventions from Talis and GBV for serializing RDF; Jim Ley, Kanzaki Masahide and Dave Beckett (likely among others) have written simple and straightforward RDF and Turtle parsers and converters; there was a floated idea for an RDF version of JSON called RDFON that has now evolved into the TURF approach; and JDIL (JSON data integration layer) instructs how to add namespaces to JSON to enable encoding RDF. Still further examples are Beckett’s Triplr and Auer’s ASKW Triplify lightweight conversion services involving many different formats. These are all laudable efforts with good relevance to a lighter ABox approach, I think.

Posted:March 7, 2011

Ontology-Driven Apps Using Generic Applications

The Time and Technology is Here to Stand Software Engineering on its Head

Download PDF

As an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value [1]. We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing [3]. In the U.S. alone, about 2 million people work as programmers or related [4].

It goes without saying that software is a very big deal.

No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership [5]. Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception.

This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in [1]) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare.

Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps.

In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership.

The Ontology Perspective in this Mix

For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 [6]. In this review, he points to these advantages for ontology-based approaches to software engineering:

Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse
Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly
Increased reliability — formal constructs with automation reduces human error
Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain.

These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies:

Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking
Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and
Agility/flexibility — ontology-driven information systems are more flexible, because you can much more easily and reliably make changes in the model than in code.

In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 [7]. The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software [8].

Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is:

Domain modeling — this category includes the domain knowledge representations and reasoning and inference bases that are the traditional understanding of ontologies in the semantic space. The structural aspects are akin to a database schema definition; the unique aspects of ontologies reside in their logic foundations and graph structures, which offer more power in inferencing, reasoning and graph analysis than conventional approaches
Model-driven architectures (MDA) — like UML, these are platform-independent specifications that provide the functional and dataflow definitions of “models” executed by the system. These are the natural progeny of earlier CASE approaches, for example. Such systems also potentially allow graphical or visual means for building or hooking together components as a substitute to direct coding
Program specifications and excecutables — though fairly experimental at present, these approaches use the languages of RDF, OWL or direct use of logic languages to create the equivalent of executable software programs. A couple of experimental systems include Fhat and Neno, for example, point to possible future directions in this area [9]
Runtime or utility components — proper construction of ontologies can be a source for labels and prompts within user interfaces and other runtime uses. Because of the ontology basis, these contributions may also be contextual [10]
Automated agents — based on context, user choices and the governing ontologies, new instruction sets can be generated via what some term automated agents or “robots” to instruct subsequent steps in the software, including potentially analysis or validation. Mission Critical IT [11] is apparently the most advanced in this area; we discuss their ODASE approach more below
Bespoke drivers of generic applications — through using and combining a number of the aspects above, in its totality this approach is a very different paradigm, as we describe below.

When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches.

But Is Software Engineering Even the Right Focus?

For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as:

Software is merely an intermediary artifact to accomplish some given tasks. Rather than “engineering” software, the focus should be on how to fulfill those tasks in an optimal manner — and that demands a systems approach.

data access
queries
data transformations
data integration or federation
reports
other data presentations
business analysis, and
targeted, specialty functionality.

Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized [12]. Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down.

Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example [13]?

For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline.

Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset [14].

If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions.

ODapps: The Ontology-Driven Application Approach

Ontology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components [15]. However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision.

ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies [16].

ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them.

In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps).

In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago.

This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through:

An understanding of the kind of things under consideration, including their inference chains
The types of data in results sets, and how that informs the nature of the widget(s) (maps, calendars, timelines, charts, tabular reports, images, stories, media, etc.) appropriate to display and manipulate that information, and
UI and utility functions such as interface labels, mouseovers, auto-suggests, spelling suggestions, synonym matches, etc.

Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP.

Booch has maintained for years that in software design abstraction is good, but not if too abstract [1]. ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology [16].

In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact.

Built Upon an Ontology- and Web-based Architecture

Again, to cite Booch, the most fundamental software design decision is architecture [1]. In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps [17]. The major layers in the OSF stack are:

Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
scones / irON – the conversion layer, in part consisting of information extraction of subject concepts or named entities (scones) or the instance record Object Notation for conveying XML, JSON or spreadsheets (CSV) in RDF-ready form (via irON or RDFizers)
structWSF – a platform-independent suite of more than 20 RESTful Web services, organized for managing structured data datasets; it provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave
conStruct – connecting modules to enable structWSF and sComponents to be hosted/embedded in Drupal, and
sComponents – (mostly) Flex semantic components (widgets) for visualizing and manipulating structured data.

Not all of these layers or even their specifics is necessary for an ontology-driven app design [18]. However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings.

OSF is the Basis for Domain-specific Instantiations

The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework.

The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases.

What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos:

The Open Semantic Framework can Spawn Many Different Domain Instances

(click for full size)

Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets.

Lower Layers

Conversion of existing assets to RDF and canonical forms is not discussed further here. See the irON and scones documentation or the TechWiki for more information on these topics.

The structWSF Web Services Layer

The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to:

Import or export datasets
Create, update, delete (CRUD) or otherwise manage data records
Search records with full-text and faceted search
Browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
Process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions.

Here is a listing of current ODapp functions within structWSF (with links to details for each):

WSF management Web services	User-oriented Web services
Auth: Validator Auth: Lister Auth Registrar: Access Auth Registrar: WS Ontology: Create Dataset: Create Dataset: Read Dataset: Update Dataset: Delete	CRUD: Create CRUD: Read CRUD: Update CRUD: Delete Browse Search Scones SPARQL Import Export

At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection.

The Semantics Components Layer

The visualization and data display and manipulation ODapps are provided via the semantic components layer. Structured Dynamics’s sComponents are Flex-based widgets that conform to a standard, generic design. Other developers using the OSF framework are developing JavaScript versions [19]. Here is the current library (with links to details for each):

New Components	Components Extending Flex
Portable Control Application sBarChart SCO Ontology sControl sDashboard sGenericBox sLinearChart sMap sPieChart sRelationBrowser sStory scones: Story Tagging sWebMap (in development)	sHBox sImage sText

These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above.

The next animated figure shows how the basic interaction flow works with these components:

(click for full size)

Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise.

As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema:

(click for full size)

These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas.

As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

Self-service Reporting

Since self-service reporting has been such a disappointment [12], it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object).

However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:

Thing

	Product

		Camera

			Digital Camera

				SLR Digital Camera

					Olympus Evolt E520

At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes.

This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain.

It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT.

Adaptive Analysis

The Mission Critical IT reference provided earlier [11] helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics.

Standing Software Engineering on its Head

All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it.

Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures.

This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head.

What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate [20]. These are important development needs for the near term.

And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end.

What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand.

[1] This estimate is from Grady Booch, 2005. “The Complexity of Programming Models,” see http://www.cs.nott.ac.uk/~nem/complexity.pdf. He comments on the weakness of software lines of code as a meaningful measure. At the time in 2005, he estimated perhaps 800 billion lines of code has accumulated, which given growth and vagaries of such guesstimates I have updated to the 1 billion number noted.

[2] For a wildly different estimate, that has been criticized somewhat, see Blackduck Software, 2009. “Estimating the Development Cost of Open Source Software,” at http://www.blackducksoftware.com/development-cost-of-open-source. According to Blackduck’s research there are over 200,000 OSS projects on the Internet representing more than 4.9 billion lines of available code from 4,000 sites that the company monitors. Blackduck estimates that reproducing this OSS would cost $387 billion for “typical” SLOC estimating bases. While Blackduck is likely in the best place of any organization to track open source given their business model, others have criticized the estimates because only a portion (fewer than 10%, consistent with my own research) of open source projects are active, and many active projects also share significant code bases. Nonetheless, there is still a huge disparity between the 1 billion SLOC estimate in [1] and this estimate of 5 billion for open source alone. This disparity is an indicator of the measurement challenges.

[3] See IMAP, 2010. Computing & Internet Software Global Report — 2010, 40 pp, see http://imap.com/imap/media/resources/HighTechReport_WEB_89B4E29C01817.pdf. The relative splits they show for software packages and licenses, IT consulting or outsourcing are 48%, 29% and 23%, respectively, of the total shown. Note however, that Gartner estimates are as high as 2x these amounts, again showing the uncertainty of measuring software; see, for example, http://www.gartner.com/it/page.jsp?id=1209913.

[4] For this and related measures, see Business Software Alliance, 2009. Software Industry Facts and Figures, see http://www.bsa.org/country/Public%20Policy/~/media/Files/Policy/Security/General/sw_factsfigures.ashx.

[5] Simply conduct a Web search on ‘”open source” “cost of ownership”‘ to see the many studies in this area. Depending on advocacy, estimates may be as high as proprietary software to a lower, but still substantial percentage. In no cases are open source understood to be fully “free” once maintenance, upgrades, modifications, and site adaptations are considered.

[6] Michael Uschold, 2008. “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see http://mba.eci.ufmg.br/downloads/recol/FormalOntologyinInformationSystems2008.pdf.

[7] Nicola Guarino, 1998. “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1776&rep=rep1&type=pdf.

[8] See Phil Tetlow et al., eds., 2006. Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering, a W3C Editor’s Draft on Best Practices, February 11, 2006; see http://www.w3.org/2001/sw/BestPractices/SE/ODA/. UML class diagrams have close resemblance to certain ontology structures. This effort was part of a formal collaboration between W3C and the Object Management Group (OMG), which resulted among other things in the production of the Ontology Definition Metamodel (ODM). In the OMG’s model-driven architecture (MDA) initiative, models are used not only for design and maintenance purposes, but as a basis for generating executable artifacts for downstream use. The MDA approach grew out of much of the standards work conducted in the 1990s in the Unified Modeling Language (UML).

[9] Neno is a semantic network programming language and Fhat is a virtual machine that works off of it. These two projects have been largely abandoned. A related project is Ripple, a relational, stack-based dataflow language by Joshua Shinavier, which is episodically updated.

[10] Holger Knublauch of TopQuadrant has made the point that ontologies can also have runtime uses as well: “In contrast to conventional Model-Driven Architecture known from object-oriented systems, semantic applications use their data models not only at design time, but also as runtime components. The rich declarative semantics of ontological data models can be exploited to drive user interfaces and to control an application’s behavior.” See H. Knublauch, 2007. “From Ontology Design to Deployment: Semantic Application Development with TopBraid,” presented at the 2007 Semantic Technology Conference, San Jose, CA; see http://www.semantic-conference.com/2007/sessions/l5.html.

[11] Mission Critical IT describes its ODASE platform (Ontology Driven Architecture for Software Engineering) as a set of tools to facilitate the creation of working applications from a semantic business model (an ontology), using the open standards OWL, SWRL and RDF. The ODASE code generators (a.k.a “robots”) generate an API based on the business terminology defined by the OWL+SWRL+RDF business model, which the ODASE platform then uses to execute the rules and reasoning as contextual choices are made by the user. Among other links, the company has an impressive online demo that shows a consumer telecommunications purchase example; there is also a video explaining the rules basis of the ODASE framework.

[12] See Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see http://tdwi.org/articles/2007/10/18/the-myth-of-selfservice-bi.aspx.

[13] The buggy whip industry as a major economic entity ceased to exist with the introduction of the automobile, and is cited in economics and marketing as an example of an industry ceasing to exist because its market niche, and the need for its product, disappears. Not recognizing what industry or business purpose is being served is an oft-cited cause for obsolescence. Thus, software engineering is a practice that serves the creation of software, which itself is only a means to a functional end.

[14] See M. K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.

[15] See M.K. Bergman, 2010. “Seven Pillars of the Open Semantic Enterprise“, AI3:::Adaptive Information blog, January 12, 2010.

[16] See M.K. Bergman, 2009. “Ontologies as the ‘Engine’ for Data-Driven Applications“, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.

[17] Some 250 pp of complete technical documentation for these projects is provided on the Structured Dynamics’ open source OpenStructs TechWiki.

[18] For more discussion of semantic components, see F. Giasson, 2010. “Semantic Components,” in his blog, July 5, 2010. For more discussion of the layered OSF design, see M.K. Bergman, 2010. “Domain-specific Instantiations based on the Open Semantic Framework“, AI3:::Adaptive Information blog, June 17, 2010.

[19] To find these groups and follow the open source OSF developments, see xxx. So long as the basic design comports with the foundations herein, sComponents may be developed in any rich Internet application (RIA) environment.

[20] Ontology development, management and mapping is the emerging imperative in the semantic technology space. For some thoughts on how Structured Dynamics is approaching this question, see a Normative Landscape of Ontology Tools on the TechWiki.

Main Links

Search

Self-service Information Management for Knowledge Workers

Decades-long Mismatches Between KM and IT

Early Attempts at Self-service and Semantics

Self-service Information Management

Building Block #1: Adaptive Ontologies

Building Block #2: Ontology-driven Apps

Building Block #3: Open World Assumption

Other Building Blocks

Benefits from Self-service Information Management

Great Progress, with Ontology Management the Next Challenge

An Overview of Freely Available, Comprehensive Icon Sets

General Icons

Other Sources

Map Icons

Dynamic Markers

Other Listings

Writing and Sharing Data Can be Lightened Up

What Isn’t an Ontology?

The Useful Distinction of the TBox and ABox

So Many Structs, So Little Time

Let’s Make this Elementary, Dr. Watson

RDF and the Skinny ABox

Lightening Up and Shifting Work to the TBox

The Time and Technology is Here to Stand Software Engineering on its Head

The Ontology Perspective in this Mix

But Is Software Engineering Even the Right Focus?

ODapps: The Ontology-Driven Application Approach

Built Upon an Ontology- and Web-based Architecture

OSF is the Basis for Domain-specific Instantiations

Lower Layers

The structWSF Web Services Layer

The Semantics Components Layer

Self-service Reporting

Adaptive Analysis

Standing Software Engineering on its Head