# Ontology-Driven Apps Using Generic Applications

## The Time and Technology is Here to Stand Software Engineering on its Head

As an information society we have become a software society. Software is everywhere, from our phones and our desktops, to our cars, homes and every location in between. The amount of software used worldwide is unknowable; we do not even have agreed measures to quantify its extent or value [1]. We suspect there are at least 1 billion lines of code that have accumulated over time [1,2]. On the order of $875 billion was spent worldwide on software in 2010, of which about half was for packaged software and licenses and the rest for programmer services, consulting and outsourcing [3]. In the U.S. alone, about 2 million people work as programmers or related [4]. It goes without saying that software is a very big deal. No matter what the metrics, it is expensive to develop and maintain software. This is also true for open source, which has its own costs of ownership [5]. Designing software faster with fewer mistakes and more re-use and robustness have clearly been emphases in computer science and the discipline of programming from its inception. This attention has caused a myriad of schools and practices to develop over time. Some of the earlier efforts included computer-aided software engineering (CASE) or Grady Booch’s (already cited in [1]) object-oriented design (OOD). Fourth-generation languages (4GLs) and rapid application development (RAD) were popular in the 1980s and 1990s. Most recently, agile software development or extreme programming have grabbed mindshare. Altogether, there are dozens of software development philosophies, each with its passionate advocates. These express themselves through a variety of software development methodologies that might be characterized or clustered into the prototyping or waterfall or spiral camps. In all instances, of course, the drivers and motivations are the same: faster development, more re-use, greater robustness, easier maintainability, and lower development costs and total costs of ownership. ### The Ontology Perspective in this Mix For at least the past decade, ontologies and semantic Web-related approaches have also been part of this mix. A good summary of these efforts comes from Michael Uschold in an invited address at FOIS 2008 [6]. In this review, he points to these advantages for ontology-based approaches to software engineering: • Re-use — abstract/general notions can be used to instantiate more concrete/specific notions, allowing more reuse • Reduced development times — producing software artifacts that are closer to how we think, combined with reuse and automation that enables applications to be developed more quickly • Increased reliability — formal constructs with automation reduces human error • Decreased maintenance costs — increased reliability and the use of automation to convert models to executable code reduces errors. A formal link between the models and the code makes software easier to comprehend and thus maintain. These first four items are similar to the benefits argued for other software engineering methodologies, though with some unique twists due to the semantic basis. However, Uschold also goes on to suggest benefits for ontology-based approaches not claimed by other methodologies: • Reduced conceptual gap — application developers can interact with the tools in a way that is closer to their thinking • Facilitate automation — formal structures are amenable to automated reasoning, reducing the load on the human, and • Agility/flexibility — ontology-driven information systems are more flexible, because you can much more easily and reliably make changes in the model than in code. In making these arguments, Uschold picks up on the “ontology-driven information systems” moniker first put forward by Nicola Guarino in 1998 [7]. The ideas around ODIS have had substantial impact on the semantic Web community, especially in the use of formal ontologies and modeling approaches. The FOIS series of conferences, and most recently the ODiSE series, have been spawned from these ideas. There is also, for example, a fairly rich and developed community working on the integration of UML via ontologies as the drivers or specifiers of software [8]. Yet, as Uschold is careful to point out, the idea of ODIS extends beyond software engineering to encompass all of information systems. My own categorization of how ontologies may contribute to information systems is: 1. Domain modeling — this category includes the domain knowledge representations and reasoning and inference bases that are the traditional understanding of ontologies in the semantic space. The structural aspects are akin to a database schema definition; the unique aspects of ontologies reside in their logic foundations and graph structures, which offer more power in inferencing, reasoning and graph analysis than conventional approaches 2. Model-driven architectures (MDA) — like UML, these are platform-independent specifications that provide the functional and dataflow definitions of “models” executed by the system. These are the natural progeny of earlier CASE approaches, for example. Such systems also potentially allow graphical or visual means for building or hooking together components as a substitute to direct coding 3. Program specifications and excecutables — though fairly experimental at present, these approaches use the languages of RDF, OWL or direct use of logic languages to create the equivalent of executable software programs. A couple of experimental systems include Fhat and Neno, for example, point to possible future directions in this area [9] 4. Runtime or utility components — proper construction of ontologies can be a source for labels and prompts within user interfaces and other runtime uses. Because of the ontology basis, these contributions may also be contextual [10] 5. Automated agents — based on context, user choices and the governing ontologies, new instruction sets can be generated via what some term automated agents or “robots” to instruct subsequent steps in the software, including potentially analysis or validation. Mission Critical IT [11] is apparently the most advanced in this area; we discuss their ODASE approach more below 6. Bespoke drivers of generic applications — through using and combining a number of the aspects above, in its totality this approach is a very different paradigm, as we describe below. When we look at this list from the standpoint of conventional software or software engineering, we see that #1 shares overlaps with conventional database roles and #2, #3 and #4 with conventional programmer or software engineering responsibilities. The other portions, however, are quite unique to ontology-based approaches. ### But Is Software Engineering Even the Right Focus? For decades, issues related to how to develop apps better and faster have been proposed and argued about. We still have the same litany of challenges and issues from expense to re-use and brittleness. And, unfortunately, despite many methodologies du jour, we still see bottlenecks in the enterprise relating to such matters as: Software is merely an intermediary artifact to accomplish some given tasks. Rather than “engineering” software, the focus should be on how to fulfill those tasks in an optimal manner — and that demands a systems approach. • data access • queries • data transformations • data integration or federation • reports • other data presentations • business analysis, and • targeted, specialty functionality. Promises such as self-service reporting touted at the inception of data warehousing two decades ago are still to be realized [12]. Enterprises still require the overhead and layers of IT to write SQL for us and prepare and fix reports. If we stand back a bit, perhaps we can come to see that the real opportunity resides in turning the whole paradigm of software engineering upside down. Our objective should not be software per se. Software is merely an intermediary artifact to accomplish some given task. Rather than engineering software, the focus should be on how to fulfill those tasks in an optimal manner. How can we keep the idea of producing software from becoming this generation’s new buggy whip example [13]? For reasons we delve into a bit more below, it perhaps has required a confluence of some new semantic technologies and ontologies to create the opening for a shift in perspective. That shift is one from software as an objective in itself to one of software as merely a generic intermediary in an information task pipeline. Though this shift may not apply (at least with current technologies) to transactional and process-based software, I submit it may be fundamental to the broad category of knowledge management. KM includes such applications as business intelligence, data warehousing, data integration and federation, enterprise information integration and management, competitive intelligence, knowledge representation, and so forth. These are the real areas where integration and reports and queries and analysis remain frustrating bottlenecks for knowledge workers. And, interestingly, these are also the same areas most amenable to embracing an open world (OWA) mindset [14]. If we stand back and take a systems perspective to the question of fulfilling functional KM tasks, we see that the questions are both broader and narrower than software engineering alone. They are broader because this systems perspective embraces architecture, data, structures and generic designs. The questions are narrower because software — within this broader context — can be now be generalized as artifacts providing the fulfillment of classes of functions. ### ODapps: The Ontology-Driven Application Approach Ontology-driven applications — or ODapps for short — based on adaptive ontologies are a topic we have been nibbling around and discussing for some time. In our oft-cited seven pillars of the semantic enterprise we devote two pillars specifically (#4 and #3, respectively) to these two components [15]. However, in keeping with the systems perspective relevant to a transition from software engineering to generic apps, we should also note that canonical data models (via RDF) and a Web-oriented architecture are two additional pillars in the vision. ODapps are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies as noted under #1 above), as supplemented by the UI and instruction sets and validations and rules (as noted under #4 and #5 above). The combination of these specifications as provided by both properly constructed domain ontologies and supplementary utility ontologies is what we collectively term adaptive ontologies [16]. ODapps fulfill specific generic tasks, consistent with their bespoke design (#6 above) to respond to adaptive ontologies. Examples of current ontology-driven apps include imports and exports in various formats, dataset creation and management, data record creation and management, reporting, browsing, searching, data visualization and manipulation (through libraries of what we call semantic components), user access rights and permissions, and similar. These applications provide their specific functionality in response to the specifications in the ontologies fed to them. ODapps are designed more similarly to widgets or API-based frameworks than to the dedicated software of the past, though the dedicated functionality (e.g., graphing, reporting, etc.) is obviously quite similar. The major change in these ontology-driven apps is to accommodate a relatively common abstraction layer that responds to the structure and conventions of the guiding ontologies. The major advantage is that single generic applications can supply shared functionality based on any properly constructed adaptive ontology. In fact, the widget idea from Web 2.0 is a key precursor to the ODapps design. What we see in Web 2.0 are dedicated single-purpose widgets that perform a display operation (such as Google Maps) based on the properly structured data fed to them (structured geolocational information in the case of GMaps). In Structured Dynamics‘ early work with RDF-based applications by our predecessor company, Zitgist, we demonstrated how the basic Web 2.0 widget idea could be extended by “triggering” which kind of mashup widget got invoked by virtue of the data type(s) fed to it. The Query Builder presented contextual choices for how to build a SPARQL query via UI based on what prior dropdown list choices were made. The DataViewer displayed results with different widgets (maps, profiles, etc.) depending on which part of a query’s results set was inspected (by responding to differences in data types). These two apps, in our opinion, remain some of the best developed in the semantic Web space, even though development on both ceased nearly four years ago. This basic extension of data-driven applications — as informed by a bit more structure — naturally evolved into a full ontology-driven design. We discovered that — with some minor best practice additions to conventional ontologies — we could turn ontologies into powerhouses that informed applications through: • An understanding of the kind of things under consideration, including their inference chains • The types of data in results sets, and how that informs the nature of the widget(s) (maps, calendars, timelines, charts, tabular reports, images, stories, media, etc.) appropriate to display and manipulate that information, and • UI and utility functions such as interface labels, mouseovers, auto-suggests, spelling suggestions, synonym matches, etc. Like the earlier Zitgist discoveries, basing the applications on only one or two canonical data models and serializations (RDF and a simple data exchange XML, which Fred Giasson calls structXML) provides the input uniformity to make a library of generic applications tractable. And, embedding the entire framework in a Web-oriented architecture means it can be distributed and deployed anywhere accessible by HTTP. Booch has maintained for years that in software design abstraction is good, but not if too abstract [1]. ODapps are a balanced abstraction within the framework of canonical architectures, data models and data structures. This design thus limits software brittleness and maximizes software re-use. Moreover, it shifts the locus of effort from software development and maintenance to the creation and modification of knowledge structures. The KM emphasis can shift from programming and software to logic and terminology [16]. In the sub-sections below, we peel back some portions of this layered design to unveil how some of these major pieces interact. #### Built Upon an Ontology- and Web-based Architecture Again, to cite Booch, the most fundamental software design decision is architecture [1]. In the case of Structured Dynamics and its support for ODapps, its open semantic framework (OSF) is embedded in a Web-oriented architecture (WOA). The OSF itself is a layered design that proceeds from a kernel of existing assets (data and structures) and proceeds through conversion to Web service access, and then ontology organization and management via ODapps [17]. The major layers in the OSF stack are: • Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise • scones / irON – the conversion layer, in part consisting of information extraction of subject concepts or named entities (scones) or the instance record Object Notation for conveying XML, JSON or spreadsheets (CSV) in RDF-ready form (via irON or RDFizers) • structWSF – a platform-independent suite of more than 20 RESTful Web services, organized for managing structured data datasets; it provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack • Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave • conStruct – connecting modules to enable structWSF and sComponents to be hosted/embedded in Drupal, and • sComponents – (mostly) Flex semantic components (widgets) for visualizing and manipulating structured data. Not all of these layers or even their specifics is necessary for an ontology-driven app design [18]. However, the general foundations of generic apps, properly constructed adaptive ontologies, and canonical data models and structures should be preserved in order to operationalize ODapps in other settings. #### OSF is the Basis for Domain-specific Instantiations The power of this design is that by swapping out adaptive ontologies and relevant data, the entire OSF stack as is can be used to deploy multiple instantiations. Potential uses can be as varied as the domain coverage of the domain ontologies that drive this framework. The OSF semantic framework is a completely open and generic one. The same set of tools and capabilities can be applied to any domain that needs to manage and understand information in its own domain. With the existing ODApps in hand, this includes from unstructured text or documents to conventional structured databases. What changes from domain to domain are the data structures (the ontologies, schema and entity references) and their instance data (which can also be converted from existing to canonical forms). Here is an illustration of how this generic framework can be leveraged for different deployments. Note that Citizen Dan is a local government example of the OSF framework with relatively complete online demos: (click for full size) Structured Dynamics continues to wrinkle this basic design for different clients and different industries. As we round out the starting set of ODapps (see below), the major effort in adapting this generic design to different uses is to tailor the ontologies and “RDFize” existing data assets. #### Lower Layers Conversion of existing assets to RDF and canonical forms is not discussed further here. See the irON and scones documentation or the TechWiki for more information on these topics. #### The structWSF Web Services Layer The first suite of ODapps occurs at the structWSF Web services layer. structWSF provides a set of generic functions and endpoints to: • Import or export datasets • Create, update, delete (CRUD) or otherwise manage data records • Search records with full-text and faceted search • Browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or • Process results sets through workflows of various natures, involving specialized analysis, information extraction or other functions. Here is a listing of current ODapp functions within structWSF (with links to details for each):  WSF management Web services User-oriented Web services At this level the information access and processing is done largely on the basis of structured results sets. Other visualization and display ODapps are listed in the next subsection. #### The Semantics Components Layer The visualization and data display and manipulation ODapps are provided via the semantic components layer. Structured Dynamics’s sComponents are Flex-based widgets that conform to a standard, generic design. Other developers using the OSF framework are developing JavaScript versions [19]. Here is the current library (with links to details for each):  New Components Components Extending Flex Portable Control Application sBarChart SCO Ontology sControl sDashboard sGenericBox sLinearChart sMap sPieChart sRelationBrowser sStory scones: Story Tagging sWebMap (in development) These components can be used in combination with any of the structWSF ODapps, meaning the filtering, searching, browsing, import/export, etc., may be combined as an input or output option with the above. The next animated figure shows how the basic interaction flow works with these components: (click for full size) Using the ODapp structure it is possible to either “drive” queries and results sets selections via direct HTTP request via endpoints (not shown) or via simple dropdown selections on HTML forms or Flex widgets (shown). This design enables the entire system to be driven via simple selections or interactions without the need for any programming or technical expertise. As the diagram shows, these various sComponents get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes. An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the sComponents. When combined with the results set data, and attribute information in the irON ontology, plus the domain understanding in the domain ontology, a synthetic schema is constructed that instructs what the interface may do next. Here is an example schema: (click for full size) These instructions are then presented to the sControl component, which determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. As new user interactions occur with the resulting displays and components, the iteration cycle is generated anew, again starting a new cycle of queries and results sets. Importantly, as these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations. #### Self-service Reporting Since self-service reporting has been such a disappointment [12], it is worth noting another aspect from this ODapp design. Every “thing” that can be presented in the interface can have a specific display template associated with it. Absent another definition, for example, any given “thing” will default to its parental type (which, ultimate, is “Thing”, the generic template display for anything without a definition; this generally defaults to a presentation of all attributes for the object). However, if more specific templates occur in the inference path, they will be preferentially used. Here is a sample of such a path:  Thing Product Camera Digital Camera SLR Digital Camera Olympus Evolt E520 At the ultimate level of a particular model of Olympus camera, its display template might be exactly tailored to its specifications and attributes. This design is meant to provide placeholders for any “thing” in any domain, while also providing the latitude to tailor and customize to every “thing” in the domain. It is critical that generic apps through an ODapp approach also provide the underpinnings for self-service reporting. The ultimate metric is whether consumers of information can create the reports they need without any support or intervention by IT. #### Adaptive Analysis The Mission Critical IT reference provided earlier [11] helps point to the potentials of this paradigm in a different way. Mission Critical also shows user interfaces contextually chosen based on prior selections. But they extend that advantage with context-specific analysis and validation through the SWRL rules-base semantic language. This is an exciting extension of the base paradigm that confirms the applicability of this approach to business intelligence and general enterprise analytics. ### Standing Software Engineering on its Head All of this points to a very exciting era for enterprise and consumer apps moving into the future. We perhaps should no longer talk about “killer apps”; we can shift our focus to the information we have at hand and how we want to structure and analyze it. Using ontologies to write or specify code or to compete as an alternative to conventional software engineering approaches seems too much like more of the same. The systems basis in which such methodologies such as MDA reside have not fixed the enterprise software challenges of decades-long standing. Rather, a shift to generic applications driven by adaptive ontologies — ODapps — looks to shift the locus from software and programming to data and knowledge structures. This democratization of IT means that everything in the knowledge management realm can become “self service.” We can create our own analyses; develop our own reports; and package and disseminate what we and our colleagues need, when they need it. Through ontology-driven apps and adaptive ontologies, we can turn prior decades of software engineering practices on their head. What Structured Dynamics and a handful of other vendors are showing is by no means yet complete. Our roster of ODapp widgets and templates still needs much filling out. The toolsets available for creating, maintaining, mapping and extending the ontologies underlying these systems are still woefully inadequate [20]. These are important development needs for the near term. And, of course, none of this means the end of software development either. Process and transactions systems still likely reside outside of this new, emerging paradigm. Creating great and solid generic ODapps still requires software. Further, ODapps and their potential are completely silent on how we create that software and with what languages or methodologies. The era of software engineering is hardly at an end. What is exceptionally powerful about the prospects in ontology-driven apps is to speed time to understanding and place information manipulation directly in the hands of the knowledge worker. This is a vision of information access and control that has been frustrated for decades. Perhaps, with ontologies and these semantic technologies, that vision is now near at hand. [1] This estimate is from Grady Booch, 2005. “The Complexity of Programming Models,” see http://www.cs.nott.ac.uk/~nem/complexity.pdf. He comments on the weakness of software lines of code as a meaningful measure. At the time in 2005, he estimated perhaps 800 billion lines of code has accumulated, which given growth and vagaries of such guesstimates I have updated to the 1 billion number noted. [2] For a wildly different estimate, that has been criticized somewhat, see Blackduck Software, 2009. “Estimating the Development Cost of Open Source Software,” at http://www.blackducksoftware.com/development-cost-of-open-source. According to Blackduck’s research there are over 200,000 OSS projects on the Internet representing more than 4.9 billion lines of available code from 4,000 sites that the company monitors. Blackduck estimates that reproducing this OSS would cost$387 billion for “typical” SLOC estimating bases. While Blackduck is likely in the best place of any organization to track open source given their business model, others have criticized the estimates because only a portion (fewer than 10%, consistent with my own research) of open source projects are active, and many active projects also share significant code bases. Nonetheless, there is still a huge disparity between the 1 billion SLOC estimate in [1] and this estimate of 5 billion for open source alone. This disparity is an indicator of the measurement challenges.
[3] See IMAP, 2010. Computing & Internet Software Global Report — 2010, 40 pp, see http://imap.com/imap/media/resources/HighTechReport_WEB_89B4E29C01817.pdf. The relative splits they show for software packages and licenses, IT consulting or outsourcing are 48%, 29% and 23%, respectively, of the total shown. Note however, that Gartner estimates are as high as 2x these amounts, again showing the uncertainty of measuring software; see, for example, http://www.gartner.com/it/page.jsp?id=1209913.
[4] For this and related measures, see Business Software Alliance, 2009. Software Industry Facts and Figures, see http://www.bsa.org/country/Public%20Policy/~/media/Files/Policy/Security/General/sw_factsfigures.ashx.
[5] Simply conduct a Web search on ‘”open source” “cost of ownership”‘ to see the many studies in this area. Depending on advocacy, estimates may be as high as proprietary software to a lower, but still substantial percentage. In no cases are open source understood to be fully “free” once maintenance, upgrades, modifications, and site adaptations are considered.
[6] Michael Uschold, 2008. “Ontology-Driven Information Systems: Past, Present and Future,” in Proceedings of the Fifth International Conference on Formal Ontology in Information Systems (FOIS 2008), Carola Eschenbach and Michael Grüninger, eds., IOS Press, Amsterdam, Netherlands, pp 3-20; see http://mba.eci.ufmg.br/downloads/recol/FormalOntologyinInformationSystems2008.pdf.
[7] Nicola Guarino, 1998. “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. Amsterdam, IOS Press, pp. 3-15; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1776&rep=rep1&type=pdf.
[8] See Phil Tetlow et al., eds., 2006. Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering, a W3C Editor’s Draft on Best Practices, February 11, 2006; see http://www.w3.org/2001/sw/BestPractices/SE/ODA/. UML class diagrams have close resemblance to certain ontology structures. This effort was part of a formal collaboration between W3C and the Object Management Group (OMG), which resulted among other things in the production of the Ontology Definition Metamodel (ODM). In the OMG’s model-driven architecture (MDA) initiative, models are used not only for design and maintenance purposes, but as a basis for generating executable artifacts for downstream use. The MDA approach grew out of much of the standards work conducted in the 1990s in the Unified Modeling Language (UML).
[9] Neno is a semantic network programming language and Fhat is a virtual machine that works off of it. These two projects have been largely abandoned. A related project is Ripple, a relational, stack-based dataflow language by Joshua Shinavier, which is episodically updated.
[10] Holger Knublauch of TopQuadrant has made the point that ontologies can also have runtime uses as well: “In contrast to conventional Model-Driven Architecture known from object-oriented systems, semantic applications use their data models not only at design time, but also as runtime components. The rich declarative semantics of ontological data models can be exploited to drive user interfaces and to control an application’s behavior.” See H. Knublauch, 2007. “From Ontology Design to Deployment: Semantic Application Development with TopBraid,” presented at the 2007 Semantic Technology Conference, San Jose, CA; see http://www.semantic-conference.com/2007/sessions/l5.html.
[11] Mission Critical IT describes its ODASE platform (Ontology Driven Architecture for Software Engineering) as a set of tools to facilitate the creation of working applications from a semantic business model (an ontology), using the open standards OWL, SWRL and RDF. The ODASE code generators (a.k.a “robots”) generate an API based on the business terminology defined by the OWL+SWRL+RDF business model, which the ODASE platform then uses to execute the rules and reasoning as contextual choices are made by the user. Among other links, the company has an impressive online demo that shows a consumer telecommunications purchase example; there is also a video explaining the rules basis of the ODASE framework.
[12] See Wayne W. Eckerson, 2007. “The Myth of Self-Service Business Intelligence,” in TDWI Online, October 18, 2007; see http://tdwi.org/articles/2007/10/18/the-myth-of-selfservice-bi.aspx.
[13] The buggy whip industry as a major economic entity ceased to exist with the introduction of the automobile, and is cited in economics and marketing as an example of an industry ceasing to exist because its market niche, and the need for its product, disappears. Not recognizing what industry or business purpose is being served is an oft-cited cause for obsolescence. Thus, software engineering is a practice that serves the creation of software, which itself is only a means to a functional end.
[14] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[15] See M.K. Bergman, 2010. Seven Pillars of the Open Semantic Enterprise, AI3:::Adaptive Information blog, January 12, 2010.
[16] See M.K. Bergman, 2009. Ontologies as the ‘Engine’ for Data-Driven Applications, AI3:::Adaptive Information blog, June 10, 2009, for the first presentation of these topics, but the specific term adaptive ontology was not yet used. That term was first introduced in “Confronting Misconceptions with Adaptive Ontologies” (August 17, 2009). The dedicated treatment of these topics and their interplay was provided in M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies”, AI3:::Adaptive Information blog, November 23, 2009. The relation of these topics to enterprise software was first presented in M.K. Bergman, 2009. “Fresh Perspectives on the Semantic Enterprise”, AI3:::Adaptive Information blog, September 28, 2009.
[17] Some 250 pp of complete technical documentation for these projects is provided on the Structured Dynamics’ open source OpenStructs TechWiki.
[18] For more discussion of semantic components, see F. Giasson, 2010. “Semantic Components,” in his blog, July 5, 2010. For more discussion of the layered OSF design, see M.K. Bergman, 2010. Domain-specific Instantiations based on the Open Semantic Framework, AI3:::Adaptive Information blog, June 17, 2010.
[19] To find these groups and follow the open source OSF developments, see xxx. So long as the basic design comports with the foundations herein, sComponents may be developed in any rich Internet application (RIA) environment.
[20] Ontology development, management and mapping is the emerging imperative in the semantic technology space. For some thoughts on how Structured Dynamics is approaching this question, see a Normative Landscape of Ontology Tools on the TechWiki.

# Peg Community Indicator System Unveiled to Enthusiasm

This past Friday the Peg project was unveiled for the first time to an enthusiastic welcome at the Winnipeg Poverty Reduction Partnership Forum. A beta version of its website (www.mypeg.ca) was also launched. Peg is an innovative Web portal for community indicators of well-being for the city of Winnipeg, Manitoba. First conceived in 2002, with much subsequent refinement, its strong consortium of members from the local community and recent backing have now allowed it to be shared with the public.

Since early this year, Structured Dynamics has been the lead technical contractor on the project. But Peg is about people and involvement, not technology. Peg is an effort of community and perspectives and information and stories, all designed to coalesce how to make Winnipeg a better community moving forward. So, while the technology underlying the site is innovative (yes, we’re proud ), more so is the effort and vision of the community making it happen. Though just a beta release, the current site and the commitment behind it points to some exciting future developments.

Here is the main screen for Peg (clicking on any of the screen captures below will take you directly to the relevant part of the site):

### A Community Perspective Backed by Dynamic Functionality

Winnipeg’s community indicator system (CIS) is organized around themes, cross-cutting issues that bridge across themes, and indicators and supporting data to track and measure the city’s well-being. Peg’s major themes, agreed upon after extensive community consultation, are: basic needs; health; education & learning; social vitality; governance; built environment; economy; and natural environment. In this first beta release, the emphasis has been on the cross-cutting issue of poverty and some of the indicators to track it.

The perspective being brought to bear on these questions of well-being is comprehensive and embracing. Data and demographics and quantitative indicators of well-being are matched with stories and narratives from affected parties, videos, and a variety of display and visualization options. Much of the supporting data is organized by the 236 neighborhoods in Winnipeg, or broader community areas, with comparative baselines to city, province and nation. The information is both hard and soft, and presented in engaging, exciting and dynamic ways. Using the best of current social media, Peg is meant to be a virtual meeting place and town hall for the public to share and engage one another.

This beta is but a first expression of Peg’s longer-term vision, yet already has the backbone to take on these labors. A concept explorer allows the public to explore and navigate through the entire information space; much information is mapped and presented in locational relevance; narratives and stories and videos are linked contextually to topics and issues; and many, many dashboards can be created and displayed for showing trends and comparing neighborhoods, and letting the data speak visually:

The current beta is but a start. The Peg project, in continued consultation with stakeholders, will be developing further indicators for each of its eight major themes, providing information about past and current trends, and expanding into additional cross-cutting issues. Daily, the site will see an increase in richness and relevance.

### Project Participants

Peg has been spearheaded by the United Way of Winnipeg and the International Institute for Sustainable Development (IISD), also based in Winnipeg, with the partnership of the Province of Manitoba, the City of Winnipeg, Health in Common, and a cross section of community interests and members across the city. Peg is a non-profit effort, and is embarking on a new three-year work plan to oversee further funding and expansion.

Peg is governed by a Steering Committee with budgetary and strategic responsibilities. Peg also works with an Engagement Group — a broad-based group of Winnipeggers — that serves as a testing ground for ideas, direction and policy. The site provides credits for the various entities involved and responsible for the effort.

IISD has provided overall project management for the current effort. As personal thanks, we’d especially like to recognize Connie Walker, Laszlo Pinter, Christa Rust and Charles Thrift. Tactica, also of Winnipeg, has been the lead graphics and site designer for Peg. SD has worked closely with them to ensure a smooth launch, and they’ve done a great job. Thanks to all!

### Now, This is Semantics Done Right

Of course, for more on the project, go directly to the Peg site or those of its other major participants and contributors. But, in our role as implementers of the behind-the-scenes wizardry powering the site, we would be remiss if we did not mention a couple of technical items.

As lead technical developer, SD was responsible for all data access, management, development and visualization software for the site. The site was developed in Drupal, with Virtuoso as the RDF data store and Solr for faceted site search. As part of its Open Semantic Framework, based on the Citizen Dan local government appliance, SD contributed and extended major open source software for Peg. These contributions included the structWSF Web services framework, conStruct modules for linking the system into Drupal, and the Flex-based semantic Components including the explorer, map, story viewer, browse/search, dashboard, workbench and back office widgets. We also developed the adaptive ontology driving the entire site, based on the Peg framework vocabulary already hashed out by the community participants.

During the course of the project we developed an entirely new workbench capability for creating new, persistent dashboards. We extended the sRelationBrowser semantic component with complete and flexible theming and styling; virtually all aspects of nodes, edges and behavior have now been exposed for tailoring, including fonts, colors and use of images. We enhanced the irON format to make it easier for project participants to submit spreadsheet datasets to the site for new indicator data. We will be migrating these advances to our existing open source software over the coming weeks. Check Fred Giasson’s blog for release details; he has also begun a series on the technology details.

But, in my opinion, what is most remarkable about all of this is that these bloody details are completely hidden from the user. Though real geeks can get RDF and linked data via export options, for the standard user they simple interact and experience the site. No triples are shoved in their face, no technology screams out for attention, and ne’er any URIs are to be found. The thing simply works, all the while being flexible, contextual, attractive and fun.

And that, folks, I submit, is semantics done right!

# Domain-specific Instantiations Based on the Open Semantic Framework

## Structured Dynamics Completes Design Phase; Citizen Dan First Exemplar

Structured Dynamics has been in a fervent — and, we believe, fruitful — design phase for the past 18 months. All of the working parts related to how to embrace becoming a semantic enterprise have now been defined and designed. Actual tools and components accompany many of these parts and have been deployed.

Recently, I have been speaking and blogging much about rationale, process, mindset and approach for how to bring semantics into the organization. But, prior to now, we have not spoken much about the overall design behind our approach. Today, as we complete our design phase and introduce our first exemplar instance of it — Citizen Dan [1] — we are finally in a position to describe this overall approach.

We term our approach the open semantic framework, also OSF. The open semantic framework is a combination of a layered architecture and modular software. The open semantic framework represents the software component of the four-component total open solution, recently described in a three part series. I return to this topic in the conclusion of this post.

### Revisiting Design Objectives

Over the past nine months, I have been focusing my writing largely on the semantic enterprise, with more specificity regarding our Open SEAS (Semantic Enterprise Adoption and Solutions) initiative. In bits and pieces, these writings have tended to reflect a number of objectives:

• Leverage existing information assets (data + structure) as much as possible
• Develop incrementally, and validate and justify as you go
• Emphasize, where possible, open standards and open software
• Employ Web-oriented architectures
• Adopt an open-world approach that acknowledges that information is most often incomplete; the approach is a key enabler for incremental deployments
• Use URIs as object identifiers, and use linked data where practical
• Embrace any data format found in the wild, but use RDF as the ultimate integration data model
• Design architectures and APIs that avoid “lock-in” and support multiple tools options across the stack
• Provide systems and capabilities that put all information sources — text, media, semi-structured and conventional databases — on an equal footing
• Promote designs that bring the ability to create useful results into the hands of users and decisionmakers; relegate IT to a support role.

To date, the result of these design objectives is perhaps best captured in my Seven Pillars of the Open Semantic Enterprise posting, as well as our general discussions regarding adaptive ontologies. Yet, still, these writings have been somewhat piecemeal. What this document attempts to do is to place all of these perspectives into a single, coherent whole.

### The Incremental Layers of the Open Semantic Framework

Structured Dynamics has been a strong advocate for layered architectures, with clear APIs between layers as appropriate. But these layers are not “laminates” that completely cover the layer below, nor are they all needed or necessary. Depending on the circumstance, some layers are unneeded or superfluous. Layers may be added or not incrementally.

In this manner, then, the open semantic framework is perhaps more akin to a pearl, than to a laminate or cocoon. Each subsequent layer does not “embed” the layer prior to it, and some layers actually may inter-operate with multiple layers below or above it (this is notably true for the “ontologies” layer, which has interactions up and down the stack).

Nonetheless, we can envision this pearl of the open semantic framework and its layers as follows:

(click for full size)

Others have termed this the “semantic muffin” or even “semantic muppet” or “semantic blob”. Whatever (hehe). The real idea is that layers may accrete (as in the growth of a pearl) and occur over time and be uneven. Each layer, though, does have a role to play (though it may not be needed in a given deployment), and does act to augment existing information assets in the transition to a semantic framework. Beginning at the core, each of these layers — with external references as appropriate for more details — is described below.

#### Existing Assets Layer

The open semantic framework is premised on leveraging existing information assets. Sure, once the framework is in place, new information can be brought into it in a more direct, semantic manner. But, the real thrust and benefit of this framework is to provide an incremental pathway for finally inter-operating and federating prior decades of data, structure and information assets.

These information assets may reside inside or outside the enterprise. They may (and DO!) exist in many formats and are described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is NO information asset that is not amenable to be included in this framework.

#### The Information Transformation (scones/irON) Layer

The information transformation layer provides either: 1) extraction of concepts and entities as structured metadata from source text or documents; or 2) conversion of existing data assets to interoperable form. As implemented by Structured Dynamics, the extractions are conducted by either scones (Subject Concept or Named EntitieS) or third-party utilities, and the conversions occur via irON (instance record Object Notation) or third-party “RDFizers“.

Depending on the source, the net result of the transformation is to produce interoperable data and information that can be ingested and used by other layers in the framework.

Though not strictly analogous, this layer bears some resemblance to the ETL (extract, transfer, load) utilities used in many enterprise information integration applications. Unlike those conventional systems, this information transformation layer also may capture and represent some of the source schema.

In all cases, however, these transformations are relatively simple and get parsed against the available structure (the ontologies, schema and entity reference lists) in the system to generate the semantic metadata (tags).

At this point, the extracted structure is generally at the level of instance records, or the ABox, with simple assertions of attribute-value pairs for specific records [2]. Little schema transformation or mapping occurs at this layer (if such is needed, that occurs at the structWSF layer; see next). Actual federation or interoperation occurs at later layers based on the TBox structures [2].

This modular portion of the framework is explicitly designed with APIs to allow third-party tools to be plugged in and substituted.

#### The structWSF Layer

The major workhorse of the open semantic framework is the structWSF (Web services framework) layer. structWSF is the most complicated of the OSF layers and has many supporting software packages and capabilities. The structWSF layer provides the standard, common interface (“canonical”) layer by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack.

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies; see below).

The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML [3], is used for internal communications across all structWSF Web services and with other layers.

structWSF has a central service that governs access rights and permissions. These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Datasets within a given structWSF instance may be accessed directly via API or via SPARQL queries to the instance’s endpoint. Depending on rights and query, results sets may be returned from a given structWSF instance in an infinite variety of ways.

This latter capability is the essential interface for subsequent layers in the open semantic framework stack. Depending on those subsequent components, pre-staged data and results sets may be returned for an essentially limitless variety of purposes.

Each structWSF instance also has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables structWSF instances to participate or not in potentially global or restricted local networks and collaboration environments. This is currently the largest untapped potential of structWSF with respect to its existing deployments.

#### The Semantic Components Layer

The newest layer in the stack is the semantic components layer. This layer takes results sets — most often generated by a specific query or data slice request — from one or more structWSF instances and then presents that information via a variety of data visualization or data presentation widgets (what we specifically call ‘semantic components‘ due to their design [4]). The operation and sensitivity of these display components are themselves driven by a presentation and data analysis (including statistics) ontology.

Current display widgets include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable without modification to any domain.

As presently implemented by Structured Dynamics, this layer consists either of Flex data visualization components or structured data display templates based on Smarty. The inherent design allows for updates to other bases (such as HTML5). The layer may also be swapped out or substituted with third-party capabilities.

The strength and power of this system is governed by its own ontology, the Semantic Component Ontology (SCO) (see next).

This is an extremely flexible layer in the open semantic framework stack. Expect an ongoing series of explanatory blog posts and online resources in the upcoming weeks to explain this innovative capability.

#### The Ontologies Layer

The ontologies layer actually refers to all structured assets driving the system. As such, this layer might be considered the “brain” (though rather simply specified!) of the open semantic framework.

At a true schema or TBox level [2], the ontologies layer represents the concept and relationships of the domain at hand. This layer also hosts the specific local entities and prominent things (people, places, events, etc.) useful for extracting local and domain-specific relevance. However, those views are also supplemented with some administrative ontologies (two examples are SCO and irON) that guide how the user interfaces or widgets in the system should behave.

The concept level represents the “world view” of the specific instantiation of the open semantic framework at hand. This conceptual (TBox) view provides the structural organization of information, inferencing capabilities, and navigation, faceting and explorer structure. The entity (ABox) view provides tagging for prominent individuals and instances important to the domain at hand, and guides the structure behind data visualizations of attribute or indicator data.

The administrative level uses simple roles and relationships for attributes and indicators to inform the framework as to how and with what widget to display information. For example, a “type” of information that is geographically related can be instructed to use the map component as an option for display. Whether some information is used for totals, comparison purposes, or other specifications useful to data visualization and graphing may also be specified.

The language and relationships (predicates or properties) of these administrative ontologies are simple and straightforward. It is, for example, relatively easy to define data display functions at the broad dataset and attributes level. Simple determinations drive how results sets and their associated results types may be displayed, no matter what datasets or slices may be generated as a result of the queries or requests fed to the system.

The structure in these layers can be replaced by other structures for other instantiations and circumstances. Indeed, all other layers in the open semantic framework can remain relatively fixed while tailoring the instance to new domains solely via this layer. The ontologies layer is what gives any given instantiation of OSF — such as Citizen Dan — its unique focus and scope.

#### The Content Management System (conStruct) Layer

The thinnest layer (that is, least substantial with respect to this framework) is the content management system (CMS) layer. In its current form, the open semantic framework uses the Drupal CMS via our conStruct plug-in modules. The design of the framework, however, has explicitly accommodated the possibility that other CMSs may substitute for this role.

The CMS layer is optional if structWSF endpoints are sufficient or if simple Web pages hosting semantic components are deemed as adequate. Very small organizations or deployments may reasonably choose to have no CMS layer at all.

However, for most sites or portals with more than a few active users, it is desirable to have broad flexibility in theming (“skinning”), user rights and permissions, or other functionality. These are the roles of the CMS layer. Drupal, for example, is presently supported by more than 4500 third-party modules in every conceivable function, from polling to blogs and rating systems and bulletin boards.

For such generalized portals or collaboration environments, it makes sense to adopt and install a flexible CMS system, such as Drupal. Much of the user experience and functional environment can be provided through such means.

The open semantic framework is thus designed to reside easily in a CMS while also providing the hooks to take advantage of the generalized user rights and functionality of the CMS. In this manner, the open semantic framework is able to stay focused on its structured data and interoperability purposes, while still gaining the advantages of rich-featured content management systems.

#### The OSF is a Web-oriented Architecture

With its inherent open-world orientation [5] and distributed and collaborative potential, the open semantic framework was designed from the outset to be Web-capable and Web-oriented:

(click for full size)

A Web-oriented architecture (WOA) has a number of understood requirements, to which the open semantic framework adheres. Specifically, these design considerations support the framework as being part of WOA:

• Data and objects are all identified with Web addresses (URIs)
• Data is generally exposed (and universally available) as linked data
• SPARQL endpoints and APIs are generally RESTful in design
• The overall architecture is modular, with inherent decentralized and distributed aspects
• All display and visualization aspects are cross-browser ready and capable.

### OSF is the Basis for Domain-specific Instantiations

Citizen Dan is our first exemplar instance of this open semantic framework. The details page for the project goes into some of Citizen Dan’s functionality and capabilities.

Citizen Dan is specifically geared to local governments and localities, with an emphasis on community indicator systems (CIS). CIS have become a popular way of measuring and tracking measures of local economic and social well-being; they are closely related to sustainability and how to measure it as used in many economic and environmental domains.

However, in the context of this post, what is really interesting about Citizen Dan is that its semantic framework is a completely open and generic one. The same set of tools and capabilities described on its details page can be applied to any domain that needs to manage and understand information in its own domain. This includes from unstructured text or documents to conventional structured databases.

What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists; see above) that are fed to this open semantic framework. By swapping out new structures, what can be called Citizen Dan in one instance can morph to become Curriculum Carla in say, the education instance or Doctor Doolittle in the veterinary science instance [6].

We can illustrate these multiple instances as follows:

(click for full size)

What this figure illustrates is that even a branded expression of the framework — such as Citizen Dan — is merely an instance of that framework. And, actually, when expressed in such a packaged manner, we can more accurately call the standard and bundled suite of generic functions and accompanying structure of Citizen Dan as an instantiation of the open semantic framework:

in·stan·ti·ate \in-ˈstan(t)-shē-āt\ (transitive verb) is to:

1. (transitive) to represent an abstract concept by a concrete instance
2. (transitive, object orientated computing) to create an object (an instance) of a specific class

in·stan·ti·a·tion \in-stan(t)-shē-ā-shən\ (noun) [7]

By replacing the structure bases, and by tailoring the function suite appropriate to a given market and use, we can create many instantiations of the open semantic framework for different domains and markets. In this manner, Citizen Dan can be seen as an early exemplar of the framework, but not as a definer and limiter to it.

### OSF is the Software Leg to a ‘Total Open Solution‘

So far, this discussion has focused solely on considerations of software and architecture. While we see the power of the open semantic framework, highly useful in itself, this is inadequate alone to achieve acceptance and success in the enterprise (as we noted in our most recent posts). The very forces that are compelling enterprises to look at new options, are also the same ones that pose difficult hurdle rates for acceptance of open source.

To address this issue, we have developed a four-legged foundation to what we termed the total open solution. The solution involves software, structure, documentation and methods (or best practices). Each of these connect and relate to the other foundations.

The open semantic framework is clearly the software (and architecture) leg to this foundation. Again, however, what is interesting is that the mere swapping out of the structure can also make the system relatively ready for other domains.

We see these relationships in the following diagram, that also shows that the DocWiki portions of the solution embody the documentation (aside from code-level comments) and methods legs of the foundation:

(click for full size)

Differences between domains may also lead to differences as to which components are included or not in that domain’s desired instantiation.

The hugely important implied point, however, from the diagram above, is to show how nearly universal the content and methods in the DocWiki may be to other domains. Because the deltas between domains largely result from structure and what specific functional components are included or not, it becomes clear that most documentation and practices shared with the DocWiki will be applicable across domains. Sure, the use cases and some of the specific terminology may change, but we can also now see a high degree of re-usability of documentation and knowledge base across markets. This realization makes the usefulness and leverage of the DocWiki even higher.

### A Common Language and Framework for Moving Forward

Developing “common language” by which to describe and convey things — especially new things like semantics that also have strong technical aspects — is tough, very tough. We are only now beginning on this process; we look to many in the community and elsewhere to help define informative and evocative terminology.

Per the original design objectives above, Structured Dynamics has approached the challenge of the semantic enterprise in what we think is both a pragmatic and a new way. The insistence on preserving and respecting existing information assets, matched with the opportunities and different mindsets arising from an open-world approach [5], have necessitated thinking through new designs and developing new concepts. Any time such new thinking and concepts occurs, new language and new metaphors must accompany it.

While certainly there are components and various software packages that populate and comprise an open semantic framework, the framework is also just as importantly a world view or way to think about information, information development, and its architecture. For example, a pivotal concept is that an open semantic framework is built around generic tools responsive to the information structures fed to them. This realization shifts the locus of emphasis from software development per se to creating, managing and adapting data and information structures. While this democratizes the information development process and is more inclusive of all knowledge workers, it also imposes needs for new toolsets and business processes. We are only at the nascent stages of understanding and learning about these differences.

Similarly, a development approach that is inherently incremental and leverages (rather than replaces or displaces) existing information assets means IT projects need to be considered in a new light. Small projects with more emphasis on tangible and demonstrable benefits will alter budgets, lower risks, and place a need for quicker turnaround. Like the architecture of the open semantic framework itself, projects based on OSF are also more distributed, decentralized and modular.

With such decentralization also comes the need for mechanisms and systems to overcome vendor “lock-in” and proprietary systems. A key thrust in support of what we have called the total open solution and its mixture of documentation and methods to accompany software and structure is specifically targeted at this issue. Tools and means for collaboration and concurrent contributions are another possible answer. Prior software practices in agile development and version control will see extensions to all manner of information development across the enterprise.

We are proud of our design work and proof-testing with clients over the past 18 months. We believe the open semantic framework and its implications to be a fundamental shift in how organizations need to think about their information development, existing information assets, and IT budgets and processes. We know widescale adoption is not yet at hand — enterprises are justifiably conservative when it comes to new thinking. But, given global competition and tight pocketbooks, the open semantic framework is a formulation to which enterprises and governments should pay very close attention.

[1] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.

[2] Structured Dynamics’ best practices approach makes explicit splits between the “ABox” (for instance data) and “TBox” (for ontology schema) in accordance with our working definition for description logics, a fundamental underpinning for how we use RDF:

“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”
[3] A subsequent post will document this rather straightforward XML schema.
[4] Contact Structured Dynamics for a early sneak peek. The Citizen Dan application will be publicly released as an online sandbox and demo by the end of summer 2010.
[5] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Anothe way to say is it that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[6] Of course, things are always not so simple as this. The CMS layer gives the open semantic framework the ready ability to change themes and layouts (“skins), not to mention the breadth and specifics of what ancillary site functionality might be provided. Moreover, the module basis of the open semantic framework also means that entire clusters of functionality might be dropped from a given instantiation (or added to it!) without violating or negating this framework.
[7] Dictionary references are from Merriam-Webster and Wikitionary.

# Listening to the Enterprise: Total Open Solutions, Part 1

## “We’re Successful When We’re Not Needed”

Structured Dynamics has been engaged in open source software development for some time. Inevitably in each of our engagements we are asked about the viability of open source software, its longevity, and what the business model is behind it. Of course, I appreciate our customers seemingly asking about how we are doing and how successful we are. But I suspect there is more behind this questioning than simply good will for our prospects.

Besides the general facts that most of us know — of hundreds of thousands of open source projects only a miniscule number get traction — I think there are broader undercurrents in these questions. Even with open source, and even with good code documentation, that is not enough to ensure long-term success.

When open source broke on the scene a decade or so ago [1], the first enterprise concerns were based around code quality and possible “enterprise-level” risks: security, scalability, and the fact that much open source was itself LAMP-based. As comfort grew about major open source foundations — Linux, MySQL, Apache, the scripting languages of PHP, Perl and Python (that is the very building blocks of the LAMP stack) — concerns shifted to licensing and the possible “viral” effects of some licenses to compromise existing proprietary systems.

Today, of course, we see hugely successful open source projects in all conceivable venues. Granted, most open source projects get very little traction. Only a few standouts from the hundreds of thousands of open source projects on big venues like SourceForge and Google Code or their smaller brethren are used or known. But, still, in virtually every domain or application area, there are 2-3 standouts that get the lion’s share of attention, downloads and use.

I think it fair to argue that well-documented open source code generally out-competes poorly documented code. In most circumstances, well-documented open source is a contributor to the virtuous circle of community input and effort. Indeed, it is a truism that most open source projects have very few code committers. If there is a big community, it is largely devoted to documentation and assistance to newbies on various forums.

We see some successful open source projects, many paradoxically backed by venture capital, that employ the “package and document” strategy. Here, existing open source pieces are cobbled together as more easily installed comprehensive applications with closer to professional grade documentation and support. Examples like Alfresco or Pentaho come to mind. A related strategy is the “keystone” one where platform players such as Drupal, WordPress, Joomla or the like offer plug-in architectures and established user bases to attract legions of third-party developers [2].

### OK, So What Has This to Do with the Enterprise?

I think if we stand back and look at this trajectory we can see where it is pointing. And, where it is pointing also helps define what the success factors for open source may be moving forward.

Two decades ago most large software vendors made on average 75% to 80% of their revenues from software licences and maintenance fees; quite the opposite is true today [3]. The successful vendors have moved into consulting and services. One only needs look to three of the largest providers of enterprise software of the past two decades — IBM, Oracle and HP — to see evidence of this trend.

How is it that proprietary software with its 15% to 20% or more annual maintenance fees has been so smoothly and profitably replaced with services?

These suppliers are experienced hands in the enterprise and know what any seasoned IT manager knows: the total lifecycle costs of software and IT reside in maintenance, training, uptime and adaptation. Once installed and deployed, these systems assume a life of their own, with actual use lifetimes that can approach two to three decades.

This reality is, in part, behind my standard exhortation about respecting and leveraging existing IT assets, and why Structured Dynamics has such a commitment to semantic technology deployment in the enterprise that is layered onto existing systems. But, this very same truism can also bring insight into the acceptable (or not) factors facing open source.

Great code — even if well documented — is not alone the mousetrap that leads the world to the door. Listen to the enterprise: lifecycle costs and longevity of use are facts.

But what I am saying here is not really all that earthshaking. These truths are available to anyone with some experience. What is possibly galling to enterprises is two smug positions of new market entrants. The first, which is really naïve, is the moral superiority of open source or open data or any such silly artificial distinctions. That might work in the halls of academia, but carries no water with the enterprise. The second, more cynically based, is to wrap one’s business in the patina of open source while engaging in the “wink-wink” knowledge that only the developer of that open source is in a position to offer longer term support.

Enterprises are not stupid and understand this. So, what IT manager or CIO is going to bet their future software infrastructure on a start-up with immature code, generally poor code documentation or APIs, and definitely no clear clue about their business?

### The Slow Squeeze

Yet, that being said, neither enterprises nor vendors nor software innovators that want to work with them can escape the inexorable force of open source. While it has many guises from cloud computing to social software or software as a service or a hundred other terms, the slow squeeze is happening. Big vendors know this; that is why there has been the rush to services. Start-up vendors see this; that is why most have gone consumer apps and ad-based revenue models. And enterprises know this, which is why most are doing nothing other than treading water because the way out of the squeeze is not apparent.

The purpose of this three-part series is to look at these issues from many angles. What might the absolute pervasiveness of open source mean to traditional IT functions? How can strategic and meaningful change be effected via these new IT realities in the enterprise? And, how can software developers and vendors desirous of engaging in large-scale initiatives with enterprises find meaningful business models?

### Lead-in to the Series: a Total Open Solution

And, after we answer those questions, we will rest for a day.

But, no, seriously, these are serious questions.

There is no doubt open source is here to stay, yet its maturity demands new thinking and perspectives. Just as enterprises have known that software is only the beginning of decades-long IT commitments and (sometimes) headaches, the purveyors and users of open source should recognize the acceptance factors facing broad enterprise adoption and reliance.

Open source offers the wonderful prospect of avoiding vendor “lock-in”. But, if the full spectrum of software use and adoption is also not so covered, all we have done is to unlock the initial selection and install of the software. Where do we turn for modifications? for updates? for integration with other packages? for ongoing training and maintenance? And, whatever we do, have we done so by making bets on some ephemeral start-up? (We know how IBM will answer that question.)

The first generation of open source has been a substitute for upfront proprietary licenses. After that, support has been a roll of the dice. Sure, broadly accepted open source software provides some solace because of more players and more attention, but how does this square with the prospect of decades of need?

The perverse reality in these questions is that most all early open source vendors are being gobbled up or co-opted by the existing big vendors. The reward of successful market entry is often a great sucking sound to perpetuate existing concentrations of market presence. In the end, how are enterprises benefiting?

Now, on the face of it, I think it neither positive nor negative whether an early open source firm with some initial traction is gobbled up by a big player or not. After all, small fish tend to be eaten by big fish.

But two real questions arise in my mind: One, how does this gobbling fix the current dysfunction of enterprise IT? And, two, what is a poor new open source vendor to do?

The answer to these questions resides in the concerns and anxieties that caused them to be raised in the first place. Enterprises don’t like “lock-in” but like even less seeing stranded investments. For open source to be successful it needs to adopt a strategy that actively extends its traditional basis in open code. It needs to embrace complete documentation, provision of the methods and systems necessary for independent maintenance, and total lifecycle commitments. In short, open source needs to transition from code to systems.

We call this approach the total open solution. It involves — in addition to the software, of course — recipes, methods, and complete documentation useful for full-life deployments. So, vendors, do you want to be an enterprise player with open source? Then, embrace the full spectrum of realities that face the enterprise.

### “We’re Successful When We’re Not Needed”

The actual mantra that we use to express this challenge is, “We’re Successful When We’re Not Needed“. This simple mental image helps define gaps and tells us what we need to do moving forward.

The basic premise is that any taint of lock-in or not being attentive to the enterprise customer is a potential point of failure. If we can see and avoid those points and put in place systems or whatever to overcome them, then we have increased comfort in our open source offerings.

Like good open source software, this is ultimately a self-interest position to take. If we can increase comfort in the marketplace that they can adopt and sustain our efforts without us, they will adopt them to a greater degree. And, once adopted, and when extensions or new capabilities are needed, then as initial developers with a complete grasp on the entire lifecycle challenges we become a natural possible hire. Granted, that hiring is by no means guaranteed. In fact, we benefit when there are many able players available.

In the remaining two parts of this series we will discuss all of the components that make up a total open solution and present a collaboration platform for delivering the methods and documentation portions. We’re pretty sure we don’t yet have it fully right. But, we’re also pretty sure we don’t have it wrong.

[1] Of course, stalwart open source applications such as Linux and MySQL and even the open source movement extend back about twenty years. But, it was only about a decade ago that real traction and visibility in the enterprise began.
[2] BTW, with regard to the latter, I think it notable that no semantic technology player has played or attracted third parties to any notable extent. That is possibly a topic for a later blog post!
[3] I first wrote about this five years ago (and updated it a year later), with analysis of many public vendors. See M.K. Bergman, Redux: Enterprise Software Licensing on Life Support, June 2, 2006.

# Citizen DAN, Prise Deux

## Huzzah! for Local Government Open Data, Transparency, Community Indicators and Citizen Journalism

While the Knight News Challenge is still working its way through the screening details, Structured Dynamics Citizen DAN proposal remains in the hunt. Listen to this:

To date, we have been the most viewed proposal by far (2x more than the second most viewed!!! Hooray!) and are in the top five of highest rated (have also been at #1 or #2, depending. Hooray!). Thanks to all of you for your interest and support.

There is much to recommend this KNC approach, not the least of which being able to attract some 2,500 proposals seeking a piece of the 2010 \$5 million potential grant awards. Our proposal extends SD’s basic structWSF and conStruct Drupal frameworks to provide a data appliance and network (DAN) to support citizen journalists with data and analysis at the local, community level.

None of our rankings, of course, guarantees anything. But, we also feel good about how the market is looking at these frameworks. We have recently been awarded some pretty exciting and related contracts. Any and all of these initiatives will continue to contribute to the open source Citizen DAN vision.

And, what might that vision be? Well, after some weeks away from it, I read again our online submission to the Knight News Challenge. I have to say: It ain’t too bad! (Plus many supporting goodies and details.)

So, I repeat in its entirety below, the KNC questions and our formal responses. This information from our original submittal is unchanged, except to add some live links where they could not be submitted as such before. (BTW, the bold headers are the KNC questions.) Eventual winners are slated to be announced around mid-June. We’re keeping our fingers crossed, but we are pursuing this initiative in any case.

Citizen DAN is an open source framework to leverage relevant local data for citizen journalists. It is a:

• Appliance for filtering and analyzing data specific to local community indicators
• Means to visualize local data over time or by neighborhood
• Meeting place for the public to upload and share local data and information
• Web data portal that can be individually tailored by any local community
• Node in a global network of communities across which to compare indicators of community well-being.

Good decisions and good journalism require good information. Starting with pre-loaded government data, Citizen DAN provides any citizen the framework to learn and compare local statistics and data with other similar communities. This helps to promote the grist for citizen journalism; it is also a vehicle for discovery and learning across the community.

Citizen DAN comes pre-packaged with all necessary deployment components and documentation, including local data from government sources. It includes facilities for direct upload of additional local data in formats from spreadsheets to standard databases. Many standard converters are included with the basic package.

Citizen DAN may be implemented by local governments or by community advocacy groups. When deployed, using its clear documentation, sponsors may choose whether or what portions of local data are exposed to the broader Citizen DAN network. Data exposed on the network is automatically available to any other network community for comparison and analysis purposes.

This data appliance and network (DAN) is multi-lingual. It will be tested in three cities in Canada and the US, showing its multi-lingual capabilities in English, Spanish and French.

### How will your project improve the way news and information are delivered to geographic communities?

With Citizen DAN, anyone with Web access can now get, slice, and dice information about how their community is doing and how it compares to other communities. We have learned from Web 2.0 and user-generated content that once exposed, useful information can be taken and analyzed in valuable and unanticipated ways.

The trick is to get information that already exists. Citizen journalists of the past may not have either known:

1. Where to find relevant information, or
2. How to ‘slice-and-dice’ that information to extract meaningful nuggets.

By removing these hurdles, Citizen DAN improves the ways information is delivered to communities and provides the framework for sifting through it to extract meaning.

### How is your idea innovative? (new or different from what already exists)

Government public data in electronic tabular form or as published listings or tables in local newspapers has been available for some time. While meeting strict ‘disclosure’ requirements, this information has neither been readily analyzable nor actionable.

The meaning of information lies in its interpretation and analysis.

Citizen DAN is innovative because it:

1. Is a platform for accessing and exposing available community data
2. Provides powerful Web-based tools for drilling down and mining data
3. Changes the game via public-provided data, and
4. Packages Citizen DAN in a Web framework that is available to any local citizen and requires no expertise other than clicking links.

### What experience do you or your organization have to successfully develop this project?

Structured Dynamics has already developed and released as open-source code structWSF and conStruct , the basic foundations to this proposal. structWSF provides the network and dataset “backbone” to this proposal; conStruct provides the Drupal portal and Web site framework.

To this foundation we add proven experience and knowledge of datasets and how to access them, as well as tools and converters for how to stage them for standard public use. A key expertise of Structured Dynamics is the conversion of virtually any legacy data format into interoperable canonical forms.

These are important challenges, which require experience in the semantics of data and mapping from varied forms into useful and common frameworks. Structured Dynamics has codified its expertise in these areas into the software underlying Citizen DAN.

Structured Dynamics’ principals are also multi-lingual, with language-neutral architectures and code. The company’s principals are also some of the most prominent bloggers and writers in the semantic Web. We are acknowledged as attentive to documentation and communication.

Finally, Structured Dynamics’ principals have more than a decade of track record in successful data access and mining, and software and venture development.

To this strong basis, we have preliminary city commitments for deploying this project in the United States (English and Spanish) and Canada (French and English).

ThisWeKnow offers local Census data, but no community or publishing aspects. Data sharing is in DataSF and DataMine (NYC), but they lack collaboration, community networks and comparisons, or powerful data visualization or mapping.

Citizen DAN is a turnkey platform for any size community to create, publish, search, browse, slice-and-dice, visualize or compare indicators of community well-being. Its use makes the Web more locally focused. With it, researchers, watchdog groups, reporters, local officials and interested citizens can now discover hard data for ‘new news’ or fact-check mainstream media.

### What tasks/benchmarks need to be accomplished to develop your project and by when will you complete them?

There are two releases with feedback. Each task summary, listing of task hours (hr) and duration in months (mo), in rough sequence order with overlaps, is:

1. Dataset Prep/Staging: identify, load and stage baseline datasets; provide means for aggregating data at different levels; 420 hr; 2.5 mo
2. Refine Data Input Facility: feature to upload other external data, incl direct from local sources; XML, spreadsheet, JSON forms; dataset metadata; 280 hr; 3 mo
3. Add Data Visualization Component: Flex mapping/data visualization (charts, graphs) using any slice-and-dice; 390 hr; 3 mo
4. Make Multi-linguality Changes: English, French, Spanish versions; 220 hr; 2 mo
5. Refine User Interface: update existing interface in faceted browse; filter; search; record create, manage and update; imports; exports; and user access rights; 380 hr; 3 mo
6. Standard Citizen DAN Ontologies: the coherent schema for the data; 140 hr; 3 mo
7. Create Central Portal: distribution and promotion site for project; 120 hr; 2 mo
8. Deploy/Test First Release: release by end of Mo 5 @ 3 test sites; 300 hr; 4 mo
9. Revise Based on Feedback: bug fixing and 4 mo testing/feedback, then revision #2; 420 hr
10. Package/Document: component packaging for easier installs; increased documentation; 310 hr; 2 mo
11. Marketing/Awareness: see next question; 240 hr; 12 mo
12. Project Management: standard PM/interact with test communities, partners; 220 hr; 12 mo.

### What will you have changed by the end of your project?

"Information is the currency of democracy." Thomas Jefferson (n.b.)

We intuitively understand that an informed citizenry is a healthy polity. At the global level and in 250 languages, we see how Wikipedia, matched with the Internet and inexpensive laptops, is bringing unforeseen information and enrichment to all. Across the board, we are seeing the democratization of information.

But very little of this revolution has percolated to the local level.

Only in the past decade or so have we seen free, electronic access to national Census data. We still see local data only published in print or not available at all, limiting both awareness but more importantly understanding and analysis. Data locked up in municipal computers or available but not expressed via crowdsourcing is as good as non-existent.

Though many citizens at the local level are not numeric, intuition has to tell us that the absense of empirical, local data hurts our ability to understand, reason and debate our local circumstances. Are we doing better or worse than yesterday? Than in comparison with our peers? Under what measures does this have meaning about community well being?

The purpose of the Citizen DAN project is to create an appliance — in the same sense of refrigerators keeping our food from spoiling — by which any citizen can crack open and expose relevant data at the local level. Citizen DAN is about enrichening our local information and keeping our communities healthy.

### How will you measure progress and ultimately success?

We will measure the progress of the project by the number of communities and local organizations that use the Citizen DAN platform to create and publish community data. Subsidiary measures include the number of:

• Individual users across all installations
• Contributed datasets
• Contributed applications based on the platform
• Interconnected sites in the network
• Different Citizen DAN networks
• Substantive articles and blog posts on Citizen DAN
• Mentions of ‘Citizen DAN’ (and local naming or variants, which will be tracked) in news articles
• Contributed blog posts on the central Citizen DAN portal
• Google citations and hits on ‘Citizen DAN’ (and prominent variants).

These measures, plus active sites with profiles of each, will be monitored and tracked on the central Citizen DAN portal.

‘Ultimate success’ is related to the general growth in transparent government at the local level. Growth in Citizen DAN-related measures on a year-over-year basis or in relation to Gov2.0 would indicate success.

### Do you see any risk in the development of your project?

There is no technical risk to this proposal, but there are risks in scope, awareness and acceptance. Our system has been operational for one year for relevant use cases; all components have been integrated, debugged, and put into production.

Scope risks relate to how much data the Citizen DAN platform is loaded with, and how much functionality is included. We balance the data question by using common public datasets for baseline data, then add features for localities to “crowdsource” their own supplementary data. We balance the functionality question by limiting new development to data visualization/mapping and to upload functions (per above), and then to refine what already exists.

Awareness risks arise from a crowded attention space. We can overcome this in two ways. The first is to satisfy users at our test sites. That will result in good recommendations to help seed a snowball effect. The second way is to use social media and our existing Web outlets aggressively. We have been building awareness for our own properties in steady, inch-by-inch measures. While a notable few Web efforts may go viral, the process is not predictable. Steady, constant focus is our preferred recipe.

Acceptance risk is intimately linked with awareness and use. If we can satisfy each Citizen DAN community, then new datasets, new functionality and new awareness will naturally arise. More users and more contributions through the network effect are the best way to broad acceptance.

### What is your marketing plan? How will people learn about what you are doing?

Marketing and awareness efforts will include our use of social media, dedicated Web sites, support from test communities, and outreach to relevant community Web sites.

Our own blogs are popular in the semantic Web and structured data space (~3K uniques daily); we have published two posts on Citizen DAN and will continue to do so with more frequency once the effort gets underway.

We will create a central portal (http://citizen-dan.org) based on the project software (akin to our other project sites). The model for this apps and deployments clearinghouse is CrimeReports.com. Using social aspects and crowdsourcing, the site will encourage sharing and best practices amongst the growing number of Citizen DAN communities.

We will blog and post announcements for key releases and milestones on relevant external Web sites including various Gov 2.0 sites, Community Indicators Consortium, GovLoop, Knight News Challenge, the Sunlight Foundation, and so forth. In addition, we will collate and track individual community efforts (maintained on the central Citizen DAN site) and make specific outreach to community data sites (such as DataSF or DataMine at NYC.gov). We will use Twitter (#CitizenDAN, etc) and the social networks of LinkedIn, Facebook, and Meetup to promote Citizen DAN activity.

We will interact with advocates of citizen journalism, and engage civic organizations, media, and government officials (esp in our three test communities) to refine our marketing plan.

### Is this a one-time experiment or do you think it will continue after the grant?

Citizen DAN is not an experiment. It is a working framework that gives any locality and its citizenry the means to assemble, share and compare measures of its community well-being with other communities. These indicators, in turn, provide substance and grist for greater advocacy and writing and blogging (“journalism”) at the local level.

Granted, there are unknowns: How many localities will adopt the Citizen DAN appliance? How essential will its data be to local advocacy and news? How active will each Citizen DAN installation be in attracting contributions and local data?

We submit the better way to frame the question is the degree of adoption, as opposed to will it work.

Web-based changes in our society and social interaction are leading to the democratization of information, access to it, and channels for expression. Whether ultimately successful in the specific form proposed herein, Citizen DAN and its open source software and frameworks will surely be adopted in one form or another — to one degree or another — in the unassailable trend toward local government transparency and citizen involvement.

In short, Yes: We believe Citizen DAN will continue long after the grant.

### If it is to be self-sustainable, what is the plan for making that happen?

Our plan begins with the nature of Citizen DAN as software and framework. Sustainability is a question of whether the appliance itself is useful, and how users choose to leverage it.

Mediawiki, the software behind Wikipedia, is an analog. Mediawiki is an enabling infrastructure. Some sites using it are not successful; others wildly so. Success has required the combination of a good appliance with topicality and good management. The same is true for Citizen DAN.

Our plan thus begins with Citizen DAN as a useful appliance, as free open source with great documentation and prominent initial use cases. Our plan continues with our commitment to the local citizen marketplace.

We are developing Citizen DAN because of current trends. We foresee many hundreds of communities adopting the system. Most will be able to do so on their own. Some others may require modifications or assistance. Our self-interest is to ensure a high level of adoption.

An era of citizen engagement is unfolding at the local level, fueled by Web technologies and growing comfort with crowdsourcing and social networks. Meanwhile, local government constraints and pressures for transparency are unleashing locked-up data. These forces will create new opportunities for data literacy by the public, that will itself bring new understanding and improvements in governance and budgeting. We plan on Citizen DAN and its offspring to be one of the catalysts for those changes.