Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
February 2013
S M T W T F S
« Jan    
 12
3456789
10111213141516
17181920212223
2425262728  
Archives
More . . .  
Credits
Blog software courtesy of WordPress Site Meter View Mike's profile on LinkedIn
6395
Search
Date:   November 8, 2010

Innovative Winnipeg Project Powered by SD TechnologyPeg Project

This past Friday the Peg project was unveiled for the first time to an enthusiastic welcome at the Winnipeg Poverty Reduction Partnership Forum. A beta version of its website (www.mypeg.ca) was also launched. Peg is an innovative Web portal for community indicators of well-being for the city of Winnipeg, Manitoba. First conceived in 2002, with much subsequent refinement, its strong consortium of members from the local community and recent backing have now allowed it to be shared with the public.

Since early this year, Structured Dynamics has been the lead technical contractor on the project. But Peg is about people and involvement, not technology. Peg is an effort of community and perspectives and information and stories, all designed to coalesce how to make Winnipeg a better community moving forward. So, while the technology underlying the site is innovative (yes, we’re proud ;) ), more so is the effort and vision of the community making it happen. Though just a beta release, the current site and the commitment behind it points to some exciting future developments.

Here is the main screen for Peg (clicking on any of the screen captures below will take you directly to the relevant part of the site):

Peg Main Page

A Community Perspective Backed by Dynamic Functionality

Winnipeg’s community indicator system (CIS) is organized around themes, cross-cutting issues that bridge across themes, and indicators and supporting data to track and measure the city’s well-being. Peg’s major themes, agreed upon after extensive community consultation, are: basic needs; health; education & learning; social vitality; governance; built environment; economy; and natural environment. In this first beta release, the emphasis has been on the cross-cutting issue of poverty and some of the indicators to track it.

The perspective being brought to bear on these questions of well-being is comprehensive and embracing. Data and demographics and quantitative indicators of well-being are matched with stories and narratives from affected parties, videos, and a variety of display and visualization options. Much of the supporting data is organized by the 236 neighborhoods in Winnipeg, or broader community areas, with comparative baselines to city, province and nation. The information is both hard and soft, and presented in engaging, exciting and dynamic ways. Using the best of current social media, Peg is meant to be a virtual meeting place and town hall for the public to share and engage one another.

This beta is but a first expression of Peg’s longer-term vision, yet already has the backbone to take on these labors. A concept explorer allows the public to explore and navigate through the entire information space; much information is mapped and presented in locational relevance; narratives and stories and videos are linked contextually to topics and issues; and many, many dashboards can be created and displayed for showing trends and comparing neighborhoods, and letting the data speak visually:

Peg Explorer Peg Map Tab
Peg Stories Tab Peg Dashboard Tab

The current beta is but a start. The Peg project, in continued consultation with stakeholders, will be developing further indicators for each of its eight major themes, providing information about past and current trends, and expanding into additional cross-cutting issues. Daily, the site will see an increase in richness and relevance.

Project Participants

Peg has been spearheaded by the United Way of Winnipeg and the International Institute for Sustainable Development (IISD), also based in Winnipeg, with the partnership of the Province of Manitoba, the City of Winnipeg, Health in Common, and a cross section of community interests and members across the city. Peg is a non-profit effort, and is embarking on a new three-year work plan to oversee further funding and expansion.

Peg is governed by a Steering Committee with budgetary and strategic responsibilities. Peg also works with an Engagement Group — a broad-based group of Winnipeggers — that serves as a testing ground for ideas, direction and policy. The site provides credits for the various entities involved and responsible for the effort.

IISD has provided overall project management for the current effort. As personal thanks, we’d especially like to recognize Connie Walker, Laszlo Pinter, Christa Rust and Charles Thrift. Tactica, also of Winnipeg, has been the lead graphics and site designer for Peg. SD has worked closely with them to ensure a smooth launch, and they’ve done a great job. Thanks to all!

Now, This is Semantics Done Right

Of course, for more on the project, go directly to the Peg site or those of its other major participants and contributors. But, in our role as implementers of the behind-the-scenes wizardry powering the site, we would be remiss if we did not mention a couple of technical items.

As lead technical developer, SD was responsible for all data access, management, development and visualization software for the site. The site was developed in Drupal, with Virtuoso as the RDF data store and Solr for faceted site search. As part of its Open Semantic Framework, based on the Citizen Dan local government appliance, SD contributed and extended major open source software for Peg. These contributions included the structWSF Web services framework, conStruct modules for linking the system into Drupal, and the Flex-based semantic Components including the explorer, map, story viewer, browse/search, dashboard, workbench and back office widgets. We also developed the adaptive ontology driving the entire site, based on the Peg framework vocabulary already hashed out by the community participants.

During the course of the project we developed an entirely new workbench capability for creating new, persistent dashboards. We extended the sRelationBrowser semantic component with complete and flexible theming and styling; virtually all aspects of nodes, edges and behavior have now been exposed for tailoring, including fonts, colors and use of images. We enhanced the irON format to make it easier for project participants to submit spreadsheet datasets to the site for new indicator data. We will be migrating these advances to our existing open source software over the coming weeks. Check Fred Giasson’s blog for release details; he has also begun a series on the technology details.

But, in my opinion, what is most remarkable about all of this is that these bloody details are completely hidden from the user. Though real geeks can get RDF and linked data via export options, for the standard user they simple interact and experience the site. No triples are shoved in their face, no technology screams out for attention, and ne’er any URIs are to be found. The thing simply works, all the while being flexible, contextual, attractive and fun.

And that, folks, I submit, is semantics done right!

Posted by AI3's author, Mike Bergman

Posted on November 8, 2010 at 12:07 am in Adaptive Innovation, Citizen Dan, Semantic Enterprise, Software Development, Structured Dynamics | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/928/peg-community-indicator-system-unveiled/
The URI to trackback this post is: http://www.mkbergman.com/928/peg-community-indicator-system-unveiled/trackback/
Date:   September 13, 2010

A Single-stop Assembly of Ontology Tips and Pointers

As we conclude this recent series on ontology tools and building [1], one item stands clear: the relative lack of guidance on how one actually builds and maintains these beasties. While there is much of a theoretic basis in the literature and on the Web, and much of methodologies and algorithms, there is surprisingly little on how one actually goes about creating an ontology.

An earlier posting pointed to the now classic Ontology Development 101 article as a good starting point [2]. Another really excellent starting point is the Protégé 4 user manual [3]. Though it is obviously geared to the Protégé tool and its interface, it also is an instructive tutorial on general ontology (OWL) topics and constructs. I highly recommend printing it out and reading it in full.

If you do nothing else, you should download, print and study in full the Protégé 4 users manual [3].

Learning by Example

Another way to learn more about ontology construction is to inspect some existing ontologies. Though one may use a variety of specialty search engines and Google to find ontologies [4], there are actually three curated services that are more useful and which I recommend.

The best, by far, is the repository created by the University of Manchester for the now-completed TONES project [5]. TONES has access to some 200+ vetted ontologies, plus a search and filtering facility that helps much in finding specific OWL constructs. It is a bit difficult to filter by OWL 2-compliant only ontologies (except for OWL 2 EL), but other than that, the access and use of the repository is very helpful. Another useful aspect is that the system is driven by the OWL API, a central feature that we recommended in the prior tools landscape posting. From a learning standpoint this site is helpful because you can filter by vocabulary.

An older, but similar, repository is OntoSelect. It is difficult to gauge how current this site is, but it nonetheless provides useful and filtered access to OWL ontologies as well.

These sources provide access to complete ontologies. Another way to learn about ontology construction is from a bottom-up perspective. In this regard, the Ontology Design Patterns (ODP) wiki is the definitive source [6]. This is certainly a more advanced resource, since its premise begins from the standpoint of modeling issues and patterns to address them, but the site is also backed by an active community and curated by leading academics. Besides ontology building patterns, ODP also has a listing of exemplary ontologies (though without the structural search and selection features of the sources above). ODP is not likely the first place to turn to and does not give “big picture” guidance, but it also should be a bookmarked reference once you begin real ontology development.

It is useful to start with fully constructed ontologies to begin to appreciate the scope involved with them. But, of course, how one gets to a full ontology is the real purpose of this post. For that purpose, let’s now turn our attention to general and then more specific best practices.

Sources of Best Practices

As noted above, there is a relative paucity of guidance or best practices regarding ontologies, their construction and their maintenance. However, that being said, there are some sources whereby guidance can be obtained.

To my knowledge, the most empirical listing of best practices comes from Simperl and Tempich [7]. In that 2006 paper they examined 34 ontology building efforts and commented on cost, effectiveness and methodology needs. It provides an organized listing of observed best practices, though much is also oriented to methodology. I think the items are still relevant, though they are now four to five years old. The paper also contains a good reference list.

Various collective ontology efforts also provide listings of principles or such, which also can be a source for general guidance. The OBO (The Open Biological and Biomedical Ontologies) effort, for example, provides a listing of principles to which its constituent ontologies should adhere [8]. As guidance to what it considers an exemplary ontology, the ODP effort also has a useful organized listing of criteria or guidance.

One common guidance is to re-use existing ontologies and vocabularies as much as possible. This is a major emphasis of the OBO effort [9]. The NeOn methodology also suggests guidelines for building individual ontologies by re-use and re-engineering of other domain ontologies or knowledge resources [10]. Brian Sletten (among a slate of emerging projects) has also pointed to the use of the Simple Knowledge Organization System (SKOS) as a staging vocabulary to represent concept schema like thesauri, taxonomies, controlled vocabularies, and subject headers [11].

The Protégé manual [3] is also a source of good tips, especially with regard to naming conventions and the use of the editor. Lastly, the major source for the best practices below comes from Structured Dynamics‘ own internal documentation, now permanently archived. We are pleased to now consolidate this information in one place and to make it public.

The best practices herein are presented as single bullet points. Not all are required and some may be changed depending on your own preferences. In all cases, however, these best practices are offered from Structured Dynamics’ perspective regarding the use and role of adaptive ontologies [12]. To our knowledge, this perspective is a unique combination of objectives and practices, though many of the individual practices are recommended by others.

General Best Practices

General best practices refer to how the ontology is scoped, designed and constructed. Note the governing perspective in this series has been on lightweight, domain ontologies.

Scope and Content

  • Provide balanced coverage of the subject domain. The breadth and depth of the coverage in the ontology should be roughly equivalent across its scope
  • Reuse structure and vocabularies as much as possible. This best practice refers to leveraging non-ontological content such as existing relational database schema, taxonomies, controlled vocabularies, MDM directories, industry specifications, and spreadsheets and informal lists. Practitioners within domains have been looking at the questions of relationships, structure, language and meaning for decades. Effort has already been expended to codify many of these understandings. Good practice therefore leverages these existing structural and vocabulary assets (of any nature), and relies on known design patterns
  • Embed the domain coverage into a proper context. A major strength of ontologies is their potential ability to interoperate with other ontologies. Re-using existing and well-accepted vocabularies and including concepts in the subject ontology that aid such connections is good practice. The ontology should also have sufficient reference structure for guiding the assignment of what content “is about”
  • Define clear predicates (also known as properties, relationships, attributes, edges or slots), including a precise definition. Then, when relating two things to one another, use care in actually assigning these properties. Initially, assignments should start with a logical taxonomic or categorization structure and expand from there into more nuanced predicates
  • Ensure the relationships in the ontology are coherent. The essence of coherence is that it is a state of logical, consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. Is the hip bone connected to the thigh bone, or is the skeleton askew? Testing (see below) is a major aspect for meeting this best practice
  • Map to external ontologies to increase the likelihood of sharing and interoperability. In Structured Dynamics’ case, we also attempt to map at minimum to the UMBEL subject reference structure for this purpose [13]
  • Rely upon a set of core ontologies for external re-use purposes; Structured Dynamics tends to rely on a set of primary and secondary standard ontologies [14]. The corollary to this best practice is don’t link indiscriminantly.

Structure and Design

  • Begin with a lightweight, domain ontology [15]. Ontologies built for the pragmatic purposes of setting context and aiding disparate data to interoperate tend to be lightweight with only a few predicates, such as isAbout, narrowerThan or broaderThan. But, if done properly, these lighter weight ontologies with more limited objectives can be surprisingly powerful in discovering connections and relationships. Moreover, they are a logical and doable intermediate step on the path to more demanding semantic analysis. Because we have this perspective, we also tend to rely heavily on the SKOS vocabulary for many of our ontology constructs [16]
  • Try to structurally split domain concepts from instance records. Concepts represent the nodes within the structure of the ontology (also known as classes, subject concepts or the TBox). Instances represent the data that populates that structure (also known as named entities, individuals or the ABox) [17]. Trying to keep the ABox and TBox separate enables easier maintenance, better understandability of the ontology, and better scalability and incorporation of new data repositories
  • Treat many concepts via “punning” as both classes and instances (that is, as either sets or members, depending on context). The “punning” technique enables “metamodeling,” such as treating something via its IRI as a set of members (such as Mammal being a set of all mammals) or as an instance (such as Endangered Mammal) when it is the object of a different contextual assertion. Use of “metamodeling” is often helpful to describe the overall conceptual structure of a domain. See endnote [18] for more discussion on this topic
  • Build ontologies incrementally. Because good ontologies embrace the open world approach [19], working toward these desired end states can also be incremental. Thus, in the face of common budget or deadline constraints, it is possible initially to scope domains as smaller or to provide less coverage in depth or to use a small set of predicates, all the while still achieving productive use of the ontology. Then, over time, the scope can be expanded incrementally. Much value can be realized by starting small, being simple, and emphasizing the pragmatic. It is OK to make those connections that are doable and defensible today, while delaying until later the full scope of semantic complexities associated with complete data alignment
  • Build modular ontologies that split your domain and problem space into logical clusters. Good ontology design, especially for larger projects, warrants a degree of modularity. An architecture of multiple ontologies often works together to isolate different work tasks so as to aid better ontology management. Also, try to use a core set of primitives to build up more complex parts. This is a kind of reuse within the same ontology, as opposed to reusing external ontologies and patterns. The corollary to this is: the same concepts are not created independently multiple times in different places in the ontology. Adhering to both of these practices tends to make ontology development akin to object-oriented programming
  • Assign domains and ranges to your properties. Domains apply to the subject (the left hand side of a triple); ranges to the object (the right hand side of the triple). Domains and ranges should not be understood as actual constraints, but as axioms to be used by reasoners. In general, domain for a property is the range for its inverse and the range for a property is the domain of its inverse. Use of domains and ranges will assist testing (see below) and help ensure the coherency of your ontology
  • Assign property restrictions, but do so sparingly and judiciously [20]. Use of property restrictions will assist testing (see below) and help ensure the coherency of your ontology
  • Use disjoint classes to separate classes from one another where the logic makes sense and dictates (if not explicitly stated, they are assumed to overlap)
  • Write the ontology in a machine-processable language such as OWL or RDF Schema (among others), and
  • Aggressively use annotation properties (see next) to promote the usefulness and human readability of the ontology.

Naming and Vocabulary Best Practices

  • Name all concepts as single nouns. Use CamelCase notation for these classes (that is, class names should start with a capital letter and not contain any spaces, such as MyNewConcept)
  • Name all properties as verb senses (so that triples may be actually read); e.g., hasProperty. Try to use mixedCase notation for naming these predicates (that is, begin with lower case but still capitalize thereafter and don’t use spaces)
  • Try to use common and descriptive prefixes and suffixes for related properties or classes (while they are just labels and their names have no inherent semantic meaning, it is still a useful way for humans to cluster and understand your vocabularies). For examples, properties about languages or tools might contain suffixes such as ‘Language‘ or ‘Tool‘ for all related properties
  • Provide inverse properties where it makes sense, and adjust the verb senses in the predicates to accommodate. For example, <Father> <hasChild> <Janie> would be expressed inversely as <Janie> <isChildOf> <Father>
  • Give all concepts and properties a definition. The matching and alignment of things is done on the basis of concepts (not simply labels) which means each concept must be defined [21]. Providing clear definitions (along with the coherency of its structure) gives an ontology its semantics. Remember not to confuse the label for a concept with its meaning. (This approach also aids multi-linguality). In its own ontologies, Structured Dynamics uses the property of skos:definition, though others such as rdfs:comment or dc:description are also commonly used
  • Provide a preferred label annotation property that is used for human readable purposes and in user interfaces. For this purpose, Structured Dynamics uses the property of skos:prefLabel
  • Include a class “SemSet”, which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept. The umbel:SemSet construct enables a listing of individual members to be generated that provides the matching set for tagging and information extraction tasks. (As such, also include the prefLabel in the SemSet for proper lookup and tagging purposes.) The SemSet construct is similar to the “synsets” in Wordnet, but with a broader use understanding. This construct is an integral part of Structured Dynamics’ approach to using ontologies for information extraction and tagging of unstructured text
  • Try to assign logical and short names to namespaces used for your vocabularies, such as foaf:XXX, umbel:XXX or skos:XXX, with a maximimum of five letters preferred
  • Enable multi-lingual capabilities in all definitions and labels. This is a rather complicated best practice in its own right. For the time being, it means being attentive to the xml:lang=”en” (for English, in this case) property for all annotation properties
  • (If you disagree with these naming conventions, use your own, but in any event, be consistent!!).

Documentation Best Practices

  • Like good software programs, a properly constructed and commented ontology is the first requirement of best practice documentation
  • The entire ontology vocabulary should be documented via a dedicated system that allows finding, selecting and editing of any and all ontology terms and their properties
  • The methodologies should be documented for ontology construction and maintenance, including naming, selection, completeness and other criteria. Documents such as this one and others in this series provide examples of important supplementary documentation regarding methodology and practice
  • Provide a complete TechWiki-like documentation system for use cases, best practices, evaluation and testing metrics, tools installation and use, and all aspects of the ontology lifecycle should be provided and supported [22]
  • Develop a complete graph of the ontology and make it available via graph visualization tools to aid understanding of the ontology in its complete aspect [23], and
  • Other ample diagrams and flowcharts should also be prepared and made available for knowledge workers’ use. UML diagrams, for example, might be included here, but general workflows and concept relationships should be explicated in any case through visual means. Such diagrams are much easier to understand and follow than the actual ontology specification.

Organizational and Collaborative Best Practices

  • Collaboration is an implementation best practice [24]
  • Re-use of already agreed-up structures and vocabularies respects prior investments and needs to be emphasized
  • Improved processes for consensus making, including tools support, must be found to enable work groups to identify and decide upon terminology, definitions, alternative labels (SemSets), and relations between concepts. These processes need not be at the formal ontology level, but at the level of the concept graph underlying the ontology [24].

Testing Best Practices

  • Test new concepts, aided by proper domain, range and property restrictions; by invoking reasoners such that inconsistencies can be determined [25]
  • Test new properties, aided by invoking reasoners, which will identify inconsistencies [25]
  • Test via external class assignments, by linking to classes in external ontologies, which acts to ‘explode the domain’ [26]
  • Use external knowledge bases and ontologies, such as Cyc or UMBEL [27], to conduct coherency testing for the basic structure and relationships in the ontology
  • Evolve the ontology specification to include necessary and sufficient conditions [25] aid more complete reasoner testing for consistency and coherence.

Best Practices for Adaptive Ontologies

In the case of ontology-driven applications using adaptive ontologies [28], there are also additional instructions contained in the system (via administrative ontologies) that tell the system which types of widgets need to be invoked for different data types and attributes. This is different from the standard conceptual schema, but is nonetheless essential to how such applications are designed.

  • Use the structWSF middleware layer [29] as the abstract access point to:
    • To create, update, delete or otherwise manage data records
    • To browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
    • To take one of these results sets and progress it through various workflows involving specialized analysis, applications, or visualization.
  • Supplement the domain ontology with a semantic component ontology for the purposes of guiding data widget display and visualization [30], and
  • Supplement the domain ontology with the irON (instance record Object Notation) for dataset exchange and interoperability [31].

The administrative ontologies supporting these applications are managed differently than the standard domain ontologies that are the focus of most of the best practices above. Nonetheless, some of the domain ontology best practices work in tandem with them, the combination of which are called adaptive ontologies.


[1] This posting is part of a current series on ontology development and tools, now permanently archived and updated on the OpenStructs TechWiki. The series began with An Executive Intro to Ontologies, then continued with an update of the prior Ontology Tools listing, which now contains 185 tools. It progressed to a survey of ontology development methodologies. That led to a presentation of a new, Lightweight, Domain Ontologies Development Methodology. That piece was then expanded to address A New Landscape in Ontology Development Tools. This portion completes the series.
[2] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.
[3] Matthew Horridge et al., 2009. A Practical Guide to Building OWL Ontologies Using Protégé 4 and CO-ODE Tools, manual prepared by the University of Manchester, March 13, 2009. 108 pp. See http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/resources/ProtegeOWLTutorialP4_v1_2.pdf.
[4] Specialty search engines for ontologies include Swoogle, FalconS, Watson, Sindice and SWSE. In addition, one can use a general search engine such as Google with a search query such as <topic> owl:equivalentClass filetype:owl. Note the filetype might also include RDF or a variant such as N3, and other language-specific constructs of interest can also be substituted for the owl:equivalentClass.
[5] The TONES Ontology Repository is primarily designed to be a central location for ontologies that might be of use to tools developers for testing purposes. It has a nice browse facility, as well as filtering by OWL vocabulary. The system contains about 220 ontologies and is powered by the OWL API.
[6] OntologyDesignPatterns.org is a semantic Web portal dedicated to ontology design patterns (ODPs). The portal was started under the NeOn project, which still partly supports its development.
[7] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. “Ontology Engineering: A Reality Check,” in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE2006, 2006. See http://ontocom.ag-nbi.de/docs/odbase2006.pdf .
[8] See http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles.
[9] Barry Smith et al., 2007. “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration,” in Nature Biotechnology 25: 1251 – 1255, published online 7 November 2007; see http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf.
[10] See the NeOn networked ontologies project; see http://www.neon-project.org/. The four-year project began in 2006 and its first open source toolkit was released by the end of 2007. OWL features were added in 2008-09. NeON has since completed, though its toolkit and plug-ins can still be downloaded as open source.
[11] Brian Sletten, 2008. “Applying SKOS Concept Schemes,” on the DevX Web site, July 22, 2008; see http://www.devx.com/semantic/Article/38629.
[12] M. K. Bergman, 2009. “Confronting Misconceptions with Adaptive Ontologies,” AI3:::Adaptive Information blog, Aug. 17, 2009.
[13] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.
[14] Core ontologies are Dublin Core, DC Terms, Event, FOAF, GeoNames, SKOS, Timeline, and UMBEL. The various criteria that are considered in nominating an existing ontology to “core” status is that it should be general; highly used; universal; broad committee or community support; well done and documented; and easily understood. Though less universal, there are also a number of secondary ontologies, namely BIBO, DOAP, and SIOC.
[15] See Fausto Giunchiglia, Maurizio Marchese and Ilya Zaihrayeu, 2006. “Encoding Classifications into Lightweight Ontologies,” see http://www.science.unitn.it/~marchese/pdf/encoding%20classifications%20into%20lightweight%20ontologies_JoDS8.pdf. Also, M. K. Bergman, 2010. “A New Methodology for Buidling Lightweight, Domain Ontologies,” AI3:::Adaptive Information blog, Sept. 1, 2010.
[16] Alistair Miles and Sean Bechhofer, eds., 2009. SKOS Simple Knowledge Organization System Reference, W3C Recommendation, 18 August 2009. See http://www.w3.org/TR/skos-reference/. Some of the common SKOS predicates used in our ontologies include skos:definition, skos:prefLabel, skos:altLabel, skos:broaderTransitive, skos:narrowerTransitive.
[17] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. These distinctions have their grounding in description logics.
[18] In the domain ontologies that are the focus here, we often want to treat our concepts as both classes and instances of a class.  This is known as “metamodeling” or “metaclassing” and is enabled by “punning” in OWL 2.  For example, here a case cited on the OWL 2 wiki entry on “punning“:
People sometimes want to have metaclasses. Imagine you want to model information about the animal kingdom. Hence, you introduce a class a:Eagle, and then you introduce instances of a:Eagle such as a:Harry.

(1) a:Eagle rdf:type owl:Class
(2) a:Harry rdf:type a:Eagle

Assume now that you want to say that “eagles are an endangered species”. You could do this by treating a:Eagle as an instance of a metaconcept a:Species, and then stating additionally that a:Eagle is an instance of a:EndangeredSpecies. Hence, you would like to say this:

(3) a:Eagle rdf:type a:Species
(4) a:Eagle rdf:type a:EndangeredSpecies.

This example comes from Boris Motik, 2005. “On the Properties of Metamodeling in OWL,” paper presented at ISWC 2005, Galway, Ireland, 2005.

“Punning” was introduced in OWL 2 and enables the same IRI to be used as a name for both a class and an individual. However, the direct model-theoretic semantics of OWL 2 DL accommodates this by understanding the class Father and the individual Father as two different views on the same IRI, i.e., they are interpreted semantically as if they were distinct. The technique listed in the main body triggers this treatment in an OWL 2-compliant editor. See further Pascal Hitzler et al., eds., 2009. OWL 2 Web Ontology Language Primer, a W3C Recommendation, 27 October 2009; see http://www.w3.org/TR/owl2-primer/.

[19] There is a role and place for closed world assumption (CWA) ontologies, though Structured Dynamics does not engage in them.

CWA is the traditional perspective of relational database systems within enterprises. The premise of CWA is that which is not known to be true is presumed to be false; or, any statement not known to be true is false. Another way of saying this is that everything is prohibited until it is permitted. CWA works well in bounded systems such as known product listings or known customer rosters, and is one reason why it is favored for transaction-oriented systems where completeness and performance are essential. In an ontology sense, CWA works best for bounded engineering environments such as aeronautics or petroleum engineering. Closed world ontologies also tend to be much more complicated with many nuanced predicates, and can be quite expensive to build.

The open world assumption (OWA), on the other hand, is premised that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity, and everything is permitted until it is prohibited. As a result, open world works better in knowledge environments with the incorporation of external information such as business intelligence, data warehousing, data integration and federation, and knowledge management.

See further, M. K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, Dec. 21, 2009.

[20] See [3] for a good description of property restrictions in Section 4 and Appendix A.
[21] As another commentary on the importance of definitions, see http://ontologyblog.blogspot.com/2010/09/physician-decries-lack-of-definitions.html.
[22] The technical wiki (TechWiki) is the central repository for all documentation related to OpenStructs projects. TechWiki is the location for users and interested parties to learn about these projects and their applications, and for developers to author and write about their use and best practices. Both the TechWiki’s content and its software and organizatonal structure may be downloaded for free for setting up similar local technical documentation.
[23] See M. K. Bergman, 2008. “Large-scale RDF Graph Visualization Tools,” AI3:::Adaptive Information blog, Jan. 28, 2008; and M. K. Bergman, 2008. “Cytoscape: Hands-down Winner for Large-scale Graph Visualization,” AI3:::Adaptive Information blog, Jan. 28, 2008.
[24] The central role of ontologies is to describe a “worldview” and in specific organizations this means a shared understanding of the concepts, relations and terminology to describe the participants’ common domain. In turn, these shared understandings establish the semantics for how to effect communication and understanding within the population of domain users. All of this means that finding ways to identify and agree upon shared vocabularies and understandings is central to the task of modeling (creating an ontology) for the domain.

Sometimes this perception of shared views is too strictly interpreted as needing to have one and only one understanding of concepts and language. Far from it. One of the strengths of ontologies and language modeling within them is that multiple terms for the same concept or slight differences in understandings about nearly similar concepts can be accommodated. It is perfectly OK to have differences in terminology and concept understandings so long as those differences are also captured and explicated within the ontology. The recommendations herein that all concepts and terminology be defined, that SemSets be used to capture alternative ways to name concepts, and that concepts often be treated as both classes and instances are some of the best practices that reflect this approach.

So, while consensus building and collaboration methods are at the heart of effective ontology building, those methods need not also strive for a imposition of language and concepts by fiat. In fact, trying to do so undercuts the ability of the collaborative process to lead to greater shared understandings.

[25] See [3] for a good description of various testing and consistency checks in Sections 4.9 to 4.14.
[26] See Frédérick Giasson, 2008. “Exploding the Domain,” from his blog, April 20, 2008. ‘Exploding the domain’ means what happens when internal ontology concepts are linked to related ones on the external Web, which helps to bring in more information and context about the concept. It is also a way to test the coherence of the original concept.
[27] Already vetted knowledge bases can be a good reference testbed for testing the coherence of concepts in a new domain ontology. If the domain ontology describes concepts quite differently than standard practice (Wikipedia, Cyc and UMBEL are good for testing this), or if relationships between concepts are greatly at variance (Cyc and UMBEL are good for this), then there are likely coherency problems. In other domains other reference knowledge bases, more specific to the domain, can be used in similar ways.
[28] Structured Dynamics’ ontology-driven apps are generic applications, the operations of which are guided by the instructions and nature of the underlying data that feeds them. For example, in the case of a standard structured data display (say, a simple table like a Wikipedia infobox), such generic design includes templates tailored to various instance types (say, locational information presenting on a map versus people information warranting a image and vital statistics). Alternatively, in the generic design for a data visualization application using Adobe Flash, the information output of the results set contains certain formats and attributes, keyed by an administrative ontology linked by data type to a domain ontology’s results sets.

These ontology-driven apps, then, are informed structured results sets that are output in a form suitable to various intended applications. This output form can include a variety of serializations, formats or metadata. This flexibility of output is tailored to and responsive to particular generic applications; it is what makes our ontologies “adaptive”. Using this structure, it is possible to either “drive” queries and results sets selections via direct HTTP request or via simple dropdown selections on HTML forms. Similarly, it is possible with a single parameter change to drive either a visualization app or a structured table template from the equivalent query request. Ontology-driven apps through this ontology and architecture design thus provide two profound benefits. First, the entire system can be driven via simple selections or interactions without the need for any programming or technical expertise. And, second, simple additions of new and minor output converters can work to power entirely new applications available to the system.

[29] The structWSF Web services framework is generally RESTful middleware that provides a bridge between existing content and structure and content management systems and available indexing engines and RDF data stores. structWSF is a platform-independent means for distributed collaboration via an innovative dataset access paradigm. It has about twenty embedded Web services. See http://openstructs.org/structwsf.
[30] A semantic component is a Flex component that takes record descriptions and schema as input, and then outputs some (possibly interactive) visualizations of that record. Depending on the logic described in the input schema and the input record descriptions, the semantic component will behave differently to optimize its presentation to the users. About a dozen semantic components are available from the Semantic Component (Flex) Library. The Semantic Component Ontology is the governing structure for these schema.
[31] irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples and code parsers and converters are also provided for the irXML, irJSON and commON serializations.

Posted by AI3's author, Mike Bergman

Posted on September 13, 2010 at 1:18 am in irON, Ontologies, Ontology Best Practices, Semantic Enterprise, Semantic Web, UMBEL | Comments (3)
The URI link reference to this post is: http://www.mkbergman.com/911/a-reference-guide-to-ontology-best-practices/
The URI to trackback this post is: http://www.mkbergman.com/911/a-reference-guide-to-ontology-best-practices/trackback/
Date:   August 9, 2010

Ontologies are the structural frameworks for organizing information on the semantic Web and within semantic enterprises. They provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness; that is, their ability to represent conceptual relationships. Ontologies can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And ontologies may be developed and matured incrementally, which means their adoption may be cost-effective as benefits become evident [1].

What Is an Ontology?

Ontology may be one of the more daunting terms for those exposed for the first time to semantic technologies. Not only is the word long and without common antecedents, but it is also a term that has widely divergent use and understanding within the community. It can be argued that this not-so-little word is one of the barriers to mainstream understanding of the semantic Web.

The root of the term is the Greek ontos, or being or the nature of things. Literally — and in classical philosophy — ontology was used in relation to the study of the nature of being or the world, the nature of existence. Tom Gruber, among others, made the term popular in relation to computer science and artificial intelligence about 15 years ago when he defined ontology as a “formal specification of a conceptualization.”

Much like taxonomies or relational database schema, ontologies work to organize information. No matter what the domain or scope, an ontology is a description of a world view. That view might be limited and miniscule, or it might be global and expansive. However, unlike those alternative hierarchical views of concepts such as taxonomies, ontologies often have a linked or networked “graph” structure. Multiple things can be related to other things, all in a potentially multi-way series of relationships.

Example Taxonomy Structure Example Ontology Structure
A distinguishing characteristic of ontologies compared to conventional hierarchical structures is their degree
of connectedness, their ability to model coherent, linked relationships

Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Ontologies thus provide a similar role for the organization of data that is provided by relational data schema. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data.

When one uses the idea of “world view” as synonomous with an ontology, it is not meant to be cosmic, but simply a way to convey how a given domain or problem area can be described. One group might choose to describe and organize, say, automobiles, by color; another might choose body styles such as pick-ups or sedans; or still another might use brands such as Honda and Ford. None of these views is inherently “right” (indeed multiples might be combined in a given ontology), but each represents a particular way — a “world view” — of looking at the domain.

Though there is much latitude in how a given domain might be described, there are both good ontology practices and bad ones. We offer some views as to what constitutes good ontology design and practice in the concluding section.

What Are Its Benefits?

A good ontology offers a composite suite of benefits not available to taxonomies, relational database schema, or other standard ways to structure information. Among these benefits are:

  • Coherent navigation by enabling the movement from concept to concept in the ontology structure
  • Flexible entry points because any specific perspective in the ontology can be traced and related to all of its associated concepts; there is no set structure or manner for interacting with the ontology
  • Connections that highlight related information and aid and prompt discovery without requiring prior knowledge of the domain or its terminology
  • Ability to represent any form of information, including unstructured (say, documents or text), semi-structured (say, XML or Web pages) and structured (say, conventional databases) data
  • Inferencing, whereby by specifying one concept (say, mammals) one knows that we are also referring to a related concept (say, that mammals are a kind of animal)
  • Concept matching, which means that even though we may describe things somewhat differently, we can still match to the same idea (such as glad or happy both referring to the concept of a pleasant state of mind)
  • Thus, this means that we can also integrate external content by proper matching and mapping of these concepts
  • A framework for disambiguation by nature of the matching and analysis of concepts and instances in the ontology graph, and
  • Reasoning, which is the ability to use the coherence and structure itself to inform questions of relatedness or to answer questions.

How Are Ontologies Used?

The relationship structure underlying an ontology provides an excellent vehicle for discovery and linkages. “Swimming through” this relationship graph is the basis of the Concept Explorer (also known as the Relation Browser) and similar widgets.

The most prevalent use of ontologies at present is in semantic search. Semantic search has benefits over conventional search in terms of being able to make inferences and matches not available to standard keyword retrieval.

The relationship structure also is a powerful and more general and more nuanced way to organize information. Concepts can relate to other concepts through a richness of vocabulary. Such predicates might capture subsumption, precedence, parts of relationships (mereology), preferences, or importances along virtually any metric. This richness of expression and relationships can also be built incrementally over time, allowing ontologies to grow and develop in sophistication and use as desired.

The pinnacle application for ontologies, therefore, is as coherent reference structures whose purpose is to help map and integrate other structures and information. Given the huge heterogeneity of information both within and without organizations, the use of ontologies as integration frameworks will likely emerge as their most valuable use.

What Makes for a Good Ontology?

Good ontology practice has aspects both in terms of scope and in terms of construction.

Scope Considerations

Here are some scoping and design questions that we believe should be answered in the positive in order for an ontology to meet good practice standards:

  • Does the ontology provide balanced coverage of the subject domain? This question gets at the issue of properly scoping and bounding the subject coverage of the ontology. It also means that the breadth and depth of the coverage is roughly equivalent across its scope
  • Does the ontology embed its domain coverage into a proper context? A major strength of ontologies is their potential ability to interoperate with other ontologies. Re-using existing and well-accepted vocabularies and including concepts in the subject ontology that aid such connections is good practice. The ontology should also have sufficient reference structure for guiding the assignment of what content “is about”
  • Are the relationships in the ontology coherent? The essence of coherence is that it is a state of logical, consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. Is the hip bone connected to the thigh bone, or is the skeleton incorrect?
  • Has the ontology been well constructed according to good practice? See next.

If these questions can be answered affirmatively, then we would deem the ontology ready for production-grade use.

Fundamental to the whole concept of coherence is the fact that experts and practitioners within domains have been looking at the questions of relationships, structure, language and meaning for decades. Though perhaps today we now finally have a broad useful data and logic model in RDF, the fact remains that massive time and effort has already been expended to codify some of these understandings in various ways and at various levels of completeness and scope. Good practice also means, therefore, that maximum leverage is made to springboard ontologies from existing structural and vocabulary assets.

And, because good ontologies also embrace the open world approach, working toward these desired end states can also be incremental. Thus, in the face of common budget or deadline constraints, it is possible initially to scope domains as smaller or to provide less coverage in depth or to use a small set of predicates, all the while still achieving productive use of the ontology. Then, over time, the scope can be expanded incrementally.

Construction Considerations

To achieve their purposes, ontologies must be both human-readable and machine-processable. Also, because they represent conceptual structures, they must be built with a certain composition.

Good ontologies therefore are constructed such that they have:

  • Concept definitions – the matching and alignment of things is done on the basis of concepts (not simply labels) which means each concept must be defined
  • A preferred label that is used for human readable purposes and in user interfaces
  • A “semset” – which means a series of alternate labels and terms to describe the concept. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept
  • Clearly defined relationships (also known as properties, attributes, or predicates) for relating two things to one another
  • All of which is written in a machine-processable language such as OWL or RDF Schema (among others).

In the case of ontology-driven applications using adaptive ontologies, there are also additional instructions contained in the system (often via administrative ontologies) that tell the system which types of widgets need to be invoked for different data types and attributes. This is different than the standard conceptual schema, but is nonetheless essential to how such applications are designed.


[1] This posting was at the request of a couple of Structured Dynamics‘ customers that desired a way to describe ontologies to non-technical management. For a more in depth treatment, see M.K. Bergman, 2007. “An Intrepid Guide to Ontologies,” AI3:::Adaptive Information blog, May 16, 2007.

Posted by AI3's author, Mike Bergman

Posted on August 9, 2010 at 12:53 am in Ontologies, Ontology Best Practices, Semantic Enterprise, Semantic Web | Comments (7)
The URI link reference to this post is: http://www.mkbergman.com/900/an-executive-intro-to-ontologies/
The URI to trackback this post is: http://www.mkbergman.com/900/an-executive-intro-to-ontologies/trackback/
Date:   July 15, 2010

Cisco Video is a Good Starting Intro for Management

Like the seminal linked data publication by PricewaterhouseCoopers of about a year ago (see PWC Dedicates Quarterly Technology Forecast to Linked Data, May 29, 2009), a video released by Cisco yesterday is another signal of the emergence of the semantic enterprise.

The Cisco tech brief on The Semantic Enterprise is a quite accessible — but a bit eerie — seven-minute introduction.  The video was prepared by Cisco’s Internet Business Solutions Group (IBSG), with Shaun Kirby, its Director of Innovations Architectures, as the narrator:

YouTube: http://www.youtube.com/watch?v=3lUzs2I8BKI

Well, as for being eerie, when the video first came up, I thought I was looking at an advanced, next generation avatar, perhaps a reincarnation of Douglas Adams’ Hyperland. Maybe this semantic stuff was closer at hand than we thought!

But, as it turned out, that first blush was only a reaction to how the video was shot. As it gets rolling, the Cisco video is extremely well done and informative. It is a great intro for sharing with management when contemplating your own moves to becoming a semantic enterprise.

I suggest you first view — and then bookmark — this one.

Posted by AI3's author, Mike Bergman

Posted on July 15, 2010 at 3:06 pm in Information Automation, Semantic Enterprise | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/897/another-milestone-in-semantic-enterprise-awareness/
The URI to trackback this post is: http://www.mkbergman.com/897/another-milestone-in-semantic-enterprise-awareness/trackback/
Date:   July 12, 2010

Benefits from an Incremental Approach

Using Incremental, Low-risk Semantic and Open World Approaches

OK. So, you’re looking at your garage … or your bedroom closet … or your office and its files. They are a mess, and you can’t find anything and you can’t stuff anything more into the nooks, cubbies, crannies or cabinets. What do you do?

Well, when you finally get fed up and have a rainy day or some other excuse, you tackle the mess. Maybe you grab a big mug of coffee to prepare for the pending battle. Maybe you strip down to comfort clothes. Then, if you’re like me, you begin to organize stuff into piles. Labeled piles and throwaway piles and any other piles that can provide a means to start bringing order to the chaos.

In the semantic Web world, there is a phrase coined by Jim Hendler that captures this approach: A little semantics goes a long way [1]. A little semantics, just like your labeled piles, helps to bring order to information chaos.

Mind you, this is not fancy or expensive stuff. In the case of my office, it is colored sheets of paper labeled with Magic Markers as “Taxes” or “Internal” or “Blog Posts” or whatever. Then, I begin sifting and distributing. In the case of the semantic world, these are classifying things into like categories and simply relating them to other categories with simple relationships, such as “is Part Of” or “is Narrower Than”.

Of course, I could have approached my mess in a different way. I could have hired an efficiency expert to come in, interview me and all of my employees and colleagues, gotten a written analysis and report, and then committed to a multi-week project to completely store and place every single last piece of paper in my office or organize every rake and set of abandoned golf clubs in my garage. When done, I would have shelled out much money and I suspect still not have been able to find anything.

Sort of sounds like the traditional way IT does its business, doesn’t it? To clean up their information messes, enterprises need to find a better strategy.

I’m not too long from having returned from the SemTech conference, which overall was quite an excellent show. But despite its emphasis on semantic technologies and their usefulness to businesses and enterprises, I found one critical theme unspoken: the ability of semantic approaches to change how enterprise IT actually does business. New ways have got to be found to clean up the many and growing information piles emerging all around us.

The Changing Nature of IT

IT is — and has been — going through a fundamental set of changes for decades. In the last decade, these changes have led to lowered relative spending, a shift in spending priorities toward services, less innovation, and less productivity. Some data and observations by researchers and analysts document these trends.

The following chart, using US Bureau of Economic Analysis data [2], shows the clear 50-year trend in declining hardware costs for enterprises, mostly resulting from the observation known as Moore’s Law. These massive hardware cost reductions (logarithmic scale) have also resulted in lower prices for IT as a whole. In 2008, for example, total relative IT prices were about two-thirds what they were a mere decade earlier:

US IT Prices in Relation to Each Other, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

In contrast, relative prices for software and services have remained remarkably flat over this entire period, including for the past decade. This is somewhat surprising given the emergence of packaged software and more recently open source. However, relative percentage expenditures for custom software and software developed in-house have also remained strong over the past decade [3].

The mid- to late-1990s represented the high-water mark on many bases for enterprise IT, expenditures and vendors. Roughly in 1997 or so, the number of public enterprise software vendors peaked as did venture funding [4] and relative expenditures for IT in relation to GDP. There was a major uptick in relation to preparing for Y2K and a major downtick due to the dot-com bubble, and then of course the past two years or so have seen a global economic downturn. But, as the figure below shows (red), the long-term trend tends to suggest a relative plateau for IT expenditures in relation to GDP somewhat around 2000:

IT and Software Expenditures in Relation to GDP, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

Yet, like the first chart, software seems to be bucking this trend (blue lines above). Though perhaps the rate of growth in expenditures for software is slowing a bit, it is still on a growth upslope, especially in relation to overall IT expenditures. The next chart, in fact, specifically compares software expenditures to total IT expenditures. Software expenditures are some 40% higher in relation to total IT than they were a mere decade ago:

US Software Expenditures in Relation to Total IT, 1960 - 2008

Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)

The mix of these software expenditures is also changing in major ways while stagnating in others.

The changing aspect is coming about from the shift of expenditures from license and maintenance fees to services. A number of software vendors began to see revenues from services overcome that from licensing in the 1990s. By the early 2000s, this was true for the enterprise software sector as a whole [4]. Today, service revenues account for 70% or so of aggregate sector revenues. Combined with the emergence of open source and other alternatives such as software as a service (SaaS), I think it fair to say that the era of proprietary software with exceedingly high margins from monopoly rents is over [5].

The stagnating aspect occurs in how the software expenditures are applied. According to Gartner, in the US, more than 70% of IT expenditures are devoted to simply running existing systems, with only about 11% of budgets devoted to innovation; other parts of the world spend nearly double on innovation and much lower for operations [6]. This relative lack of support for innovation and high percentages for running existing systems has held true for about a decade. Meanwhile, IT’s contribution to US productivity has been declining since 2001 [7].

What is the Cause for IT’s Ills?

Last year, PricewaterhouseCoopers published a major report with the provocative title, “Why Isn’t IT Spending Creating More Value?[7]. The 42-page report covered many of the aspects above. Among other factors, the PWC authors speculated that:

As consumption of IT increases and as technologies change and advance, businesses have been left to cobble together disparate software and hardware systems and tools. The end result? Unchecked IT spending, unneeded complexity, redundant systems, underutilized hardware and data centers, the need for expensive IT security, and, inevitably, diminishing returns from IT. In short, low levels of IT productivity create conditions for an IT cost crisis. [7]

I suppose one could add to this litany other factors such as the growth and emergence of the Internet, sector consolidations through mergers and acquisitions, the rise of open source and alternatives such as SaaS, etc.

But which of these are causes? Which are symptoms? And which might only be consequences or coincident?

To be sure, all recognize the explosion of digital data and information, with sources and formats springing up faster than Whack-a-Mole. It is such an evident and ubiquitous phenomenon that pointing to it as a cause appears on the face of it quite obvious. Also obvious is that these new sources carry with them a diversity of systems and tools. While not categorically stated as such, it appears that PWC fingers the difficulties of “cobbling” these systems together as the root cause for low productivity and thus the IT cost crisis.

I agree totally that these are symptoms of what we see in IT’s current circumstance. I would even say these factors are a proximate cause to these ills. But I disagree they are the root cause. To discover that root, I believe, we must look deeper to mindset and assumptions.

Closed World Mindset as the Root Cause

There are some phenomena that are so obvious that they are easily missed. Not seeing your fingertip six inches between your eyes is one of these. We aren’t used to focusing on things so near at hand.

So, let’s look for a moment at the closed world assumption (CWA), a key underpinning to most standard relational data systems and enterprise schema and logics. CWA is the logic assumption that what is not currently known to be true, is false. If CWA is not directly familiar to you that is understandable; it is an implied assumption of these systems and logics. As such, it is not often inspected directly and therefore not often questioned [8].

With regard to standard IT systems, the closed world assumption has two important aspects:

  1. The assumption is that the information domain at hand is complete [9], and
  2. The related negation as failure, which assumes every predicate to be false that cannot be proved to be true.

On the face of them, these assumptions seem tame enough. And, indeed, there are some enterprise data systems that absolutely rely on them for efficient processing and completion times, such as most transaction systems. CWA is absolutely the appropriate design for such applications.

However, for knowledge management or representation applications — that is, applications which involve combining or using heterogeneous data or information from multiple data sources, which are exactly the same sources requiring information “cobbling” noted above by PWC — there are two very critical implications of the closed-world assumption (CWA):

  1. Efforts or projects can not be undertaken incrementally; if done in pieces, each piece must be complete and consistent, which is expensive to scope and do
  2. To be consistent and explicit, the predicates (properties or relationships) must also be complex to model the “reality” of the system, which is also expensive to scope and do [10].

The net effect, which I have argued before, most notably in a major piece about the open world assumption [11], is that typical projects with a knowledge management aspect have become costly, take very long to complete, often fail, and require much planning and coordination. These facts have been true for three decades as enterprises have attempted to extract knowledge from their electronic information using closed world approaches based on relational systems. And, as recognized by PWC, these problems are only getting worse with growth in diversity and scope of systems.

The implications of closed world v. open world approaches are absolutely at the root of the causes leading to declining productivity, low innovation, significant failures and increasing costs — all exacerbated with more data and more systems — now characterizing traditional enterprise IT. Moreover, it is not a problem for open world systems to link to and incorporate closed world approaches. With open world, there is no need for Hobson’s choices. Unfortunately, such is not true when one begins with a closed world premise.

Incremental is Good: Pay as You Go

As best as I can tell, Alon Halevy was the first to use the phrase “pay as you go” in 2006 to describe the incremental aspect of the open world approach in relation to the semantic Web [12]. The “pay as you go” phrase had been applied earlier to data management and storage and had also been used to describe phone calling plans.

Incremental concepts and “agility” have been popular topics for the past five to ten years in IT, most often related to software development. And, while “incremental” sounds good in relation to enterprise projects, especially of a knowledge management or information integration/federation nature, the actual methodologies put forward were anything but incremental in their conceptual underpinnings.

Unfortunately, the “pay as you go” phrase has (and still is) largely confined to incremental, open world approaches involving the semantic Web. How this approach might apply and benefit enterprises has yet to be articulated. Nonetheless, I like the phrase, and I think it evokes the right mindset. In fact, I think with linked data and many other aspects of the current semantic Web we are seeing such approaches come to fruition. Inch-by-inch, brick-by-brick, data on the Web is getting exposed and interlinked. “Pay as you go” is incremental, and that is good.

Purposeful is Better: Pay as You Benefit

Yet the idea of “pay as you benefit” is more purposeful, able to be planned and implemented, and founded on standard enterprise cost-benefit principles. I think it is a better (and more nuanced) expression of the “pay as you go” mindset in an enterprise setting. What it means is you can start small and be incomplete. You can target any domain or department or scope that is most useful and illustrative for your organization. You can deploy your first stand-ups as proofs-of-concept or sandboxes. And, you can build on each prior step with each subsequent one.

One of the reasons we (Structured Dynamics) embraced the MIKE2.0 methodology [13] was its inherent incremental character. (Government deployments often call them “spirals”.) In general, the five phases of MIKE2.0 can be represented as follows:

Five Phases of MIKE2.0

(click for full size)

It is specifically during the fifth phase, testing and improvement, that quantitative and qualitative benefits from the current increment are calculated and documented. This evolving methodology is where the enterprise can assess the results of its prior investment and scope and budget for the next one. These can be quick, rapid increments, or more involved ones, depending on the schedule, prior results and risk profile of the enterprise (or department) at that time.

Much is made of “incremental” or “agile” deployments within enterprises, but the nature of the traditional data system (and its closed world assumption) can act to undermine these laudable steps. The inherent nature of an open world approach, matched with methodologies and best practices, can work wonderfully with KM-related projects.

Quite Simply a Different Way to Do Business

We see in our current IT circumstances a number of embedded practices and assumptions. We have been assuming control and completeness — the closed world opposite to the open world approach. We have thus embraced and promoted “global” or enterprise-wide solutions: be they desktop operating systems or browsers or expensive enterprise-level proprietary software solutions. This scope leads to immense hurdle rates and risks: we better get our choices right up front, because if we don’t, the department or enterprise are at risk. We have an inward focus about our own resources, our own networks, our own systems. Meanwhile, when we look outward, we wonder how all of these new Web companies can grow and expand so rapidly in comparison to us.

Clearly, we are seeing shifts to more services than products, more open source, more outsourcing, and more software as a service. Yet, because of the legacy of decades-long commitments from prior IT investment and the failures of many hyped “solutions” such as ERP or BI or data warehousing or a dozen others, we also see a decline and a reluctance for IT to embrace new and transforming approaches. Our prior choices were practically tantamount to “betting the enterprise.” What if our new approaches fail as so many of their predecessors did? In a demanding, competitive environment can we afford to make such wrong choices again with such immense implications?

Yet, now that information technology is a given, it only seems natural that its role becomes an integral part of the enterprise, and not a special function. Like procurement, IT has matured to become a support function. Businesses should not succeed or fail based on the types of pencils and paper stock they use; so should they not depend on the software support choices that IT makes. Enterprises are now past the need to get “computerized”; they are thoroughly so. But our understanding of IT’s role and position has not evolved with its own success.

The first whiffs of these challenges to IT’s initial hegemony came from the departmental introduction of PCs and local networks in the early 1980s. It has continued with desktop software, spreadsheets and Web portals and sites. Large, mature companies awoke in horror in the last decade to discover they had hundreds — sometimes thousands — of Web sites and content dissemination points over which IT had little or no control. Such is the nature of entropy, and it is a fact for any organization of any size.

So, now, with strategies such as “pay as you benefit,” there is no longer an excuse not to innovate. There is not a justification to put off testing and discovering benefits that the open world and semantic approaches can bring to your organization. There is now a basis to make the case and set the affordable budgets within desirable timelines for becoming a semantic enterprise.

Mindsets and expectations do require some adjustment. For example, not everything will be known or modeled in early phases. But, is that also not true in any “real” real world? We’re not talking high-throughput transaction systems here, but beginning to pull together and link the information that is important to your organization strategically.

Remember the intro statement that “a little semantics goes a long way”? Well, that truth — and it is true — when combined with incremental deployment firmly tied to demonstrable results, promises quite simply a different way to do business. Never before have enterprises had working and winnable approaches such as this to test and innovate and learn and discover. Jump on in; the water is clear and warm.

And, oh, as to that mess in your closet or garage? Well, if you adhere to CWA, you will need to define a place for everything to go before you can start cleaning things up. I say: forget those false hurdles. If you’d really want to make a dent in the mess, grab a broom and start cleaning.


[1] Jim Hendler, “a little semantics goes a long way.” See http://www.cs.rpi.edu/~hendler/LittleSemanticsWeb.html.
[2] All starting data is for the United States only and comes from the U.S. Bureau of Economic Analysis, U.S. Department of Commerce. The data tables were downloaded from the BEA Web site at http://www.bea.gov/national/nipaweb/SelectTable.asp. GDP data is from Section 1; enterprise private investment data from Section 5. For reasons as described in the text, all relative BEA numbers were re-adjusted from a 2005 baseline to 1997 based on absolute figures. Software figures and expenditures include packaged software, custom software and software developed in-house, but excludes software bundled or included within hardware.
[3] Data not shown; see the “Software Investment and Prices, by Type” data on the BEA Web page http://www.bea.gov/national/info_comm_tech.htm.
[4] Michael A. Cusumano, 2008. “The Changing Software Business: Moving from Products to Services,” Massachusetts Institute of Technology, in Computer, Vol 41 (1): 20-27, January 2008. See http://www.iae.univ-lille1.fr/SitesProjets/bmcommunity/Research/cusumano.pdf. This shift has occurred despite the recognition that potential gross margins from software packages can exceed 90% due to zero costs of reproduction. As Cusumano notes in a rule, “99 percent of zero is zero: The great profit opportunity from software products becomes theoretical and not practical” if not sold. Also, another interesting observation made by Cusumano is that in the shift to services vendors with both low percentages and high percentages of services, or what he calls the “sweet spots”, show higher contributions to profitability than vendors in the middle. He posits that low percentage vendors are getting mostly profitable maintenance fees, while those above 60% in services show profitability due to learning more replicable and systematic processes and approaches for service delivery.
[5] While we may occasionally see some vendors successfully buck this trend, I suspect these will only occur for established vendors with established platform advantages or for isolated applications where the innovating vendors have a significant first-mover advantage.
[6] Garnter calls the innovation category “transform”; see Gartner, Incorporated, 2009. “IT Software and Services, 2007-2010,” see http://www.slideshare.net/rsink/gartner-report-it-spending-2010. Also, see Jed Rubin and Howard Rubin, 2006. “Worldwide IT Benchmark Service New Trends & Findings for 2007: Strategic Performance Management and Measurement,” from Gartner Consulting Worldwide IT Benchmark Service; see http://www.gartner.com/teleconferences/attributes/attr_161183_115.pdf.
[7] PricewaterhouseCoopers, 2009. “Why Isn’t IT Spending Creating More Value?”, see http://www.pwc.com/en_US/us/increasing-it-effectiveness/assets/it_spending_creating_value.pdf.
[8] Though relational database systems did not begin with an understanding of CWA, but rather Edgar Codd’s 12 rules, the understandings of these were formulated later by Raymond Reiter.  Reiter first described the basis of CWA in 1978, and then provided an axiomatization of relational databases and their deductive generalizations and basis in CWA in 1984; see http://prism.cs.umd.edu/papers/Min02:reiter_memoriam/memoriam-tplp.pdf.
[9] Relational database systems also assume unique names for objects, which, while not perhaps the best design for federated systems, can be overcome in other ways.
[10] For semantics-related projects there is a corollary problem to the use of CWA which is the need for upfront agreement on what all predicates “mean”, which is difficult if not impossible in reality when different perspectives are the explicit purpose for the integration.
[11] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
[12] This was also the first instance (I believe) of Alon coining the “dataspace” term. First use of the “pay as you go” phrase was, Alon Halevy, Michael Franklin, and David Maier, 2006. “Principles of Dataspace Systems,” in Proceedings of ACM Symposium on Principles of Database Systems, pp: 1-9. See also the slides accompanying that talk, Alon Halevy, 2006. “Principles of Dataspace Systems (PODS),” June 26, 2006; see http://www.cs.washington.edu/homes/alon/files/pods06-keynote.ppt, 2006. More explicitly the next year see Jayant Madhavan, Shirley Cohen, Xin (Luna) Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu, 2007. “Web-scale Data Integration: You Can Afford to Pay as You Go.” in 3rd Conf. on Innovative Data Systems Research (CIDR), pp 342-350, see http://research.yahoo.com/files/paygo.pdf. The term has been picked up by many others, notably Rada Chirkova, Dongfeng Cheny, Fereidoon Sadriz and Timo J. Salo, 2007. “Pay-As-You-Go Information Integration: The Semantic Model Approach,” see ftp://ftp.csc.ncsu.edu/pub/tech/2007/TR-2007-30.pdf; and most recently papers by Gerhard Weikum on RDF-3X; see http://domino.mpi-inf.mpg.de/internet/reports.nsf/c125634c000710cec125613300585c64/70e8f906d8090e6bc125757f00448ec9!OpenDocument&ExpandSection=-1.
[13] See M.K. Bergman, 2010. “MIKE2.0: Open Source Information Development in the Enterprise,” AI3 Blog posting, February 23, 2010; and M.K. Bergman, 2010. Open SEAS: A Framework to Transition to a Semantic Enterprise,” AI3 Blog posting, March 1, 2010.

Posted by AI3's author, Mike Bergman

Posted on July 12, 2010 at 10:57 pm in Adaptive Innovation, MIKE2.0, Semantic Enterprise | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/
The URI to trackback this post is: http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/trackback/
Page 4 of 512345
Copyright © 2004–2013 Michael K. Bergman.   This work is licensed under a Creative Commons License