Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
March 2013
S M T W T F S
« Feb    
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Archives
More . . .  
Credits
Blog software courtesy of WordPress Site Meter View Mike's profile on LinkedIn
6434
Search
Date:   June 10, 2009

Structured Dynamics LLC

Ontology Best Practices for Data-driven Applications: Part 3

In my Intrepid Guide to Ontologies from a couple of years back, I noted that “Ontology is one of the more daunting terms for those exposed for the first time to the semantic Web.” And, for sure, if one starts to peruse the current discussions ranging from the Ontolog Forum to major academic symposia (not meaning to single anyone out), it is clear that the idea of developing “ontologies” is often freighted with much weight, hot air, and (by implication) cost.

This is both a shame because, firstly, it is unnecessary and not often true. And, secondly, because the whole pragmatic idea of what an ontology is and what it can do has often gotten lost in the shuffle.

To be sure, there have been massive standards efforts and EU-funded mega-projects devoted to ontologies. There are certainly cases where coordination of specific domains such as petroleum or integration with a complicated supplier base such as in airline manufacture warrant these massive, complicated ontology development efforts.

But, from my vantage, these extremes overshadow the vast majority of more prosaic, pragmatic applications of ontologies. Remember, ontologies are merely a means of describing a conceptual view of the world [1]. If one defines that “world” within focused and appropriate scope, it is surprising (we believe) how much mileage can be extracted from these suckers.

As we see a breakthrough of interest in semantic Web and linked data principles applied to the enterprise, as wonderfully described in the recent seminal PricewaterhouseCoopers quarterly Technology Forecast, also notably with a prominent focus on ontologies, I think it is time to direct all guns on prior bad assumptions and bad anecdotes. To wit:

  • Ontology development need not be a comprehensive, self-contained definition of a “big picture.” Ontologies can be focused, limited, and grow and change as needed
  • Ontology development need not be expensive. Whoever is selling six-figure ontology development to businesses ought to be taken out and shot. Start small and focused; frankly, a simple spreadsheet taxonomy or quick conversion of existing XML or metadata or vocabulary standards is A-OK to get started
  • Ontology development is not massive and static: rather, it is small and flexible and incremental as more is brought in and more is learned
  • Ontology development is not some imperative for conceptual “truth”; rather, it is a very adaptable means for stating, testing and refining stuff
  • Ontology development is certainly no massive relational schema; by its nature it is malleable with nary a whiff of “lock-in”, and
  • Most importantly, ontology development is a way of “driving” applications and user interfaces and reports.

In fact, it is the last point that no one is discussing today, but it is the most important of the lot: Ontologies, properly crafted, can be the ‘engines’ for data-driven applications.

It is this latter point that is a true paradigm shift and one of the most exciting prospects of ontologies.

Manifest Uses

Ontologies, for sure, are a formal representation of conceptual relations, a “world view.” But, that world view need not carry with it the freight of trying to describe all human knowledge. It can (and should) be restricted to an understandable scope (domain) and purpose. In that vein, what does such a “world view” need?

Let’s first talk about scope. We don’t need a “global ontology” that is accepted by everybody on Earth. What we need are focused ontology(ies) for describing things within a given problem space (whose data may reside in a single dataset or aggregate of datasets). We need to communicate how this system describes the things within its domain and how it understands the concepts and attributes associated with its problem space and data. This communication is published as the ontology. Rather than a global, comprehensive schema, we simply need these well constructed bricks, one by one.

Then, the ontology itself needs to be understandable and manageable. Ontologies should be readable by machines, but too many see ontologies solely through the lens of machines. I believe that to be a mistake. While importantly needing to be designed for machine ingest, I believe the real purpose of ontologies is for humans. How do we label things? How do we describe and define things? How do we find things? How do we organize things? How can these understandings be brought before us in the software that we use?

These types of questions lead us to the pragmatic and pull us back from the abstract. If we keep foremost the simple idea that ontologies are merely structures for how to organize (schema) and describe (vocabularies) our problem space at hand, then we can actually get on with cutting the bull and getting real stuff done.

Let’s take as an example our structWSF Web services framework that I will be announcing and demoing for the first time at SemTech 2009 next week. We developed a simple and flexible ontology to describe what a “Web services framework” should be. Then, we developed and implemented the software to make it happen. This means that an ontology development task can be seen as a specification task, too.

Pragmatic Applications of Ontologies

So, OK, what do these exhortations mean? Without respect to any particular scope or domain, let me then list below some important functional areas to which ontologies — properly and pragmatically designed — can contribute.

Conceptual Relationships

The traditional lens for viewing ontologies is as a means to express conceptual relationships. We agree.

However, ontologies need not have large and nuanced predicate (relationship) vocabularies in order to be useful. Relatively simple but powerful structures with hierarchical or part-of relationships can be very effectively employed for inferencing or faceted searching. From a pragmatic standpoint, let’s first agree on what “things” (nouns) there are in our domains, then let’s worry about how they relate (verbs).

The idea here has long been known as successive approximation: Let’s first get ourselves into the right country, then right province, then right city, then right neighborhood, then right house, and then right room. Only then should we worry about the condition of the paint or the age of the floors.

Endless harangues about “true” conceptual relations are a hindrance, not a help, to this perspective. It is much better (and faster, cheaper and more pragmatic) to put forward simple but coherent relationships than to worry about what all of that “really” means. From a business perspective, isn’t being able to utilize the assertion that the hip bone is connected to the thigh bone more important than having to await a full explication of all of the muscles and ligaments and tendons that might comprise them?

Once such simple relationship structures are embraced, then amazing inferencing power comes to the fore. If one searches on thigh bone, inferencing can also bring forward the hip because of its relationships to the leg.

Integrating Instance Data

OK, so now at least we have a coherent scaffolding of concepts and their straightforward relationships. That is, is concept A a “bigger” one (class or super set) than concept B? (Other simple relations could be substituted.) If so, we now see a bit of an organizing “world view” begin to emerge.

So, now we begin to bring in external data. But that data and its schema describe themselves differently. In one realm it is “foo”; in the other, “bar”.

While this different terminology for the same “thing” or related things may not be known at the outset, it is discoverable. And, when discovered, it is quite easy to associate the idea or concept of “foo” in one dataset to “bar” in another. In this manner, through learning and accretion, we are able to associate more and more similar things to one another.

We did not need to begin with some global, cosmic view to begin relating this data to one another. We only needed the right framework and structure that allowed this association to evolve as the learning occurred.

And, oh, by the way: this very same process is akin to documenting the organization’s institutional memory.

Orienting to Other Knowledge and Domains

Being able to relate and “classify” or “organize” some things to other things also means that we are now beginning to create a roadmap for how “stuff” in a broad sense relates to other “stuff.” For example, if I develop a detailed understanding of the hip bone, I can now bring that body of knowledge into the context above to relate this new information to the thigh and to the leg.

Frankly, at this juncture, while perhaps ultimately important, it is helpful merely to know that Domain A (hip) is somehow related to Domain B (thigh). Think of the issue more like trying to get into the right map vicinity on the globe, and not whether individual streets intersect.

Again, the mindset here is one of letting ontologies and their concepts get related knowledge bases into the same ballpark. Whether we are trying to match Little League ball players with Major League ballplayers is beside the point: accept that both are playing baseball, then decide the importance and specifics of the relationship in a later step.

Again, “ballpark” is more helpful than no connection whatsoever. Silly statements about “ontological commitments” really mean nothing. Ontologies, like any other tool, can play different roles at different times. When helping to get like-related things into the same ballpark, ontologies are easy and quite effective.

(As an aside, it is useful to note here that our efforts with the UMBEL upper-level subject ontology are solely premised on this “roadmap” purpose. In and of itself, UMBEL is not a very complicated explication of the world. But it does provide a comprehensive set of 20,000 subject concepts for orienting quite disparate datasets and information to one another. This very same approach could be replicated and then applied to the granularity of individual domains, kind of like zooming in on Google Maps, to provide similar benefits at smaller scale for domain-specific roadmaps. In fact, that is a common approach we apply in our own client ontologies, which we then also make sure we tie into UMBEL for global orientation.)

Mapping to Other Schema

OK, so with this foundation now built, we can next raise the bar a bit further. Once one begins to express these “world views” formally as an ontology, even with reduced ambitions as presented above, one still ends up with a formal specification of that conceptualization. And, that means, we now also have a basis with standard languages for mapping two disparate or separately developed ontologies to one another.

This is powerful. Through such mapping, we end up, in the memorable phrase of my colleague, Fred Giasson, “exploding the domain.”

Moreover, we also have found a means for stitching together datasets with disparate schema to one another. Voilà: We now have met the Holy Grail of data interoperability.

In my opinion, this is the money shot from all of this effort. But, again, if we set the deployment threshold to the unrealistic levels that some ontology pundits suggest, this payoff is unlikely to happen. We are not trying to state absolute, universal truth about anything nor to be unrealistically comprehensive. All we are trying to do is make defensible assertions that one portion of a world view is similar or related to a portion of a different world view.

Now, does that sound that scary? No, of course not. It is merely a reasonable and pragmatic means for relating two structures together.

A key aspect to this mapping ability is to enrich the description of our concepts with what we call “semsets.” Semsets are a listing of related terms and phrases that provide synonyms, aliases, jargon and related context for alternative ways to describe or bound the concept at hand. This terminological “grist” is the basis for relatively straightforward natural language processing techniques to suggest matches between concepts in different ontologies (which might also be combined with other ontology components such as preferred labels, descriptions or structural placements in the schema).

Like many of the points above, these semsets can be built incrementally and over time as new jargon and terminology is discovered.

Linked Data, with Federated and Comprehensive Data

These techniques of mapping datasets and their ontology structures can be leveraged still further with the proper application of linked data practices. Via linked data, we place our data into Web-accessible (HTTP) networks and give them Web-scalable identifiers (URIs). This means we can now integrate and interoperate with much external public Web data and break down our own internal data silos.

Our instance records can be fleshed out with supplementary sources to provide more comprehensive attibutes and characterizations. Uniformity of treatment and coverage is promoted. Data interoperability is finally at hand.

A key best practice to this, of course, needs to be the recognition that not all data or information is public and not all users have the same roles or should have the same access to different sets of data. Thus, to embrace global mechanisms for data interoperability, there must also be local methods for enforcing access, privacy and confidentiality.

Properly designed ontologies can fulfill this requirement, as well. By organizing information into datasets and setting profiles for access and CRUD (create – read – update – delete) rights, an effective environment for data sharing and federation is established.

Context- and Instance-sensitive Data Display

To this point, we have taken almost an exclusively data- or schema-centric view of ontologies. But, as structures, pure and simple, their structural nature can be exploited in other ways. It is here, frankly, that less is spoken of the potential for ontologies than in the more “conceptual” areas noted above.

The first of these new areas is in instance-sensitive data display. Each instance record is associated with an instance type in a governing ontology. Detecting this type means that context-sensitive display templates can then be invoked.

Detecting that something refers to a city, for example, can invoke a template providing a map, population figures, area size, city governance method and the like. In contrast, detecting an instance as a camera might invoke an entirely different display template focusing on product features or price or store and purchasing locations. Such instance-type displays are common; they are known as “infoboxes” within Wikipedia articles, as one example.

But this power of data display templates can be generalized further. What if we detect our instance represents a camera but do not have a display template specific to cameras? Well, the ontology and simple inferencing can tell us that cameras are a form of digital or optical products, which more generally are part of a product concept, which more generally is a form of a human-made artifact, or similar.

By tracing this inferencing chain from the specifc to the more general we can “fall back” until a somewhat OK display template is discovered, even in the absence of the better and more specific one. Then, if we find we are trying to display information on cameras frequently, we only need take one of the more general, parent templates and specifically modify it for the desired camera attributes. We also keep presentation separate from data so that the styling and presentation mode of these templates is also freely modifiable.

This parallel set of display structures to the domain ontology provides a highly reusable and leveraged data presentation framework. For 30 years organizations have struggled with report generators and all sorts of complicated systems for responsive reporting and data display. When driven by ontologies, this challenge is greatly simplified.

Driving User Interfaces

The careful reader of the above will note that our ontologies now have a number of interesting characteristics, all of which can be leveraged within the user interface. For example, we have:

  • Human-readable labels for our “things”
  • Alternative labels in our semsets that can characterize those same “things”
  • A readable description of each “thing”
  • An organized and logical schema for how each “thing” relates to other things.

This very information, when indexed in a supplementary full-text search engine with faceting capabilities (such as the Solr engine we use), can be leveraged in the user interface for these types of desired UI capabilities:

  • Attribute labels and tooltips
  • Navigation and browsing structures and trees
  • Menu structures
  • Auto-completion of entered data
  • Contextual dropdown list choices
  • Spell checkers
  • Online help systems
  • Etc.

This is absolutely mindblowing power!

We can now design generic tools that do patterned functions. Then, based on the data at hand and the ontologies that describe them, we can now see completely modified and tailored interfaces. And all of this is done without modifying a single line of application code!

Applications in this brave new world now consist of assembling a proper suite of generic tools, and then spending the bulk of our time on describing and characterizing our data via ontologies and refining templates for displaying or reporting the types of specific instances within our current problem space.

Conclusions

All of the points made above are doable and being done today. Properly designed ontologies can readily deliver all of the aspects noted above. Later parts in this ongoing series will address many of those aspects in greater detail.

Ontologies are not magic. Properly done — an important emphasis — ontologies are the pivot point for faster and more adaptable ways of doing business. A simple, pragmatic mindset can help.

Our perspective is that ontologies are really the “flour that gets backed into the cake”. While viewable and definable as their own structures, properly constructed ontologies actually should exist everywhere within applications and contribute everywhere to applications. This is what we mean by “data-driven applications.”

To be sure, we are suggesting a paradigm shift from 30 years of IT frustrations: schema no longer must be fragile; reports no longer must be costly and delayed; and data can finally be made interoperable.

We will continue to give you our best thinking on these topics over the coming weeks and how they might be important to you.

Sound too good to be true? Read the material above again. And, then, we welcome getting your call.

This post is part of an occasional AI3 series on ontology best practices.

[1] As used in knowledge representation or information science, ‘ontology’ is most often defined using Tom Gruber’s “explicit specification of a conceptualization.” See Thomas R. Gruber, 1993. “A Translation Approach to Portable Ontology Specifications,” in Knowledge Acquisition 5(2): 199-220; see http://tomgruber.org/writing/ontolingua-kaj-1993.pdf.

Posted by AI3's author, Mike Bergman

Posted on June 10, 2009 at 2:36 pm in Ontologies, Ontology Best Practices, Semantic Web, Structured Dynamics, Structured Web, UMBEL | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/
The URI to trackback this post is: http://www.mkbergman.com/492/ontology-best-practices-for-data-driven-applications-part-3/trackback/
Date:   May 29, 2009

Zotero Bibliographic Plug-in

Major Report Signals the Emergence of Linked Data into the Enterprise

PricewaterhouseCoopers (PWC) has just published a major 58-pp report on linked data in the enterprise. The report features insightful interviews with many industry practitioners as well as PWC’s own in-depth and thorough research. I think this report is a most significant event: it represents the first mainstream recognition of the potential importance of linked data and semantic Web technologies to the business of data interoperability within the enterprise.

This entire issue is uniformly excellent and well-timed. PWC has done a superb job of assembling the right topics and players. The report has three feature articles interspersed with four in-depth interviews. The target audience is the enterprise CIO with much useful explanation and background. Applications discussed range from standard business intelligence to energy and medicine.

The emphasis on the linked data aspect is a strong one. PWC puts twin emphases on ontologies and the enterprise perspective (naturally). This is a refreshing new perspective for the linked data community, which at times could be accused a bit of being myopic with regard to: 1) open data only; 2) instance records (RDF and no OWL, with little discussion of domain or concept ontologies); and 3) sometimes a disdain for the business perspective (as opposed to the academic).

PWC has done a great job of getting beyond some of the community's own prejudices in order to couch this in CIO and enterprise terms. This signals to me the transition from the lab to the marketplace, with all of its consequent challenges and advantages.

In short: Bravo! This is a very good piece and will, I think, put PWC ahead of the curve for some time to come.

I was very pleased to have the opportunity to review earlier drafts of this major report. After reading a couple of my recent papers on Shaky Semantics and the Advantages and Myths of RDF, with the latter cited in the piece, I had a chance to have a fruitful dialog with one of the report’s editors, Alan Morrison, who is a manager in PWC’s Center for Technology and Innovation (CTI). He kindly solicited my comments and incorporated some suggestions.

The report also lists 14 various semantic technology vendors and service providers. I’m pleased to note that PWC included our small Structured Dynamics firm as part of its listing. Other vendors listed include Cambridge Semantics, Collibra, Metatomix, Microsoft, OpenLink Software, Oracle, Semantic Discovery Systems, Talis, Thomson Reuters, TopQuadrant and Zepheira, with the selected service providers of Radar Networks and AdaptiveBlue.

This report is easy — but important — reading. I personally enjoyed the insights of Frank Chum of Chevron, a new name for me. I encourage all in the field to read and study the entire report closely. I think this report will be an important milestone for the semantic Web in the enterprise for quite some time to come.

After a brief sign-up, the 58-pp report is available for free download.

Posted by AI3's author, Mike Bergman

Posted on May 29, 2009 at 10:04 am in Linked Data, Ontologies, Semantic Web, Structured Dynamics | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/490/pwc-dedicates-quarterly-technology-forecast-to-linked-data/
The URI to trackback this post is: http://www.mkbergman.com/490/pwc-dedicates-quarterly-technology-forecast-to-linked-data/trackback/
Date:   May 17, 2009

Structured Dynamics LLC

Ontology Best Practices for Data-driven Applications: Part 2

It is perhaps not surprising that the first substantive post in this occasional series on ontology best practices for data-driven applications begins with the importance of keeping an ABox and TBox split. Structured Dynamics has been beating the tom-tom for quite a while on this topic. We reiterate and expand on this position in this post.

The Relation to Description Logics

Description logics (DL) are one of the key underpinnings to the semantic Web. DL are a logic semantics for knowledge representation (KR) systems based on first-order predicate logic (FOL). They are a kind of logical metalanguage that can help describe and determine (with various logic tests) the consistency, decidability and inferencing power of a given KR language. The semantic Web ontology languages, OWL Lite and OWL DL (which stands for description logics), are based on DL and were themselves outgrowths of earlier DL languages.

Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. It is this construct for which Structure Dynamics generally reserves the term “ontology”.

The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (or individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts. Both the TBox and ABox are consistent with set-theoretic principles.

Natural and Logical Work Splits

TBox and ABox logic operations differ and their purposes differ. TBox operations are based more on inferencing and tracing or verifying class memberships in the hierarchy (that is, the structural placement or relation of objects in the structure). ABox operations are more rule-based and govern fact checking, instance checking, consistency checking, and the like. ABox reasoning is generally more complex and at a larger scale than that for the TBox.

Early semantic Web systems tended to be very diligent about maintaining these ‘box’ distinctions of purpose, logic and treatment. One might argue, as Structured Dynamics does, that the usefulness and basis for these splits has been lost somewhat more recently.

Particularly as we now see linked data become more prevalent, these same questions of scale and actual interoperability are posing real pragmatic challenges. To help aid this thinking, we have re-assembled, re-articulated and in some cases added to earlier discussions of the purposes of the TBox and ABox:

TBox TBox < — > ABox ABox
  • Definitions of the concepts and properties (relationships) of the controlled vocabulary
  • Declarations of concept axioms or roles
  • Inferencing of relationships, be they transitive, symmetric, functional or inverse to another property
  • Equivalence testing as to whether two classes or properties are equivalent to one another
  • Subsumption, which is checking whether one concept is more general than another
  • Satisfiability, which is the problem of checking whether a concept has been defined (is not an empty concept)
  • Classification, which places a new concept in the proper place in a taxonomic hierarchy of concepts
  • Logical implication, which is whether a generic relationship is a logical consequence of the declarations in the TBox
  • Infer property assertions implicit through the transitive property
  • Entailments, which are whether other propositions are implied by the stated condition
  • Instance checking, which verifies whether a given individual is an instance of (belongs to) a specified concept
  • Knowledge base consistency, which is to verify whether all concepts admit at least one individual
  • Realization, which is to find the most specific concept for an individual object
  • Retrieval, which is to find the individuals that are instances of a given concept
  • Identity relations, which is to determine the equivalence or relatedness of instances in different datasets
  • Disambiguation, which is resolving references to the proper instance
  • Membership assertions, either as concepts or as roles
  • Attributes assertions
  • Linkages assertions that capture the above but also assert the external sources for these assignments
  • Consistency checking of instances
  • Satisfiability checks, which are that the conditions of instance membership are met

As the table shows, the TBox is where the reasoning work occurs, the ABox is where assertions and data integrity occurs, and knowledge base work in the middle (among other aspects) requires both. We can reflect these work splits via the following diagram:

TBox- and ABox-level work

This figure maps the work activities noted in the table, with particular emphasis on the possible and specialized work activities at the interstices between the TBox and ABox.

The Split Should Feel Natural

Whether a single database or the federation across many, we have data records (structs of instances) and a logical schema (ontology of concepts and relationships) by which we try to relate this information. This is a natural and meaningful split: structure and relationships v. the instances that populate that structure.

Stated this way, particularly for anyone with a relational database background, the split between schema and data is clear and obvious. While the relational data community has not always maintained this split, and the RDF, semantic Web and linked data communities have not often done so as well, this split makes eminent sense as a way to maintain a desirable separation of concerns.

The importance of description logics — besides its role as a logical underpinning to the semantic Web enterprise — is its ability to provide a perspective and framework for making these natural splits. Moreover, with some updated thinking, we can also establish a natural framework for guiding architecture and design. It is quite OK to also look to the interaction and triangulation of the ABox and TBox, as well as to specialized work that is not constrained to either.

For example, identity evaluation and disambiguation really come down to the questions of whether we are talking about the same or different things across multiple data sources. By analyzing these questions as separate components, we also gain the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available. A low-fidelity service, for example, could be applied for quick or free uses, with more rigorous methods reserved for paid or batch mode analysis. Similarly, maintaining full-text search as a separate component means the work can be done by optimized search engines with built-in faceting (such as the excellent open-source Solr application).

These distinctions feel obvious and natural. They arise from a sound grounding in the split of the ABox and the TBox.

The Re-cap of Key Reasons to Maintain the TBox – ABox Split

So, to conclude this part in this occasional series, here are some of the key reasons to maintain a relative split between instances (the ABox) and the conceptual relationships that describe a world view for interpreting them (the TBox):

  • We are able to handle instance data simply. The nature of instance “things” is comparatively constant and can be captured with easily understandable attribute-value pairs
  • We can re-use these instance records in varied and multiple world views (the TBox). World views can be refined or approached from different perspectives without affecting instance data in the slightest
  • We can approach data architectural decisions from the standpoints of the work to be done, leaving open special analysis or tasks like disambiguation or full-text search
  • Ontologies (as defined by SD and focused on the TBox) are kept simpler and easier to understand. Inter-dataset relationships are asserted and testable in largely separate constructs, rather than admixed throughout
  • Relatedly, we are thus able to use ontologies to focus on the issues of mappings and conceptual relationships
  • Instance records can often be kept in situ, especially useful when incorporating the massive amounts of data in existing relational databases
  • Instance evaluations can be done separately from conceptual evaluations, which can help through triangulation in such tasks as disambiguation or entity identification
  • It is easier to convert simple data structs to the instance record structure, aiding interoperability (a subject for a later part in this series)
  • We provide a framework that is amenable to swapping in and out different analysis methods, and
  • It is easier for broader input when the task is adding and refining attributes rather than internally consistent conceptual relationships.

Here is a final best practice suggestion when these ABox and TBox splits are maintained: Make sure as curators that new attributes added at the instance level are also added with their conceptual relationships at the TBox level. In this way, the knowledge base can be kept integral while we simultaneously foster a framework that eases the broadest scope of contributions.

This post is part of an occasional AI3 series on ontology best practices.

Posted by AI3's author, Mike Bergman

Posted on May 17, 2009 at 7:49 pm in Ontologies, Ontology Best Practices, Semantic Web, Structured Dynamics, Structured Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/489/ontology-best-practices-for-data-driven-applications-part-2/
The URI to trackback this post is: http://www.mkbergman.com/489/ontology-best-practices-for-data-driven-applications-part-2/trackback/
Date:   May 12, 2009

Structured Dynamics LLC

Ontology Best Practices for Data-driven Applications: Part 1

Structured Dynamics is plowing virgin ground in how linked, structured data — powered by the flexible RDF data model — can establish new approaches useful to the enterprise. These approaches range from how applications are architected, to how data is shared and interoperated, and to how we even design and deploy applications and the data themselves.

At the core of this mindset is the concept of ‘data-driven apps‘, with their underlying structure based on ontologies. Over the coming weeks, I will be posting a series of best practices for how these ontologies can be designed, constructed and employed, and how they can shift the paradigm from static and inflexible applications to ones that are driven by the underlying data.

So, as the introduction to this occasional series, it is thus useful to define our terms and viewpoints. Clearly the two key concepts are:

  • Data-driven applications — this concept means the use of generic tools, applications and services that shape themselves and expose capabilities based on the structure of their underlying data. Generic means reusable. Unlike inflexible report writers or static tools of the past, these applications present functionality and are contextual based on the structure of the underlying data they serve. The data-driven aspects results from proper construction of the ontologies that describe this underlying data
  • Ontologies — ontologies have been something of a teeth-grinding concept for a couple of years, having been appropriated from their historical meaning of the nature of being (“ontos”) in philosophy to describe “shared conceptualizations” in computer science and knowledge engineering [1]. For its purposes, Structured Dynamics more precisely defines ontologies as the relationships of the concepts and domains embodied in the underlying things or instances described by the data. Under this approach, ontologies based on RDF become a structural representation of the data relationships in graph form. But, in addition, we also define ontologies to mean the proper description of these concepts, so as to supply the context, synonyms and aliases, and labels useful to human use and understanding.

We therefore put a fairly high threshold of construction and design on our ontologies. These imperatives provide the rationale for this series.

One complementary aspect to our design is the importance to get data in any form or serialization converted to the canonical RDF data model upon which the ontologies define and describe the data structure. Though crucial, this aspect is not discussed further in this series.

Now, of course, when someone (me) has the chutzpah to posit “best practices” it should also be clear as to what end. Ontologies may be used for many things. Others may have as the aim completeness of domain capture, wealth of predicates, reasoning or inference. In our sense, we define “best practices” within our focus of data interoperability and data-driven apps. Your own mileage may vary.

In no particular order and with likely new topics to emerge, here is the current listing of what some of the other parts in this occasional series will contain:

  • Intro (concepts)
  • ABox – TBox split
  • Architecting (modularizing) ontologies into categories (e.g., UI/display of information; domains/instances; admin/internal apps)
  • Definition of a standard instance record vocabulary (ABox)
  • Role of an instance record vocabulary for universal struct ingest
  • Selection of core external ontologies and re-use
  • A deeper exploration of the data-driven application
  • Initial ontology building and techniques
  • Specific UI items suitable to be driven by ontologies (a listing of 20 or so items)
  • Techniques for mapping to external ontologies
  • Dataset interoperability and the myth that OWL is only useful for real-time reasoning, and
  • OWL mapping predicates, importance of class mappings, and OWL 2.

The idea throughout this series is to document best practices as encountered. We certainly do not claim completeness on these matters, but we also assert that good upfront design can deliver many free backend benefits.

If there is a particular topic missing from above that you would like us to discuss, please fire away! In any event, we will be giving you our best thinking on these topics over the coming weeks and how they might be important to you.


[1] Michael K. Bergman, 2007. An Intrepid Guide to Ontologies, May 16, 2007. See http://www.mkbergman.com/?p=374.

Posted by AI3's author, Mike Bergman

Posted on May 12, 2009 at 10:16 am in Ontologies, Ontology Best Practices, Semantic Web, Structured Dynamics, Structured Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/488/ontology-best-practices-for-data-driven-applications-part-1/
The URI to trackback this post is: http://www.mkbergman.com/488/ontology-best-practices-for-data-driven-applications-part-1/trackback/
Date:   April 21, 2009

SearchMonkey

SearchMonkey’s Recommended Vocabularies a Useful Resource

I am pleased to report that UMBEL is now included as one of the recommended vocabularies for the Yahoo! SearchMonkey service. Using SearchMonkey, developers and site owners can use structured data to enhance the value of standard Yahoo! search results and customize their presentation, including through “infobars“. SearchMonkey is integral to a concerted effort by Yahoo! to embrace structured data, RDF and the semantic Web.

SearchMonkey was first announced in February 2008 with a beta release in April and then public release in May with 28 supported vocabularies. Then, last October, an additional set of common, external vocabularies were recommended for the system including DBpedia, Freebase, GoodRelations and SIOC. At the same time, some further internal Yahoo! vocabularies and standard Web languages (e.g., OWL, XHTML) were also added.

This is the first vocabulary update since then. Besides UMBEL, the AB Meta and Semantic Tags vocabularies have also been added to this latest revision. (There have also been a few deprecations over time.)

A recommended vocabulary means that its namespace prefix is recognized by SearchMonkey. The namespaces for the recommended vocabularies are reserved. Though site owners may customize and add new SearchMonkey structure, they must be explicitly defined in specific DataRSS feeds.

Structured data may be included in Yahoo! search results from these sources:

  • Yahoo! Index — the core Yahoo! search data with limited structure such as the page’s title, summary, file size, MIME type, etc. This structure is only provided by Yahoo!
  • Semantic Web Data — including microformats and RDF data embedded in the host page
  • Data Feed — A feed of Yahoo! native DataRSS provided by a third party site
  • Custom Data Service — Any data extracted from an (X)HTML page or web service and represented within SearchMonkey as DataRSS.

As a recommended vocabulary, UMBEL namespace references can now be embedded and recognized (and then presented) in Yahoo! search results.

The Current Vocabulary Set

Here are the 34 current vocabularies (plus five deprecated) recognized by the system:

Prefix Name Namespace
abmeta AB Meta http://www.abmeta.org/ns#
action SearchMonkey Actions http://search.yahoo.com/searchmonkey/action/
assert SearchMonkey Assertions (deprecated) http://search.yahoo.com/searchmonkey/assert/
cc Creative Commons http://creativecommons.org/ns#
commerce SearchMonkey Commerce http://search.yahoo.com/searchmonkey/commerce/
context SearchMonkey Context (deprecated) http://search.yahoo.com/searchmonkey/context/
country SearchMonkey Country Datatypes http://search.yahoo.com/searchmonkey-datatype/country/
currency SearchMonkey Currency Datatypes http://search.yahoo.com/searchmonkey-datatype/currency/
dbpedia DBPedia http://dbpedia.org/resource/
dc Dublin Core http://purl.org/dc/terms/
fb Freebase http://rdf.freebase.com/ns/
feed SearchMonkey Feed http://search.yahoo.com/searchmonkey/feed/
finance SearchMonkey Finance http://search.yahoo.com/searchmonkey/finance/
foaf FOAF http://xmlns.com/foaf/0.1/
geo GeoRSS http://www.georss.org/georss#
gr GoodRelations http://purl.org/goodrelations/v1#
job SearchMonkey Jobs http://search.yahoo.com/searchmonkey/job/
media SearchMonkey Media http://search.yahoo.com/searchmonkey/media/
news SearchMonkey News http://search.yahoo.com/searchmonkey/news/
owl OWL ontology language http://www.w3.org/2002/07/owl#
page SearchMonkey Page (deprecated) http://search.yahoo.com/searchmonkey/page/
product SearchMonkey Product http://search.yahoo.com/searchmonkey/product/
rdf RDF http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs RDF Schema http://www.w3.org/2000/01/rdf-schema#
reference SearchMonkey Reference http://search.yahoo.com/searchmonkey/reference/
rel SearchMonkey Relations (deprecated) http://search.yahoo.com/searchmonkey-relation/
resume SearchMonkey Resume http://search.yahoo.com/searchmonkey/resume/
review Review http://purl.org/stuff/rev#
sioc SIOC http://rdfs.org/sioc/ns#
social SearchMonkey Social http://search.yahoo.com/searchmonkey/social/
stag Semantic Tags http://semantictagging.org/ns#
tagspace SearchMonkey Tagspace (deprecated) http://search.yahoo.com/searchmonkey/tagspace/
umbel UMBEL http://umbel.org/umbel/sc/
use SearchMonkey Use Datatypes http://search.yahoo.com/searchmonkey-datatype/use/
vcal VCalendar http://www.w3.org/2002/12/cal/icaltzd#
vcard VCard http://www.w3.org/2006/vcard/ns#
xfn XFN http://gmpg.org/xfn/11#
xhtml XHTML http://www.w3.org/1999/xhtml/vocab#
xsd XML Schema Datatypes http://www.w3.org/2001/XMLSchema#

In addition, there are a number of standard datatypes recognized by SearchMonkey, mostly a superset of XSD (XML Schema datatypes).

What is emerging from this Yahoo! initiative is a very useful set of structured data definitions and vocabularies. These same resources can be great starting points for non-SearchMonkey applications as well.

For More Information

There is quite a bit of online material now available for SearchMonkey, with new expansions and revisions also accompanying this most recent release. As some starting points, I recommend:

Posted by AI3's author, Mike Bergman

Posted on April 21, 2009 at 5:18 pm in Adaptive Information, Searching, Semantic Web, Structured Web, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/
The URI to trackback this post is: http://www.mkbergman.com/485/umbel-now-included-in-searchmonkey/trackback/
Page 10 of 38« First...89101112...2030...Last »
Copyright © 2004–2013 Michael K. Bergman.   This work is licensed under a Creative Commons License