Posted:June 17, 2009

Drupal Logo

Structured Dynamics Announces Drupal and Web Services Frameworks at SemTech 2009 Conference

After six months of dedicated effort, we are pleased to announce two new products: conStruct, which is a set of modules for bringing structured (RDF) content capabilities to Drupal and structWSF, the platform-independent Web services framework that underlies it.

There has been some promising effort to expose RDF data from Drupal for some time, and expressing internal data within Drupal as RDFa is being implemented by others as part of the upcoming version 7 of Drupal. These are exciting prospects that we wholeheartedly applaud. In fact, they will also be great sources to our products noted below.

However, our innovation looks through the other end of the telescope: Our new conStruct structured content system (SCS) enables external structured data to actually ‘drive the application‘. We think Drupal is the perfect host to demonstrate this new paradigm of ‘data-driven apps’.

The conStruct Drupal module makes the connections between existing Drupal capabilities and the structWSF Web services framework. structWSF provides a standard suite of Web services, an innovative means to access and manage datasets, and the hooks to underlying structured data stores and full-text search engines.

Combined with the existing efforts to expose RDF from Drupal, we think these two new products now promise a two-way highway for structured data thorugh Drupal.

structWSF

structWFS

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies).

The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The initial structWSF framework comes packaged with a baseline set of about a dozen Web services in CRUD, browse, search and export and import.

All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and a document of resultsets (if the query result is not null). Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML.

In initial release, structWSF has direct interfaces to the Virtuoso RDF triple store (via ODBC, and later HTTP) and the Solr faceted, full-text search engine (via HTTP). However, structWSF has been designed to be fully platform-independent. The framework is open source (Apache 2 license) and designed for extensibility.

conStruct SCS

conStruct Logo

conStruct SCS is a structured content system that extends the basic Drupal content management framework. conStruct enables structured data and its controlling vocabularies (ontologies) to drive applications and user interfaces.

Users and groups can flexibly access and manage any or all datasets exposed by the system depending on roles and permissions. Report and presentation templates are easily defined, styled or modified based on the underlying datasets and structure. Collaboration networks can readily be established across multiple installations and non-Drupal endpoints. Powerful linked data integration can be included to embrace data anywhere on the Web.

conStruct provides Drupal-level CRUD (create – read – update – delete), data display templating, faceted browsing, full-text search, and import and export over structured data stores based on RDF. Depending on roles and permissions, a given user may or may not see specific datasets or tools within the Drupal interface. Search and browse results are similarly sequestered depending on access rights. There is a core conStruct project on Drupal, with the additional optional modules of structDisplay and structOntology coming soon thereafter.

Two Accompanying Web Sites

In addition to the products themselves, two different Web sites accompany our announcement, both based on Drupal.

OpenStructs.org is dedicated to the platform-independent offerings. All OpenStructs tools are premised on the canonical RDF (Resource Description Framework) data model. OpenStructs tools either convert existing data structures to RDF, extract structure from content as RDF, or manage and manipulate RDF. All OpenStructs tools and approaches are compliant with existing open standards from the W3C. The intent is to achieve maximum data and software interoperabililty.

The main software distribution from OpenStructs is structWSF. Over time, OpenStructs is also meant to be the distribution point for user-generated “structs” in data display templating, and data extractors and converters, in addition to additional Web services compliant with the structWSF framework.

conStruct SCS is a knowledge site dedicated to conStruct and provides demos and sandboxes for the system. It accompanies the actual project sites on Drupal itself.

Unveiled at SemTech 2009

We unveiled and demoed the two products yesterday at the 2009 Semantic Technology Conference in San Jose, California. I did so during my talk on, "BKN: Building Communities through Knowledge, and Knowledge Through Communities." SemTech 2009 is a premier semantic Web event, which has been steadily growing and now exceeds 1000 attendees.

structWSF has been under development by Structured Dynamics for some time. Its linkage and incorporation within the Drupal system has more recently been supported by the Bibliographic Knowledge Network.

BKN is a major, two-year, NSF-funded project jointly sponsored by the University of California, Berkeley, Harvard University, Stanford University, and the American Institute of Mathematics, with broad private sector and community support. BKN is developing a suite of tools and infrastructure for citations and bibliographies within the mathematics and statistics domain based on semantic technologies for professionals, students or researchers to form new communities.

An alpha version of structWSF will released for download from the OpenStructs (http://openstructs.org) Web site on June 30. The conStruct system will be released at the same time under GPL license. See its home site at http://constructscs.com or within the Drupal module system (http://drupal.org/project/construct).

Some Early Observations

Structured Dynamics has as its mission to assist enterprises and non-profit organizations and projects to adopt Web-accessible and interoperable data. These are our first product offerings geared to address this mission.

The basic premise is that the data itself becomes the application. Via structured, linked data and a combination of products and Web services, information in any form and from any source can now be integrated and made interoperable. Linked data is based on open standards to interconnect any form of relevant information on the Web — on demand and in context.

One of the most exciting aspects of the overall architecture behind these two products is their suitability to support distributed collaboration, across diverse and definable datasets, all supported by sensitivity to role-based data and tools (Web services) access.

We’ll be speaking much more on these topics now that we have this foundation in place. We also, of course, have much to learn about the deployment and use case potentials of these frameworks.

These two products signal our (SD’s) commitment to open source. We hope some of you also see the promise in these frameworks to provide an adaptive infrastructure to linked and structured data. We welcome your participation!

Posted:June 10, 2009

Structured Dynamics LLCOntology Best Practices for Data-driven Applications: Part 3

In my Intrepid Guide to Ontologies from a couple of years back, I noted that “Ontology is one of the more daunting terms for those exposed for the first time to the semantic Web.” And, for sure, if one starts to peruse the current discussions ranging from the Ontolog Forum to major academic symposia (not meaning to single anyone out), it is clear that the idea of developing “ontologies” is often freighted with much weight, hot air, and (by implication) cost.

This is both a shame because, firstly, it is unnecessary and not often true. And, secondly, because the whole pragmatic idea of what an ontology is and what it can do has often gotten lost in the shuffle.

To be sure, there have been massive standards efforts and EU-funded mega-projects devoted to ontologies. There are certainly cases where coordination of specific domains such as petroleum or integration with a complicated supplier base such as in airline manufacture warrant these massive, complicated ontology development efforts.

But, from my vantage, these extremes overshadow the vast majority of more prosaic, pragmatic applications of ontologies. Remember, ontologies are merely a means of describing a conceptual view of the world [1]. If one defines that “world” within focused and appropriate scope, it is surprising (we believe) how much mileage can be extracted from these suckers.

As we see a breakthrough of interest in semantic Web and linked data principles applied to the enterprise, as wonderfully described in the recent seminal PricewaterhouseCoopers quarterly Technology Forecast, also notably with a prominent focus on ontologies, I think it is time to direct all guns on prior bad assumptions and bad anecdotes. To wit:

  • Ontology development need not be a comprehensive, self-contained definition of a “big picture.” Ontologies can be focused, limited, and grow and change as needed
  • Ontology development need not be expensive. Whoever is selling six-figure ontology development to businesses ought to be taken out and shot. Start small and focused; frankly, a simple spreadsheet taxonomy or quick conversion of existing XML or metadata or vocabulary standards is A-OK to get started
  • Ontology development is not massive and static: rather, it is small and flexible and incremental as more is brought in and more is learned
  • Ontology development is not some imperative for conceptual “truth”; rather, it is a very adaptable means for stating, testing and refining stuff
  • Ontology development is certainly no massive relational schema; by its nature it is malleable with nary a whiff of “lock-in”, and
  • Most importantly, ontology development is a way of “driving” applications and user interfaces and reports.

In fact, it is the last point that no one is discussing today, but it is the most important of the lot: Ontologies, properly crafted, can be the ‘engines’ for data-driven applications.

It is this latter point that is a true paradigm shift and one of the most exciting prospects of ontologies.

Manifest Uses

Ontologies, for sure, are a formal representation of conceptual relations, a “world view.” But, that world view need not carry with it the freight of trying to describe all human knowledge. It can (and should) be restricted to an understandable scope (domain) and purpose. In that vein, what does such a “world view” need?

Let’s first talk about scope. We don’t need a “global ontology” that is accepted by everybody on Earth. What we need are focused ontology(ies) for describing things within a given problem space (whose data may reside in a single dataset or aggregate of datasets). We need to communicate how this system describes the things within its domain and how it understands the concepts and attributes associated with its problem space and data. This communication is published as the ontology. Rather than a global, comprehensive schema, we simply need these well constructed bricks, one by one.

Then, the ontology itself needs to be understandable and manageable. Ontologies should be readable by machines, but too many see ontologies solely through the lens of machines. I believe that to be a mistake. While importantly needing to be designed for machine ingest, I believe the real purpose of ontologies is for humans. How do we label things? How do we describe and define things? How do we find things? How do we organize things? How can these understandings be brought before us in the software that we use?

These types of questions lead us to the pragmatic and pull us back from the abstract. If we keep foremost the simple idea that ontologies are merely structures for how to organize (schema) and describe (vocabularies) our problem space at hand, then we can actually get on with cutting the bull and getting real stuff done.

Let’s take as an example our structWSF Web services framework that I will be announcing and demoing for the first time at SemTech 2009 next week. We developed a simple and flexible ontology to describe what a “Web services framework” should be. Then, we developed and implemented the software to make it happen. This means that an ontology development task can be seen as a specification task, too.

Pragmatic Applications of Ontologies

So, OK, what do these exhortations mean? Without respect to any particular scope or domain, let me then list below some important functional areas to which ontologies — properly and pragmatically designed — can contribute.

Conceptual Relationships

The traditional lens for viewing ontologies is as a means to express conceptual relationships. We agree.

However, ontologies need not have large and nuanced predicate (relationship) vocabularies in order to be useful. Relatively simple but powerful structures with hierarchical or part-of relationships can be very effectively employed for inferencing or faceted searching. From a pragmatic standpoint, let’s first agree on what “things” (nouns) there are in our domains, then let’s worry about how they relate (verbs).

The idea here has long been known as successive approximation: Let’s first get ourselves into the right country, then right province, then right city, then right neighborhood, then right house, and then right room. Only then should we worry about the condition of the paint or the age of the floors.

Endless harangues about “true” conceptual relations are a hindrance, not a help, to this perspective. It is much better (and faster, cheaper and more pragmatic) to put forward simple but coherent relationships than to worry about what all of that “really” means. From a business perspective, isn’t being able to utilize the assertion that the hip bone is connected to the thigh bone more important than having to await a full explication of all of the muscles and ligaments and tendons that might comprise them?

Once such simple relationship structures are embraced, then amazing inferencing power comes to the fore. If one searches on thigh bone, inferencing can also bring forward the hip because of its relationships to the leg.

Integrating Instance Data

OK, so now at least we have a coherent scaffolding of concepts and their straightforward relationships. That is, is concept A a “bigger” one (class or super set) than concept B? (Other simple relations could be substituted.) If so, we now see a bit of an organizing “world view” begin to emerge.

So, now we begin to bring in external data. But that data and its schema describe themselves differently. In one realm it is “foo”; in the other, “bar”.

While this different terminology for the same “thing” or related things may not be known at the outset, it is discoverable. And, when discovered, it is quite easy to associate the idea or concept of “foo” in one dataset to “bar” in another. In this manner, through learning and accretion, we are able to associate more and more similar things to one another.

We did not need to begin with some global, cosmic view to begin relating this data to one another. We only needed the right framework and structure that allowed this association to evolve as the learning occurred.

And, oh, by the way: this very same process is akin to documenting the organization’s institutional memory.

Orienting to Other Knowledge and Domains

Being able to relate and “classify” or “organize” some things to other things also means that we are now beginning to create a roadmap for how “stuff” in a broad sense relates to other “stuff.” For example, if I develop a detailed understanding of the hip bone, I can now bring that body of knowledge into the context above to relate this new information to the thigh and to the leg.

Frankly, at this juncture, while perhaps ultimately important, it is helpful merely to know that Domain A (hip) is somehow related to Domain B (thigh). Think of the issue more like trying to get into the right map vicinity on the globe, and not whether individual streets intersect.

Again, the mindset here is one of letting ontologies and their concepts get related knowledge bases into the same ballpark. Whether we are trying to match Little League ball players with Major League ballplayers is beside the point: accept that both are playing baseball, then decide the importance and specifics of the relationship in a later step.

Again, “ballpark” is more helpful than no connection whatsoever. Silly statements about “ontological commitments” really mean nothing. Ontologies, like any other tool, can play different roles at different times. When helping to get like-related things into the same ballpark, ontologies are easy and quite effective.

(As an aside, it is useful to note here that our efforts with the UMBEL upper-level subject ontology are solely premised on this “roadmap” purpose. In and of itself, UMBEL is not a very complicated explication of the world. But it does provide a comprehensive set of 20,000 subject concepts for orienting quite disparate datasets and information to one another. This very same approach could be replicated and then applied to the granularity of individual domains, kind of like zooming in on Google Maps, to provide similar benefits at smaller scale for domain-specific roadmaps. In fact, that is a common approach we apply in our own client ontologies, which we then also make sure we tie into UMBEL for global orientation.)

Mapping to Other Schema

OK, so with this foundation now built, we can next raise the bar a bit further. Once one begins to express these “world views” formally as an ontology, even with reduced ambitions as presented above, one still ends up with a formal specification of that conceptualization. And, that means, we now also have a basis with standard languages for mapping two disparate or separately developed ontologies to one another.

This is powerful. Through such mapping, we end up, in the memorable phrase of my colleague, Fred Giasson, “exploding the domain.”

Moreover, we also have found a means for stitching together datasets with disparate schema to one another. Voilà: We now have met the Holy Grail of data interoperability.

In my opinion, this is the money shot from all of this effort. But, again, if we set the deployment threshold to the unrealistic levels that some ontology pundits suggest, this payoff is unlikely to happen. We are not trying to state absolute, universal truth about anything nor to be unrealistically comprehensive. All we are trying to do is make defensible assertions that one portion of a world view is similar or related to a portion of a different world view.

Now, does that sound that scary? No, of course not. It is merely a reasonable and pragmatic means for relating two structures together.

A key aspect to this mapping ability is to enrich the description of our concepts with what we call “semsets.” Semsets are a listing of related terms and phrases that provide synonyms, aliases, jargon and related context for alternative ways to describe or bound the concept at hand. This terminological “grist” is the basis for relatively straightforward natural language processing techniques to suggest matches between concepts in different ontologies (which might also be combined with other ontology components such as preferred labels, descriptions or structural placements in the schema).

Like many of the points above, these semsets can be built incrementally and over time as new jargon and terminology is discovered.

Linked Data, with Federated and Comprehensive Data

These techniques of mapping datasets and their ontology structures can be leveraged still further with the proper application of linked data practices. Via linked data, we place our data into Web-accessible (HTTP) networks and give them Web-scalable identifiers (URIs). This means we can now integrate and interoperate with much external public Web data and break down our own internal data silos.

Our instance records can be fleshed out with supplementary sources to provide more comprehensive attibutes and characterizations. Uniformity of treatment and coverage is promoted. Data interoperability is finally at hand.

A key best practice to this, of course, needs to be the recognition that not all data or information is public and not all users have the same roles or should have the same access to different sets of data. Thus, to embrace global mechanisms for data interoperability, there must also be local methods for enforcing access, privacy and confidentiality.

Properly designed ontologies can fulfill this requirement, as well. By organizing information into datasets and setting profiles for access and CRUD (create – read – update – delete) rights, an effective environment for data sharing and federation is established.

Context- and Instance-sensitive Data Display

To this point, we have taken almost an exclusively data- or schema-centric view of ontologies. But, as structures, pure and simple, their structural nature can be exploited in other ways. It is here, frankly, that less is spoken of the potential for ontologies than in the more “conceptual” areas noted above.

The first of these new areas is in instance-sensitive data display. Each instance record is associated with an instance type in a governing ontology. Detecting this type means that context-sensitive display templates can then be invoked.

Detecting that something refers to a city, for example, can invoke a template providing a map, population figures, area size, city governance method and the like. In contrast, detecting an instance as a camera might invoke an entirely different display template focusing on product features or price or store and purchasing locations. Such instance-type displays are common; they are known as “infoboxes” within Wikipedia articles, as one example.

But this power of data display templates can be generalized further. What if we detect our instance represents a camera but do not have a display template specific to cameras? Well, the ontology and simple inferencing can tell us that cameras are a form of digital or optical products, which more generally are part of a product concept, which more generally is a form of a human-made artifact, or similar.

By tracing this inferencing chain from the specifc to the more general we can “fall back” until a somewhat OK display template is discovered, even in the absence of the better and more specific one. Then, if we find we are trying to display information on cameras frequently, we only need take one of the more general, parent templates and specifically modify it for the desired camera attributes. We also keep presentation separate from data so that the styling and presentation mode of these templates is also freely modifiable.

This parallel set of display structures to the domain ontology provides a highly reusable and leveraged data presentation framework. For 30 years organizations have struggled with report generators and all sorts of complicated systems for responsive reporting and data display. When driven by ontologies, this challenge is greatly simplified.

Driving User Interfaces

The careful reader of the above will note that our ontologies now have a number of interesting characteristics, all of which can be leveraged within the user interface. For example, we have:

  • Human-readable labels for our “things”
  • Alternative labels in our semsets that can characterize those same “things”
  • A readable description of each “thing”
  • An organized and logical schema for how each “thing” relates to other things.

This very information, when indexed in a supplementary full-text search engine with faceting capabilities (such as the Solr engine we use), can be leveraged in the user interface for these types of desired UI capabilities:

  • Attribute labels and tooltips
  • Navigation and browsing structures and trees
  • Menu structures
  • Auto-completion of entered data
  • Contextual dropdown list choices
  • Spell checkers
  • Online help systems
  • Etc.

This is absolutely mindblowing power!

We can now design generic tools that do patterned functions. Then, based on the data at hand and the ontologies that describe them, we can now see completely modified and tailored interfaces. And all of this is done without modifying a single line of application code!

Applications in this brave new world now consist of assembling a proper suite of generic tools, and then spending the bulk of our time on describing and characterizing our data via ontologies and refining templates for displaying or reporting the types of specific instances within our current problem space.

Conclusions

All of the points made above are doable and being done today. Properly designed ontologies can readily deliver all of the aspects noted above. Later parts in this ongoing series will address many of those aspects in greater detail.

Ontologies are not magic. Properly done — an important emphasis — ontologies are the pivot point for faster and more adaptable ways of doing business. A simple, pragmatic mindset can help.

Our perspective is that ontologies are really the “flour that gets backed into the cake”. While viewable and definable as their own structures, properly constructed ontologies actually should exist everywhere within applications and contribute everywhere to applications. This is what we mean by “data-driven applications.”

To be sure, we are suggesting a paradigm shift from 30 years of IT frustrations: schema no longer must be fragile; reports no longer must be costly and delayed; and data can finally be made interoperable.

We will continue to give you our best thinking on these topics over the coming weeks and how they might be important to you.

Sound too good to be true? Read the material above again. And, then, we welcome getting your call.

This post is part of an occasional AI3 series on ontology best practices.

[1] As used in knowledge representation or information science, ‘ontology’ is most often defined using Tom Gruber’s “explicit specification of a conceptualization.” See Thomas R. Gruber, 1993. “A Translation Approach to Portable Ontology Specifications,” in Knowledge Acquisition 5(2): 199-220; see http://tomgruber.org/writing/ontolingua-kaj-1993.pdf.
Posted:June 7, 2009

Olde Tyme Fax

Once a Pioneering ‘Telecommuter’, I Have Now Worked from Home for Two Decades

We had a party this week to celebrate my daughter and her boyfriend’s career move to Seattle. It was a great time with many reminiscences of our life here in Iowa City over the past decade. Then it struck me: almost to a day I had now been working from a home office for 20 years!

Wow. I had not really been paying attention. That realization in turn brought back its own memories, and caused me to reflect back on my two decades of working from home.

Why and the Enabler

I left my position as director of energy research at the American Public Power Association in early June 1989. We were just a month away from my son’s birth and had decided we did not want to raise our children in Washington, DC. The District at that time was totally dysfunctional and had earned the moniker of “Murder Capital of America.” While we loved our home in Barnaby Woods (Chevy Chase) DC and our neighbors, we wanted a smaller and safer community with more connectedness.

My wife, at that time a post doc at the National Institutes of Health in Bethesda, also was committed to her profession and career. The Washington area was unlikely to be an immediate prospect for her to find a permanent position. Indeed, our generation was just coming to grips with the new challenges of two professional families: There needs to be career choices and flexibility between the partners. Some professions, like lawyering or doctoring or sales or computer programming, have much locational flexibility. Others, such as bench scientists in biology, as for my wife, less so. In academia, position openings occur at their own place and time.

I had also been climbing my way in a corporate and office environment for more than a dozen years and was ready for my own career change. As a professional, I had never been my own boss and wanted to see how I could fare in the consulting and entrepreneurial worlds. By doing so, I could also bring flexibility to my wife’s locational options for her career.

At that time, without a doubt, the enabling technology for my new career shift was the fax machine. The ability to interact with clients with documents in more-or-less real time was pivotal. My first fax machine was a Sanyo thermal model (can’t recall the model now; it is long gone and they have since gotten out of that business). I recall buying cartoons of thermal fax rolls frequently and the copies that faded in the file cabinet drawers.

Of course, the phone and plane were also pivotal. In the early years my monthly phone bills were astronomical and I flew around 100,000 miles per year. But, it was the fax that really enabled me to cut the locational knot. But how strange: from thousands upon thousands of faxed pages in the early years to only a few per year today! The Web, of course, has really proven to be the true enabler over the past decade.Mike's Home Office

Early Lessons

Prior to shifting to a home office it took me about 45 min to commute or bike to APPA. If taking public transportation, I had to walk to the local bus stop, transfer at the Metro subway station, and then walk the remaining distance. While this gave me time to read most of the the morning Washington Post, I knew that by eliminating this commute I could save 90 min a day for new productivity.

What first surprised me, though, was the fact that I also no longer needed to keep my office computer and home computer synchronized. Since, like most ambitious professionals, I also worked some in the evenings, I had overlooked the time it took to keep files and documents synchronized. I was saving about another 30 min per day in digital transfers between home and office. With this new choice to work from home, I was saving 2 hrs per day!

I only had a home office in DC for a short period before we moved to Montana. Since Montana, I have had only two further offices (homes).

My early experience in DC suggested I wanted a more dedicated office space, so we did so for the home we designed in Montana. I was also able to design reserved office space again for here in Iowa City. (The DC home office and the interim one in South Dakota prior to Iowa were converted bedrooms, definitely not recommended!)

Planning office space in advance means you can tailor the space to your work habits. For me, I want lots of natural light, a view from the windows, and lots of desk and whiteboard space. I also needed room for office equipment (copiers in the early years, fax, printers and the like) and file cabinets. When in Montana, I designed up and had built my own office furniture suite that makes me feel I’m commanding the bridge of the Starship Enterprise (see pictures of my current office).

Teaching myself and the kids that office time and office space were fairly sacrosanct was important, too. Sure, it was helpful to be around for the kids for boo-boos and emergencies and dedicated kid’s time, and to be able to be there for home repairs and the like, but for the most part I tried to treat my office as a separate space and to have the kids do so, too. Frankly, since my family has grown up with no other experience than Dad working from home, it has always felt natural and been a matter of course.

A real key is to be able to shut the office door and return to normal home life in the rest of the house. And, of course, the need for the opposite is also true. It is probably the case that I spend more time in my home office than most regular professionals do in their organization’s office, but my career has always been my passion anyway and not what I consider to be a job.

In the early years I was fairly unusual, I think, for working from home. I certainly gave many local talks and was frequently invited by service organizations to speak over lunch on the experience of “telecommuting”. Today, working from home is no longer unusual and the Internet technology and support to do so makes it a breeze.Mike's Home Office

I have been able to run both consulting and software development companies from my home office over the years. I have seen the gamut of meetings ranging from with developers before massive whiteboards in my home basement to running and coordinating 20-person companies in their own office space with investors and Boards.

With a willingness to travel, it seems like all organizational possibilities are now open to the home worker. For quality of life and other reasons, the fact that today many larger knowledge organizations offer remote office centers and commuting flexibility speaks volumes to how far “telecommuting” has really come.

How much difference two decades can make!

Today’s Thoughts

I personally could never return to a standard office setting. For me, the home office with its flexibility and productivity and ability to find contemplative time simply can not be beat.

I really welcome what is happening in online meeting software and other Web apps that are reducing the need for travel and face-to-face meetings. For while the technology and culture has improved markedly to support working from home over the past two decades, the pain and hassle of travel has only worsened.

I have transitioned from a million-miler frequent flyer to a rooted house plant. I try to chose my travel venues carefully and when I do travel I try to do so for longer periods to absorb the shocks. It is perhaps a too frequent refrain, but it is just a damn shame how getting a meal, being treated with pleasure and courtesy, having some legroom, and getting a drink are air travel amenities of a now bygone era.

There are now many, many more of us (you) who work from home and it really is no longer a topic of conversation. A quick search tells me perhaps 5 million or more US workers predominantly work from home, with some 15% of all workers doing so on occasion. In the predominant professional and business services, financial activities, and education and health services, this percentage can reach as high as 30% of workers now doing paid work from home to one degree or another.

Of course, personality, job requirements, and physical space may not make working from home a good choice for you. But if you have not tried it and it sounds interesting, by all means: Try it!

For twenty years, it has been a great choice for me and for my family. This is indeed a nice 20th anniversary to celebrate!

Posted by AI3's author, Mike Bergman Posted on June 7, 2009 at 5:00 pm in Adaptive Innovation, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/491/celebrating-20-years-of-working-from-home/
The URI to trackback this post is: https://www.mkbergman.com/491/celebrating-20-years-of-working-from-home/trackback/
Posted:May 29, 2009

Zotero Bibliographic Plug-in

Major Report Signals the Emergence of Linked Data into the Enterprise

PricewaterhouseCoopers (PWC) has just published a major 58-pp report on linked data in the enterprise. The report features insightful interviews with many industry practitioners as well as PWC’s own in-depth and thorough research. I think this report is a most significant event: it represents the first mainstream recognition of the potential importance of linked data and semantic Web technologies to the business of data interoperability within the enterprise.

This entire issue is uniformly excellent and well-timed. PWC has done a superb job of assembling the right topics and players. The report has three feature articles interspersed with four in-depth interviews. The target audience is the enterprise CIO with much useful explanation and background. Applications discussed range from standard business intelligence to energy and medicine.

The emphasis on the linked data aspect is a strong one. PWC puts twin emphases on ontologies and the enterprise perspective (naturally). This is a refreshing new perspective for the linked data community, which at times could be accused a bit of being myopic with regard to: 1) open data only; 2) instance records (RDF and no OWL, with little discussion of domain or concept ontologies); and 3) sometimes a disdain for the business perspective (as opposed to the academic).

PWC has done a great job of getting beyond some of the community's own prejudices in order to couch this in CIO and enterprise terms. This signals to me the transition from the lab to the marketplace, with all of its consequent challenges and advantages.

In short: Bravo! This is a very good piece and will, I think, put PWC ahead of the curve for some time to come.

I was very pleased to have the opportunity to review earlier drafts of this major report. After reading a couple of my recent papers on Shaky Semantics and the Advantages and Myths of RDF, with the latter cited in the piece, I had a chance to have a fruitful dialog with one of the report’s editors, Alan Morrison, who is a manager in PWC’s Center for Technology and Innovation (CTI). He kindly solicited my comments and incorporated some suggestions.

The report also lists 14 various semantic technology vendors and service providers. I’m pleased to note that PWC included our small Structured Dynamics firm as part of its listing. Other vendors listed include Cambridge Semantics, Collibra, Metatomix, Microsoft, OpenLink Software, Oracle, Semantic Discovery Systems, Talis, Thomson Reuters, TopQuadrant and Zepheira, with the selected service providers of Radar Networks and AdaptiveBlue.

This report is easy — but important — reading. I personally enjoyed the insights of Frank Chum of Chevron, a new name for me. I encourage all in the field to read and study the entire report closely. I think this report will be an important milestone for the semantic Web in the enterprise for quite some time to come.

After a brief sign-up, the 58-pp report is available for free download.

Posted by AI3's author, Mike Bergman Posted on May 29, 2009 at 10:04 am in Linked Data, Ontologies, Semantic Web, Structured Dynamics | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/490/pwc-dedicates-quarterly-technology-forecast-to-linked-data/
The URI to trackback this post is: https://www.mkbergman.com/490/pwc-dedicates-quarterly-technology-forecast-to-linked-data/trackback/
Posted:May 17, 2009

Structured Dynamics LLCOntology Best Practices for Data-driven Applications: Part 2

It is perhaps not surprising that the first substantive post in this occasional series on ontology best practices for data-driven applications begins with the importance of keeping an ABox and TBox split. Structured Dynamics has been beating the tomtom for quite a while on this topic. We reiterate and expand on this position in this post.

The Relation to Description Logics

Description logics (DL) are one of the key underpinnings to the semantic Web. DL are a logic semantics for knowledge representation (KR) systems based on first-order predicate logic (FOL). They are a kind of logical metalanguage that can help describe and determine (with various logic tests) the consistency, decidability and inferencing power of a given KR language. The semantic Web ontology languages, OWL Lite and OWL DL (which stands for description logics), are based on DL and were themselves outgrowths of earlier DL languages.

Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. It is this construct for which Structure Dynamics generally reserves the term “ontology”.

The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (or individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts. Both the TBox and ABox are consistent with set-theoretic principles.

Natural and Logical Work Splits

TBox and ABox logic operations differ and their purposes differ. TBox operations are based more on inferencing and tracing or verifying class memberships in the hierarchy (that is, the structural placement or relation of objects in the structure). ABox operations are more rule-based and govern fact checking, instance checking, consistency checking, and the like. ABox reasoning is generally more complex and at a larger scale than that for the TBox.

Early semantic Web systems tended to be very diligent about maintaining these ‘box’ distinctions of purpose, logic and treatment. One might argue, as Structured Dynamics does, that the usefulness and basis for these splits has been lost somewhat more recently.

Particularly as we now see linked data become more prevalent, these same questions of scale and actual interoperability are posing real pragmatic challenges. To help aid this thinking, we have re-assembled, re-articulated and in some cases added to earlier discussions of the purposes of the TBox and ABox:

TBox TBox < — > ABox ABox
  • Definitions of the concepts and properties (relationships) of the controlled vocabulary
  • Declarations of concept axioms or roles
  • Inferencing of relationships, be they transitive, symmetric, functional or inverse to another property
  • Equivalence testing as to whether two classes or properties are equivalent to one another
  • Subsumption, which is checking whether one concept is more general than another
  • Satisfiability, which is the problem of checking whether a concept has been defined (is not an empty concept)
  • Classification, which places a new concept in the proper place in a taxonomic hierarchy of concepts
  • Logical implication, which is whether a generic relationship is a logical consequence of the declarations in the TBox
  • Infer property assertions implicit through the transitive property
  • Entailments, which are whether other propositions are implied by the stated condition
  • Instance checking, which verifies whether a given individual is an instance of (belongs to) a specified concept
  • Knowledge base consistency, which is to verify whether all concepts admit at least one individual
  • Realization, which is to find the most specific concept for an individual object
  • Retrieval, which is to find the individuals that are instances of a given concept
  • Identity relations, which is to determine the equivalence or relatedness of instances in different datasets
  • Disambiguation, which is resolving references to the proper instance
  • Membership assertions, either as concepts or as roles
  • Attributes assertions
  • Linkages assertions that capture the above but also assert the external sources for these assignments
  • Consistency checking of instances
  • Satisfiability checks, which are that the conditions of instance membership are met

As the table shows, the TBox is where the reasoning work occurs, the ABox is where assertions and data integrity occurs, and knowledge base work in the middle (among other aspects) requires both. We can reflect these work splits via the following diagram:

TBox- and ABox-level work

This figure maps the work activities noted in the table, with particular emphasis on the possible and specialized work activities at the interstices between the TBox and ABox.

The Split Should Feel Natural

Whether a single database or the federation across many, we have data records (structs of instances) and a logical schema (ontology of concepts and relationships) by which we try to relate this information. This is a natural and meaningful split: structure and relationships v. the instances that populate that structure.

Stated this way, particularly for anyone with a relational database background, the split between schema and data is clear and obvious. While the relational data community has not always maintained this split, and the RDF, semantic Web and linked data communities have not often done so as well, this split makes eminent sense as a way to maintain a desirable separation of concerns.

The importance of description logics — besides its role as a logical underpinning to the semantic Web enterprise — is its ability to provide a perspective and framework for making these natural splits. Moreover, with some updated thinking, we can also establish a natural framework for guiding architecture and design. It is quite OK to also look to the interaction and triangulation of the ABox and TBox, as well as to specialized work that is not constrained to either.

For example, identity evaluation and disambiguation really come down to the questions of whether we are talking about the same or different things across multiple data sources. By analyzing these questions as separate components, we also gain the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available. A low-fidelity service, for example, could be applied for quick or free uses, with more rigorous methods reserved for paid or batch mode analysis. Similarly, maintaining full-text search as a separate component means the work can be done by optimized search engines with built-in faceting (such as the excellent open-source Solr application).

These distinctions feel obvious and natural. They arise from a sound grounding in the split of the ABox and the TBox.

The Re-cap of Key Reasons to Maintain the TBox – ABox Split

So, to conclude this part in this occasional series, here are some of the key reasons to maintain a relative split between instances (the ABox) and the conceptual relationships that describe a world view for interpreting them (the TBox):

  • We are able to handle instance data simply. The nature of instance “things” is comparatively constant and can be captured with easily understandable attribute-value pairs
  • We can re-use these instance records in varied and multiple world views (the TBox). World views can be refined or approached from different perspectives without affecting instance data in the slightest
  • We can approach data architectural decisions from the standpoints of the work to be done, leaving open special analysis or tasks like disambiguation or full-text search
  • Ontologies (as defined by SD and focused on the TBox) are kept simpler and easier to understand. Inter-dataset relationships are asserted and testable in largely separate constructs, rather than admixed throughout
  • Relatedly, we are thus able to use ontologies to focus on the issues of mappings and conceptual relationships
  • Instance records can often be kept in situ, especially useful when incorporating the massive amounts of data in existing relational databases
  • Instance evaluations can be done separately from conceptual evaluations, which can help through triangulation in such tasks as disambiguation or entity identification
  • It is easier to convert simple data structs to the instance record structure, aiding interoperability (a subject for a later part in this series)
  • We provide a framework that is amenable to swapping in and out different analysis methods, and
  • It is easier for broader input when the task is adding and refining attributes rather than internally consistent conceptual relationships.

Here is a final best practice suggestion when these ABox and TBox splits are maintained: Make sure as curators that new attributes added at the instance level are also added with their conceptual relationships at the TBox level. In this way, the knowledge base can be kept integral while we simultaneously foster a framework that eases the broadest scope of contributions.

This post is part of an occasional AI3 series on ontology best practices.