Posted:November 7, 2011

UMBEL Services, Part 2: Full-text, Faceted Search

OSF Integration with Solr Provides Superior Search

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part focuses on the search function within the UMBEL portal based on the native engines used by the open semantic framework (OSF).

Search uses the integration of RDF and inferencing with full-text, faceted search using Solr. This has been Structured Dynamics’ standard search function for some time, as Fred Giasson initially described in April 2009. It is a very effective way for finding new and related concepts within the UMBEL structure.

Solr, as the Web service-enabled option for its parent Lucene, has most recently become a not uncommon adjunct to semantic technologies, for the very same reasons as outlined herein. However, in 2008, when we first embraced the option, it was not common at all. To my knowledge, within the semantic technology community, only the SWSE (semantic Web search engine) project was using Lucene when we began to adopt it.

The reasons for embracing Solr (or Lucene) are these:

Full-text search with a flexible search syntax
Ability to add facets (which is very powerful when combined with the structure of RDF)
High performance
Extensions for locational and time searches and many additional options, and
Open source.

Prior to the adoption of add-ons like Solr, RDF-only search suffered from a number of limitations, especially in the lack of a searchable correspondence of labels in relation to the object URIs used in the RDF model (see some of the limitations of standard RDF search).

Because of its advantages, Solr became the first additional main engine underneath our structWSF Web services framework, complementing the central RDF triple store (Virtuosoin most cases). We have subsequently added other main engines, as well, with a current total of four, which other parts in this UMBEL series will later discuss:

Being a main engine underneath structWSF means that datasets are fully indexed and cross-correlated with the capabilities of the other engines at time of ingest. Ingest most commonly occurs when datasets are ingested by the standard import tool; but, it might also be part of the system’s large dataset import scripts or synchronization routines.

The Search Function and Syntax

The standard UMBEL search box is found at the upper left of most site pages. When searching, you may choose these operators or syntax to add to your keywords, for example:

park OR city — provides the most results
park AND city — both terms must be present; fewer results
park city (no quotes) — both terms must be present, and within 5 words of one another; still fewer results, or
“park city” — exact phrase in quotes, with the fewest results.

(At present, Booleean operators apply to full-content search, and not filtered search.)

Upon searching, using the default of searching title, alternative labels (synonyms) and description (“TAD”), the standard search results page is displayed:

This page provides the further filtering options of searching by only title, or all content (including the linkings for each concept to its super classes and sub classes, which may produce a quite inclusive results set). These filter options are helpful in being able to sift through the 28,000 concepts within UMBEL.

The results listing provides the UMBEL concept names, their description, their alternative labels and a link [] to view them in the relation browser (to be discussed in more detail in the next part of this series). A simple pagination utility enables the results to be paged.

structWSF Basis and Options

This UMBEL search uses the structWSF Search Web service. It is what ties into the Solr engine to perform the full text searches on the structured data indexed on a structWSF instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters.

Not all of these query syntax or filtering options are active on the UMBEL instance given the simple concept structure of the UMBEL ontology. Turning these options on or off is a relatively straightforward matter of altering some configuration files and ensuring the right parameters are included in the queries issued by the application to the structWSF search endpoint.

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

A. Optional Available Operators

Optionally, the structWSF Search function may be configured to support these operators and conventions. All operators, by the way, must be entered as ALL CAPS:

AND, which is the default operator if more than one key word is entered
OR, which needs to be specifically entered
NOT
Phrases, which are denoted by double quotes as this “search phrase”; single quotes are not accepted
Wildcard searches on single characters (?) and multiple characters (*), which can be placed anywhere except the beginning of the query term
Field searches, whereby the field name is used in the query followed by a colon
Nesting, which allows complicated Boolean expressions to be formed (so long as parentheses are balanced), and many more exotic options.

See further the Lucene search engine syntax specification.

B. Optional Available Filters

Each search query can be applied to all, or a subset of, datasets accessible by the requester. Each Search query can be filtered by these different filtering criteria:

Type of the record(s) being requested
Source dataset(s) for the the record
Presence of an attribute describing the record(s)
A specific value, for a specific attribute describing the record(s)
A distance from a lat/long coordinate (for geo-enabled structWSF instance)
A range of lat/long coordinates (for geo-enabled structWSF instance)

These filtering options allow subset searches to occur, as the example above for title and TAD in UMBEL shows. However, these filters can also be combined into more complete and structured selection options as well. For example, this same search utility applied to Structured Dynamics’ Citizen Dan local government sandbox shows how these additional filters may be applied:

Clicking on a given “kind” name causes the results display to be restricted to results only for that kind of class.
If so selected, the Filter by Dataset tab is also restricted to the datasets that contain results with that class.
Once selected, the filter remains in place. To remove it, click on the Remove filter icon [] to restore the “kinds” back to the original listing for this search.

See the example. Such filtering capabilities present all of the “kinds” (actually, classes that have similar members) that are contained within the structure of the individual results that comprise the search results. The number of records (results) returned for each class may also be shown in parentheses.

Single Result (Concept) View

Clicking on an individual instance result in the UMBEL search results view (see above) provides the single result View for that specific UMBEL concept:

This view now provides a detailed description of the UMBEL concept and its structure and relationships. I briefly describe each item denoted by a checkmark.

The concept title and link to the relation browser [] are provided, followed by the actual concept URI identifier. Then the listing shows the alternative labels (synonyms, jargon and acronyms) provided for that concept followed by its (often detailed) description.

The structured information for that concept appears below that material. First shown is the UMBEL SuperType [2] to which the concept belongs, and then its external (non-UMBEL ontology) and internal (UMBEL) super classes and subclasses. There is also the facility to retrieve named individuals (instances) for that concept (see next).

Named Individual Listing

Choosing the ‘Get Entities from Sources’ button may provide example instances for that concept, as is shown below for the ‘Artist’ concept:

This linkage is relatively new for UMBEL (see the version 1.00 release write-up) and is still being expanded. At present, these linkages are limited to only a subset of UMBEL concepts and only linkages to Wikipedia. This aspect of the system is under active development, with more sources and more linked concepts to be released in the future.

This is the second of a multi-part series on the newly updated UMBEL services. Other articles in this series are:

[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.

[2] SuperTypes are a collection of (mostly) similar Reference Concepts. Most of the SuperType classes have been designed to be (mostly) disjoint from the other SuperType classes. SuperTypes thus provide a higher-level of clustering and organization of Reference Concepts for use in user interfaces and for reasoning purposes. Every Reference Concept in UMBEL is assigned to a SuperType; a few are assigned to more than one where disjointedness conditions are not absolute. Each of the 32 UMBEL SuperTypes has a matching predicate for relating to external topics. See further the discusison of SuperTypes in the UMBEL specification.

Posted:November 3, 2011

And, Now, We Pause for a Brief Commercial Message . . .

Mixing Business and Pleasure

I never talk politics here, and rarely speak of sports or family or personal matters. But I’m making an exception today.

Since we lived in Montana a couple of decades back, skiing and the mountains have been a central theme in my family. Both of my kids learned to ski at Lost Trail before they even turned three. Today, both are impressive skiers. (I’m a different story, but that is immaterial. 😉 )

We have skied many places across the Western US, all enjoyable and all remarkable. But, our favorite amongst them has been Winter Park, CO (more specifically, Mary Jane — no Jane, no pain). We have been going there for nearly a decade. The slopes and the beauty are, of course, arguments in themselves. But also what makes Winter Park special is that it offers the best deal on earth for lift tickets (with an annual pass) and has a local clientele that is laid back and into substance and not flash.

As our kids have grown and taken on lives of their own, we have come to treasure those chances when all of us can be together. Sking — but summer activities, too — are great ways to make that happen.

So, it is with immeasurable pleasure that we closed the sale today on a new second home in Winter Park. It is absolutely perfect for all things outdoors. And, since we still have regular lives and work, we will be offering our new place for rental for those many weeks we can not enjoy it ourselves. If mountains and beauty and nature are in your calling, let us know. We have a fantastic place to rent to you in one of the most spectacular places on earth.

Posted:October 24, 2011

UMBEL Services, Part 1: Overview

New Portal Update Leverages the Open Semantic Framework

UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. When we first released UMBEL in mid-2008 it was accompanied by a number of Web services and a SPARQL endpoint, and general APIs. In fact, these were the first Web services developed for release by Structured Dynamics. They were the prototypes for what later became the structWSF Web services framework, which incorporated many lessons learned and better practices.

By the time that the structWSF framework had evolved with many additions to comprise the Open Semantic Framework (OSF), those original UMBEL Web services had become quite dated. Thus, upon the last major update to UMBEL to version 1.0 back in February of this year, we removed these dated services.

Like what I earlier mentioned about the cobbler’s children being the last to get new shoes, it has taken us a bit to upgrade the UMBEL services. However, I am pleased to announce we have now completed the transition of UMBEL’s earlier services to use the OSF framework, and specifically the structWSF platform-independent services. As a result, there are both upgraded existing services and some exciting new ones. We will now be using UMBEL as one of our showcases for these expanding OSF features. We will be elaborating upon these features throughout this series, some parts of which will appear on Fred Giasson’s blog.

In this first part, we provide a broad overview of the new UMBEL OSF implementation. We also begin to foretell some of the parts to come that will describe some of these features in more detail.

The Overall Portal

The new UMBEL portal is a fairly classic example of an OSF installation. The content management system hosting the system is Drupal, supplemented with a standard set of third-party modules and our own conStruct semantic technology modules. The theme is a stripped-down modification of the popular Pixture Reloaded theme:

Like other vocabulary sites, the UMBEL portal contains specifications and links to community resources and downloads. It also has some specialty links not shown on typical standards sites.

Much Better Vocabulary Access and Management

The site now most prominently features our structOntology editing and maintenance tool. Built on the OWL API, the same as Protégé 4, structOntology provides the advantage of enabling edits and management of ontologies directly within the applications in which they are used. This is far superior to needing to fire up an external ontology manager and then to re-import the changed ontology. structOntology also has an arguably simpler interface and operation than other ontology management alternatives:

For the UMBEL site, the standard view of using structOntology is read-only. In a subsequent part we will also discuss structOntology’s full editing and maintenance mode.

Improved Discovery and Navigation

Like all standard OSF installations, there are two superior means for discovery and navigation of the information space: search and the relation browser.

Search uses the integration of RDF and inferencing with full-text, faceted search using Solr. This has been Structured Dynamics’ standard search function for some time, as Fred initially described in April 2009. It is a very effective way for finding new and related concepts within the UMBEL structure.

The relation browser is what is used for casual navigation and discovery. Any concept found via search or other means within the system can have the browser invoked by clicking on its browser icon []. When done, the standard relation browser appears:

The relation browser is highly configurable, as shown by some of our exemplar installations. Note in this case that the More details … link brings you to a detailed concept view, such as this example:

These various tools provide great means for discovery and navigation within the 28,000 concepts in the UMBEL reference space.

Newly Released Web Services and SPARQL Endpoints

We are also now providing updated endpoints for Ontology: Read, Search, Crud: Read, SPARQL and Scones. These will be described with access and query examples in a later part.

Some Cool New Sandboxes

We will also be discussing our OBIE (ontology-based information extraction) and entity tagger, scones, and export and ontology edit and management functions in subsequent posts.

Looking Ahead to Remaining Parts

We anticipate eight or nine more parts in this series explaining most of these options in greater detail. We hope to post a couple per week or so over the coming month. We will conclude with a discussion of next pending UMBEL releases.

This is the first of a multi-part series on the newly updated UMBEL services. Other articles in this series so far are:

[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.

Posted:October 17, 2011

Fred’s Hair is on Fire

Today’s Post is a Testimony to the Value of Vacations

My partner, Fred Giasson, today posted the second part of his series on open source. Since returning from a well-earned vacation a few weeks back — after more than three years without a break — Fred has been writing and developing up a storm. As someone said to me last week, “Fred’s on fire!” I could not agree more.

I think Fred’s post speaks for itself as to why and how Structured Dynamics has made a conscious choice to embrace open source. The major reason he puts forth — to bootstrap the company without the need for external investment — is unusual in itself. But one thing he is silent about is why this is a compelling reason. I’ll comment on that.

Fred and I have both worked for others dependent on their capital for our ventures (a few more times in my case). Capital is great for expansion and operations, but it can be deadly when visions requiring patience are in play. Structured Dynamics is only now a bit more than halfway through its five-year plan. While semantics technologies are exciting with a world of upside potential, they have also been incubated in academic labs with (as yet) a general lack of practical deployment. The promise is there, but often the delivery and maturation have been lacking. We are committed to play a visible role in correcting that.

The approach Fred outlines was not perhaps easily available to new startups a decade ago. But now, with open source and the Internet, costs of entry and ongoing development have dropped markedly. Yet, surprisingly, the idea of financing a startup via revenues is still not talked about sufficiently — let alone often used as an actual basis for building a company.

I’ve been fortunate to be able to partner with a young, world-class technologist whose maturity exceeds that of individuals many years his senior. He understands that in order to achieve important visions that the stewardship of those ideas can not be left to venture capitalists committed solely or mostly to gaming terms or near-term returns. We’re placing our bets on the paying customer and our own judgment.

So, it is great to see Fred continue his phenomenal development productivity since he returned from Hawaii. The benefit of his vacation is that we are also now getting his insights on his blog again.

Posted:October 11, 2011

The Cobbler’s Shoes

The Need to Enforce Periodic Checkups on Web Properties

Face it, we all get busy and begin to overlook our own needs while we work for others on our jobs. The parable of the cobbler’s children going without shoes says it all. It means that the shoemaker spends so much time looking after his customers’ needs that he neglects the needs of his own children.

We see the same phenomena in relation to our own personal assets, home repairs and cleaning, and a myriad of chores and background requirements. One way we can overcome these neglects is by scheduling annual or periodic checkups or activities. Spring cleaning is one such effort, as is annual asset portfolio re-balancing or doctor’s appointments or 10,000 mile vehicle servicing.

One of the cobbler’s chores for Structured Dynamics is the periodic care and feeding of our various Web sites. This has actually proven to be a non-trivial exercise, as our properties have grown to exceed 1400 static Web pages across some 30 diverse Web addresses and properties. As our client and code base expands, this exercise is increasingly demanding.

Taking advantage of a small break in the action, we have just completed another one of these reviews and revisions. Interestingly, as I was going through the various sites, I saw that date stamps for prior revisions tended to all occur in the September and October time frame. Last September, for example, SD went through a major redesign and new logo. Apparently, without consciously realizing it, we have been doing our own Web attic cleaning in the Fall.

Thus, as a way to formalize this process for us internally, I thought I’d briefly outline the Web site changes that we have cobbled together for this year. I suspect we’ll be doing another spiffing come Fall 2012.

Rationalizing the Properties

It is kind of frightening to realize that we have allowed our Web properties to grow to about 30 individual sites. This accretion happens gradually: a new initiative or capability arises that seems to warrant its own Web site. Yet each site carries with it a need to develop and maintain, as well as to explain its role and use in the Structured Dynamics information space.

Exclusive of internal development sites or ones dedicated to specific customers, here is the roster of existing SD properties that we have needed to rationalize:

Structured Dynamics
OpenStructs.org
structWSF – moved to OpenStructs
conStructSCS – moved to OpenStructs
OpenStructs community site
Open Semantic Framework (OSF) Google community site
Earlier conStruct, irON and structWSF Google community sites – moved to OSF Google
community site
structWSF – Google Code SVN
conStruct – Drupal CVS
Semantic Components – Google Code SVN
irON Parsers – Google Code SVN
Citizen Dan sandbox
Vanilla OSF – client access only
UMBEL – undergoing major update
UMBEL wiki – had been private; now being transitioned into main UMBEL site
UMBEL Google community site
UMBEL Web services – old version being replaced with new functionality, being moved to main UMBEL site
UMBEL vocabulary site – replaced with new UMBEL functionality
Bibliographic Ontology (BIBO)
BIBO Google community site
MUNI Ontology
MUNI vocabulary site – to be replaced with new MUNI functionality
MUNI Ontology Google community site – to be rationalized shortly
Music Ontology – hosting only
TechWiki
DocWiki – locus of updates has shifted to TechWiki
Mike Bergman’s AI3 blog
Fred Giasson’s blog

Note that all properties with strike outs have now either been retired or consolidated with other properties. We have reduced the property count by 10, or by a third. Additional consolidations will be forthcoming.

Providing a Consistent Entry to the Various Properties

With the growth of our various Web properties and the diversity of the initiatives behind them, Fred and I have grown increasingly frustrated that our site visitors lacked a consistent way to access and understand these projects. Across all properties, Structured Dynamics has about 6,000 daily visitors or RSS tracking feeds.

Providing a consistent context of what these properties mean and their relation to one another is further compounded by the sheer size of our properties. Excluding dynamically generated pages (such as from search, demonstration of our semantic components, or use of the relation browser), we have on the order of 1400 static Web pages across all properties and blogs. Users may enter our information space via any of these entry points.

The answer to how to provide a consistent context on any Web page throughout our properties resides in the nifty JavaScript popup Fred recently described for his own blog. What we realized is that we could adapt this widget to provide a single overview of SD’s resources, and then add that widget to all of our properties such that it appears as a small tab at the bottom (sometimes side) of all property pages.

Then, when the tab SD Resource tab is clicked, the following popup appears:

So, whenever you are on one of our properties, look for the tab (generally) at the lower right corner of every Web page. That will take you to the common entry point across Structured Dynamics’ Web properties.

Updating the Properties

In this process we also went through some of our existing sites and made content, narrative and navigation changes consistent with this rationalization and consistent entry point. These updates were not nearly as extensive as the full re-designs from one year ago.

New Shoe Designs

With a constant stream of new initiatives and new understandings, it will remain a challenge for us to describe our various products and services. An even greater challenge will be to provide coherent descriptions of how all of these initiatives fit together consistent with our overall vision. One attempt at that is our new Overview page. Meanwhile, of course, we will occasionally be offering new Web goodies and sites as developments warrant. These will need to get integrated into this picture as well.

We think we have taken an itty-bitty step to improving this process with the SD Resources tab widget. Nonetheless, I’m sure that we will continue to craft new shoes to try to find ones that are still yet more comfortable and attractive. Thing is, we may have to wait another year before we get around to it again.

Main Links

Search

Author: Mike Bergman