Posted:November 7, 2011

UMBEL Vocabulary and Reference Concept OntologyOSF Integration with Solr Provides Superior Search

This continues our series on the new UMBEL portal. UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. This part focuses on the search function within the UMBEL portal based on the native engines used by the open semantic framework (OSF).

Search uses the integration of RDF and inferencing with full-text, faceted search using Solr. This has been Structured Dynamics’ standard search function for some time, as Fred Giasson initially described in April 2009. It is a very effective way for finding new and related concepts within the UMBEL structure.

Solr, as the Web service-enabled option for its parent Lucene, has most recently become a not uncommon adjunct to semantic technologies, for the very same reasons as outlined herein. However, in 2008, when we first embraced the option, it was not common at all. To my knowledge, within the semantic technology community, only the SWSE (semantic Web search engine) project was using Lucene when we began to adopt it.

The reasons for embracing Solr (or Lucene) are these:

  • Full-text search with a flexible search syntax
  • Ability to add facets (which is very powerful when combined with the structure of RDF)
  • High performance
  • Extensions for locational and time searches and many additional options, and
  • Open source.

Prior to the adoption of add-ons like Solr, RDF-only search suffered from a number of limitations, especially in the lack of a searchable correspondence of labels in relation to the object URIs used in the RDF model (see some of the limitations of standard RDF search).

Because of its advantages, Solr became the first additional main engine underneath our structWSF Web services framework, complementing the central RDF triple store (Virtuosoin most cases). We have subsequently added other main engines, as well, with a current total of four, which other parts in this UMBEL series will later discuss:

Being a main engine underneath structWSF means that datasets are fully indexed and cross-correlated with the capabilities of the other engines at time of ingest. Ingest most commonly occurs when datasets are ingested by the standard import tool; but, it might also be part of the system’s large dataset import scripts or synchronization routines.

The Search Function and Syntax

The standard UMBEL search box is found at the upper left of most site pages. When searching, you may choose these operators or syntax to add to your keywords, for example:

  • park OR city — provides the most results
  • park AND city — both terms must be present; fewer results
  • park city (no quotes) — both terms must be present, and within 5 words of one another; still fewer results, or
  • “park city” — exact phrase in quotes, with the fewest results.

(At present, Booleean operators apply to full-content search, and not filtered search.)

Upon searching, using the default of searching title, alternative labels (synonyms) and description (“TAD”), the standard search results page is displayed:

This page provides the further filtering options of searching by only title, or all content (including the linkings for each concept to its super classes and sub classes, which may produce a quite inclusive results set). These filter options are helpful in being able to sift through the 28,000 concepts within UMBEL.

The results listing provides the UMBEL concept names, their description, their alternative labels and a link [] to view them in the relation browser (to be discussed in more detail in the next part of this series). A simple pagination utility enables the results to be paged.

structWSF Basis and Options

This UMBEL search uses the structWSF Search Web service. It is what ties into the Solr engine to perform the full text searches on the structured data indexed on a structWSF instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters.

Not all of these query syntax or filtering options are active on the UMBEL instance given the simple concept structure of the UMBEL ontology. Turning these options on or off is a relatively straightforward matter of altering some configuration files and ensuring the right parameters are included in the queries issued by the application to the structWSF search endpoint.

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

A. Optional Available Operators

Optionally, the structWSF Search function may be configured to support these operators and conventions. All operators, by the way, must be entered as ALL CAPS:

  • AND, which is the default operator if more than one key word is entered
  • OR, which needs to be specifically entered
  • NOT
  • Phrases, which are denoted by double quotes as this “search phrase”; single quotes are not accepted
  • Wildcard searches on single characters (?) and multiple characters (*), which can be placed anywhere except the beginning of the query term
  • Field searches, whereby the field name is used in the query followed by a colon
  • Nesting, which allows complicated Boolean expressions to be formed (so long as parentheses are balanced), and many more exotic options.

See further the Lucene search engine syntax specification.

B. Optional Available Filters

Each search query can be applied to all, or a subset of, datasets accessible by the requester. Each Search query can be filtered by these different filtering criteria:

  1. Type of the record(s) being requested
  2. Source dataset(s) for the the record
  3. Presence of an attribute describing the record(s)
  4. A specific value, for a specific attribute describing the record(s)
  5. A distance from a lat/long coordinate (for geo-enabled structWSF instance)
  6. A range of lat/long coordinates (for geo-enabled structWSF instance)

These filtering options allow subset searches to occur, as the example above for title and TAD in UMBEL shows. However, these filters can also be combined into more complete and structured selection options as well. For example, this same search utility applied to Structured Dynamics’ Citizen Dan local government sandbox shows how these additional filters may be applied:

  • Clicking on a given “kind” name causes the results display to be restricted to results only for that kind of class.
  • If so selected, the Filter by Dataset tab is also restricted to the datasets that contain results with that class.
  • Once selected, the filter remains in place. To remove it, click on the Remove filter icon [] to restore the “kinds” back to the original listing for this search.

See the example. Such filtering capabilities present all of the “kinds” (actually, classes that have similar members) that are contained within the structure of the individual results that comprise the search results. The number of records (results) returned for each class may also be shown in parentheses.

Single Result (Concept) View

Clicking on an individual instance result in the UMBEL search results view (see above) provides the single result View for that specific UMBEL concept:

This view now provides a detailed description of the UMBEL concept and its structure and relationships. I briefly describe each item denoted by a checkmark.

The concept title and link to the relation browser [] are provided, followed by the actual concept URI identifier. Then the listing shows the alternative labels (synonyms, jargon and acronyms) provided for that concept followed by its (often detailed) description.

The structured information for that concept appears below that material. First shown is the UMBEL SuperType [2] to which the concept belongs, and then its external (non-UMBEL ontology) and internal (UMBEL) super classes and subclasses. There is also the facility to retrieve named individuals (instances) for that concept (see next).

Named Individual Listing

Choosing the ‘Get Entities from Sources’ button may provide example instances for that concept, as is shown below for the ‘Artist’ concept:

Retrieving Named Individuals

This linkage is relatively new for UMBEL (see the version 1.00 release write-up) and is still being expanded. At present, these linkages are limited to only a subset of UMBEL concepts and only linkages to Wikipedia. This aspect of the system is under active development, with more sources and more linked concepts to be released in the future.

UMBEL small logo

This is the second of a multi-part series on the newly updated UMBEL services. Other articles in this series are:


[1] See further the general Wikipedia description of UMBEL or its specification on the official UMBEL Web site.
[2] SuperTypes are a collection of (mostly) similar Reference Concepts. Most of the SuperType classes have been designed to be (mostly) disjoint from the other SuperType classes. SuperTypes thus provide a higher-level of clustering and organization of Reference Concepts for use in user interfaces and for reasoning purposes. Every Reference Concept in UMBEL is assigned to a SuperType; a few are assigned to more than one where disjointedness conditions are not absolute. Each of the 32 UMBEL SuperTypes has a matching predicate for relating to external topics. See further the discusison of SuperTypes in the UMBEL specification.

Posted by AI3's author, Mike Bergman Posted on November 7, 2011 at 8:28 am in Open Semantic Framework, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/986/umbel-services-part-2-full-text-faceted-search/
The URI to trackback this post is: http://www.mkbergman.com/986/umbel-services-part-2-full-text-faceted-search/trackback/