Posted:January 20, 2014

Open Semantic Framework New OSF Platform Leapfrogs Earlier Releases in Features and Capabilities

After nearly five years of concentrated development — including the past 20 months of quiet, background efforts — Structured Dynamics is proud to announce version 3.0 of its open-source Open Semantic Framework. OSF is a turnkey platform targeted to enterprises to bring interoperability to their information assets, achieved via a layered architecture of semantic technologies. OSF can integrate information from documents to Web pages and standard databases. Its broad functions range from information ingest and tagging to search and data management to publishing.

Until today, the version available for download was OSF version 1.x. While capable as an enterprise platform — indeed, it has been in use by a number of leading global enterprises since development first began — the capability of the platform was spotty and required consulting expertise to configure and set-up. SD was hired by Healthdirect Australia (HDA) nearly two years ago to enhance OSF’s capabilities and integrate it more closely with the Drupal open-source content management system, among other modern enterprise requirements. The OSF from those developments — the non-public version 2.0 specific to HDA — has now been generalized for broader public use with today’s public announcement of version 3.0.

A More Complete Enterprise Platform

HDA's healthinsite Portal

Not unlike many large organizations, HDA had specific enterprise requirements when it began its recent initiative. Included in these were stringent security, broad use of proven open-source applications, governance and workflow procedures, and strict content authoring and management guidelines. These requirements further needed to express themselves via a sequence of deployment and testing environments, all conducted by a multi-vendor support group following agile development practices.

These requirements placed a premium on performance, scalability and interoperability, all subject to repeatable release procedures and scripts. OSF’s initial development as a more-or-less standalone platform needed to accommodate an enterprise-wide management model involving many players, environments and applications. Prior decisions based on OSF alone now needed to consider and bridge modern enterprise development and deployment practices.

Tighter integration with Drupal was one of these requirements (see next section), but other OSF changes necessary to accommodate this environment included:

  • A new security layer — the initial OSF security model was based on IP authentication. Given the sensitivity of the health data managed by HDA, such a simplistic approach was unacceptable. The actual HDA deployment relied on a third-party security application. However, what was learned from that resulted in a key-based access and validation model in the OSF v 3.0 update
  • A new revisioning system — content authoring and governance required multiple checks in the workflow, and requirements to review prior edits and invoke possible rollbacks. The result was to add a completely new revisioning capability to OSF
  • Middleware integration and APIs — in a multi-vendor environment, OSF operates in part as a central repository for all system information, which third parties must more readily and easily be able to access. Thus, besides the security aspects, a much improved programmatic API and a generalized search API were added to the OSF platform
  • New, additional Web services — the requirements above meant that seven new OSF Web services were added to the system, bringing the total number of current Web services to 27
  • New caching layer — because of its Web-service design, information access and mediation occurs via a large number of endpoint queries, many of which are patterned and repeated. To improve overall performance, a new caching layer was added to OSF that significantly improved performance and reduced access burdens on the OSF engines
  • Workflow integration — improved workflow sequences and screens were required to capture workflow and goverance demands, and
  • Multilingual support — like most larger organizations, HDA has a diversity of native languages throughout its user base. Though OSF had initially been explicitly designed to support multilinguality, specific procedures and capabilities were put in place to more easily support multiple languages in OSF.

Tighter Integration with Drupal

When Fred Giasson and I first designed and architected the Open Semantic Framework in 2009, we made the conscious decision to loosely couple OSF with the initial user interface and content management system, Drupal. We did so thinking that perhaps other CMS frameworks would be cloned onto OSF over time.

Time has not proven this assumption correct. Client experience and HDA’s interests suggested the wisdom of a tighter coupling to Drupal. This shift arose because of the great flexibility of Drupal with its tens of thousands of add-on modules and its ecosystem of capable developers and designers. Our early decision to keep Drupal at arm’s length was making it more difficult to manage an OSF instance. Existing Drupal developers were not able to employ their Drupal expertise to manage OSF portals.

We pivoted on this error by tightening the coupling to Drupal, which involved a number of discrete activities:

  • Upgrade to Drupal 7 — earlier versions of OSF used Drupal 6. We migrated the code base to Drupal 7. That, plus the other Drupal changes noted below, resulted in re-writing about 80% of the OSF code base related to Drupal
  • Alternative Drupal data storage — Drupal’s own evolution in version 7 (and continuing with version 8) is to abstract its underlying information model around entities and fields, abstractions that are much better aligned with OSF’s RDF data model. As these entity and field changes were exposed in Drupal APIs, it was possible for us to write an entirely new information model underlying Drupal. Drupal administrators using OSF are now able to use OSF solely as the data model underneath Drupal (rather than the more standard MySQL)  or any mixed portions in between. The typical OSF for Drupal design now uses OSF for all content storage, with MySQL reserved for internal Drupal settings (à la MVC)
  • Drupal connectors — certain common or core Drupal modules, such as Fields, Entities, Search, and Views, are either common utilities for Drupal developers or are themselves core bases for third-party modules. Because of their centrality, SD developed a series of “connectors” that enable these modules to be used as is while transparently communicating with and writing to OSF. Thus, Drupal developers can use these familiar capabilities without needing particular OSF knowledge
  • Major updates to Drupal modules — because of the changes above, the existing OSF Drupal modules (called conStruct in the earlier versions) were updated to take advantage of the common terminology and tighter integration
  • Major updates to Drupal widgets — similarly, the standard OSF data and visualization widgets used with Drupal (called Semantic Components in the earlier versions) were also updated to work in this more tightly integrated environment.

Expanded Search Capabilities and Web Services

Some of the extended capabilities in OSF v 3.0 are noted above, including the expanded roster of Web services. However, the OSF Search Web service, which is by far the most used OSF endpoint, received massive improvements in this latest release.

First, OSF Search now uses a new query parser, which provides the capability to change the ranking of search results by boosting how specific query components get scored. Types, attributes, datasets or counts may be used to vary any given search result, including different occurrences on the same page. It is also now possible to add restrictions to the search queries, including restricting results to a specified set of attributes.

This flexibility is highly useful wherein certain structured pages contain blocks or sections with patterned search results. This structuring leads to the ability to create generic page templates, wherein search queries and results vary within the layout. An “events” block may score differently than, say, a “related topics” block, all of which in turn can respond to a given context (say, “cancer” versus “automobiles”) for a given page (and its template).

These repeated patterns lend themselves to the use of reusable “search profiles,” which are predefined queries that may include context variables. These profiles, in turn, can be named and placed on page layouts. Existing profiles may be recalled or invoked to become patterns for still further profiles. The flexibility of these search profiles is immense, and the parameters used in constructing them can be quite extensive.

Thus, OSF version 3.0 includes the new Query Builder module. Via an intuitive selection interface, users may construct search queries of any complexity, and then save and reuse them later as search profiles.

Lastly, registering, configuring and managing OSF instances and datasets into Drupal has never been easier. The new OSF Configure module centralizes all the features and options required for these purposes, which are then managed by a new suite of tools (see next).

Automated Installation and Management Tools

Standard enterprise deployments that proceed from development to production require constant updates and versions, both in application code and content. Keeping track and managing these changes — let alone deploying them quickly and without error — requires separate management capabilities in their own right. The new OSF thus has a number of utilities and command-line tools to aid these requirements:

  • OSF Installer — this tool installs and configures all the pieces required by the OSF stack, then runs the OSF Tests Suites to make sure that all functionality is fully operational on the new server
  • OSF Tests Suites — composed of 746 tests and 4139 assertions, these tests may be run every time an OSF instance is deployed or code is changed. The tests measure all of the input parameters of each endpoint, combinations thereof, mime types, and expected errors returned by each endpoint
  • OSF Ontologies Management Tool — (OMT) is used to manage ontologies, list ontologies, create/import new ones, delete existing ones, or to generate underlying ontological structures
  • OSF Datasets Management Tool — (DMT) is used to manage datasets of a OSF instance, enabling the user to create, delete, update, import and export datasets directly from the command line
  • OSF Permissions Management Tool — (PMT) is used to manage, list, create or delete access permissions groups and users
  • OSF Data Validator Tool — (DVT) is used to perform a series of post-indexation data validation tests and return validation errors if any are found.

Tempered via Enterprise Development and Deployments

The methods and processes by which these advances have been made all occurred within the context of state-of-the-art enterprise IT management. Experience with supporting infrastructure tools (such as Jira, Confluence, Puppet, etc.) and agile development methods are part of the ongoing documentation of OSF (see next). This experience also bolsters Structured Dynamics’ ability to work with other third-party applications at the middleware layer or in support of enterprise deployments.

Comprehensive and Completed Updated Documentation

The Open Semantic Framework has evolved considerably since its conception now five years ago. In its early development, components and pieces were sometimes developed in isolation and then brought into the framework. This jagged development path led to a cacophony of names and terms to characterize portions of the OSF stack. This terminology confusion has made it more difficult than it needed to be to understand the vision of OSF, the layers of its architecture, or the interactions between its components and parts.

In making the substantial efforts to update documentation from OSF version 1.x to the current version 3.0, terminology was made consistent and code references were cleaned up to reflect the simpler OSF branding. This clean up has led to necessary updates across multiple Web sites maintained by Structured Dynamics with some relationship to OSF.

The Web site with the most changes required has been the OSF Wiki. In its prior incarnation, called TechWiki, there were nearly 400 technical articles on OSF. That site has now been completely rewritten and re-organized. Nearly two hundred new articles have been written in support of OSF v 3.0. Terminology related to the older cacophony (see correspondance table here) has (hopefully) been updated and corrected. Most architectural and technical diagrams have been updated. Additional documentation is being posted daily, catching up with the experience of the past twenty months.

Moving Beyond the Established Foundation

Open Semantic Framework

SD is pleased that enterprise sponsors want to continue beyond the Open Semantic Framework’s present solid foundations. While we are not at liberty to discuss specific client initiatives, a number of ongoing developments can be described broadly. First, in terms of the key engines that provide the core of OSF’s data management capabilities, initiatives are underway in the areas of visualization, business analytics and workflow orchestration and management. There are also efforts underway in more automated means for direct ingest of quality Web-based information, both based on linked data and from Web APIs. We are also pleased that efforts to further extend OSF’s tight integration with Drupal are also of interest, even while the integration efforts of the past months have not yet been fully exploited.

To Learn More

To learn more, make sure and check out the re-organized OSF wiki. See specifically the complete OSF overview, the list of all the OSF 3.0 features, and the list of all the new features to OSF 3.0. Also, for a complete soup-to-nuts view of what it takes to put up a new OSF installation, see the Users Guide. Lastly, for a broad overview of OSF, see its reference architecture and the overviews on its dedicated OSF Web site.

As a final note, Structured Dynamics would like to thank its corporate sponsors of the past five years for providing the development funds for OSF, and for agreeing with the open source purposes of the Open Semantic Framework.

Posted:February 27, 2012

Open Semantic FrameworkOntology-driven Application Meshes Structured Data with Public APIs

Locational information — points of interest/POIs, paths/routes/polylines, or polygons/regions — is common to many physical things in our real world. Because of its pervasiveness, it is important to have flexible and powerful display widgets that can respond to geo-locational data. We have been working for some time to extend our family of semantic components [1] within the open semantic framework (OSF) [2] to encompass just such capabilities. Structured Dynamics is thus pleased to announce that we have now added the sWebMap component, which marries the entire suite of Google Map API capabilities to the structured data management arising from the structWSF Web services framework [3] at the core of OSF.

The sWebMap component is fully in keeping with our design premise of ontology-driven applications, or ODapps [4]. The sWebMap component can itself be embedded in flexible layouts — using Drupal in our examples below — and can be very flexibly themed and configured. sWebMap we believe will rapidly move to the head of the class as the newest member of Structured Dynamics’ open source semantic components.

The absolutely cool thing about sWebMap is it just works. All one needs to do is relate it to a geo-enabled Search structWSF endpoint, and then all of the structured data with geo-locational attributes and its facets and structure becomes automagically available to the mapping widget. From there you can flexible map, display, configure, filter, select and keep those selections persistent and share with others. As new structured data is added to your system, that data too becomes automatically available.

Key Further Links

Though screen shots in the operation of this component are provided below, here are some further links to learn more:

sWebMap Overview

There is considerable functionality in the sWebMap widget, not all immediately obvious when you first view it.

NOTE: a wide variety of configuration options — icons and colors — matched with the specific data and base tiling maps appropriate to a given installation may produce maps of significantly different aspect from the screenshots presented below. Click on any screenshot to get a full-size view.

Here is an example for sWebMap when it first comes up, using an example for the “Beaumont neighborhood”:

It is possible to set pre-selected items for any map display. That was done in this case, which shows the pre-selected items and region highlighted on the map and in the records listing (lower left below map).

The basic layout of the map has its main search options at the top, followed by the map itself and then two panels underneath:

The left-hand panel underneath the map presents the results listing. The right-hand panel presents the various filter options by which these results are generated. The filter options consist of:

  • Sources – the datasets available to the instance
  • Kinds – the kinds or types of data (owl:Classes or rdf:types) contained within those datasets, and
  • Attributes – the specific attributes and their values for those kinds or sources.

As selections are made in sources or kinds, the subsequent choices narrow.

The layout below shows the key controls available on the sWebMap:

You can go directly to an affiliated page by clicking the upper right icon. This area often shows a help button or other guide. The search box below that enables you to search for any available data in the system. If there is information that can be mapped AND which occurs within the viewport of the current map size, those results will appear as one of three geographic feature types on the map:

  • Markers, which can be configured with differing icons for specific types or kinds of data
  • Polylines, such as highways or bus routes, or
  • Polygons, which enclose specific regions on the map through a series of drawn points in a closed area.

At the map’s right is the standard map control that allows you to scroll the map area or zoom. Like regular Google maps, you can zoom (+ or – keys, or middle wheel on mouse) or navigate (arrow direction keys, or left mouse down and move) the map.

Current records are shown below the map. Specific records may be selected with its checkbox; this keeps them persistent on the map and in the record listing no matter what the active filter conditions may be. (You may also see a little drawing icon [Update record], which presents an attribute report — similar to a Wikipedia ‘infobox‘ — for the current record). You can see in this case that the selected record also corresponds to a region (polygon) shape on the map.

sWebMap Views, Layers and Layouts

In the map area itself, it is possible to also get different map views by selecting one of the upper right choices. In this case, we can see a satellite view (or “layer”):

Or, we can choose to see a terrain layer:

Or there may optionally be other layers or views available in this same section.

Another option that appears on the map is the ability to get a street view of the map. That is done by grabbing the person icon at the map left and dragging it to where you are interested within the map viewport. That also causes the street portion to be highlighted, with street view photos displayed (if they exist for that location):

By clicking the person icon again, you then shift into walking view:

Via the mouse, you can now navigate up and down these streets and change perspective to get a visual feel for the area.

Multi-map View

Another option you may invoke is the multi-map view of the sWebMap. In this case, the map viewing area expands to include three sub-maps under the main map area. Each sub-map is color-coded and shown as a rectangle on the main map. (This particular example is displaying assessment parcels for the sample instance.) These rectangles can be moved on the main map, in which case their sub-map displays also move:

You must re-size using the sub-map (which then causes the rectangle size to change on the main map). You may also pan the sub-maps (which then causes the rectangle to move on the main map). The results list at the lower left is determined by which of the three sub-maps is selected (as indicated by the heavier bottom border).

Searching and Filter Selections

There are two ways to get filter selection details for your current map: Show All Records or Search.

NOTE: for all data and attributes as described below, only what is visible on the current map view is shown under counts or records. Counts and records change as you move the map around.

In the first case, we pick the Show All Records option at the bottom of the map view, which then brings up the detailed filter selections in the lower-right panel:

Here are some tips for using the left-hand records listing:

  • If there are more than 10 records, pagination appears at the bottom of the listing
  • Each record is denoted by an icon for the kind of thing it is (bus stops v schools v golf courses, for example)
  • If we mouse over a given record in the listing, its marker icon on the map bounces to show where it resides
  • To the right of each record listing, the checkbox indicates whether you want the record to be maintained persistently. If you check it, the icon on the map changes color, the record is promoted to the top of the list where it becomes sticky and is given an alphabetic sequence. Unchecking this box undoes all of these changes
  • To the right of each record listing is also the view record [View raw attributes for the record] icon; clicking it shows the raw attribute data for that record.

The records that actually appear on this listing are based on the records scope or Search (see below) conditions, as altered by the filter settings on the right-hand listing under the sWebMap. For example, if we now remove the neighborhood record as being persistent and Show included records we now get items across the entire map viewport:

Search works in a similar fashion, in that it invokes the filter display with the same left- and right-hand listings appear under the sWebMap, only now only for those records that met the search conditions. (The allowable search syntax is that for Lucene.) Here is the result of a search, in this case for “school”:

As shown above, the right-hand panel is split into three sections: Sources (or datasets), Kinds (that is, similar types of things, such as bus stops v schools v golf courses), and Attributes (that is, characteristics for these various types of things). All selection possibilities are supported by auto-select.

Sources and Kinds are selected via checkbox. (The default state when none are checked is to show all.) As more of these items are selected, the records listing in the left-hand panel gets smaller. Also, the counts of available items [as shown by the (XX) number at the end of each item] are also changed as filters are added or subtracted by adding or removing checkboxes.

Applying filters to Attributes works a little differently. Attributes filters are selected by selecting the magnifier plus [Filter by attribute] icon, which then brings up a filter selection at the top of the listing underneath the Attributes header.

The specific values and their counts (for the current selection population) is then shown; you may pick one or more items. Once done, you may pick another attribute to add to the filter list, and continue the filtering process.

Saving and Sharing Your Filters

sWebMaps have a useful way to save and share their active filter selections. At any point as you work with a sWebMap, you can save all of its current settings and configurations — viewport area, filter selections, and persistent records — via some simple steps.

You initiate this functionality by choosing the save button at the upper right of the map panel:

When that option is invoked, it brings up a dialog where you are able to name the current session, and provide whatever explanatory notes you think might be helpful.

NOTE: the naming and access to these saved sessions is local to your own use only, unless you choose to share the session with others; see below.

Once you have a saved session, you will then see a new control at the upper right of your map panel. This control is how you load any of your previously saved sessions:

Further, once you load a session, still further options are presented to you that enables you to either delete or share that session:

If you choose to share a session, a shortened URI is generated automatically for you:

If you then provide that URI link to another user, that user can then click on that link and see the map in the exact same state — viewport area, filter selections, and persistent records — as you initially saved. If the recipient then saves this session, it will now also be available persistently for his or her local use and changes.

NOTE: two users may interactively work together by sharing, saving and then modifying maps that they share again with their collaborator.

[1] A semantic components is a JavaScript or Flex component or widget that takes record descriptions and irXML schema as input, and then outputs interactive visualizations of those records. Depending on the logic described in the input schema and the input record descriptions, the semantic component may behave differently or provide presentation options to users. Each semantic component delivers a very focused set of functionality or visualization. Multiple components may be combined on the same canvas for more complicated displays and controls. At present, there are 12 individual semantic widgets in the available open source suite; see further the sComponent category on the TechWiki. By convention, all of the individual widgets in the semantic component suite are named with an ‘s’ prefix; hence, sWebMap.
[2] The open semantic framework, or OSF, is a combination of a layered architecture and an open-source, modular software stack. The stack combines many leading third-party software packages — such as Drupal for content management, Virtuoso for (RDF) triple storage, Solr for full-text indexing, GATE for tagging and natural language processing, the OWL2 API for ontology management and support, and others. These third-party tools are extended with open source developments from Structured Dynamics including structWSF (a RESTful Web services layer of about a dozen modules for interacting with the underlying data and data engines), conStruct (a series of Drupal modules that tie Drupal to the structWSF Web services layer), semantic components (data display and manipulation widgets, mostly based either in Flash or JavaScript, for working with the semantic data), various parsers and standard data exchange formats and schema to facilitate information flow amongst these options, and a ontologies layer, that consists of both domain ontologies that capture the coherent concepts and relationships of the current problem space and of administrative ontologies that govern how the other software layers interact with this structure.
[3] structWSF is a platform-independent Web services framework for accessing and exposing structured RDF (Resource Description Framework) data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies). The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework has a baseline set of more than 20 Web services in CRUD, browse, search, tagging, ontology management, and export and import.
[4] For the most comprehensive discussion of ODapps, see M. K. Bergman, 2011. ” Ontology-Driven Apps Using Generic Applications,” posted on the AI3:::Adaptive Information blog, March 7, 2011. You may also search on that blog for ‘ODapps‘ to see related content.
Posted:December 12, 2011

State of SemWeb Tools - 2011Number of Semantic Web Tools Passes 1000 for First Time; Many Other Changes

We have been maintaining Sweet Tools, AI3‘s listing of semantic Web and -related tools, for a bit over five years now. Though we had switched to a structWSF-based framework that allows us to update it on a more regular, incremental schedule [1], like all databases, the listing needs to be reviewed and cleaned up on a periodic basis. We have just completed the most recent cleaning and update. We are also now committing to do so on an annual basis.

Thus, this is the inaugural ‘State of Tooling for Semantic Technologies‘ report, and, boy, is it a humdinger. There have been more changes — and more important changes — in this past year than in all four previous years combined. I think it fair to say that semantic technology tooling is now reaching a mature state, the trends of which likely point to future changes as well.

In this past year more tools have been added, more tools have been dropped (or abandoned), and more tools have taken on a professional, sophisticated nature. Further, for the first time, the number of semantic technology and -related tools has passed 1000. This is remarkable, given that more tools have been abandoned or retired than ever before.

Click here to browse the Sweet Tools listing. There is also a simple listing of URL links and categories only.

We first present our key findings and then overall statistics. We conclude with a discussion of observed trends and implications for the near term.

Key Findings

Some of the key findings from the 2011 State of Tooling for Semantic Technologies are:

  • As of the date of this article, there are 1010 tools in the Sweet Tools listing, the first it has passed 1000 total tools
  • A total of 158 new tools have been added to the listing in the last six months, an increase of 17%
  • 75 tools have been abandoned or retired, the most removed at any period over the past five years
  • A further 6%, or 55 tools, have been updated since the last listing
  • Though open source has always been an important component of the listing, it now constitutes more than 80% of all listings; with dual licenses, open source availability is about 83%. Online systems contribute another 9%
  • Key application areas for growth have been in SPARQL, ontology-related areas and linked data
  • Java continues to dominate as the most important language.

Many of these points are elaborated below.

The Statistical Picture

The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories, each with over 6% of the total, are information extraction, general RDF tools, ontology tools, browser tools (RDF, OWL), and parsers or converters. The relative share by category is shown in this diagram (click to expand):

Since the last listing, the fastest growing categories have been SPARQL, linked data, knowledge bases and all things related to ontologies. The relative changes by tools category are shown in this figure:

Though it is true that some of this growth is the result of discovery, based on our own tool needs and investigations, we have also been monitoring this space for some time and serendipity is not a compelling explanation alone. Rather, I think that we are seeing both an increase in practical tools (such as for querying), plus the trends of linked data growth matched with greater sophistication in areas such as ontologies and the OWL language.

The languages these tools are written in have also been pretty constant over the past couple of years, with Java remaining dominant. Java has represented half of all tools in this space, which continues with the most recent tools as well (see below). More than a dozen programming or scripting languages have at least some share of the semantic tooling space (click to expand):

Sweet Tools Languages

With only 160 new tools it is hard to draw firm trends, but it does appear that some languages (Haskell, XSLT) have fallen out of favor, while popularity has grown for Flash/Flex (from a small base), Python and Prolog (with the growth of logic tools):

PHP will likely continue to see some emphasis because of relations to many content management systems (WordPress, Drupal, etc.), though both Python and Ruby seem to be taking some market share in that area.

New Tools

The newest tools added to the listing show somewhat similar trends. Again, Java is the dominant language, but with much increased use of JavaScript and Python and Prolog:

Sweet Tools Languages

The higher incidence of Prolog is likely due to the parallel increase in reasoners and inference engines associated with ontology (OWL) tools.

The increase in comprehensive tool suites and use of Eclipse as a development environment would appear to secure Java’s dominance for some time to come.

Trends and Observations

These dry statistics tend to mask the feel one gets when looking at most of the individual tools across the board. Older academic and government-funded project tools are finally getting cleaned out and abandoned. Those tools that remain have tended to get some version upgrades and improved Web sites to accompany them.

The general feel one gets with regard to semantic technology tooling at the close of 2011 has these noticeable trends:

  • A three-tiered environment – the tools seem to segregate into: 1) a bottom tier of tools (largely) developed by individuals or small groups, now most often found on Google Code or Github; 2) a middle-tier of (largely) government-funded projects, sometimes with multiple developers, often older, but with no apparent driving force for ongoing improvements or commercialization; and 3) a top-tier of more professional and (often) commercially-oriented tools. The latter category is the most noticeable with respect to growth and impact
  • Professionalism – the tools in the apparent top tiers feel to have more professionalism and better (and more attractive) packaging. This professionalism is especially true for the frameworks and composite applications. But, it also applies to many of the EU-funded projects from Europe, which has always been a huge source of new tool developments
  • More complete toolsets – similarly, the upper levels of tools are oriented to pragmatic problems and problem-solving, which often means they embody multiple functions and more complete tooling environments. This category actually appears to be the most visible one exhibiting growth
  • Changing nature of academic releases – yet, even the academic releases seem to be increasing in professionalism and completeness. Though in the lowest tier it is still possible to see cursory or experimental tool releases, newer academic releases (often) seem to be more strategically oriented and parts of broader programmatic emphases. Programs like AKSW from the University of Leipzig or the Freie Universität Berlin or Finland’s Semantic Computing Research Group (SeCo), among many others, tend to be exemplars of this trend
  • Rise of commercial interests and enterprise adoption – the growing maturity of semantic technologies is also drawing commercial interest, and the incubation of new start-ups by academic and research institutions acts to reinforce the above trends. Promising projects and tools are now much more likely to be spun off as potential ventures, with accompanying better packaging, documentation and business models
  • Multiple languages and applications – with this growing complexity and sophistication has also come more complicated apps, combining multiple languages and functions. In fact, for some time the Sweet Tools listing has been justifiably criticized by some as overly “simplifying” the space by classifying tools under (largely) single applications or single languages. By the 2012 survey, it will likely be necessary to better classify the tools using multiple assignments
  • Google code over SourceForge for open source (and an increase in Github, as well) – virtually all projects on SourceForge now feel abandoned or less active. The largest source of open source projects in the semantic technology space is now clearly Google Code. Though of a smaller footprint today, we are also seeing many of the newer open source projects also gravitate to Github. Open source hosting environments are clearly in flux.

I have said this before, and been wrong about it before, but it is hard to see the tooling growth curve continue at its current slope into the future. I think we will see many individual tools spring up on the open source hosting sites like Google and Github, perhaps at relatively the same steady release rate. But, old projects I think will increasingly be abandoned and older projects will not tend to remain available for as long a time. While a relatively few established open source standards, like Solr and Jena, will be the exception, I think we will see shorter shelf lives for most open source tools moving forward. This will lead to a younger tools base than was the case five or more years ago.

I also think we will continue to see the dominance of open source. Proprietary software has increasingly been challenged in the enterprise space. And, especially in semantic technologies, we tend to see many open source tools that are as capable as proprietary ones, and generally more dynamic as well. The emphasis on open data in this environment also tends to favor open source.

Yet, despite the professionalism, sophistication and complexity trends, I do not yet see massive consolidation in the semantic technology space. While we are seeing a rapid maturation of tooling, I don’t think we have yet seen a similar maturation in revenue and business models. While notable semantic technology start-ups like Powerset and Siri have been acquired and are clear successes, these wins still remain much in the minority.


[1] Please use the comments section of this post for suggesting new or overlooked tools. We will incrementally add them to the Sweet Tools listing. Also, please see the About tab of the Sweet Tools results listing for prior releases and statistics.

Posted by AI3's author, Mike Bergman Posted on December 12, 2011 at 8:29 am in Open Source, Semantic Web Tools, Structured Web | Comments (6)
The URI link reference to this post is: http://www.mkbergman.com/991/the-state-of-tooling-for-semantic-technologies/
The URI to trackback this post is: http://www.mkbergman.com/991/the-state-of-tooling-for-semantic-technologies/trackback/
Posted:September 26, 2011

OWL - Web Ontology LanguageDocumenting the Emerging Ecosystem Around OWL 2

We have been touting the importance of OWL 2 as the language of choice for federating and reasoning over RDF and ontologies. An absolutely essential enabler of the OWL 2 language is version 3 of the OWL API (actually, version 3.2.4 at the time of this writing), a Java-based framework for accessing and managing the language. Protégé 4, the most popular open source ontology editor and integrated development environment (IDE), for example, is built around the OWL API.

As we laid out a bit more than a year ago, now codified on our TechWiki as the Normative Landscape of Ontology Tools (especially the second figure), we see the OWL API as the essential pivot point for all forms of ontology tools moving forward.

We have attempted to assemble a definitive and comprehensive list of all known tools presently based around version 3 of the OWL API. (We have surely missed some and welcome comments to this post that identify missing ones; we promise to add them and keep tracking them.) Herein is a listing of the 30 or so known OWL API-based tools:

  • Protégé 4 is a free, open source ontology editor and knowledge-base framework based on OWL 2 and centered on the OWL API
  • CEL, FaCT++, HermiT, Pellet, and Racer Pro reasoners provide OWL API wrappers and are also available as reasoner plugins to Protégé 4
  • There is also a FaCT++ port to Java that is also implementing the OWLReasoner and is available as a plugin for Protégé 4.1; it is at version 0.9 with user feedback welcomed
  • structOntology is an open source ontology editor and manager supporting Structured DynamicsconStruct implementation of the Open Semantic Framework (OSF) in Drupal; more information is provided here
  • TrOWL is a Tractable reasoning infrastructure for OWL 2. TrOWL supports both standard TBox and ABox reasoning, as well as conjunctive query answering
  • SKOSEd is a SKOS editor for Protege; just recently made compatible with Protégé 4.1
Please let us know of any missing OWL API tools that should be added to this list by submitting a comment to this post. We will keep this listing current.
  • Populus is a semantic spreadsheet framework using RightField and OPPL for creating OWL ontologies
  • Bubastis is a tool for detecting asserted logical differences between two ontologies, such as between versions. A stand alone version of the tool is also available for download from the EFO tools page. Bubastis is powered by the OWL API
  • Tab2OWL and its download is a Java tool for importing classes into an already existing OWL file. The script uses the OWL API to read in a tab delimited file of class details and create OWL classes from these rows, adding them to an existing ontology
  • S-Match is a semantic matching framework, which provides several semantic matching algorithms and facilities for developing new ones. Currently S-Match contains implementations of the original S-Match semantic matching algorithm, as well as minimal semantic matching algorithm and structure preserving semantic matching algorithm
  • The Alignment API is an API and implementation for expressing and sharing ontology alignments. It uses an RDF format for expressing alignments in a uniform way. Its four main interfaces (Alignment, Cell, Relation and Evaluator) provides these services: storing, finding, and sharing alignments; piping alignment algorithms (improving an existing alignment); manipulating (thresholding and hardening); generating processing output; and comparing alignments
  • The OWLlink API is a Java interface and implementation of the OWLlink protocol on top of the Java-based OWL API. The OWLlink API enables OWL API-based applications to access remote reasoners (so-called OWLlink servers), and it turns any OWL API aware reasoner into an OWLlink server
  • OPPL2 (ontology pre-processing language) is an abstract formalism that allows for manipulating ontologies written in OWL. It is 100% based on the Manchester OWL Syntax; a query language based on OWL (logical) axioms and variables; a scripting language that allows the addition/removal of OWL (logical) axioms. It is available as an Protégé 4.1 plug-in
  • OPPL Patterns It is available as an Protégé 4.1 plug-in
  • Posh (Prolog OWL Shell) is a command line utility that wraps the Thea OWL library to allow for advanced querying and processing of ontologies, combining the power of Prolog and OWL reasoning
  • POPL (Prolog Ontology Processing Language) allows you to write expressive ontology rewrite rules in a high-level declarative fashion using a syntax similar to Manchester syntax
  • OWLTools (aka OWL2LS – OWL2 Life Sciences) is a convenience Java API on top of the OWL API. Code is available here
  • LexOWL is a plug-in for Protégé 4. In order to add more powerful functionality (e.g., inferencing, editing) to the existing infrastructure and align LexGrid more closely with various Semantic Web technologies, the LexOWL plugin for Protégé 4 provides a way for representing the ontologies modeled within the LexGrid environment in OWL. A source for downloading this tool has not been found
  • Apero, a Protégé plug-in that is an ontology debugging tool based on the use of anti-patterns; see http://www.emcl-study.eu/fileadmin/master_theses/thesis_tahwil.pdf
  • DReW is a prototype DL reasoner over LDL+ ontologies and a prototype reasoner for dl-programs over LDL+ ontologies under well-founded semantics. It is not well developed or documented; it can be downloaded here
  • The LingInfo, LexOnto, LexInfo and LMF ontologies are available from the project website, as well as a corresponding Java API with an implementation for the commonly used OWL API
  • Thea2 is a Prolog library that provides complete support for querying and processing OWL 2 ontologies directly from within Prolog programs. Thea2 also offers additional capabilities including a bridge to the Java OWL API and translation of ontologies to Description Logic programs
  • GLOW is a visualization for OWL ontologies, based on Hierarchical Edge Bundles. Hierarchical Edge Bundles is a new visually attractive technique for displaying adjacency relations in hierarchical data, such as concept structures formed by `subclass-of’ and `type-of’ relations. The displayed adjacency relations can be selected from an ontology using a set of common configurations, allowing for intuitive discovery of information. It is a visualization library based on OWL API, as well as a plug-in for Protégé
  • ROWLKit is a simple GUI to reason and query over ontologies written in the OWL 2 QL profile of OWL
  • OBDA Plugin (Ontology-based data access) is an add-on for the Protégé ontology editor aimed at transforming Protégé into a fully fledged OBDA model editor. It provides data source and mapping editors, as well as querying facilities that, in conjunction with an OBDA-enabled reasoner, allows you to design and test every aspect of an OBDA system
  • OntoCAT provides high level abstraction for interacting with ontology resources including local ontology files in standard OWL and OBO formats (via OWL API)
  • SemaRule Navigator is an Eclipse-based toolkit of multiple semWeb tools, built around the OWL API, organized into a pipeline-like system (appears quite complicated)
  • OWLDB (alias Mnemosyne) is a storage system based on object-relational mappings utilising the OWL-API for the W3C Web Ontology Language OWL
  • Finally, for a periodically updated list of “official” extensions, see https://owlapi.svn.sourceforge.net/svnroot/owlapi/v3/branches/owlextensions/.

Addendum

Ignazio Palmisano also graciously suggested these additional sources:

which also further leads to this additional listing:

It is not clear if all of these offer OWL 2 support, let along work with the current OWL API.

Posted by AI3's author, Mike Bergman Posted on September 26, 2011 at 2:52 am in Ontologies, Semantic Web Tools | Comments (5)
The URI link reference to this post is: http://www.mkbergman.com/977/thirty-owl-api-tools/
The URI to trackback this post is: http://www.mkbergman.com/977/thirty-owl-api-tools/trackback/
Posted:August 8, 2011

Geshi NetworkVisualization + Analysis Pushes Aside Cytoscape

Though I never intended it, some posts of mine from a few years back dealing with 26 tools for large-scale graph visualization have been some of the most popular on this site. Indeed, my recommendation for Cytoscape for viewing large-scale graphs ranks within the top 5 posts all time on this site.

When that analysis was done in January 2008 my company was in the midst of needing to process the large UMBEL vocabulary, which now consists of 28,000 concepts. Like anything else, need drives research and demand, and after reviewing many graphing programs, we chose Cytoscape, then provided some ongoing guidelines in its use for semantic Web purposes. We have continued to use it productively in the intervening years.

Like for any tool, one reviews and picks the best at the time of need. Most recently, however, with growing customer usage of large ontologies and the development of our own structOntology editing and managing framework, we have begun to butt up against the limitations of large-scale graph and network analysis. With this post, we announce our new favorite tool for semantic Web network and graph analysis — Gephi — and explain its use and showcase a current example.

The Cytoscape Baseline and Limitations

Three and one-half years ago when I first wrote about Cytoscape, it was at version 2.5. Today, it is at version 2.8, and many aspects have seen improvement (including its Web site). However, in other respects, development has slowed. For example, version 3.x was first discussed more than three years ago; it is still not available today.

Though the system is open source, Cytoscape has also largely been developed with external grant funds. Like other similarly funded projects, once and when grant funds slow, development slows as well. While there has clearly been an active community behind Cytoscape, it is beginning to feel tired and a bit long in the tooth. From a semantic Web standpoint, some of the limitations of the current Cytoscape include:

  • Difficult conversion of existing ontologies — Cytoscape requires creating a CSV input; there was an earlier RDFscape plug-in that held great promise to bridge the software into the RDF and semantic Web sphere, but it has not remained active
  • Network analysis — one of the early and valuable generalized network analysis plug-ins was NetworkAnalyzer; however, that component has not seen active development in three years, and dynamic new generalized modules suitable for social network analysis (SNA) and small-world networks have not been apparent
  • Slow performance and too-frequent crashes — Cytoscape has always had a quirky interface and frequent crashes; later versions are a bit more stable, but usability remains a challenge
  • Largely supported by the biomedical community — from the beginning, Cytoscape was a project of the biomedical community. Most plug-ins still pertain to that space. Because of support for OBO (Open Biomedical and Biological Ontologies) formats and a lack of uptake by the broader semantic Web community, RDF- and OWL-based development has been keenly lacking
  • Aside from PDFs, poor ability to output large graphs in a viewable manner
  • Limited layout support — and poor performance for many of those included with the standard package.

Undoubtedly, were we doing semantic technologies in the biomedical space, we might well develop our own plug-ins and contribute to the Cytoscape project to help overcome some of these limitations. But, because I am a tools geek (see my Sweet Tools listing with nearly 1000 semantic Web and -related tools), I decided to check out the current state of large-scale visualization tools and see if any had made progress on some of our outstanding objectives.

Choosing Gephi and Using It

There are three classes of graph tools in the semantic technology space:

  1. Ontology navigation and discovery, to which the Relation Browser and RelFinder are notable examples
  2. Ontology structure visualization (and sometimes editing), such as the GraphViz (OWLViz) or OntoGraf tools used in Protégé (or the nice FlexViz, again used by the OBO community), and
  3. Large-scale graph visualization in order to gain a complete picture and macro relationships in the ontology.

One could argue that the first two categories have received the most current development attention. But, I would also argue that the third class is one of the most critical:  to understand where one is in a large knowledge space, much better larger-scale visualization and navigation tools are needed. Unfortunately, this third category is also the one that appears to be receiving the least development attention. (To be sure, large-scale graphs pose computational and performance challenges.)

In the nearly four years since my last major survey of 26 tools in this category, the new entrants appear quite limited. I’ve surely overlooked some, but the most notable are Gruff, NAViGaTOR, NetworkX and Gephi [1]. Gruff actually appears to belong most in Category #2; I could find no examples of graphs on the scale of thousands of nodes. NAViGaTOR is biomedical only. NetworkX has no direct semantic graph importing and — while apparently some RDF libraries can be used for manipulating imports — alternative workflows were too complex for me to tackle for initial evaluation. This leaves Gephi as the only potential new candidate.

From a clean Web site to well-designed intro tutorials, first impressions of Gephi are strongly positive. The real proof, of course, was getting it to perform against my real use case tests. For that, I used a “big” ontology for a current client that captures about 3000 different concepts and their relationships and more than 100 properties. What I recount here — from first installing the program and plug-ins and then setting up, analyzing, defining display parameters, and then publishing the results — took me less than a day from a totally cold start. The Gephi program and environment is surprisingly easy to learn, aided by some great tutorials and online info (see concluding section).

The critical enabler for being able to use Gephi for this source and for my purposes is the SemanticWebImport plug-in, recently developed by Fabien Gandon and his team at Inria as part of the Edelweiss project [2]. Once the plug-in is installed, you need only open up the SemanticWebImport tab, give it the URL of your source ontology, and pick the Start button (middle panel):

SemWeb Plug-in for GephiNote the SemanticWebImport tool also has the ability (middle panel) to issue queries to a SPARQL endpoint, the results of which return a results graph (partial) from the source ontology. (This feature is not further discussed herein.) This ontology load and display capability worked without error for the five or six OWL 2 ontologies I initially tested against the system.

Once loaded, an ontology (graph) can be manipulated with a conventional IDE-like interface of tabs and views. In the right-hand panels above we are selecting various network analysis routines to run, in this case Average Degrees. Once one or more of these analysis options is run, we can use the results to then cluster or visualize the graph; the upper left panel shows highlighting the Modularity Class, which is how I did the community (clustering) analysis of our big test ontology. (When run you can also assign different colors to the cluster families.) I also did some filtering of extraneous nodes and properties at this stage and also instructed the system via the ranking analysis to show nodes with more link connections as larger than those nodes with fewer links.

At this juncture, you can also set the scale for varying such display options as linear or some power function. You can also select different graph layout options (lower left panel). There are many layout plug-in options for Gephi. The layout plugin called OpenOrd, for instance, is reported to be able to scale to millions of nodes.

At this point I played extensively with the combination of filters, analysis, clusters, partitions and rankings (as may be separately applied to nodes and edges) to: 1) begin to understand the gross structure and characteristics of the big graph; and 2) refine the ultimate look I wanted my published graph to have.

In our example, I ultimately chose the standard Yifan Hu layout in order to get the communities (clusters) to aggregate close to one another on the graph. I then applied the Parallel Force Atlas layout to organize the nodes and make the spacings more uniform. The parallel aspect of this force-based layout allows these intense calculations to run faster. The result of these two layouts in sequence is then what was used for the results displays.

Upon completion of this analysis, I was ready to publish the graph. One of the best aspects of Gephi is its flexibility and control over outputs. Via the main Preview tab, I was able to do my final configurations for the published graph:

Publication Options for GephiThe graph results from the earlier-worked out filters and clusters and colors are shown in the right-hand Preview pane. On the left-hand side, many aspects of the final display are set, such as labels on or off, font sizes, colors, etc. It is worth looking at the figure above in full size to see some of the options available.

Standard output options include either SVG (vector image) or PDFs, as shown at the lower left, with output size scaling via slider bar. Also, it is possible to do standard saves under a variety of file formats or to do targeted exports.

One really excellent publication option is to create a dynamically zoomable display using the Seadragon technology via a separate Seadragon Web Export plug-in. (However, because of cross-site scripting limitations due to security concerns, I only use that option for specific sites. See next section for the Zoom It option — based on Seadragon — to workaround that limitation.)

Outputs Speak for Themselves

I am very pleased with the advances in display and analysis provided by Gephi. Using the Zoom It alternative [3] to embedded Seadragon, we can see our big ontology example with:

  • All 3000 nodes labeled, with connections shown (though you must must zoom to see) and
  • When zooming (use scroll wheel or + icon) or panning (via mouse down moves), wait a couple of seconds to get the clearest image refresh:

Note: at standard resolution, if this graph were to be rendered in actual size, it would be larger than 7 feet by 7 feet square at full zoom !!!

To compare output options, you may also;

Still, Some Improvements Would be Welcomed

It is notable that Gephi still only versions itself as an “alpha”. There is already a robust user community with promise for much more technology to come.

As an alpha, Gephi is remarkably stable and well-developed. Though clearly useful as is, I measure the state of Gephi against my complete list of desired functionality, with these items still missing:

  • Real-time and interactive navigation — the ability to move through the graph interactively and to issue queries and discover relationships
  • Huge node numbers — perhaps the OpenOrd plug-in somewhat addresses this need. We will be testing Gephi against UMBEL, which is an order of magnitude larger than our test big ontology
  • More node and edge control — Cytoscape still retains the advantage in the degree to which nodes and edges can be graphically styled
  • Full round-tripping — being able to use Gephi in an edit mode would be fantastic; the edit functionality is fairly straightforward, but the ability to round-trip in appropriate formats (OWL, RDF or otherwise) may be the greater sticking point.

Ultimately, of course, as I explained in an earlier presentation on a Normative Landscape for Ontology Tools, we would like to see a full-blown graphical program tie in directly with the OWL API. Some initial attempts toward that have been made with the non-Gephi GLOW visualization approach, but it is still in very early phases with ongoing commitments unknown. Optimally, it would be great to see a Gephi plug-in that ties directly to the OWL API.

In any event, while perhaps Cytoscape development has stalled a bit for semantic technology purposes, Gephi and its SemanticWebImport plug-in have come roaring into the lead. This is a fine toolset that promises usefulness for many years to come.

Some Further Gephi Links

To learn more about Gephi, also see the:

Also, for future developments across the graph visualization spectrum, check out the Wikipedia general visualization tools listing on a periodic basis.


[1] The R open source math and statistics package is very rich with apparently some graph visualization capabilities, such as the dedicated network analysis and visualization project statnet. rrdf may also provide an interesting path for RDF imports. R and its family of tools may indeed be quite promising, but the commitment necessary to R appears quite daunting. Longer-term, R may represent a more powerful upgrade path for our general toolsets. Neo4j is also a rising star in graph databases, with its own visualization components. However, since we did not want to convert our underlying data stores, we also did not test this option.
[2] Erwan Demairy is the lead developer and committer for SemanticWebImport. The first version was released in mid-April 2011.
[3] For presentations like this blog post, the Seadragon JavaScript enforces some security restrictions against cross-site scripting. To overcome that, the option I followed was to:
  • Use Gephi’s SVG export option
  • Open the SVG in Inkscape
  • Expand the size of the diagram as needed (with locked dimensions to prevent distortion)
  • Save As a PNG
  • Go to Zoom It and submit the image file
  • Choose the embed function, and
  • Embed the link provided, which is what is shown above.
(Though Zoom.it also accepts SVG files directly, I found performance to be spotty, with many graphical elements dropped in the final rendering.)

Posted by AI3's author, Mike Bergman Posted on August 8, 2011 at 3:27 am in Ontologies, Open Source, Semantic Web Tools, UMBEL | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/968/a-new-best-friend-gephi-for-large-scale-networks/
The URI to trackback this post is: http://www.mkbergman.com/968/a-new-best-friend-gephi-for-large-scale-networks/trackback/