Posted:June 30, 2014

Open Semantic Framework Structured Dynamics Moves to Integrate Key Initiatives

Structured Dynamics is pleased to announce its new UMBEL Web site and set of Web services.

Our first release of the UMBEL site occurred in 2007 while UMBEL was still under development. That site used its own homegrown HTML. The release was followed in 2008 by the addition of our own Web services. The Web services were well-received, which caused Structured Dynamics to develop the more general structWSF Web services framework (most recently updated as the OSF Web Services). We subsequently migrated the earlier UMBEL Web services to this more general framework, and also migrated to Drupal as the standard content management and Web site component for OSF.

For most reasons, including all client work to date, our OSF framework (Web services + Drupal 7) has been performant and met client site needs. However, the operation of the UMBEL Web services was often problematic after moving to the Drupal (full OSF) version. Unfortunately, we have seen both performance and stability problems, though calculations over a full 28,000 node graph are a challenge in any environment.

Since the UMBEL structure was an order of magnitude larger than our client work to date, we have frankly adopted a posture of occasional monitoring and reboots to keep the UMBEL Web site up. This posture was not limiting use of UMBEL for general browsing purposes, but was limiting its usefulness as a working API.

Because the cobbler’s son is often the last to get shoes, we have let the UMBEL Web site chill to a degree in the background. But, now, with other imperatives underway and some dedicated time to look directly at performance of larger-scale ontologies, we have looked at these items anew. The report card on our current evaluations is contained in a newly released UMBEL Web site with services, which I summarize and provide context for below. What emerges is an interesting story of discovery and growth.

Basis of the New Site

The new UMBEL site and its underlying 28,000 concept graph is consistent with the OSF layered architecture. However, the Web services are now written in Clojure and the Web site framework uses Bootstrap and plain ol’ HTML. These structured and foundational changes have been championed by Fred Giasson, SD’s chief technology officer, who is also putting forth a blog series on Clojure in particular. He also has a current post from a technical basis on these UMBEL site and service changes.

In essence, we have learned two important things about our prior practice with respect to making UMBEL Web services broadly available. First, for UMBEL, we do not need or want our standard configuration of having a Drupal front-end as the interface into OSF. Access to a knowledge graph does not need — and is ill-served — by having a complicated interface stand atop a large-scale concept model. APIs and Web services are the most important interaction points with the UMBEL knowledge graph, not a user-oriented Web site.

Second, in the various phases of our work, we had come to embrace the idea of ontology-driven applications (what we have termed ODapps). The compelling vision behind such structures is to place the emphasis on knowledge structures and data, rather than more software. Once one begins to unpack that vision, it can also become clear that software programming languages themselves that look toward “code as data” might be one way to be consistent with that vision.

Seeking a Sense of Harmony

For years I have been writing about data integration and interoperability and our company has been devoted to the topic. I have written extensively about the importance of RDF and description logics to how we organize and represent data. We were also some of the first to supplement RDF with a faceted text-search engine (Solr) to provide the most responsive query environment across structured to unstructured data. We have also adopted ontologies and the OWL 2 (plus SKOS) languages as standards to both foster and enable interoperability. We have explored native data structs to understand how wild forms of information can be efficiently pipelined into interoperable RDF and text forms.

All of this points to the ideal of the democratization of the information function in the enterprise. In other words, to the idea that how data structures get organized and represented (the ontology side of things) is something that knowledge workers can do themselves, rather than accepting the bottleneck of IT and programmers.

This is well and good except there is a critical “last mile” between data representation and data usefulness. This “last mile” deals with how actual data gets manipulated and then organized and presented (visualized). Query responses, reports, analysis and maps continue to be the choke points between knowledge workers and their IT support. And one need not frame this entirely from an enterprise perspective: these same challenges exist for the individual researcher or the small organization.

So, while one can focus on data and its organization and representation, until we address this “last mile” problem, we still are not likely addressing the largest source of frustration and lost opportunities in the knowledge function.

The reason that simple data struct forms and tools like spreadsheets continue to be popular is that they are empirically the best tools for the “last mile”. Web forms and services are increasingly showing their strengths in this realm.

Once one steps back and looks at the entire cycle from basic datum to actionable knowledge, it is clear that the question of data model is but one portion of the challenge. The remaining challenge is how (now) accessible information can be placed into context and acted upon. Further, if one premise is the democratization of the information function, then the challenge should also be how to provide productive capabilities for the last mile to the knowledge worker. Productivity is enhanced when there are the fewest channels and distortions between signal (problem) and channel (user chains).

Fred, in his investigation of functional languages, clearly saw that bringing the languages of code (programming) into the language of data (knowledge workers as expressed in our RDF world view) was one means to reduce the number and lossiness of the channels between problem (signal) and solution. A world view premised on the efficient representation and interoperability of data must logically support the idea of a coding (instructional or language) base aligned as well to problems. Moreover, since software guides the actual computer operations, a form of the software that supports the nature of the data should also provide a more performant framework for moving forward. In technical terms, this is known as homoiconicity.

Whether one looks to the intellectual foundations of Charles S Peirce or Claude Shannon (both of whom we do), one can see that the idea of signs and information theory means finding both data representations and code that minimize communication losses and promote the accurate transfer of the message. Lossless data transmission is one contributor to that vision, but so too is a functional representation for how the information is to be processed and transformed that aligns most closely with the information at hand.

Ergo, a better model for data is not enough. A better model of how to manipulate that data (that is, software) is also needed that aligns with the idea of coherence and structure in the underlying information. For our purposes, we have chosen Clojure as the functional language basis for these new UMBEL Web services. Not only is it performant, but it aligns well with the creation of domain-specific languages (DSLs) that also promise to democratize the computing function for the knowledge worker.

Bringing the Pieces Together

Fred and I founded Structured Dynamics a bit more than five years ago. But, we had worked together much earlier on UMBEL and Zitgist. For nearly ten years now, we have episodically emphasized a few different initiatives and passions.

One of those passions has been the structure of data and information. It is this perspective that brought us to RDF and data structs (and our irON efforts) at various times. The idea of structure is a basis for our company name, and represents the belief that structure can be brought to unstructured forms (via tagging, for example). Structure is perhaps the most common notion or concept in my own writings for a decade.

Another need has been the idea of making semantic technologies operational. We have been keen researchers of the tools space and algorithms and such since the beginning. We observed early on that many innovative and open source semantic programs existed, but most were the result of EU grants or academic efforts elsewhere. Thousands of tools existed, but very few had either been evaluated or stress-tested. By bringing together the best of class tools and integrating them, we could begin to provide a useful semantic platform for enterprises. This motivation was the genesis for the Open Semantic Framework, and has been the major source of our client support since SD was founded. We have finally created an enterprise-capable platform and have done much to transfer its technology. But, these concepts are difficult, and much remains to be done before semantic technologies are a standard option for enterprises.

Still, in another vein, our first love and interest has been knowledge bases. We first identified the need for UMBEL years ago when we perceived an organizing vocabulary would become an essential glue on the Web. We pursued and studied Wikipedia and how it is informing knowledge bases. Instance data and how it is represented is a passion for how these knowledge bases (KBs) get leveraged going forward.

As a smaller consulting and development boutique, we have needed to be opportunistic about when and where we devoted efforts to these pieces. So, over the months or years, we have at various times devoted ourselves to data models and ontologies (structure), the Open Semantic Framework (platform), or UMBEL or Wikipedia (KBs, knowledge bases). Depending on funding and priorities, any one of these threads did receive episodic attention and focus. But, truth is, each one of these pieces has been developed in (project-level) isolation to the whole. Such piecemeal development was essential until each component achieved an appropriate degree of maturity.

I could say we could foresee some years back that all of these pieces would eventually reinforce and bolster one another. Though there is a small bit of truth in that statement, the way things have actually unfolded is to show, as experience and sophistication have been gained, that there is a synergy that comes in the interplay of these various pieces. The goodness is that Structured Dynamics’ efforts (and of its predecessors) were building inexorably to the possible cross-fertilization of these efforts.

Once this kind of realization takes place — that data, code and semantics move hand-in-hand — it then becomes logical to look at the entire knowledge ecosystem. For example, it is not surprising that artificial intelligence, now in the informed guise of KB-backed systems, has again come to the fore. It is also not surprising that what software and programming languages we bring to bear also directly interact with these concerns. Just as Hadoop and non-relational database systems have become prominent, we should also investigate what kind of programming languages and constructs may best fit into this brave new information world.

What we have seen from that investigation is that functional languages (with their DSL offspring) somehow fit into the overall equation moving forward. SD has moved from a single-focus endeavor to one explicitly looking at integration and interoperability issues. What we had earlier seen as (largely) independent pieces we now see as fitting into a broader equation of related emphases:

Structure + Platform + KBs + Functional Language = Knowledge Worker-based Interoperability

We are seeing artificial intelligence moving in these directions. As a subset of AI, I suspect we will also see  the semantic Web moving in the same direction.

We clearly now have the theory, the data, the understanding of semantics, and languages and data representations that can make these democratic interoperabilities become real. This new UMBEL Web site is the first expression of how these pieces can begin to work together into a compelling, accessible whole.

We welcome you to visit and to take advantage of UMBEL’s fully accessible APIs.

Posted:January 20, 2014

Open Semantic Framework New OSF Platform Leapfrogs Earlier Releases in Features and Capabilities

After nearly five years of concentrated development — including the past 20 months of quiet, background efforts — Structured Dynamics is proud to announce version 3.0 of its open-source Open Semantic Framework. OSF is a turnkey platform targeted to enterprises to bring interoperability to their information assets, achieved via a layered architecture of semantic technologies. OSF can integrate information from documents to Web pages and standard databases. Its broad functions range from information ingest and tagging to search and data management to publishing.

Until today, the version available for download was OSF version 1.x. While capable as an enterprise platform — indeed, it has been in use by a number of leading global enterprises since development first began — the capability of the platform was spotty and required consulting expertise to configure and set-up. SD was hired by Healthdirect Australia (HDA) nearly two years ago to enhance OSF’s capabilities and integrate it more closely with the Drupal open-source content management system, among other modern enterprise requirements. The OSF from those developments — the non-public version 2.0 specific to HDA — has now been generalized for broader public use with today’s public announcement of version 3.0.

A More Complete Enterprise Platform

HDA's healthinsite Portal

Not unlike many large organizations, HDA had specific enterprise requirements when it began its recent initiative. Included in these were stringent security, broad use of proven open-source applications, governance and workflow procedures, and strict content authoring and management guidelines. These requirements further needed to express themselves via a sequence of deployment and testing environments, all conducted by a multi-vendor support group following agile development practices.

These requirements placed a premium on performance, scalability and interoperability, all subject to repeatable release procedures and scripts. OSF’s initial development as a more-or-less standalone platform needed to accommodate an enterprise-wide management model involving many players, environments and applications. Prior decisions based on OSF alone now needed to consider and bridge modern enterprise development and deployment practices.

Tighter integration with Drupal was one of these requirements (see next section), but other OSF changes necessary to accommodate this environment included:

  • A new security layer — the initial OSF security model was based on IP authentication. Given the sensitivity of the health data managed by HDA, such a simplistic approach was unacceptable. The actual HDA deployment relied on a third-party security application. However, what was learned from that resulted in a key-based access and validation model in the OSF v 3.0 update
  • A new revisioning system — content authoring and governance required multiple checks in the workflow, and requirements to review prior edits and invoke possible rollbacks. The result was to add a completely new revisioning capability to OSF
  • Middleware integration and APIs — in a multi-vendor environment, OSF operates in part as a central repository for all system information, which third parties must more readily and easily be able to access. Thus, besides the security aspects, a much improved programmatic API and a generalized search API were added to the OSF platform
  • New, additional Web services — the requirements above meant that seven new OSF Web services were added to the system, bringing the total number of current Web services to 27
  • New caching layer — because of its Web-service design, information access and mediation occurs via a large number of endpoint queries, many of which are patterned and repeated. To improve overall performance, a new caching layer was added to OSF that significantly improved performance and reduced access burdens on the OSF engines
  • Workflow integration — improved workflow sequences and screens were required to capture workflow and goverance demands, and
  • Multilingual support — like most larger organizations, HDA has a diversity of native languages throughout its user base. Though OSF had initially been explicitly designed to support multilinguality, specific procedures and capabilities were put in place to more easily support multiple languages in OSF.

Tighter Integration with Drupal

When Fred Giasson and I first designed and architected the Open Semantic Framework in 2009, we made the conscious decision to loosely couple OSF with the initial user interface and content management system, Drupal. We did so thinking that perhaps other CMS frameworks would be cloned onto OSF over time.

Time has not proven this assumption correct. Client experience and HDA’s interests suggested the wisdom of a tighter coupling to Drupal. This shift arose because of the great flexibility of Drupal with its tens of thousands of add-on modules and its ecosystem of capable developers and designers. Our early decision to keep Drupal at arm’s length was making it more difficult to manage an OSF instance. Existing Drupal developers were not able to employ their Drupal expertise to manage OSF portals.

We pivoted on this error by tightening the coupling to Drupal, which involved a number of discrete activities:

  • Upgrade to Drupal 7 — earlier versions of OSF used Drupal 6. We migrated the code base to Drupal 7. That, plus the other Drupal changes noted below, resulted in re-writing about 80% of the OSF code base related to Drupal
  • Alternative Drupal data storage — Drupal’s own evolution in version 7 (and continuing with version 8) is to abstract its underlying information model around entities and fields, abstractions that are much better aligned with OSF’s RDF data model. As these entity and field changes were exposed in Drupal APIs, it was possible for us to write an entirely new information model underlying Drupal. Drupal administrators using OSF are now able to use OSF solely as the data model underneath Drupal (rather than the more standard MySQL)  or any mixed portions in between. The typical OSF for Drupal design now uses OSF for all content storage, with MySQL reserved for internal Drupal settings (à la MVC)
  • Drupal connectors — certain common or core Drupal modules, such as Fields, Entities, Search, and Views, are either common utilities for Drupal developers or are themselves core bases for third-party modules. Because of their centrality, SD developed a series of “connectors” that enable these modules to be used as is while transparently communicating with and writing to OSF. Thus, Drupal developers can use these familiar capabilities without needing particular OSF knowledge
  • Major updates to Drupal modules — because of the changes above, the existing OSF Drupal modules (called conStruct in the earlier versions) were updated to take advantage of the common terminology and tighter integration
  • Major updates to Drupal widgets — similarly, the standard OSF data and visualization widgets used with Drupal (called Semantic Components in the earlier versions) were also updated to work in this more tightly integrated environment.

Expanded Search Capabilities and Web Services

Some of the extended capabilities in OSF v 3.0 are noted above, including the expanded roster of Web services. However, the OSF Search Web service, which is by far the most used OSF endpoint, received massive improvements in this latest release.

First, OSF Search now uses a new query parser, which provides the capability to change the ranking of search results by boosting how specific query components get scored. Types, attributes, datasets or counts may be used to vary any given search result, including different occurrences on the same page. It is also now possible to add restrictions to the search queries, including restricting results to a specified set of attributes.

This flexibility is highly useful wherein certain structured pages contain blocks or sections with patterned search results. This structuring leads to the ability to create generic page templates, wherein search queries and results vary within the layout. An “events” block may score differently than, say, a “related topics” block, all of which in turn can respond to a given context (say, “cancer” versus “automobiles”) for a given page (and its template).

These repeated patterns lend themselves to the use of reusable “search profiles,” which are predefined queries that may include context variables. These profiles, in turn, can be named and placed on page layouts. Existing profiles may be recalled or invoked to become patterns for still further profiles. The flexibility of these search profiles is immense, and the parameters used in constructing them can be quite extensive.

Thus, OSF version 3.0 includes the new Query Builder module. Via an intuitive selection interface, users may construct search queries of any complexity, and then save and reuse them later as search profiles.

Lastly, registering, configuring and managing OSF instances and datasets into Drupal has never been easier. The new OSF Configure module centralizes all the features and options required for these purposes, which are then managed by a new suite of tools (see next).

Automated Installation and Management Tools

Standard enterprise deployments that proceed from development to production require constant updates and versions, both in application code and content. Keeping track and managing these changes — let alone deploying them quickly and without error — requires separate management capabilities in their own right. The new OSF thus has a number of utilities and command-line tools to aid these requirements:

  • OSF Installer — this tool installs and configures all the pieces required by the OSF stack, then runs the OSF Tests Suites to make sure that all functionality is fully operational on the new server
  • OSF Tests Suites — composed of 746 tests and 4139 assertions, these tests may be run every time an OSF instance is deployed or code is changed. The tests measure all of the input parameters of each endpoint, combinations thereof, mime types, and expected errors returned by each endpoint
  • OSF Ontologies Management Tool — (OMT) is used to manage ontologies, list ontologies, create/import new ones, delete existing ones, or to generate underlying ontological structures
  • OSF Datasets Management Tool — (DMT) is used to manage datasets of a OSF instance, enabling the user to create, delete, update, import and export datasets directly from the command line
  • OSF Permissions Management Tool — (PMT) is used to manage, list, create or delete access permissions groups and users
  • OSF Data Validator Tool — (DVT) is used to perform a series of post-indexation data validation tests and return validation errors if any are found.

Tempered via Enterprise Development and Deployments

The methods and processes by which these advances have been made all occurred within the context of state-of-the-art enterprise IT management. Experience with supporting infrastructure tools (such as Jira, Confluence, Puppet, etc.) and agile development methods are part of the ongoing documentation of OSF (see next). This experience also bolsters Structured Dynamics’ ability to work with other third-party applications at the middleware layer or in support of enterprise deployments.

Comprehensive and Completed Updated Documentation

The Open Semantic Framework has evolved considerably since its conception now five years ago. In its early development, components and pieces were sometimes developed in isolation and then brought into the framework. This jagged development path led to a cacophony of names and terms to characterize portions of the OSF stack. This terminology confusion has made it more difficult than it needed to be to understand the vision of OSF, the layers of its architecture, or the interactions between its components and parts.

In making the substantial efforts to update documentation from OSF version 1.x to the current version 3.0, terminology was made consistent and code references were cleaned up to reflect the simpler OSF branding. This clean up has led to necessary updates across multiple Web sites maintained by Structured Dynamics with some relationship to OSF.

The Web site with the most changes required has been the OSF Wiki. In its prior incarnation, called TechWiki, there were nearly 400 technical articles on OSF. That site has now been completely rewritten and re-organized. Nearly two hundred new articles have been written in support of OSF v 3.0. Terminology related to the older cacophony (see correspondance table here) has (hopefully) been updated and corrected. Most architectural and technical diagrams have been updated. Additional documentation is being posted daily, catching up with the experience of the past twenty months.

Moving Beyond the Established Foundation

Open Semantic Framework

SD is pleased that enterprise sponsors want to continue beyond the Open Semantic Framework’s present solid foundations. While we are not at liberty to discuss specific client initiatives, a number of ongoing developments can be described broadly. First, in terms of the key engines that provide the core of OSF’s data management capabilities, initiatives are underway in the areas of visualization, business analytics and workflow orchestration and management. There are also efforts underway in more automated means for direct ingest of quality Web-based information, both based on linked data and from Web APIs. We are also pleased that efforts to further extend OSF’s tight integration with Drupal are also of interest, even while the integration efforts of the past months have not yet been fully exploited.

To Learn More

To learn more, make sure and check out the re-organized OSF wiki. See specifically the complete OSF overview, the list of all the OSF 3.0 features, and the list of all the new features to OSF 3.0. Also, for a complete soup-to-nuts view of what it takes to put up a new OSF installation, see the Users Guide. Lastly, for a broad overview of OSF, see its reference architecture and the overviews on its dedicated OSF Web site.

As a final note, Structured Dynamics would like to thank its corporate sponsors of the past five years for providing the development funds for OSF, and for agreeing with the open source purposes of the Open Semantic Framework.

Posted:May 21, 2013

Neighbourhoods of Winnipeg - NOWFirst and Largest Local Government Site to Exclusively Embrace Semantic Technologies

The City of Winnipeg, the capital and largest city of Manitoba, Canada, just released its “NOW” portal celebrating its diverse and historical 236 neighborhoods. The NOW portal is one of the largest releases of open data by a local government to date, with some 57 varied datasets now available ranging from local neighborhood amenities such as pools and recreation centers, to detailed real estate and economic development information. Nearly one-half million individual Web pages comprise the site, driven exclusively by semantic technologies. Nearly 10 million RDF triples underly the site.

In announcing the site, Winnipeg Mayor Sam Katz said, “We want to attract new investment to the city and, at the same time, ensure that Winnipeg remains healthy and viable for existing businesses to thrive and grow.” He added, “The new web portal, Neighbourhoods of Winnipeg—or NOW—is one way that we are making it easy to do business within the City of Winnipeg.”

NOW provides a single point of access for information such as location of schools and libraries, Census and demographic information, historical data and mapping information. A new Economic Development feature included in the portal was developed in partnership with Economic Development Winnipeg Inc. (EDW) and Winnipeg REALTORS®.

Our company, Structured Dynamics, was the lead contractor for the effort. An intro to the technical details powering the Winnipeg site is provided in the complementary blog post by SD’s chief technologist, Fred Giasson. These intro announcements by SD will be later followed by more detailed discussions on relevant NOW portal topics in the coming weeks.

Background and Formal Release

But the NOW story is really one of municipal innovation and a demonstration of what a staff of city employees can accomplish when given the right tools and frameworks. SD’s real pleasure over the past two years of development and data conversion for this site has been our role as consultants and advisors as the City itself converted the data and worked the tools. The City of Winnipeg NOW (Neighbourhoods of Winnipeg) site is testament to the ability of semantic technologies to be learned and effectively used and deployed by subject matter professionals from any venue.

In announcing the site on May 13, Mayor Sam Katz also released a short four-minute introductory video about the site:

What we find most exciting about this site is how our open source Open Semantic Framework can be adopted to cutting-edge municipal open data and community-oriented portals. Without any semantic technology background at the start of the project, the City has demonstrated its ability to manage, master and then tailor the OSF framework to its specific purposes.

Key Emphases

As its name implies, the entire thrust of the Winnipeg portal is on its varied and historical neighborhoods. The NOW portal itself is divided into seven major site sections with 2,245 static pages and a further 425,000 record-oriented pages. The number of dynamic pages that may be generated from the site given various filtering or slicing-and-dicing choices is essentially infinite.

Neighborhoods

The fulcrum around which all data is organized on the NOW portal are the 236 neighborhoods within the City of Winnipeg, organized into 14 community areas, 15 political wards, and 23 neighborhood clusters. These neighborhood references link to thousands of City of Winnipeg and external sites, as well as have many descriptive pages of their own.

Some 57 different datasets contribute the information to the site, some authored specifically for the NOW portal with others migrated from legacy City databases. Coverage ranges from parks, schools, recreational and sports facilities, and zoning, to libraries, bus routes, police stations, day care facilities, community gardens and more. More than 1,400 attributes characterize this data, all of which may be used for filtering or slicing the data.

Property and Economic Development

A key aspect of the site is its real estate, assessment and zoning information. Every address and parcel in the city — a count nearing 190,000 in the current portal — may be looked up and related to its local and neighborhood amenities. Up to three areas of the City may be mapped and compared to one another, felt to be a useful tool for screening economic development potentials.

Census Data

All of the neighborhood and neighborhood clusters may be investigated and compared for Census data in two time periods (2001 and 2006). Types of Census informaton includes population, education, labor and work, transportation, education, languages, income, minorities and immigration, religion, marital status, and other family and household measures.

Any and all neighborhoods may be compared to one another on any or all of these measures, with results available in chart, table or export form.

Images and History

Images and history pages are provided for each Winnipeg neighborhood.

Mapping

Throughout, there are rich mapping options that can be sliced and displayed on any of these dimensions of locality or type of information or attribute.

More to Come!

The basic dataset authoring framework will enable City staff (and, perhaps, external parties or citizens) to add additional datasets to the portal over time.

Key Functionality and Statistics

The NOW site is rich in functionality and display and visualization options. Some of this functionality includes the:

NOW Ontology Graph

NOW Graph Structure

NOW is entirely an ontology-driven site, with both domain and administrative ontologies guiding all aspects of search, retrieval and organization. There are 12 domain ontologies govering the site, two of which are specific to NOW (the NOW ontology and a Canadian Census ontology). Ten external ontologies (such as FOAF, GeoNames, etc) are also used.

The NOW ontology, shown to the left, has more than 2500 subject concepts within it covering all aspects of municipal governance and specific Winnipeg factors.

Relation Browser

All of the 2500 linked concepts in the NOW ontology graph can be interactively explored and navigated via the relation browser. The central “bubble” also presents related, linked information such as images, Census data, descriptive material and the like. As adjacent “bubbles” are clicked, the user can navigate or “swim through” the NOW graph.

NOW Relation Browser

NOW Web Maps

Web Map

Nearly all of the information on the NOW site — or about 420,000 records — contains geolocational information of one form or another. There are about 200,000 points of interest records, another 200,000 area or polygon records, and about 7,000 paths and routes such as bus routes in the system.

All 190,000 property addresses in Winnipeg may be looked up and mapped.

Virtually all of the 57 datasets in the system may be filtered by category or type or attribute. This information can be filtered or searched using about 1400 different facets, singly or in combination with one another.

Various map perspectives are provided from facilities (schools, parks, etc.) to economic development and history, transportation routes and bus stops, and property, real estate and zoning records.

Templates

Depending on the type of object at hand, one of more than 50 templates may be invoked to govern the display of its record information. These templates are selected contextually from the ontology and present different layouts of map, image, record attribute or other information, all keyed by the governing type.

Each template is thus geared to present relevant information for the type of object at hand, in a layout specific to that object.

Objects lacking their own specific templates default to the display type of their parent or grandparent objects such that no object type lacks a display format.

Multiple templates are displayed on search pages, depending upon the diversity of object types returned by the given search.

Example of a NOW Record Template

Example of a NOW Census Chart

Graph Statistics

The NOW site provides a rich set of Census statistics by neighborhood or community area for comparison purposes. The nearly half million data points may be compared between neighborhoods (make sure and pick more than one) in graph form (shown) or in tabular form (not shown).

Census information spans from demographics and income to health, schooling and other measures of community well-being.

Like all other displays, the selected results can also be exported as open data (see below).

Image Gallery

The NOW portal presently has about 2700 images on site organized by object type, neighborhood, and historical. These images are contextually available in multiple locations throughout the site.

The History topic section also matches these images to historical neighborhood narratives.

Example of a NOW Image Gallery

Example conStruct Tool: structOntology

conStruct Tools

A series of twenty or so back office tools are available to City of Winnipeg staff to grow, manage and otherwise maintain the portal. Some of these tools are exposed in read-only form to the general public (see Geeky Tools next).

The example at left is the structOntology tool for managing the various ontologies on the site.

Geeky Tools

As a means to show what happens behind the scenes, the Geeky Tools section presents a number of the back office tools in read-only form. These are also good ways to see the semantic technologies in action.

The Geeky Tools section provides access to Search, Browse, Ontology, and Export (see next) tools.

NOW's Geeky Tools

The NOW Export Function

Open Data Exports

On virtually any display or after any filter selection, there is an “export” button that allows the active data to be exported in a variety of formats. Under Geeky Tools it is also possible to export whole datasets or slices of them. Some of the key formats include:

Some of these are serializations that are not standard ones for RDF, but follow a notation that retains the unique RDF aspects.

Some Early Lessons

Though the technical aspects of the NOW site have been ready for quite some time, with limited staff and budget it took City staff some time to convert all of its starting datasets and to learn how to develop and manage the site on its own. As a result, some of the design decisions made a couple of years back now appear a bit dated.

For example, the host content management system is Drupal 6, though Drupal 8 is getting close to its own release. Similarly, some of the display widgets are based on Flash, which Adobe announced last year it will continue to maintain, but will no longer develop. In the two years since design decisions were originally made, the importance of mobile apps and smartphones and tablets has also grown tremendously in importance.

These kinds of upgrades are a constant in the technology world, and apply to NOW as well. Fortunately, the underlying basis of the entire portal in its data and stack were architected to enable eventual upgrades.

Another key aspect of the site will be the degree to which external parties contribute additional data. It would be nice, for example, to see the site incorporate events announcements and non-City information on commercial and non-profit services and facilities.

Conclusion

Structured Dynamics is proud about the suitability of our OSF technology stack and is impressed with all the data that is being exposed. Our informal surveys suggest this is the largest open data portal by a major city worldwide to be released to date. It is certainly the first to be powered exclusively by semantic technologies.

Yet, despite those impressive claims, we submit that the real achievement of this project is something different. The fact that this entire portal is fully maintained and operated by the City’s own internal IT staff is a game changer. The IT staff of the City of Winnipeg had no prior internal semantic Web knowledge, nor any knowledge in RDF, OWL or any other underlying technologies used by the portal. What they had is a vision of their project and what they wanted. They placed significant faith and made a commitment to master the OSF technology stack, and the underlying semantic Web concepts and principles to make their vision a reality. Much of SD’s 430+ documents on the OSF TechWiki are a result of this collaborative technology transfer between us and the City.

We are truly grateful that the City of Winnipeg has taken open source and open data quite seriously. In our partnership wth them they have been extremely supportive of what we have done to progress the technology, release it as open source, and then to document our lessons and experiences for other parties to utilize as documented on the TechWiki. The City of Winnipeg truly shows enlightened government at its best. Thank you, especially to our project managers, Kelly Goldstrand and Don Conolly.

Structured Dynamics has long stated its philosophy as, “We are successful when we are no longer needed. We’re extremely pleased and proud that the NOW portal and the City of Winnipeg show this objective is within realistic reach.

Posted:February 18, 2013

The Semantic Enterprise Part 6 in the Enterprise-scale Semantic Systems Series

The fulcrum by which semantic technologies work within the enterprise is the dataset. A dataset refers to a named grouping of records, best designed as similar in record types and intended access rights (though technically a dataset is any named grouping of records).

Datasets play a central role in the organization of information in Structured Dynamics‘ (SD) open semantic framework (OSF). Datasets are one of the three major access dimensions to the OSF (the other two being users/groups and tools/endpoints). In combination, these three dimensions — datasets, users/groups and tools/endpoints — can also result in a powerful set of profiles that govern overall access to content.

Specific security aspects of the semantic enterprise stack are discussed in another part of this Enterprise-scale Semantic Systems (ESSS) series, but the interplay of those aspects with datasets is fundamental. As such, how datasets are bounded and organized (and, then, named) is a critical management consideration for enterprises that adopt a semantic technology stack based on an architecture like OSF. This role of datasets, how to organize them, how to manage them, and also some best practices for how to use them, are the focus of this part in our series.

Access Dimensions to the OSF

To briefly recall the architectural discussion in this series, SD’s semantic technology stack involves a Web services layer (structWSF) used to access specific functional endpoints, all via HTTP queries [1]. Some of these endpoints access complete applications in such areas as tagging, imports/exports, search and the like. Other endpoints individually provide (or not) access to CRUD (createread- updatedelete) rights to interact with either individual records, full datasets, or the ontologies that are the “schema” overlying this information. The net result, at present, is more than 20 individual Web service endpoints to query and interact with the system:

Web Service Create Read Update Delete
Auth Registrar: Access X X    
Auth Registrar: WS X X    
Auth: Lister   X    
Auth: Validator   X    
Ontology: Create X      
Ontology: Read   X    
Ontology: Update     X  
Ontology: Delete       X
Dataset: Create X      
Dataset: Read   X    
Dataset: Update     X  
Dataset: Delete       X
CRUD: Create X      
CRUD: Read   X    
CRUD: Update     X  
CRUD: Delete       X
Search   X    
SPARQL   X    
Tracker: Create X      
Scones        

This structWSF Web services layer has a three-dimensional design that is used to govern access:

  1. Users (or Groups or Roles)
  2. Tools, and
  3. Datasets.

A “user” may extend from an individual to an entire class or group of users, say, unregistered visitors to a given portal. Tools refer to each of the structWSF endpoints, each with its own URI.

What this means is that a given user may be granted access or not — and various rights or not from reading to the creation or deletion of information — in relation to specific datasets. Stated another way, it is in the nexus of user type and dataset that access control is established for the semantic system.

In an enterprise context, a given individual (“user”) may have different access rights depending on circumstance. A worker in a department may be able to see and do different things for departmental information than for enterprise information. A manager may be able to view budget information that is not readable by support personnel. A visitor to a different Web site or portal may see different information than visitors to other Web sites. Supervisors might be able to see and modify salary data for certain employees that is not viewable by others.

The user role or persona thus becomes the access identifier to the system. What information and what tools they might use in relation to that information is defined in relation to the datasets for which they have access.

Some Access Scenarios

So, let’s say, a given enterprise has two major information stores, #1 and #2, and also has some domain (or departmental or other such boundary) information for X, Y and Z, some of which is local (perhaps for the local branch) and the rest global (for that line of business). Further, let’s also suppose that those same departments also have sensitive, internal information related to either internal matters (such as salaries) or support matters (such as qualified vendors). This basic scenario is laid out in A of the diagram below:

Now, depending, different individuals (most often assigned to different access groups, but that is not required) need to have different access to this information. In one case, a general user with access to mostly public stuff exists for domain B; another for domain C. Then still, a supervisor or someone internally may have responsibilities in the Y domain; that could be case D.

Any of the same variations above could result in a different use case; AD above is merely illustrative.

Profiles to Overcome the Combinatorial Problem

It is fairly easy to see that the combination of datasets x tools x roles can lead to many access permutations. With, say, the current 20 some-odd tools in the OSF with five different roles and just ten different datasets, we already have about 1,000 permutations. As portals and dataset numbers grow, this combinatorial explosion gets even worse. Of course, not all combinations of datasets, tools and roles make sense. In fact, only a relatively few number of patterns likely covers 95% or more of all likely access options.

Because access rights are highly patterned, these theoretical combinations can in fact be boiled down to a small number of practical templates — called profiles — to which a newly registered dataset can be assigned. (Of course, the enterprise could also tweak any of the standard profiles to meet any of the combinatorial options for a specific, unusual individual, such as for a tax auditor.)  Experience, in fact, shows the number of actual profiles to be surprisingly small.

For instance, consider these possible profile patterns:

  • Profile: Public (standard) — this profile is for a dataset intended for broad public access
  • Profile: Registered — this profile is for datasets that are limited to registered users of a portal (possibly as a way to prevent spam or to encourage membership or participation)
  • Profile: Curated — this profile is where a specific group or groups (which themselves can be flexibly determined and assigned) has curation rights for the dataset, or
  • Profile: Internal — this profile is for internal (private) datasets where only a specific group or groups may access or modify. In some instances, an internal dataset might be the profile type while the dataset is under development, with the profile shifting to a broader access category once completed.

Profiles may, of course, be applied to any permutation.

This profile concept can now be expanded to incorporate user type. Four categories of users can illustrate this dimension:

  • O = Owner (the original registrar of the dataset; often possibly the “owner” or “admin” of the portal, but not necessarily so)
  • G = Group member (a registered user who is a member of a specific group)
  • R = Registered user (an authorized portal user with a Drupal login and password)
  • P = Public (anonymous user)

Further, of course, with a multitude of groups, there are potentially many more than four categories (“roles”) of users as well.

A Sample Profile Matrix

To illustrate how we can collapse this combinatorial space into something more manageable, let’s look at what one of the profile cases noted above — that is the Public profile — can now be expressed as a pattern or template. In this example, the Public profile means that owners and some groups may curate the data, but everyone can see and access the data. Also note that export is a special case, which could warrant a sub-profile.

We also need to relate this Public profile to a specific dataset. For this dataset, we can characterize our “possible” assignments as described above as to whether a specific user category (O, G, R and P as noted above) has available a given function (open dot), gets permission rights to that function by virtue of the assigned profile (solid dot), or whether that function may also be limited to a specific group or groups (half-filled dot) or not.

Thus, we can now see this example profile matrix for the Public profile for an example dataset with respect to the available structWSF Web services:

Note, of course, that these options and categories and assignments are purely arbitrary for our illustrative discussion. Actual needs and circumstances may vary wildly from this example.

Matrices such as this seem complex, but that is why profiles can collapse and simplify the potential assignments into a manageable number of discrete options. If the pre-packaged profiles need to be tweaked or adjusted for a particular circumstance, provisions through the CMS enables all assignments to be accessed in individual detail. Via this design, knowledge and collaboration networks can be deployed that support an unlimited number of configurations and options, all in a scalable, Web-accessible manner. The data that is accessed is automatically expressed as linked data. This same framework can be layered over in situ existing data assets to provide data federation and interoperable functionality, all responsive to standard enterprise concerns regarding data access, rights and permissions.

Best Practices

Datasets are clearly one of the fundamental dimensions for organizing content within this OSF semantic enterprise design. Some of the best practices in bounding these structures:

  • Domain – what is the applicable scope or business purpose of this information? It is best to think of this question with regard to access, which is, after all, the most pragmatic way to think of it
  • Source – does the data vary by publisher or source location? For example, provenance or download location or format may be an important distinguishing factor in release or access, and may have copyright or royalty implications
  • When created – does the data have periodic update or creation times? For example, it may be important to distinguish between preliminary data and final data or to segregate data because of workflow or processing considerations
  • Access rights – are there any differences in how users may see or act upon the data? For example, privileged budget information may be put in a different dataset from public financial information
  • Type – does the data vary by class or kind? For example, records about schools might be desirable to keep different from records about churches, though at a different level both may be considered buildings, or
  • Attributes – are there differences in fields or attributes that describe the data? For example, a portion of records may have complete attribute descriptions, while the majority only contain a few descriptive fields.

Any of these differences may warrant creating a separate dataset. There are no limits to the number of datasets that may be managed by a given OSF instance.

Once such boundaries get set, then thinking about common attributes or metadata should be applied. Still further, datasets and their records (as all decision or information artifacts in an enterprise) go through natural work stages or progressions. Even the lowliest written document needs to be drafted, reviewed, characterized, approved, and then possibly revised. Whatever such workflow steps may be, including versioning, may warrant consideration as belonging to a different dataset.

Lastly, whatever the operational mode devised, finding naming conventions to reflect these variations is essential to manage the dataset files. Which goes to show: datasets are meaningful information artifacts in and of themselves.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] Or programmatically via the structWSF API.
Posted:February 11, 2013

The Semantic Enterprise Part 5 in the Enterprise-scale Semantic Systems Series

We become such captives to our language and what we think we are saying. Many basic words or concepts, such as “search,” seem to fit into that mould. A weird thing with “search” is that twenty years ago the term and its prominence were quite different. Today, “search” is ubiquitous and for many its embodiment is Google such that “to google” is often the shorthand for “to search”.

When we actually do “search”, we submit a query. The same extension that has inveigled its tendrils into search has also caused the idea of a query to become synonymous with standard text (Google) search.

But, there’s more, much more, both within the meaning and the execution of “to search”.

Enterprises, familiar with structured query language (SQL), have understood for quite some time that queries and search were more than text searches to search engines. Semantic technologies have their own structured query approach, SPARQL. Enterprises know the value of search from discovery to research and navigation. And, they also intuitively know that they waste much time and don’t often get what they want from search. U.S. businesses alone could derive $33 billion in annual benefits from being better able to re-find previously discovered Internet information, an amount equal to $10 million per firm for the 1,000 largest businesses [1]. And those benefits, of course are only for Internet searches. There are much larger costs arising from unnecessary duplicated effort because of weaknesses in internal search [1].

The thing that’s great about semantic search — done right — is it combines conventional text search with structured search, adds more goodies, and basically overcomes most current search limitations.

Many Kinds of Search

The Webster definition of “search” is to, look into or over carefully or thoroughly in an effort to find or discover something.”

There are two telling aspects to this definition. One, search may be either casual or careful, from “looking” into something to being “thorough”. Second, search may have as its purpose finding or discovery. Finding, again, implies purpose or research. Discovery can range from serendipity to broadening one’s understanding or horizons given a starting topic.

Prior to the relational systems, network databases represented the state-of-the-art. One of E.F. Codd‘s stated reasons in developing the relational approach and its accompanying SQL query language was to shift the orientation of databases from links and relationships (the network approach) to query and focused search [2]. By virtue of the technology design put forward, relational databases shifted the premise to structured information and direct search queries. Yet, as noted, this only represents the purposeful end of the search spectrum; navigation and discovery now becomes secondary.

Text search and (text) search engines then came to the fore, offering a still-different model of indexing and search. Each term became a basis for document retrieval, leading to term-based means of scoring (the famous Salton TF/IDF statistical model), but with actually no understanding of the semantic structure or meaning of the document. Other term-based retrieval bases, such as latent semantic indexing, were put forward, but these were based on the statistical relationships between terms in documents, and not the actual meaning of the text or natural language within the documents.

What we see in the early evolution of “search” is kind of a fragmented mess. Structured search swung from navigation to purposeful queries. Text search showed itself to be term-based and reliant on Boolean logic. Each approach and information store thus had its own way to represent or index the data and a different kind of search function to access it. Web search, with its renewal of links and relationships, further shifted the locus back to the network model.

State-of-the-art semantic search , as practiced by Structured Dynamics, has found a way to combine these various underlying retrieval engines with the descriptive power of the graph and semantic technologies to provide a universal search mechanism across all types of information stores. We describe this basis more fully below, but what is important to emphasize at the outset is that this approach fundamentally addresses all aspects of search within the enterprise. As a compelling rationale for trying and then adopting semantic technologies, semantic search is the primary first interest for most enterprises.

Unique Advantages to Semantic Search

The first advantage of semantic search is that all content within the organization can be combined and searched at once. Structured stuff . . . documents . . . image metadata . . . databases . . . can now all be characterized and put on an equivalent search footing. As we just discussed in text as a first class citizen, this power of indexing all content types is the real dynamo underneath semantic search.

The universality of search means that being able to search all available content is awesome. But, being able to add the dimensions of relationships between things means that the semantic graph takes information exploration to a totally new level.

The simplest way to understand semantic search is to de-construct the basic RDF triple down to its fundamentals. This first tells us that the RDF data model is able to represent any thing, that is, an object or idea. And, we can represent that object in virtually any way that any viewer would care to describe it, in any language. Do we want it to be big, small? blue, green? meaningful, silly? smart, stupid? The data model allows this and more. We can capture how diverse users describe the same thing in diverse ways.

But, now that I have my world populated with things and descriptions of them, how do they connect? What are the relationships between these things? It is the linkages — the connections, the relationships — between things that give us context, the basis for classifying, and as a result, the means to ascertain the similarity or adjacency of those things. These sorts of adjacencies then enable us to understand the “pattern” of the thing, which is ultimately the real basis for organizing our world.

The rich brew of things (‘nouns”) and the connections between them (“verbs”) starts to give our computers a basis for describing the world more akin to our real language. It is not perfect, and even if it were, it would still suffer from the communication challenges that occur between all of us as humans. Language itself is another codified way of transmitting messages, which will always suffer some degree of loss [3]. But in this comment we can also glean a truth: humans interacting with their computer servants will be more effective the more “human” their interfaces are. And this truth can also give us some insight into what search must do.

First, we are interested in classifying and organizing things. The idea of “facets”, the arrangement of search results into categories based on indexed terms, is not a new one in search. In conventional approaches, “facets” are approached as a kind of dimension, one that is purposefully organized, sometimes hierarchically. In Web interfaces, facets most often appear as a listing in a left-hand column from which one or more of these dimensions might be selected, sometimes with a count number of potential results after the facet or sometimes with a checkbox or such by which multiple of these facets might be combined. In essence, these facets act as structural or classificatory “filters” for the content at hand. This is made potentially more powerful when also combined with basic keyword search.

In semantic search, facets may be derived from not only what types of things exist in the search space, but also what kinds of attributes (or properties) connect them. And, this all comes for free. Unlike conventional faceting, no one needs to decide what are the important “dimensions” or any such. With semantic search, the very basis of describing the domain at hand creates an organization of all things in the space.  As a result of semantic search, this combination of entities and properties leads to what could be called “global faceting”. The structure of how the domain is described is the sole basis required to gain — and make universal to the information space — these facets.

Whoa! How did that happen? All we did is describe our information space, but now we have all of this rich structure. This is when the first important enterprise realization sets in:  how we describe the information in our domain is the driving, critical factor. Semantic search is but easy pickings from this baseline. What is totally cool about the nature of semantic search is that slicing-and-dicing would put a sushi restaurant to shame. Every property represents a different pathway; and every entry (node) is an entry point.

Second, because we have based all of this on an underlying logic model in descriptive logics, we gain a huge Archimedes’ lever about our information space. We do not need to state all of the relationships and organizations in our information space. We can infer them from the assertions already made. Two parents have a child? That child has a sibling? Then, we can infer the second child also has the same parents. The “facts” that one might assume about a given domain can grow by 10x or more when inference is included.

Now we can begin to see where the benefits and return from semantic search becomes evident. Semantic search also enables a qualitatively different content enrichment: we can use these broad understandings of our content to do better targeting, tagging, highlighting or relating concepts to one another. The fact that semantic search is simply a foundation to semantic publishing is noteworthy. We will discuss this topic in a later part to this series.

SD’s Approach: RDF Triple Store + Solr + OWLAPI

In recognition of the primacy of search, we at Structured Dynamics were one of the first in the semantic Web community to add Solr (based on Lucene) full-text indexing to the structured search of an RDF triple store [4]. We later added the OWL API to gain even more power in our structured queries [5]. These three components give us the best of unstructured and structured search, and enable us to handle all kinds of search with additional flexibility at scale. Since we historically combined RDF and Solr first, let’s discuss it first.

We first adopted Solr because traditional text search of RDF triple stores is not sufficiently performant and makes it difficult to retrieve logical (user) labels in place of the URIs used in semantic technologies. While RDF and its graph model provide manifest benefits (see below), text search is a relatively mature technology and Solr provided commercial-grade features and performance in an open source option.

In our design, the triple store is the data orchestrator. The RDF data model and its triple store are used to populate the Solr schema index. The structural specifications (schema) in the triple store guide the development of facets and dynamic fields within Solr. These fields and facets in Solr give us the ability to gain Solr advantages such as aggregates, autocompletion, spell checkers and the like. We also are able to capture the full text if the item is a document, enabling standard text search to be combined with the structural aspects orchestrated from the RDF. On the RDF side, we can leverage the schema of the underlying ontologies to also do inferencing (via forward chaining). This combination gives us an optimal search platform to do full-text search, aggregates and filtering.

Since our initial adoption of Solr, and Solr’s own continued growth, we have been able to (more-or-less) seamlessly embrace geo-locational based search, time-based search, the use of multiple search profiles and ranking and scoring approaches (using Solr’s powerful extended disMax edismax parser) and other advantages. We now have nearly five years of experience of the RDF + Solr combination. We continue to discover new functionality and power in this combination. We are extremely pleased with this choice.

On the structured data side, RDF and its graph model have many inherent advantages, as earlier described. One of those advantages is the graph structure itself:

Example Taxonomy Structure Example Ontology Structure
A distinguishing characteristic of ontologies compared to conventional hierarchical structures is their degree
of connectedness, their ability to model coherent, linked relationships

Another advantage over conventional structured search (SQL) with relational databases is performance. For example, as Rik Van Bruggen recently explained [6], RDBMs searches that need to obtain information from more than one table require a “join.” The indexes in all applicable tables need to be scanned recursively to find all the data elements fitting the query criteria. Conversely, in a graph database, the index needs only be accessed once to find the starting point in the graph, after which the relationships in the graph are “walked” to traverse the graph to find the next applicable data elements. The need for complete scans is what makes “joins” expensive computationally. Graph queries are incredibly fast because index lookups are hugely reduced.

Queries that experienced DBAs with relational databases would never attempt because of the excessive need for joins are trivial in a graph search.

Various graph databases provide canned means for traversing or doing graph-based operations. And that brings us to the second addition we added to the RDF triple store: inclusion of the OWL API. While it is true that our standard triple store, Virtuoso, has support for simple inferencing and forward chaining, the fact that our semantic technologies are based on OWL 2 means that we can bring more power to bear with an ontology-specific API, including reasoners.The OWL API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines. Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API.

Thus, besides the advantages of RDF and graph-based search, we can now reason over and manipulate the ontologies themselves to bring even more search power to the system. Because of the existing integrations between the triple store and Solr, these same retrieval options can also be used to inform Solr query retrievals.

Shaking Hands with the Enterprise

On the face of it, a search infrastructure based on three components — triple store + Solr + OWL API — appears more complex than a single solution. But, enterprises already have search provided in many different guises involving text or SQL-based queries. Structured Dynamics now has nearly five years experience with this combined search configuration. Each deployment results in better installation and deployment procedures, including scripting and testing automation. The fact there are three components to the search stack is not really the challenge for enterprise adoption.

This combined approach to search really poses two classes of challenges to the enterprise. The first, and more fundamental one, is the new mindset that semantic search requires. Facets need to be understood and widely embraced; graphs and graph traversals are quite new concepts; full incorporation of tagging to make text a first-class citizen with structured search needs to be embraced; and, the pivotal role of ontologies in driving the whole structural understanding of the domain and all the various ways to describe it means a shift in thinking from dedicated applications for specific purposes to generic ontology-driven applications. These new mindsets require concerted knowledge transfer and training. Many of the new implementers are now the subject matter experts and content editors within the enterprise, rather than developers. Dedicated effort is also necessary — and needs to be continually applied — to enable ontologies to properly and adaptively capture the enterprise’s understanding of its applicable domain.

These are people-oriented aspects that require documentation, training materials, tools and work processes. These topics, actually some of the most critical to our own services, are discussed in later parts to this ESSS series.

The second challenge is in the greater variability and diversity of the “dials and knobs” now available to the enterprise to govern how these search capabilities actually work. The ranking of search results can now embrace many fields and attributes; many different types of content; and potentially different contexts. Weights (or “boosts” in Solr terms) can be applied to every single field involved in a search. Fields may be included or excluded in searches, thereby acting as filters. Different processors or parsers may be applied to handle such things as text case (upper or lower), stemming for dealing with plurals and variants, spelling variants such as between British and American English, invoking or not synonyms, handling multiple languages, and the like.

This level of control means that purposeful means and frameworks must be put in place that enable responsible managers in the enterprise to decide such settings. Understanding of these “dials and knobs” must therefore also be transferred to the enterprise. Then, easily used interfaces for changing and selecting options and then comparing the results of those changes must be embedded in tools and transferred. (This latter area is quite exciting and one area of innovation SD will be reporting on in the near future.)

The Productivity Benefits

There are actually many public Web sites that are doing fantastic and admirable jobs of bringing broad, complicated, structured search to users, all without much if any semantic technologies in the back end. Some quick examples that come to mind are Trulia in real estate; Fidelity in financial products; Amazon in general retail, etc. One difficulty that semantic search has in comparison to the alternatives is that first-blush inspection of Web sites may not show many large differences.

The real advantages from semantic search comes in its productivity and flexibility. Semantic search frameworks are easier to construct, easier to extend, easier to modify and cheaper to build. Semantic search frameworks are inherently robust. Adding entirely new domains of scope — say from moving from a department level to the entire enterprise or accommodating a new acquisition — can be implemented in a fraction of the time without the need for rework.

It will be necessary to document the use case experience of early adopting enterprises to quantify these productivity and flexibility benefits. From Structured Dynamics’ experience, however, these advantages are in the range of one to two orders of magnitude in reduced deployment and maintenance costs compared to RDBMs-based approaches.

The Tie-in with Semantic Publication

Another hot topic of late has been “semantic publishing” that is of keen interest to media and content-intensive sites on the Web. What is interesting about semantic pubishing, however, is that it is completely founded on semantic search. All of the presentation or publishing of content in the interface (or in an exported form) is the result of search. Remember, due to Structured Dynamics’ semantic technology design with its structWSF interfaces, all interaction with the underlying engines and system occur via queries.

We will be talking much about semantic publishing toward the conclusion of this series. We will cover content enrichment, new kinds of products such as topic pages and semantic forms and widgets, and the fact that semantic publishing is available almost for “free” when your stack is based on semantic technologies with semantic search, SD-style.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] M.K. Bergman, 2004. “Untapped Assets: The $3 Trillion Value of U.S. Enterprise Documents,” BrightPlanet Corporation White Paper, December 2004, 41 pp. Published on this blog at http://www.mkbergman.com/82/untapped-assets-the-3-trillion-value-of-us-enterprise-documents/.
[2] See, for instance, the Wikipedia entry on the historical development of databases.
[3] M.K. Bergman, 2012. “What is Structure?,” AI3:::Adaptive Information blog, May 28, 2012.
[4] F. Giasson, 2009. “RDF Aggregates and Full Text Search on Steroids with Solr,” Fred Giasson’s blog, April 9, 2009.
[5] M.K. Bergman, 2010. “A New Landscape in Ontology Development Tools,”, AI3:::Adaptive Information blog, September 7, 2010.
[6] See, for example, Rik Van Bruggen, 2013. “Demining the ‘Join Bomb’ with Graph Queries,” Neo4J blog, January 28, 2013.