Previous installments in this series have listed existing ontology tools, overviewed development methodologies, and proposed a new approach to building lightweight, domain ontologies . For the latter to be successful, a new generation in ontology development tools is needed. This post provides an explication of the landscape under which this new generation of tools is occurring.
Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data.
We are now concluding the first decade of ontology development tools, especially those geared to the semantic Web and its associated languages of RDFS and OWL. Last year we also saw the release of the major update to the OWL 2 language, with its shift to more expressiveness and a variety of profiles. The upcoming next generation of ontology tools now must also shift.
The current imperative is to shift away from ontology engineering by a priesthood to pragmatic daily use and maintenance by domain practitioners. Market growth demands simpler, task-focused tools with intuitive interfaces. For this change to occur, the general tools architecture needs to shift its center of gravity from IDEs and comprehensive toolkits to APIs and Web services. Not surprisingly, this same shift is what has been occurring across all areas of software.
In the previous installment of this series, we presented a new methodological approach to ontology development, geared to lightweight, domain ontologies. One aspect of that design was to separate the operational workflow into two pathways:
The ontology build methodology concentrated on the upper half of this diagram (blue, with yellow lead-ins and outcomes) with the various steps overviewed in that installment :
The methodology captured in this diagram embraces many different emphases from current practice: re-use of existing structure and information assets; conscious split between instance data (ABox) and the conceptual structure (TBox) ; incremental design; coherency and other integrity testing; and explicit feedback for scope extension and growth. The methodology also embraces some complementary utility ontologies that also reflect the design of ontology-driven apps .
These are notable changes in emphasis. But they are not the most important one. The most important change is the tools landscape to implement this methodology. This landscape needs to shift to pragmatic daily use and maintenance by domain practitioners. That requires simpler and more task-oriented tools. And that change in tooling needs a still more fundamental shift in tools architecture and design.
In many places throughout this series I use the term “inadequate” to describe the current state of ontology development tools. This characterization is not a criticism of first-generation tools per se. Rather, it is a reflection of their inadequacy to fulfill the realities of the new tooling landscape argued in this series. The fact remains, as initial generation tools, that many of the existing tools are quite remarkable and will play central roles (mostly for the professional ontologist or developer) moving forward.
At the risk of overlooking some important players, let’s trace the (partial) legacy of some of the more pivotal tools in today’s environment.
As early as a decade ago the ontology standards languages were still in flux and the tools basis was similarly immature. Frame logic, description logics, common logic and many others were competing at that time for primacy and visibility. Most ontology tools at that time such as Protégé , OntoEdit , or OilEd  were based on F-logic or the predecessor to OWL, DAML+Oil. But the OWL language was under development by the W3C and in anticipation of its formal release the tools environment was also evolving to meet it. Swoop , for example, was one of the first dedicated OWL browsers. A Protégé plug-in for OWL was also developed by Holger Knublauch . In parallel, the OWL group at the University of Manchester also introduced the OWL API .
With the formal release of OWL 1.0 in 2004, ontology tools continued to migrate to the language. Protégé, up through the version 3x series, became a popular open source system with many visualization and OWL-related plug-ins. Knublauch joined TopQuadrant and brought his OWL experience to TopBraid Composer, which shifted to the Eclipse IDE platform and leveraged the Jena API [9,11]. In Europe, the NeON (Networked Ontologies) project started in 2006 and by 2008 had an Eclipse-based OWL platform using the OWL API with key language processing capabilities through GATE .
Most recently, Protégé and NeON in open source, and TopBraid Composer on the commercial side, have likely had the largest market share of the comprehensive ontology toolkits. So far, with the release of OWL 2 in late 2009, only Protégé in version 4 and the TwoUse Toolkit have yet fully embraced all aspects of the new specification, doing so by intimately linking with the new OWL API (version 3x has full OWL 2 support) . However, most leading reasoners now support OWL 2 and products such as TopBraid Composer and Ontotext’s OWLIM support OWL 2 RL as well .
The evolution of Protégé to version 4 (OWL 2) was led by the University of Manchester via its CO-ODE project , now ended, which has also been a source for most existing Protégé 4 plug-ins. (Because of the switch to OWL 2 and the OWL API most earlier plug-ins are incompatible with Protégé 4.) Manchester has also been a leading force in the development of OWL 2 and the alternative Manchester syntax.
Though only recently stable because of the formalization of OWL 2, Protégé 4 and its linkage to the new OWL API provides for a very powerful combination. With Protégé, the system has a familiar ontology editing framework and a mechanism for plug-in migration and growth. With the OWL API, there is now a common API for leading reasoners (Pellet, HermiT, FaCT++, RacerPro, etc.), a solid ontology management and annotation framework, and validators for various OWL 2 profiles (RL, EL and QL). The system is widely embraced by the biology community, probably the most active scientific field in ontologies. However, plug-in support lags the diversity of prior versions of Protégé and there does not appear to be the energy and community standing behind it as in prior years.
These leading frameworks and toolkits have opted to be “ontology engineering” environments. Via plug-ins and complicated interfaces (tabs or Eclipse-style panes) the intent has apparently been to provide “all capabilities in one box.” The tools have been IDE-centric.
Unfortunately, one must be a combination of ontologist, developer, programmer and IDE expert in order use the tools effectively. And, as incremental capabilities get added to the systems, these also inherit the same complexity and style of the host environment. It is simply not possible to make complex environments and conventions simple.
Curiously, the existence or use of APIs have also not been adequately leveraged. The usefulness of an API means that subsets of information can be extracted and worked on in very clear and simple ways. This information can then be roundtripped without loss. An API allows a tailored subset abstraction of the underlying data model. In contrast, IDEs, such as Protégé or Eclipse, when they play a similar role, force all interfaces to share their built-in complexity.
With these thoughts in mind, then, we set out to architect a tools suite and work flow that could truly take advantage of a central API. We further wanted to isolate the pieces into distributable Web services in keeping with our standard structWSF Web services framework design.
This approach also allows us to split out simpler, focused tools that domain users and practitioners can use. And, we can do all of this while also enabling the existing professional toolsets and IDEs to also interoperate in the environment.
The resulting tools landscape is shown in the diagram below. This diagram takes the same methodology flow from Figure 1 (blue and yellow boxes) and stretches them out in a more linear fashion. Then, we embed the various tools (brown) and APIs (orange) in relation to that methodology:
This diagram is worth expanding to full size and studying in some detail. Aspects of this diagram that deserve more discussion are presented in the sections below.
As noted in the preceding methodology installment, the working ontology is the central object being managed and extended for a given deployment. Because that ontology will evolve and grow over time, it is important the complete ontology specification itself be managed by some form of version control system (green) . This is the one independent tool in the landscape.
Access to and from the working ontology is mediated by the OWL API . The API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines . Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API. It is reasonable to expect more APIs to emerge over time that also interoperate .
The OWL API is the best current choice because of its native capabilities and because Jena does not yet support OWL 2 . However, because of the basic design with structWSF (see next), it is also possible to swap out with different APIs at a later time should developments warrant.
In short, having the API play the central management role in the system means that any and all tools can be designed to interact effectively with the working ontology(ies) without any loss in information due to roundtripping.
Use of the structWSF layer also means that tools and functionality can be distributed anywhere on the Web. Specialized server-side functions can be supported as well as dedicated specialty hardware. Text indexing or disambiguation services can fit within this design.
The ultimate value of piggybacking on the structWSF framework is that all other extant services also become available. Thus, a wealth of converters, data managers, and semantic components (or display widgets) can be invoked depending on the needs of the specific tool.
The objective, of course, of this design is to promote more and simpler tools useful to domain users. Some of these are shown under the Use & Maintain box in the diagram above; others are listed by category in the table below.
The RESTful interface and parameter calls of the structWSF layer further simplify the ontology management and annotation abstractions arising from the OWL API. The number of simple tools available to users under this design is virtually limitless. These tools are also fast to develop and test.
This landscape is not yet a full reality. It is a vision of adaptive and simpler tools, working with a common API, and accessible via platform-independent Web services. It also preserves many of the existing tools and IDEs familiar to present ontology engineers.
However, pieces of this landscape do presently exist and more are on the way. The next section briefly overviews some of the major application areas where these tools might contribute.
If one inspects the earlier listing of 185 ontology tools it is clear that there is a diversity of tools both in terms of scope and function across the entire ontology development stack. It is also clear that nearly all of those 185 tools listed do not communicate with one another. That is a tremendous waste.
Via shared APIs and some degree of consistent design it should be possible to migrate these capabilities into a more-or-less interoperating whole. We have thus tried to categorize some important tool types and exemplar tools from that listing to show the potential that exists. (Please note that the Example Tools are links to the tools and categories from the earlier 185 tools listing.)
This correlation of types and example tools is not meant to be exhaustive nor a recommendation of specific tools. But, this tabulation is illustrative of the potential that exists to both simplify and extend tool support across the entire ontology development workflow:
|Tool Type||Comments||Example Tools|
|OWL API||OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL||OWL API|
|Web Services Layer||This layer provides a common access layer and set of protocols for almost all tools. It depends critically on linkage and communication with the OWL API||structWSF|
|Ontology Editor (IDE)||There are a variety of options in this area. Generally, more complete environments (that is, IDEs) based on OWL and with links to the OWL API are preferred. Less complete editor options are listed under other categories. Note that only Protégé 4 incorporates the OWL API||NeOn toolkit,
|Scripts||In all pragmatic cases the migration of existing structure and vocabulary assets to an ontology framework requires some form of scripting. These may be off the shelf resources, but more often are specific to the use case at hand. Typical scripting languages include the standard ones (Perl, Python, PHP, Ruby, XSLT, etc.) and often involve some form of parsing or regex||variety; specific to use case|
|Converters||Converters are more-or-less pre-packaged scripts for migrating one serialization or data format to another one. As the scripts above continue to be developed, this roster of off-she-shelf starting points can increase. Today, there are perhaps close to 200 converters useful to ontology purposes||irON, ReDeFer, SKOS2GenTax; also see RDFizers|
|Vocabulary Prompter||Domain ontologies are ultimately about meaning, and for that purpose there is much need for definitions, synonyms, hyponyms, and related language assets. Vocabulary prompters take input documents or structures and help identify additional vocabulary useful for characterizing semantic meaning||see the TechWiki’s vocab prompting tools; ROC|
|Spreadsheet||Spreadsheets can be important initial development environments for users without explicit ontology engineering backgrounds. The biggest issue with spreadsheets is that what is specified in them is more general or simplistic compared to what is contained in an actual ontology. Attempts to have spreadsheets capture all of this sophistication are often less than satisfactory. One way to effective “round trip” with spreadsheets (and many related simple tools) is to adhere to an OWL API||Anzo, RDF123, irON (commON), Excel, Open Office|
|Editor (general)||Ontology editing spans from simple structures useful to non-ontologists to those (like the IDEs or toolkits) that capture all aspects of the ontology. Further, some of these editors are strictly textual or (literally) editors; others span or attempt to enable visual editing. Visual editing (see below) can ultimately extend to the ontology graph itself||see the TechWiki’s ontology editing tools|
|Alignment API||The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF||Alignment API|
|Mapper||A variety of tools, algorithms and techniques are available for matching or mapping concepts between two different ontologies. In general, no single method has shown itself individually superior. The better approaches use voting methods based on multiple comparisons||see the TechWiki’s ontology mapping tools|
|Ontology Browser||Ontology browsers enable the navigation or exploration of the ontology — generally in visual form — but without allowing explicit editing of the structure||Relation Browser, Ontology Browser, OwlSight, FlexViz|
|Vocabulary Manager||Vocabulary managers provide a central facility for viewing, selecting, accessing and managing all aspects of the vocabulary in an ontology (that is, to the level of all classes and properties). This tool category is poorly represented at present. Ultimately, vocabulary managers should also be one (if not the main) access point to vocabulary editing||PoolParty, TermWiki, UMBEL Web service|
|Vocabulary Editor||Vocabulary editors provide (generally simple) interfaces for the editing and updating of vocabulary terms, classes and properties in an ontology||Neologism, TemaTres, ThManager, Vocab Editor|
|Structure Editor||A structure editor is a specific form of an ontology editor, geared to the subsumption (taxonomic) organization of a largely hierarchical structure. Editors of this form tend to use tree controls or spreadsheets with indented organization to show parent and child relationships||PoolParty, irON (commON)|
|Graph Analysis||Ontologies form graph structures, which are amenable to many specific network and graph analysis algorithms, included relatedness, shortest path, grouped structures, communities and the like||SNAP, igraph, Network Workbench, NetworkX, Ontology Metrics|
|Graph API||Graph visualization with associated tools is best enabled by working from a common API. This allows for expansion and re-use of other capabilities. Preferably, this graph API would also have direct interaction with the OWL API, but none exist at the moment||under investigation|
|Graph Visualizer||Graph visualizers enable the ontology to be rendered in graph form and presentation, often with multiple layout options. The systems also enable export to PDF or graphics formats for display or printing. The better tools in this category can handle large graphs, can have their displays easily configured, and are performant||see the TechWiki’s ontology visualization tools|
|Visual Editor||An ontology visual editor enables the direct manipulation of the graph in a visual mode. This capability includes adding and moving nodes, changing linkages between nodes, and other ontology specification. Very few tools exist in this category at present||COE, TwoUse Toolkit|
|Coherence Tester||Testing for coherence involves whether the ontology structure is properly constructed and has logical interconnections. The testing either involves inference and logic testing (including entailments) based on the structure as provided; comparisons with already vetted logical structures and knowledge bases (e.g., Cyc, Wikipedia); or both||Cyc, OWLim, FactForge|
|Gap Tester||Related to coherence testing, gap testing is the identification of key missing pieces or intermediary nodes in the ontology graph. This tends to happen when external specification of the ontology is made without reference to connecting information||requires use of a reference external ontology; see above|
|Documenter||Ontology documentation is not limited to the technical specifications of the structure, but also includes best practices, how-to and use guides, and the like. Automated generation of structure documentation is also highly desirable||TechWiki, SpecGen, OWLDoc|
|Tagger||Once constructed, ontologies (and their accompanying named entity dictionaries) can be very powerful resources for aiding tagging and information extraction utilities. Like vocabulary prompting, there is a broad spectrum of potential tools and uses in the tagging category||GATE (OBIE); many other options|
|Exporter||Exports need to range from full-blown OWL representations to the simpler export of data and constructs. Multiple serialization options and the ability to support the input requirements of third-party tools is also important||OWL Syntax Converter, OWL Verbalizer; many various options|
The beauty of this approach is that most of the tools listed are open source and potentially amenable to the minor modifications necessary to conform with this proposed landscape.
Contrasting the normative tools landscape above with the existing listing of ontology tools points out some key gaps or areas deserving more development attention. Some of these are:
Finally, it does appear that the effort and focus behind Protégé seems to be slowing somewhat. The future has clearly shifted to OWL 2 with Protégé 4. Yet, besides the admirable CO-ODE project (now ended), tools and plug-in support seems to have slowed. Many of the admirable plug-ins for Protégé 3x do not appear to be under active development as upgrades to Protégé 4. While Protégé’s future (and similar IDEs) seems assured, its prominence possibly will (and should) be replaced by a simpler kit of tools useful to users and practitioners.
For the past few months we at Structured Dynamics have seen ontology design and management as the pending technical priorities within the semantic technology space. Now that the market no longer looks at “ontology” as a four-letter word, it is imperative to simplify the development and use of ontologies. The first generation of tools leading up to this point have been helpful to understand the semantic space; changes are now necessary to expand it.
In our first generation we have begun to understand the types and nature of needed tools. But our focus on IDEs and comprehensive toolsets belies a developer’s or technologist’s perspective. We need to now shift focus and look at tool needs from the standpoint of users and actual use of ontologies. Many players and many toolmakers and innovators will need to contribute to build this market for semantic technologies and approaches.
Fortunately, replacing an IDE focus with one based around APIs and Web services should be a fairly smooth and natural transition. If we truly desire to be market makers, we need to stand back and place ourselves into the shoes of the domain practitioners, the subject matter experts. We need to shield actual users from all of the silly technical details and complexity. And, then, let’s focus — task-by-task — on discrete items of management and use of ontologies. Growth of the semantic technology space depends on expanding our practitioner base.
For its part, Structured Dynamics is presently seeking new projects and sponsors with a commitment to these aims. Like our prior development of structWSF and semantic components, we will be looking to make simpler ontology tools a priority in the coming months. Please let me know if you want to partner with us toward this commitment.
At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing .
All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.
Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents). A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.
Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.
We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design . However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:
(click for full size)
The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:
Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.
Current semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.
Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.
These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:
(click for full size)
New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
As the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.
Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.
We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.
There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.
Actual code SVN repositories are unchanged. These code repositories may be found at:
We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!
As I reported about a year ago after my first attendance, I think the Semantic Technology Conference is the best venue going for pragmatic discussion of semantic approaches in the enterprise. I’m really pleased that I will be attending again this year. The conference (#SemTech) will be held at the Hilton Union Square in downtown San Francisco on June 21-25, 2010. Now in its sixth year and the largest of its kind, it is again projected to attract 1500 attendees or so.
A really exciting presentation for us is, Sizzle for the Steak: Rich, Visual Interfaces for Ontology-driven Apps, on Wed, June 23 in the 2:00 PM – 3:00 PM session.
A nagging gap in the semantic technology stack is acceptable — better still, compelling — user experiences. After our exile for a couple of years doing essential infrastructure work, we have been unshackled over the past year or so to innovate on user interfaces for semantic technologies.
Our unique approach uses adaptive ontologies to drive rich Internet applications (RIAs) through what we call “semantic components.” This framework is unbelievably flexible and powerful and can seamlessly interact with our structWSF Web services framework and its conStruct Drupal implementations.
We will be showing these rich user interfaces for the first time in this session. We will show concept explorers, “slicer-and-dicers”, dashboards, information extraction and annotation, mapping, data visualization and ontology management. Get your visualization anyway you’d like, and for any slice you’d like!
While we will focus on the sizzle and demos, we will also explain a bit of the technology that is working behind Oz’s curtain.
A more informal, interactive F2F discussion will be, MIKE2.0 for the Semantic Enterprise, on Thurs, June 24 in the 4:45 PM – 5:45 PM slot.
MIKE2.0 (Method for an Integrated Knowledge Environment) is an open source methodology for enterprise information management that is coupled with a collaborative framework for information development. It is oriented around a variety of solution “offerings”, ranging from the comprehensive and the composite to specific practices and technologies. A couple of months back, I gave an overview of MIKE2.0 that was pretty popular.
We have been instrumental in adding a semantic enterprise component to MIKE2.0, with our specific version of it called Open SEAS. In this Face-to-Face session, experts and desirous practitioners will join together to discuss how to effectively leverage this framework. While I will intro and facilitate, expect many other MIKE2.0 aficionados to participate.
This is perhaps a new concept to many, but what is exciting about MIKE2.0 is that it provides a methodology and documentation complement to technology alone. When combined with that technology, all pieces comprise what might be called a total open solution. I personally think it is the next logical step beyond open source.
So, if you have not already made plans, consider adjusting your schedule today. And, contact me in advance (mike at structureddynamics dot com) if you’ll be there. We’d love to chat!
In earlier posts, I described the significant progress in climbing the data federation pyramid, today’s evolution in emphasis to the semantic Web, and the 40 or so sources of semantic heterogeneity. We now transition to an overview of how one goes about providing these semantics and resolving these heterogeneities.
In an excellent recent overview of semantic Web progress, Paul Warren points out:
Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. . . . Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area . . . .Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge.
Of course, even assuming that ontologies are created and semantics and metadata are added to content, there still remains the nasty problems of resolving heterogeneities (semantic mediation) and efficiently storing and retrieving the metadata and semantic relationships.
Putting all of this process in place requires the infrastructure in the form of tools and automation and proper incentives and rewards for users and suppliers to conform to it.
In his paper, Warren repeatedly points to the need for “semi-automatic” methods to make the semantic Web a reality. He makes fully a dozen such references, in addition to multiple references to the need for “reasoning algorithms.” In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:
In a different vein, SemWebCentral lists these clusters of semantic Web-related tasks, each of which also requires tools:
With some ontologies approaching tens to hundreds of thousands to millions of triples, viewing, annotating and reconciling at scale can be daunting tasks, the efforts behind which would never be taken without useful tools and automation.
A 2005 paper by Izza, Vincent and Burlat (among many other excellent ones) at the first International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA) provides a very readable overview on the role of semantics and ontologies in enterprise integration. Besides proposing a fairly compelling unified framework, the authors also present a useful workflow perspective emphasizing Web services (WS), also applicable to semantics in general, that helps frame this challenge:
Generic Semantic Integration Workflow (adapted from )
For existing data and documents, the workflow begins with information extraction or annotation of semantics and metadata (#1) in accordance with a reference ontology. Newly found information via harvesting must also be integrated; however, external information or services may come bearing their own ontologies, in which case some form of semantic mediation is required.
Of course, this is a generic workflow, and depending on the interoperation task, different flows and steps may be required. Indeed, the overall workflow can vary by perspective and researcher, with semantic resolution workflow modeling a prime area of current investigations. (As one alternative among scores, see for example Cardoso and Sheth.)
Semantic mediation is a process of matching schemas and mapping attributes and values, often with intermediate transformations (such as unit or language conversions) also required. The general problem of schema integration is not new, with one prior reference going back as early as 1986.  According to Alon Halevy:
As would be expected, people have tried building semi-automated schema-matching systems by employing a variety of heuristics. The process of reconciling semantic heterogeneity typically involves two steps. In the first, called schema matching, we find correspondences between pairs (or larger sets) of elements of the two schemas that refer to the same concepts or objects in the real world. In the second step, we build on these correspondences to create the actual schema mapping expressions.
The issues of matching and mapping have been addressed in many tools, notably commercial ones from MetaMatrix, and open source and academic projects such as Piazza,  SIMILE,  and the WSMX (Web service modeling execution environment) protocol from DERI.   A superb description of the challenges in reconciling the vocabularies of different data sources is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM’s Prestigious Doctoral Dissertation Award.
What all of these efforts has found is the inability to completely automate the mediation process. The current state-of-the-art is to reconcile what is largely unambiguous automatically, and then prompt analysts or subject matter experts to decide the questionable matches. These are known as “semi-automated” systems and the user interface and data presentation and workflow become as important as the underlying matching and mapping algorithms. According to the WSMX project, there is always a trade-off between how accurate these mappings are and the degree of automation that can be offered.
Once all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community.
As Stonebraker and Hellerstein note in their retrospective covering 35 years of development in databases, some of the first post-relational data models were typically called semantic data models, including those of Smith and Smith in 1977 and Hammer and McLeod in 1981. Perhaps what is different now is our ability to address some of the fundamental issues.
At any rate, this subsection is included here because of the hidden importance of database foundations. It is therefore a topic often addressed in this series.
In all of these areas, there is a growing, but still spotty, set of tools for conducting these semantic tasks. SemWebCentral, the open source tools resource center, for example, lists many tools and whether they interact or not with one another (the general answer is often No). Protégé also has a fairly long list of plugins, but not unfortunately well organized. 
In the table below, I begin to compile a partial listing of semantic Web tools, with more than 50 listed. Though a few are commercial, most are open source. Also, for the open source tools, only the most prominent ones are listed (Sourceforge, for example, has about 200 projects listed with some relation to the semantic Web though most of minor or not yet in alpha release).
|Almo||http://ontoware.org/projects/almo||An ontology-based workflow engine in Java|
|Altova SemanticWorks||http://www.altova.com/products_semanticworks.html||Visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design|
|Bibster||http://bibster.semanticweb.org/||A semantics-based bibliographic peer-to-peer system|
|cwm||http://www.w3.org/2000/10/swap/doc/cwm.html||A general purpose data processor for the semantic Web|
|Deep Query Manager||http://www.brightplanet.com/products/dqm_overview.asp||Search federator from deep Web sources|
|DOSE||https://sourceforge.net/projects/dose||A distributed platform for semantic annotation|
|ekoss.org||http://www.ekoss.org/||A collaborative knowledge sharing environment where model developers can submit advertisements|
|Endeca||http://www.endeca.com||Facet-based content organizer and search platform|
|FOAM||http://ontoware.org/projects/map||Framework for ontology alignment and mapping|
|Gnowsis||http://www.gnowsis.org/||A semantic desktop environment|
|GrOWL||http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html||Open source graphical ontology browser and editor|
|HAWK||http://swat.cse.lehigh.edu/projects/index.html#hawk||OWL repository framework and toolkit|
|HELENOS||http://ontoware.org/projects/artemis||A Knowledge discovery workbench for the semantic Web|
|Jambalaya||http://www.thechiselgroup.org/jambalaya||Protégé plug-in for visualizing ontologies|
|Jastor||http://jastor.sourceforge.net/||Open source Java code generator that emits Java Beans from ontologies|
|Jena||http://jena.sourceforge.net/||Opensource ontology API written in Java|
|KAON||http://kaon.semanticweb.org/||Open source ontology management infrastructure|
|Kazuki||http://projects.semwebcentral.org/projects/kazuki/||Generates a java API for working with OWL instance data directly from a set of OWL ontologies|
|Kowari||http://www.kowari.org/||Open source database for RDF and OWL|
|LuMriX||http://www.lumrix.net/xmlsearch.php||A commercial search engine using semantic Web technologies|
|MetaMatrix||http://www.metamatrix.com/||Semantic vocabulary mediation and other tools|
|Metatomix||http://www.metatomix.com/||Commercial semantic toolkits and editors|
|MindRaider||http://mindraider.sourceforge.net/index.html||Open source semantic Web outline editor|
|Model Futures OWL Editor||http://www.modelfutures.com/OwlEditor.html||Simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports|
|Net OWL||http://www.netowl.com/||Entity extraction engine from SRA International|
|Nokia Semantic Web Server||https://sourceforge.net/projects/sws-uriqa||An RDF based knowledge portal for publishing both authoritative and third party descriptions of URI denoted resources|
|OntoEdit/OntoStudio||http://ontoedit.com/||Engineering environment for ontologies|
|OntoMat Annotizer||http://annotation.semanticweb.org/ontomat||Interactive Web page OWL and semantic annotator tool|
|Oyster||http://ontoware.org/projects/oyster||Peer-to-peer system for storing and sharing ontology metadata|
|Piggy Bank||http://simile.mit.edu/piggy-bank/||A Firefox-based semantic Web browser|
|Pike||http://pike.ida.liu.se/||A dynamic programming (scripting) language similar to Java and C for the semantic Web|
|pOWL||http://powl.sourceforge.net/index.php||Semantic Web development platform|
|Protégé||http://protege.stanford.edu/||Open source visual ontology editor written in Java with many plug-in tools|
|RACER Project||https://sourceforge.net/projects/racerproject||A collection of Projects and Tools to be used with the semantic reasoning engine RacerPro|
|RDFReactor||http://rdfreactor.ontoware.org/||Access RDF from Java using inferencing|
|Redland||http://librdf.org/||Open source software libraries supporting RDF|
|RelationalOWL||https://sourceforge.net/projects/relational-owl||Automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OW|
|Semantical||http://semantical.org/||Open source semantic Web search engine|
|SemanticWorks||http://www.altova.com/products_semanticworks.html||SemanticWorks RDF/OWL Editor|
|Semantic Mediawiki||https://sourceforge.net/projects/semediawiki||Semantic extension to the MediaWiiki wiki|
|Semantic Net Generator||https://sourceforge.net/projects/semantag||Utility for generating topic maps automatically|
|Sesame||http://www.openrdf.org/||An open source RDF database with support for RDF Schema inferencing and querying|
|SMART||http://web.ict.nsc.ru/smart/index.phtml?lang=en||System for Managing Applications based on RDF Technology|
|SMORE||http://www.mindswap.org/2005/SMORE/||OWL markup for HTML pages|
|SPARQL||http://www.w3.org/TR/rdf-sparql-query/||Query language for RDF|
|SWCLOS||http://iswc2004.semanticweb.org/demos/32/||A semantic Web processor using Lisp|
|Swoogle||http://swoogle.umbc.edu/||A semantic Web search engine with 1.5 M resources|
|SWOOP||http://www.mindswap.org/2004/SWOOP/||A lightweight ontology editor|
|Turtle||http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/||Terse RDF “Triple” language|
|WSMO Studio||https://sourceforge.net/projects/wsmostudio||A semantic Web service editor compliant with WSMO as a set of Eclipse plug-ins|
|WSMT Toolkit||https://sourceforge.net/projects/wsmt||The Web Service Modeling Toolkit (WSMT) is a collection of tools for use with the Web Service Modeling Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX)|
|WSMX||https://sourceforge.net/projects/wsmx/||Execution environment for dynamic use of semantic Web services|
Individually, there are some impressive and capable tools on this list. Generally, however, the interfaces are not intuitive, integration between tools is lacking, and why and how standard analysts should embrace them is lacking. In the semantic Web, we have yet to see an application of the magnitude of the first Mosaic browser that made HTML and the World Wide Web compelling.
It is perhaps likely that a similar “killer app” may not be forthcoming for the semantic Web. But it is important to remember just how entwined tools are to accelerating acceptance and growth of new standards and protocols.