Sweet Tools, AI3‘s listing of semantic Web and -related tools, has just been released with its 17th update. The listing now contains more than 900 tools, about a 10% increase over the last version. Significantly the listing is also now presented via its own semantic tool, the structSearch sComponent, which is one of the growing parts to Structured Dynamics‘ open semantic framework (OSF).
So, we invite you to go ahead and try out this new Flex/Flash version with its improved search and filtering! We’re pretty sure you’ll like it.
Sweet Tools now lists 907 919 tools, an increase of 72 84 (or 8.6 10.1%) over the prior version of 835 tools. The most notable trend is the continued increase in capabilities and professionalism of (some of) the new tools.
This new release of Sweet Tools — available for direct play and shown in the screenshot to the right — is the first to be presented via Structured Dynamics’ Flex-based semantic component technology. The system has greatly improved search and filtering capabilities; it also shares the superior dataset management and import/export capabilities of its structWSF brethren.
As a result, moving forward, Sweet Tools updates will now be added on a more regular basis, reducing the big burps that past releases have tended to follow. We will also see much expanded functionality over time as other pieces of the structWSF and sComponents stack get integrated and showcased using this dataset.
This release is the first in WordPress, and shows the broad capabilities of the OSF stack to be embedded in a variety of CMS or standalone systems. We have provided some updates on Structured Dynamics’ OSF TechWiki for how to modify, embed and customize these components with various Flex development frameworks (see one, two or three), such as Flash Builder or FlashDevelop.
However, this release does mark the retirement of the very fine Exhibit version of Sweet Tools (an archive version will be kept available until it gets too long in the tooth). I was one of the first to install a commercial Exhibit system, and the first to do so on WordPress, as I described in an article more than four years ago.
Exhibit has worked great and without a hitch, and through a couple of upgrades. It still has (I think) a superior faceting system and sorting capabiities to what we presently offer with our own sComponent alternative. However, the Exhibit version is really a display technology alone, and offers no search, access control or underlying data management capabilities (such as CRUD), all of which are integral to our current system. It is also not grounded in RDF or semantic technologies, though it does have good structural genes. And, Sweet Tools has about reached the limits of the size of datasets Exhibit can handle efficiently.
Exhibit has set a high bar for usability and lightweight design. As we move in a different direction, I’d like again to publicly thank David Huynh, Exhibit’s developer, and the MIT Simile program for when he was there, for putting forward one of the seminal structured data tools of the past five years.
The updated Sweet Tools listing now includes nearly 50 different tools categories. The most prevalent categories are browser tools (RDF, OWL), information extraction, ontology tools, parsers or converters, and general RDF tools. The relative share by category is shown in this diagram (click to expand):
Since the last listing, the fastest growing categories have been utilities (general and RDF) and visualization. Linked data listings have also grown by 200%, but are still a relatively small percentage of the total.
These values should be taken with a couple of grains of salt. First, not all of these additions are organic or new releases. Some are the result of our own tools efforts and investigations, which can often surface prior overlooked tools. Also, even with this large number of application categories, many tools defy characterization, and can reside in multiple categories at once or are even pointing to new ones. So, the splits are illustrative, but not defining.
General language percentages have been keeping pretty constant over the past couple of years. Java remains the leading language with nearly half of all applications, a percentage it has kept steady for four years. PHP continues to grow in popularity, and actually increased the largest percentage amount of any language over this past census. The current language splits are shown in the next diagram (click to expand):
C/C++ and C# have really not grown at all over the past year. Again, however, for the reasons noted, these trends should be interpreted with care.
Tools development is hard and the open source nature of today’s development tends to require a certain critical mass of developer interest and commitment. There are some notable tools that have much use and focus and are clearly professional and industrial grade. Yet, unfortunately, too many of the tools on the Sweet Tools listing are either proofs-of-concept, academic demos, or largely abandoned because of lack of interest by the original developer, the community or the market as a whole.
There is a common statement within the community about how important it is for developers to “eat their own dogfood.” On the face of it, this makes some sense since it conveys a commitment to use and test applications as they are developed.
But looked at more closely, this sentiment carries with it a troublesome reflection of the state of (many) tools within the semantic Web: too much kibble that is neither attractive nor tasty. It is probably time to keep the dogfood in the closet and focus on well-cooked and attractive fare.
We at Structured Dynamics are not trying to hold ourselves up as exemplars or the best chefs of tasty food. We do, however, have a commitment to produce fare that is well prepared and professional. Let’s stop with the dogfood and get on with serving nutritious and balanced fare to the marketplace.
Today, the headlines and buzz for information technologies centers on smartphones, social networks, cloud computing, tablets and everything Internet. Very little is now discussed about IT in the enterprise. This declining trend began about 15 years ago, and has been accelerating over time. Letting the air out of the enterprise IT balloon has some profound reasons and implications. It also has some lessons and guidance related to semantic approaches and technologies and their adoption by enterprises.
One can probably clock the start of enterprise information technology (IT) to the first use of mainframe computers in the early 1950s , or sixty years ago. The earliest mainframes were huge and expensive machines that required their own specially air-conditioned rooms because of the heat they generated. The first use of “information technology” as a term occurred in a Harvard Business Review article from 1958 .
Until the late 1960s computers were usually supplied under lease, and were not purchased . Service and all software were generally bundled into the lease amount without separate charge and with source code provided. Then, in 1969, IBM led an industry change by starting to charge separately for (mainframe) software and services, and ceasing to supply source code . At about the same time integrated circuits enabled computer sizes to be reduced, with the minicomputers such as from DEC causing a marked expansion in number of potential customers. Enterprise apps became a huge business, with software licensing and maintenance fees achieving a peak of 70% of IT vendor total revenues by the mid-1990s . However, since that peak, enterprise software as a portion of vendor revenues has been steadily eroding.
One of the earliest enterprise applications was in transaction systems and their underlying database management software. The relational database management system (RDBMS) was initially developed at IBM. Oracle, based on early work for the CIA in the late 1970s and its innovation to write in the C programming language, was able to port the RDBMS to multiple operating systems. These efforts, along with those of other notable vendors (most of which like Informix no longer exist), led to the RDBMS becoming more or less the de facto standard for data management within the enterprise by the 1980s. Today Oracle is the largest supplier of RDBMS software globally, and other earlier database system designs such as network databases or object databases fell out of favor .
In 1975, the Altair 8800 was introduced to electronics hobbyists as the first microcomputer, followed then by Apple II and the IBM PC in 1981, among others. Rapidly a slew of new applications became available to the individual, including spreadsheets, small databases, graphics programs and word processors. These apps were a boon to individual productivity and the IBM PC in particular brought credibility and acceptance within the enterprise (along with the growth of Microsoft). Novell and local area networks also pointed the way to a more distributed computing future. By the late 1980s virtually every knowledge worker within enterprises had some degree of computer literacy.
The apogee for enterprise software and apps occurred in the 1990s, with whole classes of new applications (most denoted by three-letter acronyms) such as enterprise resource planning (ERP), business intelligence (BI), customer relationship management (CRM), enterprise information systems (EIS) and the like coming to the fore. These systems also began as proprietary software, which resulted in the “stovepiping” or creating of information silos. In reaction and with great market acceptance, vendors such as SAP arose to provide comprehensive, enterprise-wide solutions, though often at high cost and with significant failure rates.
More significantly, the 1990s also saw the innovation of the World Wide Web with its basis in hypertext links on the Internet. Greatly facilitated by the Mosaic Web browser, the basis of the commercial Netscape browser, and the HTML markup language and HTTP transport protocol, millions began experiencing the benefit of creating Web pages and interconnecting. By the mid-1990s, enterprises were on the Web in force, bringing with them larger content volumes, dynamic databases and enterprise portals. The ability for anyone to become a publisher led to a focus and attention on the new medium that led to still further innovations in e-commerce and online advertising. New languages and uses of Web pages and applications emerged, creating a convergence of design, media, content and interactivity. Venture capital and new startups with valuations independent of revenues led to a frenzy of hype and eventually the dot com crash of 2000.
The growth companies of the past 15 years have not had the traditional focus on enterprises, but on the use and development of the Web. From search (Google) to social interactions (Facebook) to media and video (Flickr, YouTube) and to information (Wikipedia), the engines of growth have shifted away from the enterprise.
Meanwhile, the challenges of data integration and interoperability that were such a keen focus going back to initial enterprise computerization remain. Now, however, these challenges are even greater, as we see images, documents (unstructured data) and Web pages, markup and metadata (semi-structured data) become first-class information citizens. What was a challenge in integrating structured data in the 1980s and 1990s via data warehousing, has now become positively daunting for the enterprise with respect to scale and scope.
The paradox is that as these enterprise needs increased, the attractiveness of the enterprise from an IT perspective has greatly decreased. It is these factors we discuss below, with an eye to how Web architecture, design and opportunities may offer a new path through the maze of enterprise information interoperability.
Since 1995 the Gartner Group has been producing its annual Hype Cycle . The clientele for this research is the enterprise, so Gartner’s presentation of what’s hot and what’s hype and what is being adopted is a good proxy for the IT state of affairs in enterprises. These graphs are reproduced below since 2006 (click to expand). Note how many of the items shown are not very specific to the enterprise:
References to architectures and content processing and related topics were somewhat prevalent in 2006, but have disappeared most recently. In comparison to the innovations noted under the History discussion, it appears that the items on Gartner’s radar are more related to consumer applications and uses. We no longer see whole new categories of enterprise-related apps or enterprise architectures.
The kinds of innovations that are being discussed as important to enterprises in the coming year [7,8] tend to mostly leverage existing innovations in other areas or to wrinkle existing approaches. One report from Constellation Research, for example, lists the five core disruptive technologies of social, mobile, cloud, analytics and unified communications . Only analytics could be described as enterprise focused or driven.
And, even in analytics, the kinds of things being promoted are self-service reporting or analysis . In essence, these opportunities represent the application of Web 2.0 techniques to bring reporting or analysis directly to the analyst. Though important and long overdue, such innovations are more derivative than fundamental.
Master data management (MDM) is another touted area. But, to read analyst’s predictions in these areas, it feels like one has stepped into a time warp of technologies and options from a decade ago. When has XML felt like an innovation?
Of course, there is a whole industry of analysts that makes their living prognosticating to enterprises about what to expect from information technologies and how to adopt and embrace them. The general observations — across the board — seem to center on items such as smartphones and mobile, moving to the cloud for software or platforms (SaaS, PaaS), and collaboration and social networks. As I note below, there is nothing inherently wrong or unexciting per se about these trends. But, what does appear true is that the locus of innovation has shifted from the enterprise to consumers or the Internet.
The shift in innovation away from the enterprise has been structural, not cyclical. That means that very fundamental forces are at work to cause this change in innovation focus. It does not mean that innovation has permanently shifted away from the enterprise (organizations), but that some form of countervailing structural changes would need to occur to see a return to the IT focus on the enterprise from prior decades.
I think we can point to seven structural reasons for this shift, many of which interact with one another. While all of them are bringing benefits (some yet to be foreseen) to the enterprise, and therefore are to be lauded, they are not strictly geared to address specific enterprise challenges.
As pundits say, “The Internet changes everything” . For the reasons noted under the history above, the most important cause for the shift in innovation away from the enterprise has been the Internet.
One aspect that is quite interesting is the use of Internet-based technologies to provide “outsourced” enterprise applications hosted on Web servers. Such “cloud computing” leverages the technologies and protocols inherent to the Internet. It shifts hosting, maintenance and upgrade responsibilities for conventional apps to remote providers. Initially, of course, this simply shifts locus and responsibility from in-house to a virtual party. But, it is also the case that such changes will also promote more subtle shifts in collaboration and interaction possibilities. There is also the fact that quick upgrades of underlying infrastructure and application software can also occur.
The implications for existing enterprise IT staff, traditional providers, and licensing and maintenance approaches are profound. The Internet and cloud computing will perhaps have a greater effect on governance, staffing and management than application functionality per se.
The captivating IT-related innovations at present are mobile (smartphones) and their apps, tablets and e-book readers, Internet TV and video, and social networks of a variety of stripes. Somewhat like the phenomenon of when personal computers first appeared, many of these consumer innovations have applicability to the enterprise, though only as a side effect.
It is perhaps instructive to look back at the adoption of PCs in the enterprise to understand the possible effect of these new consumer innovations. Central IT was never able to control and manage the proliferation of personal computers, and only began to understand years later what benefits and new governance challenges they brought. Enterprise leaders will understand how to embrace and extend today’s new consumer technologies for the enterprise’s benefits; laggards will resist to no avail.
The ubiquity of computing will be enormously impactful on the enterprise. The understanding of what makes sense to do on a mobile basis with a small screen and what belongs on the desk or in the office is merely a glimmer in the current conversation. However, in the end, like most of the other innovations noted in this analysis, the enterprise will largely be a reactive player to these innovations. Yes, the implications will be profound, but their inherent basis are not grounded in unique enterprise challenges. Nonetheless, adapting to them and changing business practice will be critical to asserting enterprise leadership.
Ten years ago open source was largely dismissed in the enterprise. About five years ago VCs and others began funding new commercial open source ventures, even while there were still rear guard arguments from enterprises resisting open source. Meanwhile, as the figure to the right shows, open source projects were growing exponentially .
The shift to open source in the enterprise, still ongoing, has been rapid. Within 5 years, more than 50% of enterprise software will be open source  . According to an article in Fortune magazine last year , a Forrester Research survey found that 48% of enterprise respondents were using open source operating systems, and 57% were using open source code. A similar Accenture survey of 300 large public and private companies found that half are committed to open source software, with 38% saying they would begin using open-source software for “mission-critical” applications over the next 12 months.
There are likely many reasons for this shift, including the Internet itself and its basis in open source. Many of the most successful companies of the past 15 years including Amazon, Google, Facebook, and virtually any large Web site has shown excellent performance and scalability building their IT infrastructure around open source foundations. Most of the large, existing enterprise IT vendors, notably including IBM, Oracle, Nokia, Intel, Sun (prior to Oracle), Citrix, Novell (just acquired by Attachmate) and SAP have bought open source providers or have visible support for open source initiatives. Even two of the most vocal proprietary source proponents of the past — HP and Microsoft — have begun to make moves toward open source.
The age of proprietary software based on proprietary standards is dead. The monopoly rents formerly associated with unique, proprietary platforms and large-scale enterprise apps are over. Even where software remains proprietary, it is embracing open standards for data interchange and APIs. Traditional enterprise apps such as content management, business intelligence and ETL, among all others, are being penetrated by commercial open source offerings (as examples, Alfresco, Pentaho and Talend, respectively). The shift to services and new business models appears to be an inexorable force.
Declining profit margins, matched with the relatively high cost of marketing and sales to enterprises, means attention and focus have been shifting away from the enterprise. And with these shifts in focus has come a reduction in enterprise-focused innovation.
It is not unusual to find deployed systems within enterprises as old as thirty years . So long as they work reasonably well, systems once installed — along with their data — tend to remain in operation until their platforms or functionality become totally obsolete. This leads to rather lengthy turnover cycles, and slow development cycles.
Slow cycles in themselves slow innovation. But slow development cycles are also a disincentive to attract the most capable developers. When development tends to focus on maintenance and scripts and more routines of the same nature, the best developers tend to migrate elsewhere (see next).
Another aspect of slow development cycles is the imperative for new enterprise IT to relate to and accommodate legacy systems — again, including legacy data. This consideration is the source of one of the negative implications of a shift away from innovation in the enterprise: the orphaning of existing information assets.
Arguably the emphasis on consumer and Internet technologies means that is where the best developers gravitate. Developing apps for smartphones or working at one of the cool Internet companies or joining a passionate community of open source developers is now attracting the best developers. Open source and Web-based systems also lead to faster development cycles. The very best developers are often the founders of the next generation startups and Web and software companies .
While, of course, huge numbers of computer programmers and IT specialists are hired by enterprises each year, the motivations tend to be higher pay, better benefits and more job security. The nature of the work and the bureaucracy and routine of many IT functions require such compensation. And, because of the other shifts noted elsewhere, even the software startups that are able to attract the most innovative developers no longer tend to develop for enterprise purposes.
Computer science students have been declining in industrialized countries for some time and that is the category of slowest growth in IT . Meanwhile, existing IT personnel often have expertise in older legacy systems or have been focused on bug fixes and more prosaic tasks like report writing. Narrow job descriptions and work activities also keep many existing IT personnel from getting exposed to or learning about new trends or innovations, such as the semantic Web.
Declining numbers of new talent, plus declining interest by that talent, combined with (often) narrow and legacy expertise of existing talent, creates a disappointing storm of energy and innovation to address enterprise IT challenges. Enterprises have it within their power to create more exciting career opportunities to overcome these limitations, but unfortunately IT management often also appears challenged to get on top of these structural forces.
Open source and Internet-based systems have reduced the capital necessary for a new startup by an order of magnitude or so over the past decade. It is now quite possible to get a new startup up and running for tens to hundreds of thousands of dollars, as opposed to the millions of years past. This is leading to more startups, more startups per innovator, and quicker startup and abandonment cycles. Ideas can be tried quickly and more easily thrown away .
These dynamics are acting to accelerate overall development cycles and to cause a shift in funding structures and funding amounts by VCs and angels. The kind of market and sales development typical for many enterprise sales does not fit well within these dynamics and is a countervailing force for more capital when all trends point the other way.
In short, all of this is saying that money goes to where the returns are, and returns are not of the same basis as decades past in the enterprise sector. Again, this means a hollowing out of innovation for enterprises.
As an earlier reference noted , software revenues as a percent of IT vendor revenues peaked in about the mid-1990s. As profitability for these entities began to decline, so did the overall attractiveness of the sector.
As the next chart shows, coincident with the peak in profitability was the onset of a consolidation trend in the enterprise IT vendor sector . The chart below shows that three of the largest IT vendors today — Oracle, IBM and HP — began an acquisition spree in the mid-1990s that has continued until just recently, as many of the existing major players have already been acquired:
Notable acquisitions over this period include: Oracle — PeopleSoft, Siebel Systems, MySQL, Hyperion, BEA and Sun; HP — EDS, 3Com, VeriFone, Compaq, Palm and Mercury Interactive; IBM — Lotus, Rational, Informix, Ascential, FileNet, Cognos and SPSS. Published acquisition costs exceeded $130 billion, mostly for the larger deals. But terms for 75% of the 262 transactions were not disclosed . The total value of these consolidations likely approaches $200 billion to $300 billion.
Clearly, the market is now favoring large players with large service components. This consolidation trend does belie one early criticism of open source v proprietary software: proprietary software is likely to be better supported. In theory this might be true, but vanishing suppliers does not bode well for support either. Over time, we may likely see successful open source projects showing greater longevity than many IT vendors.
This discussion is not a boo-hoo because the heyday of enterprise IT innovation is past. Much of that innovation was expensive, often failed to achieve successful adoption, and promoted walled gardens and silos. As someone who ran companies directly involved in enterprise software sales, I personally do not miss the meetings, the travel, the suits and the 18-month sales cycles.
The enterprise has gained much from outside innovation in the past, from the personal computer to LANs and browsers and the Internet. To be sure, what we are now seeing with mobile phones has more computing power than the original Space Shuttle , and continued mashup and social engagement innovations will have unforeseen and manifest benefits for enterprises. I think this is unalloyed goodness.
We can also see innovations based on the Internet such as the semantic Web and its languages and standards to promote interoperability. Breaking these barriers is critically needed by enterprises of the future. Data models such as RDF  and open world mindsets that better accommodate uncertainty and breadth of information  can only be seen as positive. The leverage that will come from these non-enterprise innovations may in the end prove to be as important as the enterprise-specific innovations of the past.
Yet a shift to Internet and consumer IT innovation leaves some implications. These concerns have to do with the unique demands and needs of enterprises. One negative implication is that a diminishing supplier base may not lead to actual deployments that are enterprise-ready or -responsive.
The first concern relates to quality and operational integrity. There is an immense gulf between ISO 9000 or Six Sigma and, for example, the “good enough” of standard search results on the Web. Consumer apps do not impose the same thresholds for quality as demanded by paying bosses or paying customers. This is not a value judgment; simply a reality. I see it reflected in the quality of tools and code for many new innovations today on the Web.
Proofs-of-concept and “cool” demos work well for academic theses or basic intros to new concepts. The 20% that gets you 80% goes a long way to point the way to new innovation; but the 80% to get to the last 20% is where enterprises bet their money. Unfortunately, in too many instances, that gap is not being filled. The last 20% is hard work, often boring, and certainly not as exciting as the next Big Thing. And, as the trends above try to explicate, there are also diminishing rewards for living in that territory.
A similar and second concern pervades data interoperability. Data interoperability has been the central challenge of enterprise IT for at least three decades. As soon as we were able to interconnect systems and bridge differences in operating systems and data schema, the Holy Grail has been breaking information barriers and silos. The initial attempts with proprietary data warehouses or enterprise-wide ERP systems were wrongly trying to apply closed solutions to inherently open problems. But, now, finally when we have the open approaches and standards in hand for bridging these gaps, the attractiveness of doing so for the enterprise seems to have vanished.
For example, we see demos, tools and algorithms being published all over the place that show promising advances or improvements in the semantic Web or linked data (among other areas; see ). Some of these automated techniques sound wonderful, but real systems require the hard slog of review and manual approval. Quality matters. If Technique A, say, shows an improvement over Technique B of 5%, that is worth touting. But even at 98% percent accuracy, we will still find 20,000 errors in a population of 1 million items. Such errors will simply not work in having trains run on time, seats be available on airplanes, or inventory get to their required destinations.
What can work from the standpoint of linkage or interoperability on the Web according to consumer standards will simply not fly for many enterprises. But, where are the rewards for tackling that hard slog?
Another concern is security and differential access. Open Web systems, bless their hearts, do not impose the same access and need to know restrictions as information systems within enterprises. If we are to adopt Web-based approaches to the next-generation enterprise — a position we strongly advocate — then we are also going to need to figure out how to marry these two world views. Again, there appears to be an effort-reward mismatch here.
These observations are not meant to be a polemic, but a statement of more-or-less current circumstances. Since its widescale adoption, the major challenge — and opportunity — of enterprise IT has been how to leverage the value within the enterprise’s existing digital information assets. That challenge is augmented today with the availability of literally a whole world of external digital knowledge. Yet, the energy and emphasis for innovation to address these challenges has seemingly shifted to consumers and away from the enterprise.
Economics abhors a vacuum. I think two responses may be likely to this circumstance. The first is that new vendors will emerge to address these gaps, but with different cost structures and business models. I’d like to think my own firm, Structured Dynamics, is one of these entities. How we are addressing this opportunity and differences in our business model we will discuss at a later time. In any case, any such new player will need to take account of some of the structural changes noted above.
Another response can come from enterprises themselves, using and working the same forces of change noted earlier. Via collaboration and open source, enterprises can band together to contribute resources, expertise and people to develop open source infrastructures and standards to address the challenges of interoperability. We already see exemplars of such responses in somewhat related areas via initiatives such as Eclipse, Apache, W3C, OASIS and others. By leveraging the same tools of collaboration and open data and systems and the Internet, enterprises can band together and ensure their own self-interests are being addressed.
One advantage of this open, collaborative approach is that it is consistent with the current innovation trends in IT. But the real advantage is that it works and is needed. Without it, it is unclear how the enterprise IT challenge — especially in data interoperability — will be met.
Ontotext is the developer of OWLIM, a highly scalable semantic database engine, and KIM, a popular semantic annotation and search platform. Its FactForge and LinkedLifeData services provide the largest curated and interoperable linked data platforms over which inferencing and reasoning may be applied. Some of Ontotext’s major clients include AstraZeneca, BBC and Korea Telecom. Major professional services include its own technologies, plus text mining and semantic annotation. Ontotext has notable and longstanding technical partnerships, such as with the GATE team and many of the other leading technologies and companies in the semantic Web space. We are very pleased to join forces with them.
Our partnership was formed to address some of the key semantic ‘gaps’ in the semantic Web. The partnership will focus on development of the next generation of the UMBEL and PROTON ontologies, as well as tools and applications based on them.
Thanks to the efforts of the W3C (World Wide Web Consortium), we now have the techniques, languages and standards to deliver the “web” portion of the semantic Web. But, the practical “semantics” for actually effecting the semantic Web have heretofore been lacking. Early experience with linked data has exposed many poor practices. The lack of approximate linking predicates and reference concepts undercuts our ability to achieve meaningful semantic interoperability.
In forming our partnership, Ontotext and SD will shine attention on this semantics “gap”. We will also be aggressively seeking additional partners and players to join with us on this challenge. My recent outreach to DCMI (the Dublin Core Metadata Initiative) is one example of this commitment; we will be talking with others in the coming weeks.
Linked data and the prospects of the semantic Web are at a critical juncture. While we have seen much growth in the release of linked data, we are still not seeing much uptake (other than some curated pockets). Linkages between datasets are still disappointingly low, and quality of linkages is an issue. The time has come to stop simply shoveling more triples over the fence.
The combination of UMBEL and PROTON offers a powerful blend to address these weaknesses. Our partnership will first provide a logical mapping and consolidated framework based on the two core ontologies. These will be made available as standard ontologies and via open source semantic annotation tools.
UMBEL (Upper Mapping and Binding Exchange Layer) is both a vocabulary for building domain ontologies and a framework of more than 20,000 reference concepts. The UMBEL reference ontology is used to tag information and map existing schema in order to help link content and promote interoperability. UMBEL’s reference concepts and structure are a direct subset extraction of the Cyc knowledge base.
The PROTON ontology (PROTo ONtology) is a basic upper-level ontology that contains about 300 classes and 100 properties, providing coverage of the general concepts necessary for a wide range of tasks, including semantic annotation, indexing, and retrieval of documents. It is domain independent with coverage suitable to encompass any domain or named entity.
This consolidated framework will then be applied to organize and provide a coherent categorization of the Wikipedia online encyclopedia. One expression of this result will be a new version of Ontotext’s FactForge, already the largest and best performing reasoning engine leveraging linked data. This new version will allow easy access to the most central Linking Open Data (LOD) datasets such as DBpedia, Freebase, and Geonames, through the vocabularies of UMBEL and PROTON. Additional applications in linked data mining and general tagging of standard Web content are also contemplated by the partnership.
Ontotext’s proven reasoning technologies and ability to host extremely large knowledge bases with great performance are tremendous boons to the next iteration of UMBEL. We have been seeking large-scale coherency testing of UMBEL for some time and Ontotext is the perfect answer.
Ontotext’s CEO, Atanas Kiryakov, indicated their interest in UMBEL stemmed from what they saw as some stumbling blocks with linked data while developing FactForge. “The growth and maturation of linked data will require credible ways to orient and annotate the data,” said Kiryakov. “UMBEL is the right scope of comprehensiveness and size to use as one foundation for this,” he said. Ontotext is also the original developer and current maintainer of PROTON, which will also contribute in this role.
The efforts of the partnership will first be seen with release of UMBEL v. 0.80 in the next couple of weeks. This update revises many aspects of the ontology based on two years of applied experience and updates it to OWL 2. Then, this basis will be used for broader mappings and linkages to Wikipedia. Those next mappings are earmarked for UMBEL version 1.00, slated for release by the end of the year. All of these planned efforts will be released as open source.
Among other intended uses, PROTON, UMBEL and FactForge form a layered reference data structure that will be used for data integration within the European Union research project RENDER. The large-scale RENDER project aims to integrate diverse methods in the ways Web information is selected, ranked, aggregated, presented and used.
Beyond that, further relationships and partnerships are being actively sought with players serious about interoperable, high-quality data on the semantic Web. We welcome inquiries or outreach.
Previous installments in this series have listed existing ontology tools, overviewed development methodologies, and proposed a new approach to building lightweight, domain ontologies . For the latter to be successful, a new generation in ontology development tools is needed. This post provides an explication of the landscape under which this new generation of tools is occurring.
Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data.
We are now concluding the first decade of ontology development tools, especially those geared to the semantic Web and its associated languages of RDFS and OWL. Last year we also saw the release of the major update to the OWL 2 language, with its shift to more expressiveness and a variety of profiles. The upcoming next generation of ontology tools now must also shift.
The current imperative is to shift away from ontology engineering by a priesthood to pragmatic daily use and maintenance by domain practitioners. Market growth demands simpler, task-focused tools with intuitive interfaces. For this change to occur, the general tools architecture needs to shift its center of gravity from IDEs and comprehensive toolkits to APIs and Web services. Not surprisingly, this same shift is what has been occurring across all areas of software.
In the previous installment of this series, we presented a new methodological approach to ontology development, geared to lightweight, domain ontologies. One aspect of that design was to separate the operational workflow into two pathways:
The ontology build methodology concentrated on the upper half of this diagram (blue, with yellow lead-ins and outcomes) with the various steps overviewed in that installment :
The methodology captured in this diagram embraces many different emphases from current practice: re-use of existing structure and information assets; conscious split between instance data (ABox) and the conceptual structure (TBox) ; incremental design; coherency and other integrity testing; and explicit feedback for scope extension and growth. The methodology also embraces some complementary utility ontologies that also reflect the design of ontology-driven apps .
These are notable changes in emphasis. But they are not the most important one. The most important change is the tools landscape to implement this methodology. This landscape needs to shift to pragmatic daily use and maintenance by domain practitioners. That requires simpler and more task-oriented tools. And that change in tooling needs a still more fundamental shift in tools architecture and design.
In many places throughout this series I use the term “inadequate” to describe the current state of ontology development tools. This characterization is not a criticism of first-generation tools per se. Rather, it is a reflection of their inadequacy to fulfill the realities of the new tooling landscape argued in this series. The fact remains, as initial generation tools, that many of the existing tools are quite remarkable and will play central roles (mostly for the professional ontologist or developer) moving forward.
At the risk of overlooking some important players, let’s trace the (partial) legacy of some of the more pivotal tools in today’s environment.
As early as a decade ago the ontology standards languages were still in flux and the tools basis was similarly immature. Frame logic, description logics, common logic and many others were competing at that time for primacy and visibility. Most ontology tools at that time such as Protégé , OntoEdit , or OilEd  were based on F-logic or the predecessor to OWL, DAML+Oil. But the OWL language was under development by the W3C and in anticipation of its formal release the tools environment was also evolving to meet it. Swoop , for example, was one of the first dedicated OWL browsers. A Protégé plug-in for OWL was also developed by Holger Knublauch . In parallel, the OWL group at the University of Manchester also introduced the OWL API .
With the formal release of OWL 1.0 in 2004, ontology tools continued to migrate to the language. Protégé, up through the version 3x series, became a popular open source system with many visualization and OWL-related plug-ins. Knublauch joined TopQuadrant and brought his OWL experience to TopBraid Composer, which shifted to the Eclipse IDE platform and leveraged the Jena API [9,11]. In Europe, the NeON (Networked Ontologies) project started in 2006 and by 2008 had an Eclipse-based OWL platform using the OWL API with key language processing capabilities through GATE .
Most recently, Protégé and NeON in open source, and TopBraid Composer on the commercial side, have likely had the largest market share of the comprehensive ontology toolkits. So far, with the release of OWL 2 in late 2009, only Protégé in version 4 and the TwoUse Toolkit have yet fully embraced all aspects of the new specification, doing so by intimately linking with the new OWL API (version 3x has full OWL 2 support) . However, most leading reasoners now support OWL 2 and products such as TopBraid Composer and Ontotext’s OWLIM support OWL 2 RL as well .
The evolution of Protégé to version 4 (OWL 2) was led by the University of Manchester via its CO-ODE project , now ended, which has also been a source for most existing Protégé 4 plug-ins. (Because of the switch to OWL 2 and the OWL API most earlier plug-ins are incompatible with Protégé 4.) Manchester has also been a leading force in the development of OWL 2 and the alternative Manchester syntax.
Though only recently stable because of the formalization of OWL 2, Protégé 4 and its linkage to the new OWL API provides for a very powerful combination. With Protégé, the system has a familiar ontology editing framework and a mechanism for plug-in migration and growth. With the OWL API, there is now a common API for leading reasoners (Pellet, HermiT, FaCT++, RacerPro, etc.), a solid ontology management and annotation framework, and validators for various OWL 2 profiles (RL, EL and QL). The system is widely embraced by the biology community, probably the most active scientific field in ontologies. However, plug-in support lags the diversity of prior versions of Protégé and there does not appear to be the energy and community standing behind it as in prior years.
These leading frameworks and toolkits have opted to be “ontology engineering” environments. Via plug-ins and complicated interfaces (tabs or Eclipse-style panes) the intent has apparently been to provide “all capabilities in one box.” The tools have been IDE-centric.
Unfortunately, one must be a combination of ontologist, developer, programmer and IDE expert in order use the tools effectively. And, as incremental capabilities get added to the systems, these also inherit the same complexity and style of the host environment. It is simply not possible to make complex environments and conventions simple.
Curiously, the existence or use of APIs have also not been adequately leveraged. The usefulness of an API means that subsets of information can be extracted and worked on in very clear and simple ways. This information can then be roundtripped without loss. An API allows a tailored subset abstraction of the underlying data model. In contrast, IDEs, such as Protégé or Eclipse, when they play a similar role, force all interfaces to share their built-in complexity.
With these thoughts in mind, then, we set out to architect a tools suite and work flow that could truly take advantage of a central API. We further wanted to isolate the pieces into distributable Web services in keeping with our standard structWSF Web services framework design.
This approach also allows us to split out simpler, focused tools that domain users and practitioners can use. And, we can do all of this while also enabling the existing professional toolsets and IDEs to also interoperate in the environment.
The resulting tools landscape is shown in the diagram below. This diagram takes the same methodology flow from Figure 1 (blue and yellow boxes) and stretches them out in a more linear fashion. Then, we embed the various tools (brown) and APIs (orange) in relation to that methodology:
This diagram is worth expanding to full size and studying in some detail. Aspects of this diagram that deserve more discussion are presented in the sections below.
As noted in the preceding methodology installment, the working ontology is the central object being managed and extended for a given deployment. Because that ontology will evolve and grow over time, it is important the complete ontology specification itself be managed by some form of version control system (green) . This is the one independent tool in the landscape.
Access to and from the working ontology is mediated by the OWL API . The API allows all or portions of the ontology specification to be manipulated separately, with a variety of serializations. Changes made to the ontology can also be tested for validity. Most leading reasoners can interact directly with the API. Protégé 4 also interacts directly with the API, as can various rules engines . Additionally, other existing APIs, notably the Alignment API with its own mapping tools and links to other tools such as S-Match can interact with the OWL API. It is reasonable to expect more APIs to emerge over time that also interoperate .
The OWL API is the best current choice because of its native capabilities and because Jena does not yet support OWL 2 . However, because of the basic design with structWSF (see next), it is also possible to swap out with different APIs at a later time should developments warrant.
In short, having the API play the central management role in the system means that any and all tools can be designed to interact effectively with the working ontology(ies) without any loss in information due to roundtripping.
Use of the structWSF layer also means that tools and functionality can be distributed anywhere on the Web. Specialized server-side functions can be supported as well as dedicated specialty hardware. Text indexing or disambiguation services can fit within this design.
The ultimate value of piggybacking on the structWSF framework is that all other extant services also become available. Thus, a wealth of converters, data managers, and semantic components (or display widgets) can be invoked depending on the needs of the specific tool.
The objective, of course, of this design is to promote more and simpler tools useful to domain users. Some of these are shown under the Use & Maintain box in the diagram above; others are listed by category in the table below.
The RESTful interface and parameter calls of the structWSF layer further simplify the ontology management and annotation abstractions arising from the OWL API. The number of simple tools available to users under this design is virtually limitless. These tools are also fast to develop and test.
This landscape is not yet a full reality. It is a vision of adaptive and simpler tools, working with a common API, and accessible via platform-independent Web services. It also preserves many of the existing tools and IDEs familiar to present ontology engineers.
However, pieces of this landscape do presently exist and more are on the way. The next section briefly overviews some of the major application areas where these tools might contribute.
If one inspects the earlier listing of 185 ontology tools it is clear that there is a diversity of tools both in terms of scope and function across the entire ontology development stack. It is also clear that nearly all of those 185 tools listed do not communicate with one another. That is a tremendous waste.
Via shared APIs and some degree of consistent design it should be possible to migrate these capabilities into a more-or-less interoperating whole. We have thus tried to categorize some important tool types and exemplar tools from that listing to show the potential that exists. (Please note that the Example Tools are links to the tools and categories from the earlier 185 tools listing.)
This correlation of types and example tools is not meant to be exhaustive nor a recommendation of specific tools. But, this tabulation is illustrative of the potential that exists to both simplify and extend tool support across the entire ontology development workflow:
|Tool Type||Comments||Example Tools|
|OWL API||OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API provides links to inferencers, managers, annotators, and validators for the OWL2 profiles of RL, QL, EL||OWL API|
|Web Services Layer||This layer provides a common access layer and set of protocols for almost all tools. It depends critically on linkage and communication with the OWL API||structWSF|
|Ontology Editor (IDE)||There are a variety of options in this area. Generally, more complete environments (that is, IDEs) based on OWL and with links to the OWL API are preferred. Less complete editor options are listed under other categories. Note that only Protégé 4 incorporates the OWL API||NeOn toolkit,
|Scripts||In all pragmatic cases the migration of existing structure and vocabulary assets to an ontology framework requires some form of scripting. These may be off the shelf resources, but more often are specific to the use case at hand. Typical scripting languages include the standard ones (Perl, Python, PHP, Ruby, XSLT, etc.) and often involve some form of parsing or regex||variety; specific to use case|
|Converters||Converters are more-or-less pre-packaged scripts for migrating one serialization or data format to another one. As the scripts above continue to be developed, this roster of off-she-shelf starting points can increase. Today, there are perhaps close to 200 converters useful to ontology purposes||irON, ReDeFer, SKOS2GenTax; also see RDFizers|
|Vocabulary Prompter||Domain ontologies are ultimately about meaning, and for that purpose there is much need for definitions, synonyms, hyponyms, and related language assets. Vocabulary prompters take input documents or structures and help identify additional vocabulary useful for characterizing semantic meaning||see the TechWiki’s vocab prompting tools; ROC|
|Spreadsheet||Spreadsheets can be important initial development environments for users without explicit ontology engineering backgrounds. The biggest issue with spreadsheets is that what is specified in them is more general or simplistic compared to what is contained in an actual ontology. Attempts to have spreadsheets capture all of this sophistication are often less than satisfactory. One way to effective “round trip” with spreadsheets (and many related simple tools) is to adhere to an OWL API||Anzo, RDF123, irON (commON), Excel, Open Office|
|Editor (general)||Ontology editing spans from simple structures useful to non-ontologists to those (like the IDEs or toolkits) that capture all aspects of the ontology. Further, some of these editors are strictly textual or (literally) editors; others span or attempt to enable visual editing. Visual editing (see below) can ultimately extend to the ontology graph itself||see the TechWiki’s ontology editing tools|
|Alignment API||The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF||Alignment API|
|Mapper||A variety of tools, algorithms and techniques are available for matching or mapping concepts between two different ontologies. In general, no single method has shown itself individually superior. The better approaches use voting methods based on multiple comparisons||see the TechWiki’s ontology mapping tools|
|Ontology Browser||Ontology browsers enable the navigation or exploration of the ontology — generally in visual form — but without allowing explicit editing of the structure||Relation Browser, Ontology Browser, OwlSight, FlexViz|
|Vocabulary Manager||Vocabulary managers provide a central facility for viewing, selecting, accessing and managing all aspects of the vocabulary in an ontology (that is, to the level of all classes and properties). This tool category is poorly represented at present. Ultimately, vocabulary managers should also be one (if not the main) access point to vocabulary editing||PoolParty, TermWiki, UMBEL Web service|
|Vocabulary Editor||Vocabulary editors provide (generally simple) interfaces for the editing and updating of vocabulary terms, classes and properties in an ontology||Neologism, TemaTres, ThManager, Vocab Editor|
|Structure Editor||A structure editor is a specific form of an ontology editor, geared to the subsumption (taxonomic) organization of a largely hierarchical structure. Editors of this form tend to use tree controls or spreadsheets with indented organization to show parent and child relationships||PoolParty, irON (commON)|
|Graph Analysis||Ontologies form graph structures, which are amenable to many specific network and graph analysis algorithms, included relatedness, shortest path, grouped structures, communities and the like||SNAP, igraph, Network Workbench, NetworkX, Ontology Metrics|
|Graph API||Graph visualization with associated tools is best enabled by working from a common API. This allows for expansion and re-use of other capabilities. Preferably, this graph API would also have direct interaction with the OWL API, but none exist at the moment||under investigation|
|Graph Visualizer||Graph visualizers enable the ontology to be rendered in graph form and presentation, often with multiple layout options. The systems also enable export to PDF or graphics formats for display or printing. The better tools in this category can handle large graphs, can have their displays easily configured, and are performant||see the TechWiki’s ontology visualization tools|
|Visual Editor||An ontology visual editor enables the direct manipulation of the graph in a visual mode. This capability includes adding and moving nodes, changing linkages between nodes, and other ontology specification. Very few tools exist in this category at present||COE, TwoUse Toolkit|
|Coherence Tester||Testing for coherence involves whether the ontology structure is properly constructed and has logical interconnections. The testing either involves inference and logic testing (including entailments) based on the structure as provided; comparisons with already vetted logical structures and knowledge bases (e.g., Cyc, Wikipedia); or both||Cyc, OWLim, FactForge|
|Gap Tester||Related to coherence testing, gap testing is the identification of key missing pieces or intermediary nodes in the ontology graph. This tends to happen when external specification of the ontology is made without reference to connecting information||requires use of a reference external ontology; see above|
|Documenter||Ontology documentation is not limited to the technical specifications of the structure, but also includes best practices, how-to and use guides, and the like. Automated generation of structure documentation is also highly desirable||TechWiki, SpecGen, OWLDoc|
|Tagger||Once constructed, ontologies (and their accompanying named entity dictionaries) can be very powerful resources for aiding tagging and information extraction utilities. Like vocabulary prompting, there is a broad spectrum of potential tools and uses in the tagging category||GATE (OBIE); many other options|
|Exporter||Exports need to range from full-blown OWL representations to the simpler export of data and constructs. Multiple serialization options and the ability to support the input requirements of third-party tools is also important||OWL Syntax Converter, OWL Verbalizer; many various options|
The beauty of this approach is that most of the tools listed are open source and potentially amenable to the minor modifications necessary to conform with this proposed landscape.
Contrasting the normative tools landscape above with the existing listing of ontology tools points out some key gaps or areas deserving more development attention. Some of these are:
Finally, it does appear that the effort and focus behind Protégé seems to be slowing somewhat. The future has clearly shifted to OWL 2 with Protégé 4. Yet, besides the admirable CO-ODE project (now ended), tools and plug-in support seems to have slowed. Many of the admirable plug-ins for Protégé 3x do not appear to be under active development as upgrades to Protégé 4. While Protégé’s future (and similar IDEs) seems assured, its prominence possibly will (and should) be replaced by a simpler kit of tools useful to users and practitioners.
For the past few months we at Structured Dynamics have seen ontology design and management as the pending technical priorities within the semantic technology space. Now that the market no longer looks at “ontology” as a four-letter word, it is imperative to simplify the development and use of ontologies. The first generation of tools leading up to this point have been helpful to understand the semantic space; changes are now necessary to expand it.
In our first generation we have begun to understand the types and nature of needed tools. But our focus on IDEs and comprehensive toolsets belies a developer’s or technologist’s perspective. We need to now shift focus and look at tool needs from the standpoint of users and actual use of ontologies. Many players and many toolmakers and innovators will need to contribute to build this market for semantic technologies and approaches.
Fortunately, replacing an IDE focus with one based around APIs and Web services should be a fairly smooth and natural transition. If we truly desire to be market makers, we need to stand back and place ourselves into the shoes of the domain practitioners, the subject matter experts. We need to shield actual users from all of the silly technical details and complexity. And, then, let’s focus — task-by-task — on discrete items of management and use of ontologies. Growth of the semantic technology space depends on expanding our practitioner base.
For its part, Structured Dynamics is presently seeking new projects and sponsors with a commitment to these aims. Like our prior development of structWSF and semantic components, we will be looking to make simpler ontology tools a priority in the coming months. Please let me know if you want to partner with us toward this commitment.
At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing .
All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.