Just in Time for Christmas: Vista in the Crosshairs

Or, Give your computer the bird.
Computers are frustrating. Creating documents, finding files, sharing information — why do everyday things still seem so tedious and counterintuitive?
Dave Kushner interviews Blake Ross and gets a preview of his new Parakey venture in the November issue of IEEE Spectrum. Ross, a 20-yr old wunderkind and one of the driving forces behind the Firefox browser, has teamed with Joe Hewitt of Firefox and Firebug fame to create an absolutely disruptive new approach to computing. Quoting from Kushner’s article:
Just as with Firefox, Ross began this project by asking himself one simple question: What's bad about today's software? The answer . . . resided in the gap between the desktop and the Web. . . . The problem, according to Ross, is there's no simple, cohesive tool to help people store and share their creations online. Currently, the steps involved depend on the medium. If you want to upload photos, for example, you have to dump your images into one folder, then transfer them to an image-sharing site such as Flickr. The process for moving videos to YouTube or a similar site is completely different. If you want to make a personal Web page within an online community, you have to join a social network, say, MySpace or Friendster. If you intend to rant about politics or movies, you launch a blog and link up to it from your other pages. The mess of the Web, in other words, leaves you trapped in one big tangle of actions, service providers, and applications. Ross's answer is . . . Parakey, "a Web operating system that can do everything an OS can do." Translation: it makes it really easy to store your stuff and share it with the world. Most or all of Parakey will be open source, under a license similar to Firefox's.
Thus, Parakey aims to bridge the divide between desktop operating systems and the Internet, using the browser as the common user interface. Parakey will give users the ability to easily host their own Web sites via their desktop. Even though Parakey works within the browser (all leading ones are to be supported), it actually runs on the local computer. This enables developers to do many things not allowable in a traditional Web site. By the use of easily assigned “keys”, the desktop owner can also easily and simply post or allow access to content of their choosing — from documents to photos to files — to become “public” to the distribution lists associated with these keys. Remote users get issued cookies so that their access to the local resources is seamless and without friction.
Similar to the models of the Firefox plugin or Web services, the basic Parakey platform can be easily extended. Ross and Hewitt have created a programming language, JUL (for ‘Just another User interface Language’), likely similar to the Mozilla XUL, for developers to write these components and extensions. Though the launch date for Parakey is being kept under wraps, all signals point to before January. The pre-launch company site allows interested parties to enter their email address to receive formal notification of the launch.
It is rather amazing that this article came out on the same day, yesterday, as John Milan’s blog post on Elephants and Evolution – How the Landscape is Changing for Google, Microsoft, Mozilla and Adobe on Richard McManus’s Read/Write Web blog. In that post, Milan posits Mozilla as another one of the gorillas (elephants) in the room and Adobe’s Apollo project as another “under the radar” approach to the desktop/Internet browser convergence.
All of this seems rather ironic as the world (Redmond) awaits the release of the long-delayed Windows update, Vista. Even the mighty do indeed live in interesting times.
Matt Asay, of OSBC and Alfresco, makes a very telling point in a recent post: One power of open source (if done right) is its suitability to interoperability and extensibility. As Matt states:
. . . let me give him/you an idea of what we’re already doing in this space. It’s not a question of what we might do, but what we’re already doing. You can get Alfresco integrated with Asterisk (VoiceRD from Novacoast) and SugarCRM (CRM) today. (And since our 1.4 Business Process Management release, we already have BPM in spades.)
Now extend this. Add some JasperSoft or Pentaho for Business Intelligence (perhaps reporting capabilities). Some DimDim for web conferencing. Some Zimbra or Scalix for email/collaboration. Want to scale this out on a grid? Get yourself some 3Tera. Etc. The great thing about all of this is that we don’t have to do all of it ourselves. In many instances, enterprises are already extending Alfresco (or these other projects) to meet these and other needs. Hence, when a large pharmaceutical/medical devices company wanted wiki functionality in Alfresco, it didn’t ask us. It just built it in.
One could certainly make the argument that first-generation open source like Linux was adopted for cost, risk and code-access purposes, and that second-generation open source like JBoss or Red Hat was adopted because of completeness and support across a broader portion of the stack. But I think what we are now seeing in third-generation open source efforts like Alfresco or LogicBlaze is the enterprise-scale integration and interoperability of components.
Open source combined with open standards avoid vendor lock-in and points the way to a very, very different application and deployment paradigm: identifying, evaluating and glueing, rather than baking the cake each time from scratch.
I have been assembling for some time a listing of semantic Web-related software applications and tools. My first partial listing had about 50 sources. I recently noted the W3C’s semantic Web wiki listing of about 70 sources. I then came across the EU’s AKT (Advanced Knowledge Technologies) project, which also has about 75 tools compiled. Protégé also has a fairly long list of plugins, but not unfortunately well organized. Complicating matters still was the listing of natural language processing tools listed at the Natural Language Software Registy, another fantastic resource particularly in the annotation and information extraction arena.
Semantic Web tool sets span from comprehensive engineering environments to specific converters and editors and the like. The entire workflow extends from getting the initial content, annotating or tagging it according to existing or built ontologies, reconciling heterogeneities, and then storing and managing the RDF or OWL with subsequent querying and inferencing.
There are certainly more tools extant, and I made some choices to exclude some marginal tools (Sourceforge, for example, has more than 200 semantic Web-related projects, but the vast majority appear moribund with no actual software to download).
Thus, listed below, are today’s current, most comprehensive list of 175 semantic Web software tools and applications. I am now further characterizing these offline as to open source v. proprietary and categorizing according to SW-related workflow. I may later post those expansions.
I also welcome tool suggestions. I think the ESW tools listing is the best place ongoing for such a compilation, but so far I am not liking what I am seeing in vendors using hype to characterize their tools versus more dispassionate descriptions by practitioners.
| NAME (URL) | DESCRIPTION |
| 3store | A core C library that uses MySQL to store its raw RDF data and caches, forming an important part of the infrastructure required to support a range of knowledgeable services |
| 4Suite 4RDF | The 4Suite 4RDF is an open-source platform for XML and RDF processing implemented in Python with C extensions |
| ActiveRDF | ActiveRDF is a library for accessing RDF data from Ruby programs. It can be used as data layer in Ruby-on-Rails. You can address RDF resources, classes, properties, etc. programmatically, without queries |
| Adaptiva | A user-centred ontology building environment, based on using multiple strategies to construct an ontology, minimising user input by using adaptive information extraction |
| Aduna Metadata Server | The Aduna Metadata Server automatically extracts metadata from information sources, like a file server, an intranet or public web sites. The Aduna Metadata Server is a powerful and scalable store for metadata |
| AeroText | Entity extraction engine from Lockheed Martin |
| AJAX Client for SPARQL | AJAX Client for SPARQL is a simple AJAX client that can be used for running SELECT queries against a service and then integrating them with client-side Javascript code |
| AKT Research Map | A competence map for members of the AKT project |
| AKT-Bus | An open, lightweight, Web standards-based communication infrastructure to support interoperability among knowledge services. |
| AllegroGraph | Franz Inc’s AllegroGraph is a system to load, store and query RDF data. It includes a SPARQL interface and RDFS reasoning. It has a Java and a Prolog interface |
| Alembic | The Alembic Workbench project from Mitre has as its goal the creation of a natural language engineering environment for the development of tagged corpora |
| Almo | An ontology-based workflow engine in Java |
| Altova SemanticWorks | Visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design |
| Amilcare | An adaptive information extraction tool designed to support document annotation for the Semantic Web. |
| ANNIE – Open Source Information Extraction | An open-source robust information extraction system |
| Aperture | Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems |
| Applications of FCA in AKT | Formal Concept Analysis (FCA) is used in a variety of application scenarios in AKT in order to perform concept-based domain analysis and automatically deduce a taxonomy lattice of that domain. |
| Aqua | AQUA is a system which answer questions written in English. It combines several technologies Natural Language Processing, Logic, Information Retrieval and Ontologies. |
| ARC | ARC is a lightweight, SPARQL-enabled RDF system for mainstream Web projects. It is written in PHP and has been optimized for shared Web environments |
| Armadillo | Exploits the redundancies apparent in the Internet, combining many information sources to perform document annotation with minimal human intervention. |
| ArtEquAkt | A system that automatically extracts information about artists from the web, populates an ontology, then uses the knowledge to generate personalised biographies. |
| Automatic Support for Enterprise Modelling and Workflow | Knowledge management using multi-modelling techniques and how modelling activities may be assisted with automation based on formal methods. |
| BBN OWL Validator | BBN OWL Validator |
| Bibster | A semantics-based bibliographic peer-to-peer system |
| Bossam | Bossam, a rule-based OWL reasoner (free, well-documented, closed-source) |
| Brahms | Brahms is a fast main-memory RDF/S storage, capable of storing, accessing and querying large ontologies. It is implemented as a set of C++ classes |
| BuddySpace | Instant messaging with custom map visualizations, semantics of presence (beyond ‘offline’/'online’/'away’ status) and value-added web services (group alerts, bots, inferences via personal profiles) |
| Callisto | The Callisto annotation tool was developed to support linguistic annotation of textual sources for any Unicode-supported language with annotation support from jATLAS |
| CASD | A tool for producing system architecture diagrams from service and data descriptions. |
| Cerebra Server | A technology platform that is used by enterprises to build model-driven applications and highly adaptive information integration infrastructure; company recently bought by webMethods |
| COCKATOO | A knowledge acquisition tool which can be used to produce a set of cases for use with a Case-Based Reasoning system. |
| COHSE – Conceptual Open Hypermedia Services Environment | COHSE researches methods to improve significantly the quality, consistency and breadth of linking of WWW documents at retrieval and authoring time. |
| CS AKTiveSpace | CS AKTiveSpace is a smart browser interface for a Semantic Web application that provides ontologically motivated information about the UK computer science research community. |
| ClassAKT | A text classification web service for classifying documents according to the ACM Computing Classification System. |
| Compendium | Compendium is a semantic, visual hypertext tool for supporting collaborative domain modelling and real time meeting capture |
| ConRef | A service discovery system which uses ontology mapping techniques to support different user vocabularies |
| ConcepTool | A system to model, analyse, verify, validate, share, combine, and reuse domain knowledge bases and ontologies, reasoning about their implication. |
| Corese | Corese stands for Conceptual Resource Search Engine. It is an RDF engine based on Conceptual Graphs (CG) and written in Java. It enables the processing of RDF Schema and RDF statements within the CG formalism, provides a rule engine and a query engine accepting the SPARQL syntax |
| cwm | The Closed World Machine (CWM) data manipulator, rules processor and query system mostly using using the Notation 3 textual RDF syntax. It also has an incomplete OWL Full and a SPARQL access. It is written in Python |
| Cypher | Cypher Generates RDF and SeRQL representation of natural language statements and phrases |
| D2R Server | D2R Server, turns relational databases into SPARQL endpoints, based on Jena’s Joseki |
| D3E – Digital Document Discourse Environment | D3E enables the easy conversion of websites or structured documents into interactive discussion sites |
| Deep Query Manager | Search federator from deep Web sources |
| DOME | A programmable XML editor which is being used in a knowledge extraction role to transform Web pages into RDF, and available as Eclipse plug-ins. DOME stands for DERI Ontology Management Environment |
| DOSE | A distributed platform for semantic annotation |
| Drive | Drive is an RDF parser written in C# for the .NET platform |
| ekoss.org | A collaborative knowledge sharing environment where model developers can submit advertisements |
| Ellogon | Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, based on the earlier TIPSTER approach |
| Endeca | Facet-based content organizer and search platform |
| Eprep | An add-on for the Eprints document archive which uses text extraction to automatically create the bibliographic metadata needed for the submission of a new document. |
| eServices | The e-Services framework provides advanced scholarly services (in particular visualisations) using distributed metadata. |
| Euler | Euler is an inference engine supporting logic based proofs. It is a backward-chaining reasoner enhanced with Euler path detection. It has implementations in Java, C#, Python, Javascript and Prolog. Via N3 it is interoperable with W3C Cwm |
| ExtrAKT | ExtrAKT is a tool for extracting ontologies from Prolog knowledge bases. |
| F-Life | F-Life is a tool for analysing and maintaining life-cycle patterns in ontology development. |
| FaCT++ | FaCT++ is an OWL DL Reasoner implemented in C++ |
| Fastr | Fastr is a parser for term and variant recognition. Fastr take as input a corpus and a list of terms and ouputs the indexed corpus in which terms and variants are recognized |
| Floodsim | A prototype system which demonstrates the benefits of applying semantically rich service descriptions (expressed using Semantic Web technologies) to Web Services. |
| FOAF-o-matic | Online FOAF generator |
| FOAM | Framework for ontology alignment and mapping |
| Foxtrot | Foxtrot is a recommender system which represents user profiles in ontological terms, allowing inference, bootstrapping and profile visualization. |
| FreeLing | FreeLing is an open source language analysis tool suite. The FreeLing package consists of a library providing language analysis services (such as morphological analysis, date recognition, PoS tagging, etc.) The current version (1.2) of the package provides tokenizing, sentence splitting, morphological analysis, NE detection, date/number/currency recognition, PoS tagging, and chart-based shallow parsing |
| GATE – General Architecture for Text Engineering | GATE is a stable, robust, and scalable open-source infrastructure which allows users to build and customise language processing components, while it handles mundane tasks like data storage, format analysis and data visualisation. |
| Gnowsis | A semantic desktop environment |
| GrOWL | Open source graphical ontology browser and editor |
| HAWK | OWL repository framework and toolkit |
| Heart of Gold | Heart of Gold is a middleware for the integration of deep and shallow natural language processing components. It provides a uniform and flexible infrastructure for building applications that use Robust Minimal Recursion Semantics (RMRS) and/or general XML standoff annotation produced by NLP components |
| HELENOS | A Knowledge discovery workbench for the semantic Web |
| I-X Process Panels | The I-X tool suite supports principled collaborations of human and computer agents in the creation or modification of some product. |
| Identify Knowledge Base | Identify-Knowledge-Base is a tool of Topic Identification about Knowledge Base |
| IF-Map | IF-Map is an Information Flow based ontology mapping method. It is based on the theoretical grounds of logic of distributed systems and provides an automated streamlined process for generating mappings between ontologies of the same domain. |
| ILP for Information Extraction | To overcome the knowledge acquisition bottleneck, we apply Inductive Logic Programming techniques to learn Information Extraction rules. |
| Internet Reasoning Service | The Internet Reasoning Service provides a a number of tools which supports the publication, location, composition and execution of heterogeneous web services, specified using semantic web technology |
| IODT | IBM’s toolkit for ontology-driven development |
| IsaViz | IsaViz is a visual authoring tool for browsing and authoring RDF models represented as graphs. Developed by Emmanuel Pietriga of W3C and Xerox Research Centre Europe. |
| Jambalaya | Protégé plug-in for visualizing ontologies |
| Jastor | Open source Java code generator that emits Java Beans from ontologies |
| Javascript RDF/Turtle parser | Javascript RDF/Turtle parser, can be used with Jibbering |
| Jena | Jena is a Java framework to construct Semantic Web Applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. It also has the ability to be used as an RDF database via its Joseki layer. See the jena discussion list for more information |
| Jibbering | Jibbering, a simple javascript RDF Parser and query thingy |
| Joseki | Jena’s Joseki layer offers an RDF Triple Store facility with SPARQL interface (see also the entry on Jena) |
| JRDF | JRDF Java RDF Binding is an attempt to create a standard set of APIs and base implementations to RDF using Java. Includes a SPARQL GUI. |
| KAON | Open source ontology management infrastructure |
| KAON2 | KAON2 is an an infrastructure for managing OWL-DL, SWRL, and F-Logic ontologies. it is capable of manipulating OWL-DL ontologies; queries can be formulated using SPARQL |
| Kazuki | Generates a java API for working with OWL instance data directly from a set of OWL ontologies |
| KIM Platform | KIM is a software platform for the semantic annotation of text, automatic ontology population, indexing and retrieval, and information extraction from Ontotext |
| KnoZilla | |
| Knowledge Broker | The knowledge broker addresses the problem of knowledge service location in distributed environments. |
| Kowari | Open source database for RDF and OWL |
| KRAFT – I-X TIE | Supports collaboration among members of a virtual organisation by integrating workflow and communication technology with constraint solving. |
| LingPipe | LingPipe is a suite of Java tools designed to perform linguistic analysis on natural language data. LingPipe’s flexibility and included source make it appropriate for research use. Version 1.0 tools include a statistical named-entity detector, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine |
| LinguaStream | LinguaStream is an integrated experimentation environment (IEE) targeted to researchers in Natural Language Processing. LinguaStream allows processing streams to be assembled visually, picking individual components in a “palette” (the standard set contains about fifty components, and is easily extensible using a Java API, a macro-component system, and templates). Some components are specifically targeted to NLP, while others solve various issues related to document engineering (especially to XML processing). Other components are to be used in order to perform computations on the annotations produced by the analysers, to visualise annotated documents, to generate charts, etc. |
| LinKFactory | Language & Computing’s LinKFactory is an ontology management tool, it provides an effective and user-friendly way to create, maintain and extend extensive multilingual terminology systems and ontologies (English, Spanish, French, etc.). It is designed to build, manage and maintain large, complex, language independent ontologies. |
| Lucene | Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. It is open source |
| LuMriX | A commercial search engine using semantic Web technologies |
| Magpie | Magpie supports the interpretation of web documents through on-the-fly ontologically based enrichment. Semantic services can be invoked either by the user or be automatically triggered by patterns of browsing activity |
| Melita | Melita is a semi-automatic annotation tool using an Adaptive Information Extraction engine (Amilcare)to support the user in document annotation. |
| MetaMatrix | Semantic vocabulary mediation and other tools |
| Metatomix | Commercial semantic toolkits and editors |
| MindRaider | Open source semantic Web outline editor |
| MnM | MnM is an annotation tool which provides both automated and semi-automated support for annotating web pages with semantic contents. MnM integrates a web browser with an ontology editor and provides open APIs to link to ontology servers and for integrating information extraction tool |
| Model Futures OWL Editor | Simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports |
| Mulgara | The Mulgara Semantic Store is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage and retrieval of RDF, written in Java. It is an active fork of Kowari |
| Muskrat-II | Given a set of knowledge bases and problems solvers, the Muskrat system will try to identify which knowledge bases could be combined with which problems solvers to solve a given problem. |
| MyPlanet | MyPlanet allows users to create a personalised version of a web based newsletter using an ontologically based profile. |
| Net OWL | Entity extraction engine from SRA International |
| NMARKUP | NMARKUP helps the user build ontologies by detecting nouns in texts and by providing support for the creation of an ontology based on the entities extracted. |
| Nokia Semantic Web Server | An RDF based knowledge portal for publishing both authoritative and third party descriptions of URI denoted resources |
| ONTOCOPI | A tool which uncovers Communities Of Practise by analysing the connectivity of instances in the 3store knowledge base. |
| OntoEdit/OntoStudio | Engineering environment for ontologies |
| OntoMat Annotizer | Interactive Web page OWL and semantic annotator tool |
| OntoPortal | Enables the authoring and navigation of large semantically-powered portals |
| OpenLink Data Spaces (ODS) | ODS is a distributed collaborative application platform for creating Semantic Web applications such as: blogs, wikis, feed aggregators, etc., with built-in SPARQL support and incorporation of shared ontologies such as SIOC, FOAF, and Atom OWL. ODS is an application of OpenLink Virtuoso and is available in Open Source and Commercial Editions |
| Oracle Spatial 10g | Oracle Spatial 10g includes an open, scalable, secure and reliable RDF management platform |
| Oyster | Peer-to-peer system for storing and sharing ontology metadata |
| OWL Consistency checker | OWL Consistency checker (based on Pellet) |
| OWL-DL Validator | WonderWeb OWL-DL Validator |
| OWLJessKB | OWLJessKB is a description logic reasoner for OWL. The semantics of the language is implemented using Jess, the Java Expert System Shell. Currently most of the common features of OWL lite, plus some and minus some |
| OWLIM | OWLIM is a high-performance semantic repository, packaged as a Storage and Inference Layer (SAIL) for the Sesame RDF database |
| OWLViz | OWLViz is visual editor for OWL and is available as a Protégéplug-in |
| Pellet | Pellet is an open-source Java based OWL DL reasoner. It can be used in conjunction with both Jena and OWL API libraries; it can also be downloaded and be included in other applications |
| Piggy Bank | A Firefox-based semantic Web browser |
| Pike | A dynamic programming (scripting) language similar to Java and C for the semantic Web |
| pOWL | Semantic Web development platform |
| Protégé | Open source visual ontology editor written in Java with many plug-in tools |
| RACER | A collection of Projects and Tools to be used with the semantic reasoning engine RacerPro |
| RacerPro | RacerPro is an OWL reasoner and inference server for the Semantic Web |
| rdfabout.com’s Validator | RDF/XML and N3 validator |
| RDF Gateway | Intellidimension’s RDF Gateway is an RDF Triple database with RDFS reasoning and SPARQL interface |
| RDF InferEd | Intellidimension’s RDF InferEd is an authoring environment with the ability to navigate and edit RDF documents |
| RDFLib | RDFLib, an RDF libary for Python, including a SPARQL API. The library also contains both in-memory and persistent Graph backends |
| RDFReactor | Access RDF from Java using inferencing |
| RDF Server | The RDF server of the PHP RAP environment |
| RDFStore | RDFStore is an RDF storage with Perl and C API-s and SPARQL facilities |
| RDFSuite | The ICS-FORTH RDFSuite open source, high-level scalable tools for the Semantic Web. This suite includes Validating RDF Parser (VRP), a RDF Schema Specific DataBase (RSSDB) and supporting RDF Query Language (RQL) |
| Redland | The Redland RDF Application Framework is a set of free software libraries that provide support for RDF. It provides parser for RDF/XML, Turtle, N-triples, Atom, RSS; has a SPARQL and GRDDL implementation, and has language interfaces to C#, Python, Obj-C, Perl, PHP, Ruby, Java and Tcl |
| RelationalOWL | Automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OW |
| ReTAX+ | ReTAX is an aide to help a taxonomist create a consistent taxonomy and in particular provides suggestions as to where a new entity could be placed in the taxonomy whilst retaining the integrity of the revised taxonomy (c.f., problems in ontology modelling). |
| Refiner++ | REFINER++ is a system which allows domain experts to create and maintain their own Knowledge Bases, and to receive suggestions as to how to remove inconsistencies, if they exist. |
| Seamark Navigator | Siderean’s Seamark Navigator provides a platform to combine Web search pages with product catalog databases, document servers, and other digital information from both inside and outside the enterprise |
| Semantic Annotation with MnM | MnM is a semantic annotation tool which provides manual, automated and semi-automated support for annotating web pages with ‘semantics’, i.e., machine interpretable descriptions. |
| Semantical | Open source semantic Web search engine |
| SemanticWorks | A visual RDF/OWL Editor from Altova |
| Semantic Mediawiki | Semantic extension to the MediaWiiki wiki |
| Semantic Net Generator | Utility for generating topic maps automatically |
| SemWeb | SemWeb for .NET supports persistent storage in MySQL, Postgre, and Sqlite; has been tested with 10-50 million triples; supports SPARQL |
| Sesame | Sesame is an open source RDF database with support for RDF Schema inferencing and querying. It offers a large scale of tools to developers to leverage the power of RDF and RDF Schema |
| SMART | System for Managing Applications based on RDF Technology |
| SMORE | OWL markup for HTML pages |
| SPARQL | Query language for RDF |
| SPARQLer | SPARQL query demo and service |
| SPARQLette | A SPARQL demo query service |
| SPARQL JavaScript Library | SPARQL JavaScript Library interfaces to the SPARQL Protocol and interpret the return values as part of an AJAX framework |
| SWCLOS | A semantic Web processor using Lisp |
| SWI-Prolog | SWI-Prolog is a comprehensive Prolog environment, which also includes an RDF Triple store. There is also a separate Prolog library to handle OWL |
| Swish | Swish is a framework for performing deductions in RDF. It has similar features to CWM. It is written for Haskell developers |
| Swoogle | A semantic Web search engine with 1.5 M resources |
| SWOOP | A lightweight ontology editor |
| TopBraid Composer | Top Quandrant’s TopBraid Composer is a complete standards-based platform for developing, testing and maintaining Semantic Web applications |
| Tucana Suite | Northrop Grumman’s Tucana Suite is an industrial quality version of the Kowari metastore |
| Turtle | Terse RDF “Triple” language |
| Visualisations for the CS AKTive Portal | Maps are used to geographically illustrate knowledge from the Triplestore, such as highlighting the locations in the UK that are active in a particular research area. |
| VisuaText | VisualText ® is an integrated development environment for building information extraction systems, natural language processing systems, and text analyzers |
| W3C’s RDF Validator | W3C’s RDF Validator |
| WebOnto | WebOnto supports the browsing, creation and editing of ontologies through coarse grained and fine grained visualizations and direct manipulation. |
| Wilbur | Wilbur is lisp based toolkit for Semantic Web Programming. Wilbur is Nokia Research Center’s toolkit for programming Semantic Web applications that use RDF written in Common Lisp |
| WSMO Studio | A semantic Web service editor compliant with WSMO as a set of Eclipse plug-ins |
| WSMT Toolkit | The Web Service Modeling Toolkit (WSMT) is a collection of tools for use with the Web Service Modeling Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX) |
| WSMX | Execution environment for dynamic use of semantic Web services |
| XML Army Knife | XML Army Knife |
| XMP | A labeling technology from Adobe that enables data about a file to be embedded as metadata into the file itself. |
| YARS | YARS (Yet Another RDF Store) is a data store for RDF in Java and allows for querying RDF based on a declarative query language, which offers a somewhat higher abstraction layer than the APIs of RDF toolkits such as Jena or Redland |
| Zotero | Firefox add-in (in development) that allows the auto-completion of online citations |
John Newton (co-founder formerly of Documentum, now of Alfresco) puts a telling marker on the table in his recent post on the Commoditization of ECM. Though noting the term "enterprise content management" did not even exist prior to 1998, he goes on to observe that expansion of the definition of what was appropriate in ECM and the consolidation of the leading players occurred rapidly. He concludes that this process has commoditized the market, with competitive differentiation now based on market size rather than functionality. The platforms from the leading IBM, Microsoft and EMC-Documentum vendors all can manage documents, Web content, images, forms and records via basic library services, metadata management, search and retrieval, workflow, portal integration, and development kits.
If such consolidation and standardization of functionality were Newton’s only point one could say, “ho, hum,” such has been true in all major enterprise software markets.
But, in my reading, he goes on to make two more important and fundamental points, both of which existing enterprise software vendors ignore at their peril.
Poor Foundations and Poor Performance
Newton notes that ECM applications are never bought based on the nature of their repositories, but an inefficient repository can result in the rejection of the system. He also acknowledges that ECM installations are costly to set up and maintain, difficult to use, poorly performing and lack essential automation (such as classification). (Kind of sounds like most enterprise software initiatives, doesn’t it?)
Indeed, I have repeatedly documented these gaps for virtually all large-scale document-centric or federated applications. The root cause — besides rampant poor interface designs — has been in my opinion poorly suited data management foundations. Relational or IR-based systems both perform poorly for different reasons in managing semi-structured data. This problem will not be solved by open source per se (see below), though there are some interesting options emerging from open source that may point the way to new alternatives, as well as incipient designs from BrightPlanet and others.
The Proprietary Killers of Open Standards and Open Source
Service-oriented architectures (SOA), the various Web services standards (WS**), the certain JSRs (170 and 283 in documents, but also 168 and others), plus all of the various XML and semantic derivatives are moving rapidly with the very real prospect of “pluggability” and the substitution of various packages, components and applications across the entire enterprise stack.
In quoting Newton’s case at Alfresco, by aggregating these existing open source components they were able to get their ECM product ready in less than one year:
- Spring – A framework that provides the wiring of the repository and the tools to extend capabilities without rebuilding the repository (Aspect-Oriented Programming)
- Hibernate – An object-relational mapping tool that stores content metadata in database and handles all the idiosyncrasies of each SQL dialect
- Lucene – An internet-scale full-text and general purpose information retrieval engine that supports federated search, taxonomic, XML and full-text search
- EHCache – Distributed intelligent caching of content and metadata in a loosely coupled environment
- jBPM – A full featured enterprise production workflow and business process engine that includes BPEL4WS support
- Chiba – A complete Xforms interface that can be used for the configuration and management of the repository
- Open Office – Provides a server-based and Linux-compatible transformation of MS Office based content
- ImageMagic – Supports transformation and watermarking of images.
Moreover, the combination of these components led to an inherent architecture including pluggable modules, rules and templating engines, workflow and business process management, security, and other enterprise-level capabilities. In prior times, I estimate no proprietary-based vendor could have accomplished this for ten times or more the effort.
Similar Trends and Challenges in the Entire Enterprise Space
Newton is obviously well placed to comment on these trends within ECM. But similar trends can be seen in every major enterprise software space. For virtually every component one can imagine, there is a very capable open source offering. Many of the newer open source ventures are indeed centered around aggregating and integrating various open source components followed by either dual-source licensing or support services as the basis of their business models. At its most extreme, this trend has expanded to the whole process of enterprise application integration (EAI) itself through offerings such as LogicBlaze FUSE with its SOA-oriented standards and open source components. Initiatives such as SCA (service component architecture) will continue to fuel this trend.
So, enterprise software vendors, listen to your wake up call. It is as if gold dubloons, pearls and jewels are laying all of the floor. If you and your developers don’t take the time to bend over and pick them up, someone else will. As Joel Mokyr has compellingly researched, the innovation of systems or how to integrate pieces can be every bit as important as the ‘Aha!’ discovery. Open source is now giving a whole new breed of bakers new ingredients for baking the cake.
| NOTE: I have posted a major cleanup and update of what is now called the Advanced TinyMCE Editor, tested beginning with WP v. 2.2. Obtain the plug-in download and documentation HERE. The update announcement is now the best place to post new online comments and discussion. Let me know what you think! MKB |
Author's Note: There is a zipped plugin, code and documentation that supports the information in this post, which will allow you to extend the functionality of your TinyMCE rich text editor in WordPress; for immediate instructions see the end of the post below.
Click here to download the zipped file (101 KB)
My most recent post was about the smooth upgrade to WordPress v. 2.0.4 for my blog software and noted my popular Comprehensive Guide to a Professional Blog Site recounting my own experiences setting up, configuring and maintaining my own blog site. A key aspect of that earlier Guide dealt with (what I perceived to be) an oversight in older versions of WordPress that lacked a bundled WYSIWYG editor. For my own site and installation, I had chosen the Xinha editor, and had devoted a number of entries in the Guide to its configuration and use.
However, as of WordPress version 2x, the developers have now chosen to bundle the proven Javascript rich text editor, TinyMCE, as part of the standard distribution package. Since I had come to rue some of the aspects of Xinha in my earlier implementation (namely, bad HTML for carriage returns and VERY slow times when publishing a post), I decided to give TinyMCE a go as my new replacement editor.
(Actually, this was not such a major shift since we had adopted a sibling TinyMCE application, the Joomla Content Editor (JCE), for the Joomla-based BrightPlanet corporate Web site.)
As implemented, the TinyMCE editor in WordPress is configured more akin to the prior QuickTags feature set, with the few available editing functions being bold, underline, bullets, text alignment, and so forth. Here is a screen shot from my WordPress administration center with TinyMCE as delivered with WordPress v. 2.0.4:

The only problem is that I have become used to editing support for items such as tables, image manipulation, special characters, font types, and so forth. While I (generally) edit and clean up the HTML before final posting, I very much enjoy the productivity benefits of a more full-featured WYSIWYG editor. So, the rhetorical question to myself was: If I’m going to use TinyMCE, how can I extend its functionality?
The Investigations Begin
Having been familiar with other TinyMCE instantiations, I began my investigations with the (as it turns out naà ¯ve) assumption that upgrading to a full featured TinyMCE would be a snap. Boy, was I wrong.
I first began with the TinyMCE Web site itself checking out the standard distribution package. Like many open source sites, I found the online documentation for TinyMCE to be fragmented, incomplete and hard to navigate. I looked under the ‘Plugins’ tab and found it was documentation for developers in creating a new plugin. My first lead came from the online manual (which can also be downladed for local browsing) and its reference to installation options, specifically these options at the bottom of that page:
Bingo! Clearly, TinyMCE had the advanced features I was seeking and they were packaged as part of the direct TinyMCE distribution to boot! Now I assumed my only needed step was to find how to “turn on” these features in my WordPress installation.
What Was Learned
This line of thinking led me to an unfortunate waste of time in Web search and poking through the forums at both the TinyMCE and WordPress sites. It became clear that the TinyMCE integration in WordPress was both highly tailored and limited to just the simple functionality (Example 00 above). I saw references by others to the “wisdom” of the WordPress developers to making this choice and therefore reducing the overall size of the WordPress download, but I don’t see it that way. It seems rather arbitrary and taking available choices from the user by unilaterally “whittling down” a more fully featured option from Moxiecode. Oh, well.
One dead end among many I pursued was instructions from the TinyMCE staff on integrating Moxiecode’s commercial plugins. That reference — http://tinymce.moxiecode.com/downloads/integration/ — got me way too into specific WordPress code that I was also unable to modify for my specific plugin purposes (though perhaps more capable programmers could have seen a clear path). I also found many requests but little guidance from the WordPress forums.
The first breakthough occurred on the TinyMCE Wishlist postings on the WordPress forum, which led me to the Advanced WYSIWYG plugin by Assaf Arkin of Labnotes. Part of the problem in finding this in the first place was that the actual plugin file name was misspelled as “advacned-wysiwyg”. So, I followed the instructions for the plugin and, voilÃ, it didn’t work!
Grr! More investigation indicated that the likely problem resided in new version 2x plugins for TinyMCE as NOT working with the Advanced WYSIWYG plugin. As Paul Finch reported on the Labnotes site, reverting back to earlier advanced plugins for TinyMCE in versions 1.45 and earlier, which could be found on the Sourceforge download site, solved the problem.
As indeed it does, as this updated editor on my blog administration panel shows:

These “standard” advanced plugins for TinyMCE provide possible functionality beyond the simple installation (marked with an asterisk [*]) (also, ones I could not get to work — but I did not test all of them! — are shown with double asterisks [**]) for:
Default buttons available in the advanced theme:
* bold
italic
* underline
* strikethrough
* justifyleft
* justifycenter
* justifyright
* justifyfull
* bullist
* numlist
outdent
* indent
cut
copy
paste
* undo
* redo
* link
* unlink
* image
cleanup
help
* code
hr
removeformat
formatselect
fontselect
fontsizeselect
styleselect
sub
sup
forecolor
backcolor
charmap
visualaid
anchor
newdocument
separator
Plugins with the button name same as plugin name:
save
emotions
flash
iespell
preview
print
zoom
fullscreen
advhr
fullpage
spellchecker
Plugins with custom buttons:
advlink (will override the “link” button)
advimage (will override the “image” button)
** style
Early Use Observations
With one major exception — and it is MAJOR! — I have generally been pleased with the new TinyMCE editor in its full functionality version. I have been working with it for nearly a week and have completed four or five published posts. The writing of posts is now much quicker. There are no longer problems with line breaks and paragraph formatting. For most functionality, the editor just feels more “solid” than my previous Xinha editor. For all of that, I am very thankful.
The major issue I have encountered is with long posts (such as this one), particularly when I am toggling between the code (HTML) view and WYSIWYG view. Without warning, I will suddenly lose entire portions of text at the bottom of the post. This appears to be either strictly a TinyMCE issue or perhaps an issue related to my Firefox browser that others have noted on the WordPress forum.
Best practices, as I have reported on elsewhere and as part of my Guide, generally suggest drafting long posts external to WordPress anyway, though the loss of any work is distressing. I will monitor this “long posting” issue carefully, and until I see a resolution I will likely save to the clipboard or take other steps to prevent future losses.
Specific Upgrade Instructions
So, because I have generally been pleased with these extensions, I thought I would package and write them up for others to use, saving you the fits and starts I went through. The download at the top of this post includes the instructions and all files noted below. The instructions are included as the readme.txt file in the package. I also chose to make some minor updates to plugin operation (better sizing of popups, for example) and also corrected the spelling error in the file name and allowed for multi-line bullets for the extended TinyMCE in the Advanced WYSIWYW plugin. All of these changes, plus the vetted TinyMCE ver. 1.45 advanced plugins, are included in the distribution. Please note this information is being provided “as is”; you can also only do this if you have direct file access to your WordPress installation.
1. Download the enclosed zip file and unzip it to a clean subdirectory; these instructions are repeated in the enclosed readme.txt file.
2. If you don’t like the button order shown in the image above, you may remove buttons or change ordering or add or remove separator bars by editing the advanced-wysiwyg.php file:
< ?php
/*
Plugin Name: Advanced WYSIWYG Editor
Plugin URI: http://www.labnotes.org/
Description: Adds more styling options to the WYSIWYG post editor, updated for multi-line buttons.
Version: 0.3
Author: Assaf Arkin
Author URI: http://labnotes.org/
License: Creative Commons Attribution-ShareAlike
Tags: wordpress tinymce
*/
if (isset($wp_version)) {
add_filter(“mce_plugins”, “extended_editor_mce_plugins”, 0);
add_filter(“mce_buttons”, “extended_editor_mce_buttons”, 0);
add_filter(“mce_buttons_2″, “extended_editor_mce_buttons_2″, 0);
add_filter(“mce_buttons_3″, “extended_editor_mce_buttons_3″, 0);
}
function extended_editor_mce_plugins($plugins) {
array_push($plugins, “table”, “fullscreen”, “searchreplace”, “advhr”, “advimage”);
return $plugins;
}
function extended_editor_mce_buttons($buttons) {
return array(
“undo”, “redo”, “separator”, “cut”, “copy”, “paste”, “separator”, “bold”, “italic”, “underline”, “strikethrough”, “separator”,
“bullist”, “numlist”, “separator”, “indent”, “outdent”, “separator”,
“justifyleft”, “justifycenter”, “justifyright”, “justifyfull”, “separator”,
“sub”, “sup”, “charmap”, “hr”, “advhr”,”separator”, “link”, “unlink”, “anchor”, “separator”,
“code”, “cleanup”, “separator”, “search”, “replace”, “separator”, “wphelp”);
}
function extended_editor_mce_buttons_2($buttons) {
// the second toolbar line
return array(
“formatselect”, “fontselect”, “fontsizeselect”, “styleselect”, “separator”, “forecolor”, “backcolor”, “separator”,”removeformat”);
}
function extended_editor_mce_buttons_3($buttons) {
// These are the buttons for third toolbar line
return array(
“image”, “separator”, “tablecontrols”, “separator”, “fullscreen”, “wordpress”);
}
?>
3. Copy the resulting advanced-wysiwyg.php file into your standard WordPress plugins directory (wp-content\plugins)
4. Copy all files from the extracted plugins subdirectory to the TinyMCE plugins subdirectory in your WordPress directory (wp-includes\js\tinymce\plugins)
5. Under ‘Plugins’ in your WordPress administrative center, ‘activate’ the Advanced WYSIWYG Editor plugin
6. Now, when you write or manage posts or pages you will have the extended TinyMCE functionality available
7. Enjoy!