The CKC Challenge Highlights A New Generation of Semantic Web Tools
Many predict — and I concur — collaborative methods to add rigor and structure to tagging and other Web 2.0 techniques will be one of the next growth areas for the semantic Web. Under the leadership of the University of Southampton, Stanford University and the University of Karlsruhe, the Collaborative Knowledge Construction (CKC) Challenge has been designed to seek use and feedback on this new generation of semantic Web collaboration tools.
Anyone is welcomed to register and participate during the challenge test period of April 16 – 30, with recognition to the most active and most insightful testers. The candidate tools are:
Some of these tools are quite new and some I need to add to my Sweet Tools listing. The CKC Challenge Web site has nice write-ups, screen shots, and further information on these tools.
Results from the challenge will be discussed at the broader Workshop on Social and Collaborative Construction of Structured Knowledge at the 16th International World Wide Web Conference (WWW2007) in Banff, Canada, on May 8, 2007. As part of the general program, Jamie Taylor of Metaweb will also give an invited talk.
CKC Challenge participants do not need to attend in Banff to be eligible for recognition; all results and feedback will be made public by the Challenge organizers.
In response to my review last week of OpenLink‘s offerings, Kingsley Idehen posted some really cool examples of what extracted RDF looks like. Kingsley used the content of my review to generate this structure; he also provided some further examples using DBpedia (which I have also recently discussed).
What is really neat about these examples is how amazingly easy it is to create RDF, and the “value added” that results when you do so. Below, I again discuss structure and RDF (only this time more on why it is important), describe how to create it from standard Web content, and link to a couple of other recent efforts to structurize content.
Why is Structure Important?
The first generation of the Web, what some now refer to as Web 1.0, was document-centric. Resources or links most always referred to the single Web page (or document) that displayed in your browser. Or, stated another way, the basic, most atomic unit of organizing information was the document. Though today’s so-called Web 2.0 has added social collaboration and tagging, search and display still largely occurs by this document-centric mode.
Yet, of course, a document or Web page almost always refers to many entities or objects, often thousands, which are the real atomic units of information. If we could extract out these entities or objects — that is the structure within the document, or what one might envision as the individual Lego © bricks from which the document is constructed — then we could manipulate things at a more meaningful level, a more atomic and granular level. Entities now would become the basis of manipulation, not the more jumbled up hodge-podge of the broader documents. Some have called this more atomic level of object information the “Web of data,” others use “Web 3.0.” Either term is OK, I guess, but not sufficiently evocative in my opinion to explain why all of this stuff is important.
So, what does this entity structure give us, what is this thing I’ve been calling the structured Web?
First, let’s take simple search. One problem with conventional text indexing, the basis for all major search engines, is the ambiguity of words. For example, the term ‘driver‘ could refer to a printer driver, Big Bertha golf driver, NASCAR driver, a driver of screws, family car driver, a force for change, or other meanings. Entity extraction from a document can help disambiguate what “class” (often referred to as a “facet” when applied to search or browsing) is being discussed in the document (that is, its context) such that spurious search results can be removed. Extracted structure thus helps to filter unwanted search results.
Extracted entities may also enable narrowing search requests to say, documents only published in the last month or from certain locations or by certain authors or from certain blogs or publishers or journals. Such retrieval options are not features of most search engines. Extracted structure thus adds new dimensions to characterize candidate results.
Second, in the structured Web, the basis of information retrieval and manipulation becomes the entity. Assembling all of the relevant information — irrespective of the completeness or location of source content sites — becomes easy. In theory, with a single request, we could collate the entire corpus of information about an entity, say Albert Einstein, from life history to publications to Nobel prizes to who links to these or any other relationship. The six degrees of Kevin Bacon would become child’s play. But sadly today, knowledge workers spend too much of their time assembling such information from disparate sources, with notable incompleteness, imprecision and inefficiency.
Third, the availability of such structured information makes for meaningful information display and presentation. One of the emerging exemplars of structured presentation (among others) is ZoomInfo. This, for example, is what a search on my name produces from ZoomInfo’s person search:
Granted, the listing is a bit out of date. But it is a structured view of what can be found in bits and pieces elsewhere about me, being built from about 50 contributing Web sources. And, the structure it provides in terms of name, title, employers, education, etc., is also more useful than a long page of results links with (mostly) unusable summary abstracts.
Presentations such as ZoomInfo’s will become common as we move to structured entity information on the Web as opposed to documents. And, we will see it for many more classes of entities beyond the categories of people, companies or jobs used by ZoomInfo. We are, for example, seeing such liftoff occurring in other categories of structured data within sources like DBpedia.
Fourth, we can get new types of mash-ups and data displays when this structure is combined, from calendars to tabular reports to timelines, maps, graphs of relatedness and topic clustering. We can also follow links and “explore” or “skate” this network of inter-relatedness, discovering new meanings and relationships.
And, fifth, where all of this is leading to is, of course, the semantic Web. We will be able to apply descriptive logic and draw inferences based on these relationships, resulting in the derivation of new information and connections not directly found in any of the atomic parts. However, note that much value still comes from the first areas of the structured Web alone, achievable immediately, short of this full-blown semantic Web vision.
OK, So What is this RDF Stuff Again?
As my earlier DBpedia review described, RDF — Resource Description Framework — is the data representation model at the heart of these trends. It uses a “triple” of subject-predicate-object, as generally defined by the W3C’s standard RDF model, to represent these informational entities or objects. In such triples, subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. (You can think of subjects and objects as nouns, predicates as verbs, and even think of the triples themselves as simple Dick-and-Jane sentences from a beginning reader.)
Resources are given a URI (as may also be given to predicates or objects that are not specified with a literal) so that there is a single, unique reference for each item. (OK, so here’s a tip: the length and complexity of the URIs themselves make these simple triple structures appear more complicated then they truly are! ‘Dick‘ seems much more complicated when it is expressed as http://www.dick-is-the-subject-of-this-discussion.com/identity/dickResolver/OpenID.xml.)
These URI lookups can themselves be an individual assertion, an entire specification (as is the case, for example, when referencing the RDF or XML standards), or a complete or partial ontology for some domain or world-view. While the RDF data is often stored and displayed using XML syntax, that is not a requirement. Other RDF forms may include N3 or Turtle syntax, and variants or more schematic representations of RDF also exist.
Here are some sample statements (among a few hundred generated, see later) from my reference blog piece on OpenLink that illustrate RDF triples:
| Subject | Predicate | Object |
| http://www.mkbergman.com/?p=355 | http://purl.org/dc/terms/created | 2007-04-16T19:33:50Z |
| http://www.mkbergman.com/?p=355 | http://rdfs.org/sioc/ns#link | http://www.mkbergman.com/?p=355 |
| http://www.mkbergman.com/?p=355 | http://purl.org/dc/elements/1.1/title | OpenLink Plugs the Gaps in the Structured Web |
| http://www.mkbergman.com/?p=355 | http://rdfs.org/sioc/ns#topic | http://www.mkbergman.com/?cat=2 |
| http://en.wikipedia.org/wiki/SIOC | http://www.w3.org/2000/01/rdf-schema#label | SIOC |
The first four items have the post itself as the subject. The last statement is an entity referenced within my subject blog post. In all cases, the specific subjects of the triple statements are resources.
In all statements, the predicates point to reference URIs that precisely define the schema or controlled vocabularies used in that triple statement. For readability, such links are sometimes aliased, such as created at (time), links to, has title, is within topic, and has label, respectively, for the five example instances. These predicates form the edges or connecting lines between nodes in the conceptual RDF graph.
Lastly, note that the object, the other node in the triple besides the subject, may be either a URI reference or a literal. Depending on the literal type, the material can be full-text indexed (one triple, for example, may point to the entire text of the blog posting, while others point to each post image) or can be used to mash-up or display information in different display formats (such as calendars or timelines for date/time data or maps where the data refer to geo-coordinates).
[Depending on provenance, source format, use of aliases, or other changes to make the display of triples more readable, it may at times be necessary to "dereference" what is displayed to obtain the URI values to trace or navigate the actual triple linkages. Deferencing in this case means translating the displayed portion (the "reference") of a triple to its actual value and storage location, which means providing its linkable URI value. Note that literals are already actual values and thus not "dereferenced".]
The absolutely great thing about RDF is how well it lends itself through subsequent logic (not further discussed here) to map and mediate concepts from different sources into an unambiguous semantic representation [my 'glad' == (is the same as) your 'happy' OR my 'glad' is your 'glad']. Further, with additional structure (such as through RDF-S or the various dialects of OWL), drawing inferences and machine reasoning based on the data through more formal ontologies and descriptive logics is also possible.
How is This Structure Extracted?
The structure extraction necessary to construct a RDF “triple” is thus pivotal, and may require multiple steps. Depending on the nature of the starting content and the participation or not of the site publisher, there is a range of approaches.
Generally, the highest quality and richest structure occurs when the site publisher provides it. This can be done either through various APIs with a variety of data export formats, in which case various converters or translators to canonical RDF may be required by the consumers of that data, or in the direct provision of RDF itself. That is why the conversion of Wikipedia to RDF (done by DBPedia or System One with Wikipedia3) is so helpful.
I anticipate beyond Freebase that other sources, many public, will also become available as RDF or convertible with straightforward translators. We are at the cusp of a veritable explosion of such large-scale, high-quality RDF sources.
The next level of structure extractors are “RDFizers.” These extractors take other internal formats or metadata and convert them to RDF. Depending on the source, more or less structure may be extractable. For example, publishing a Web site with Dublin Core metadata or providing SIOC characterization for a blog (both of which I do for this blog site with available plugins, especially SIOC Plugin by Uldis Bojars or the Zotero COinS Metadata Exposer by Sean Takats), adds considerable structure automatically. For general listings of RDFizers, see my recent OpenLink review or the MIT Simile RDFizer site.
We next come to the general area of direct structure extraction, the least developed — but very exciting — area of gleaning RDF structure. There is a spectrum of challenges here.
At one end of the specturm are documents or Web sources that are published much like regular data records. Like the ZoomInfo listing above or category-oriented sites such as Amazon, eBay or the Internet Movie Data Base (IMDb), information is already presented in a record format with labels and internal HTML structure useful for extraction purposes.
Most of the so-called Web “wrappers” or extractors (such as Zotero’s translators or the Solvent extractor associated with Simile’s Semantic Bank and Piggy Bank) evaluate a Web page’s internal DOM structure or use various regular expression filters and parsers to find and extract info from structure. It is in this manner, for example, that an ISBN number or price can be readily extracted from an Amazon book catalog listing. In general, such tools rely heavily on the partial structure within semi-structured documents for such extractions.
The most challenging type of direct structure extraction is from unstructured documents. Approaches here use a family of possible information extraction (IE) techniques including named entity extraction (proper names and places, for example), event detection, and other structural patterns such as zip codes, phone numbers or email addresses. These techniques are most often applied to standard text, but newer approaches have emerged for images, audio and video.
While IE is the least developed of all structural extraction approaches, recent work is showing it to be possible to do so at scale with acceptable precision, and via semi-automated means. This is a key area of development with tremendous potential for payoff, since 80% to 85% of all content falls into this category.
Structure Extraction with OpenLink is a Snap
In the case of my own blog, I have a relatively well-known framework in WordPress with two resident plugins noted above that do automatic metadata creation in the background. With this minimum of starting material, Kingsley was able to produce two RDF extractions from my blog post using OpenLink’s Virtuoso Sponger (see earlier post). Sponger added to the richness of the baseline RDF extraction by first mapping to the SIOC ontology, followed then by mapping to all tags via the SKOS (simple knowledge organization structure) ontology and to all Web documents via the FOAF (friend-of-a-friend) ontology.
In the case of Kingsley’s first demo using the OpenLink RDF Browser Session (which gives a choice of browser, raw triples, SVG graph, Yahoo map or timeline views), you can do the same yourself for any URL with these steps:
It is that simple. You really should use the demo directly yourself.
But here is the graph view for my blog post (note we are not really mashing up anything in this example, so the RDF graph structure has few external linkages, resulting in an expected star aspect with the subject resource of my blog post at the center):
If you then switch to the raw triples view, and are actually working with the live demo, you can click on any URI link within a triple and get a JavaScript popup that gives you these further options:
The ‘Explore’ option on this popup enables you to navigate to that URI, which is often external, and then display its internal RDF triples rather than the normal page. In this manner, you can “skate” across the Web based on any of the linkages within the RDF graph model, navigating based on objects and their relationships and not documents.
The second example Kingsley provided for my write-up was the Dynamic Data Web Page. To create your own, follow these steps:
Again, it is that simple. Here is an example screenshot from this demo (a poor substitute for working with the live version):
This option, too, also includes the ‘Explore’ popup.
These examples, plus other live demos frequently found on Kingsley’s blog (none of which requires more than your browser), show the power of RDF structuring and what can be done to view data and produce rich interrelationships “on the fly”.
Come, Join in the Fun
With the amount of RDF data now emerging, rapid events are occurring in viewing and posting such structure. Here are some options for you to join in the fun:
Michiel Hildebrand has begun a useful survey of Text-based Search on the Semantic Web, which actually is a broader survey of text search and indexing systems applied to this area.
The survey is a natural complement to those maintained, for example, by the W3C's ESW wiki or my own Sweet Tools.
The survey covers about 30 apps, fewer than in Sweet Tools, but with much greater detail in two tables including functionality, features, uses, search syntax supported, interfaces, paper references, applications, storage type and literal indexing engine.
Virtuoso and Related Tools String Together a Surprisingly Complete Score
NOTE: This is a lengthy post. |
One of my objectives in starting the comprehensive Sweet Tools listing of 500+ semantic Web and -related tools was to surface hidden gems deserving of more attention. Partially to that end I created this AI3 blog’s Jewels and Doubloons award to recognize such treasures among the many offerings. The idea was and is that there is much of value out there; all we need to do is spend the time to find it.
I am now pleased to help better expose one of those treasures — in fact, a whole string of pearls — from OpenLink Software.
| The semantic Web through its initial structured Web expression is not about to come — it has arrived. Mark your calendar. And OpenLink’s offerings are a key enabler of this milestone. |
Though having been in existence since 1992 as a provider of ODBC middleware and then virtual databases, since at least 2003 OpenLink has been active in the semantic Web space [1]. The company was an early supporter of open source beginning with iODBC in 1999. But OpenLink’s decision to release its main product lines under dual-license open source in mid-2006 really marked a significant milestone for semantic Web toolsets. Though generally known mostly to the technology cognoscenti and innovators, these events now poise OpenLink for much higher visibility and prominence in the structured Web explosion now unfolding.
OpenLink’s current tools and technologies span the entire spectrum from RDF browsing and structure extraction to create RDF-enabled Web content, to format conversions and basic middleware operations, to ultimate data storage, query and management. (It is also important to emphasize that the company’s offerings support virtual databases, procedure hosting, and general SQL and XML data systems that provide value independent of RDF or the semantic Web.) There are numerous alternatives in other semantic Web tools at specific spots across this spectrum, most of which are also open source [2].
What sets OpenLink apart is the breadth and consistency of vision — correct in my view — that governs the company’s commitment and total offerings. It is this breadth of technology and responsiveness to emerging needs in the structured Web that signals to me that OpenLink will be a deserving player in this space for many years to come [3].
An Overview of OpenLink Products [4]
OpenLink provides a combination of open source and commercial software and associated services. While most attention in the remainder of this piece is given to its open source offerings, the company’s longevity obviously stems from its commercial success. And that success first built from its:
This basis in data federation and interoperability then led to a constant evolution to more complex data integration and application needs, resulting in major expansion of the company’s product lines:
| OpenLink Software Demos |
OpenLink has a number of useful (and impressive) online demos of capabilities. Here are a few of them, with instructions (if appropriate) on how best to run them:
|
Relation to the Overall Semantic Web
One remarkable aspect of the OpenLink portfolio is its support for nearly the full spectrum of semantic Web requirements. Without the need to consider tools from any other company or entity, it is now possible to develop, test and deploy real semantic Web instantiations today, and with no direct software cost. Sure, some of these pieces are rougher than others, and some are likely not the “best” as narrowly defined, and there remain some notable and meaningful gaps. (After all, the semantic Web and its supporting tools will be a challenge for years to come.) But it is telling about the actual status of the semantic Web and its readiness for serious attention that most of the tools to construct a meaningful semantic Web deployment are available today — from a single supplier [5].
We can see the role and placement of OpenLink tools within the broader context of the overall semantic Web. The following diagram represents the W3C’s overall development roadmap (as of early 2006), with the roles played by OpenLink components shown in the opaque red ovals. While this is not necessarily the diagram I would personally draw to represent the current structured Web, it does represent consensus of key players and does represent a comprehensive view of the space. By this measure, OpenLink components contribute in many, many areas and across the full spectrum of the semantic Web architecture (note the full-size diagram is quite large):
This status is no accident. It can be argued that OpenLink’s strong background in structured data, data federation and virtual databases and ODBC were excellent foundations. The company has also been focused on semantic Web issues and RDF support since at least 2003; we are now seeing the obvious fruits of these years of effort. As well, the early roots for some of the basic technology and approaches taken by the company extend back to Lisp and various AI (artificial intelligence) interests. One could also argue that perhaps a certain mindset comes from a self-defined role in “middleware” that lends itself to connecting the dots. And, finally, a vision and a belief in the centrality of data as an organizing principal for the next-generation Web has infused this company’s efforts from the beginning.
This placing of the scope of OpenLink’s offerings in context now allows us to examine some of the company’s individual “pearls” in more depth.
In Focus: Bringing Structure to the Web
No matter how much elegance and consistency of logic some may want to bring to the idea of the semantic Web, the fact will always remain that the medium is chaotic with multiple standards, motivations and statuses of development at work at any point in time. Do you want a consistent format, a consistent schema, a consistent ontology or world view? Forget it. It isn’t going to happen unless you can force it internally or with your suppliers (if you are a hegemonic enterprise, and, likely, not even then).
This real-world context has been a challenge for some semantic Web advocates. (I find it telling and amusing that the standard phrase for real-world applications is “in the wild”.) Many come from an academic background and naturally gravitate to theory or parsimony. Actually, there is nothing wrong with this; quite the opposite. What all of us now enjoy with the Internet would not have come about without the brilliance of understanding of simple and direct standards that early leaders brought to bear. My own mental image is that standards provide the navigation point on the horizon even if there is much tacking through the wind to get there.
But as theory meets practice the next layer of innovation comes from the experienced realists. And it is here that OpenLink (plus many others who recognize such things) provide real benefit. A commitment to data, to data federation, and an understanding of real world data formats and schemas “in the wild” naturally lead to a predilection to conversion and transformation. It may not be grand theory, but real practitioners are the ones who will lead the next phases with prejudices more to workability through such things as “pipeline” models or providing working code [6].
I’ve spoken previously about how RDF has emerged as the canonical storage form for the structured and semantic Web; I’m not sure this was self-evident to many observers up until recently. But it was evident to the folks at OpenLink, and — given their experience in heterogeneous data formats — they acted upon it.
Today, out of the box, you can translate the following formats and schema using OpenLink’s software alone. My guess is that the breadth of the table below would be a surprise to many current semantic Web researchers:
| Accepted Data Formats / Schemas | Query / Access / Transport Protocols | Output Formats |
|
|
(Note, some of these items may reside in multiple categories and thus have been somewhat arbitrarily placed.)
In addition, OpenLink is developing or in beta with these additional formats, application sources or standards: Triplr, Jena, WordPress, Drupal, Zotero (with its support for major citation schemas such as CSL, COinS, etc.), Relax NG, phpBB, MediaWiki, XBRL, and eCRM.
A critical piece in these various transformations is the new Virtuoso Sponger, formally released in version 5.0 (though portions had been in use for quite some time). Depending on the file or format type detected at ingest, Sponger may apply metadata extractors to binary files (a multitude of built-in extractors, Open Office, images, audio, video, etc., plus an API to add new ones), “cartridges” for mapping REST-style Web services and query languages (see above), or cartridges for mapping microformats or metadata or structure embedded in HTML (basic XHTML, eRDF, RDFa, GRDDL profiles, or other sources suitable for XSLT). There is also a UI for simply adding your own cartridges via the Virtuoso Conductor, the system administration console for Virtuoso.
Detection occurs at the time of content negotiation instigated by the retrieval user agent. Sponger first tests for RDF (including N3 or Turtle), then scans for microformats or GRDDL. (If it is GRDDL-based the associated XSLT is used; otherwise Virtuoso’s built-in XSLT processors are used. If it is a microformat, Virtuoso uses its own XSLT to transform and map to RDF.) The next fallback is scanning of the HTML header for different Web 2.0 types or RSS 1.1, RSS 2.0, Atom, etc. In these cases, the format (if supported) is transformed to RDF on the fly using other built-in XSLT processors (via an internal table that associates data sources with XSLT similar to GRDDL patterns but without the dependency on GRRDL profiles). Failing those tests, the scan then uses standard Web 1.0 rules to search in the header tags for metadata (typically Dublin Core) and transform to RDF; other HTTP response header data may also be transformed to RDF.
“Transform” as used above includes the fact that OpenLink generates RDF based on an internal mapping table that associates the data sources with schemas and ontologies. This mapping will vary depending on if you are using Virtuoso with or without the ODS layer (see below). If using the ODS layer, OpenLink maps further to SIOC, SKOS, FOAF, AtomOWL, Annotea bookmarks, Annotea annotations, etc.. depending on the data source [7]. Getting all “scrubbed” or “sponged” data from these Web sources into RDF enables fast and easy mash-ups and, of course, an appropriate canonical form for querying and inference. Sponger is also fully extensible.
The result of these capabilities is simple: from any URL, it is now possible to get a minimum of RDF, and often quite rich RDF. The bridge is now made between the Web and the structured Web. Though we are now seeing such transformation utilities or so-called “RDFizers” rapidly emerge, none to my knowledge offer the breadth of formats, relation to ontology structure, or ease of integration and extension as do these Sponger capabilities from OpenLink.
In Focus: OAT and Tools
Though the youngest of the major product releases from Open Link, OAT — the OpenLink Ajax Toolkit — is also one of the most accessible and certainly flashiest of the company’s offerings. OAT is a JavaScript-based toolkit for browser-independent rich-Internet application development. It includes a robust set of standalone applications, database connectivity utilities, complete and supplemental UI (user interface) widgets and controls, an event management system, and supporting libraries. It works on a broad range of Ajax-capable web browsers in a standalone mode, and has notable on-demand library loading to reduce the total amount of downloaded JavaScript code.
OAT is provided as open source under the GPL license, and was first released in August 2006. It is now at version OAT 1.2 with the most recent release including integration of the iSPARQL QBE (see below) into the OAT Form Designer application. The project homepage is found at http://sourceforge.net/projects/oat; source code may be downloaded from http://sourceforge.net/projects/oat/files.
OAT is one of the first toolkits that fully conforms to the OpenAjax Alliance standards; see the conformance test page. OAT’s development team, led by Ondrej Zara, also recently incorporated the OAT data access layer as a module to the Dojo datastore.
OpenLink Ajax Toolkit (OAT) Overview
There are many Ajax toolkits, some with robust widget sets. What really sets OAT apart is its data access with breadth. This is very much in keeping with OpenLink’s data federation and middleware strengths. OAT’s Ajax database connectivity layer supports data binding to the following data source types:
Most all of the OAT components are data aware, and thus have the same broad data format flexibilities.The table below shows the full listing of OAT components, with live links to their online demos, selectable from a expanding menu, which itself is an example of the OAT Panelbar (menu) (Notes: use demo / demo for username and password if requested; pick “DSN=Local_Instance” if presented with a dropdown that asks for ‘Connection String’):
|
Standalone Applications
Forms Designer [see below]
DB Designer [see below]
SQL QBE [see below]
iSPARQL [see below]
RDF Browser [see below]
|
Libraries
Ajax DB Connectivity
Mapping (with support for Google, Yahoo!, OpenLayers, Microsoft Visual Earth)
|
Complete Widgets
|
Supplemental Widgets
Date Picker (calendar)
|
Not all of these components are supported on all major browsers; please refer to the OpenLink OAT compatibility chart for any restrictions.
The following screen shot shows one of the OAT complete widgets, the pie charting component:
One of the more complete widgets is the pivot table, which offers general Excel-level functionality including sorts, totals and subtotals, column and row switching, and the like:
Another one of the more interesting controls is the WebDAV Browser. WebDAV extends the basic HTTP protocols to allow moves, copies, file accesses and so forth on remote servers and in support of multiple authoring and versioning environments. OAT’s WebDAV Browser provides a virtual file navigation system across one or more Web-connected servers. Any resource that can be designated by a URI/IRI (the Internationalized Resource Identifier is a Unicode generalization of the Uniform Resource Identifier, or URI) can be organized and managed via this browser:
Again, there are online demos for each of the other standard widgets in the OAT toolkit.
The next subsections cover each of the major standalone applications contained in OAT; most are themselves combinations of multiple OAT components and widgets.
Forms Designer
The Forms Designer is the major UI design component within OAT. The full range of OAT widgets noted above — plus more basic controls such as labels, inputs, text areas, checkboxes, lines, URLs, images, tag clouds and others — can be placed by the designer into a savable and reusable composition. Links to data sources can be specified in any accessible network location with a choice of SQL, SPARQL or GData (and the various data forms associated with them as noted above).
The basic operation of the Forms Designer is akin to other standard RAD-type tools; the screenshot below shows a somewhat complicated data form pop-up overlaid on a map control (there is also a screencast showing this designer’s basic operations):
When completed, this design produces the following result. Another screencast shows using the composite mash-up in action:
Because of the common data representation at the core of OAT (which comes from the OpenLink vision), the ease of creating these mash-ups is phenomenal, the simplicity of which can be missed when viewing static examples. The real benefits actually become apparent when you make these mash-ups on your own.
Database (DB) Designer
The OAT Database (DB) Designer (also the basis for the Data Modeller) addresses a very different problem: laying out and designing a full database, including table structures and schemas. This is one of the truly unique applications within OAT, not shared (to my knowledge) by any other Ajax toolkit.
One starts either by importing a schema using XMLA or begins from scratch or from an existing template. Tables can be easily added, named and modified. As fields are added, it is easy to select data types, which also become color coded based on type on the design palette. Key relations between the tables are shown automatically:
The result is a very easy, interactive data design environment. Of course, once completed, there are numerous options for saving or committing to actual data population.
SQL Query by Example
Once created and populated, the database is now available for querying and display. The SQL QBE (query-by-example) application within OAT is a fairly standard QBE interface. That may be a comfort to those used to such tools, but it has a more important role in being an exemplar for moving to RDF queries with SPARQL (see next):
OpenLink also provides a screencast for how to make these table linkages and to modify the results display with a now-common SQL format.
iSPARQL Query Builder
OpenLink’s analog to QBE for SQL applied to RDF is its visual iSPARQL Query Builder (SVG-based, which of course is also an XML format). We are now dealing with graphs, not tables. And we are dealing with SPARQL and triples, not standard two-dimensional relational constructs. Some might claim that with triples we are now challenging the ability of standard users to grasp semantic Web concepts. Well, I say, yeah, OK, but I really don’t see how a relational table and schema with its lousy names is any easier than looking at a graph with URLs as the labeled nodes. Look at the following and judge for yourself:
OK, it looks pretty foreign. (Therefore, this is the one demo you should really try.) In my opinion, the community is still groping for constructs, visuals and paradigms to make the use of SPARQL intuitive. Though OpenLink has a presentation as good as anyone’s, the challenge of finding a better paradigm remains out there.
RDF Browser
Direct inspection of RDF is also hard to describe and even harder to present visually. What does it mean to have data represented as triples that enables such things such as graph (network) traversals or inference [8]? The broad approach that OpenLink takes for viewing RDF is via what it calls the RDF Browser. Actually, the browser presents a variety of RDF views, some in mash-up mode. The first view in the RDF Browser is for basic triples:
This particular example is quite rich, with nearly 4000 triples with many categories and structure (see full size view). Note the tabs in the middle of the screen; these are where the other views below are selected.
For example, that same information can also be shown in standard RDF “triples” view (subject-object-predicate):
The basic RDF structural relationships can also be shown in a graph view, itself with many settings for manipulating and viewing the data, with mouseovers showing the content of each graph node:
The data can also be used for “mash-ups”; in this case, plotting the subject’s location on a map:
Or, another mash-up shows the dates of his blog postings on a timeline:
Since the RDF Browser uses SVG, some of these views may be incompatible with IE (6 or 7) and Safari. Firefox (1.5+), Opera 9.x, WebKit (Open Source Safari), and Camino all work well.
These are early views about how to present such “linked” information. Of course, the usability of such interfaces is important and can make or break user acceptance (again, see [8]). OpenLink — and all of us in the space — has the challenge to do better.
Yet what I find most telling about this current OpenLink story is that the data and its RDF structure now fully exist to manipulate this data in meaningful ways. The mash-ups are as easy as A connects to B. The only step necessary to get all of this data working within these various views is simply to point the RDF Browser to the starting Web site’s URL. Now that is easy!
So,we are not fully there in terms of compelling presentation. But in terms of the structured Web, we are already there in surprising measure to do meaningful things with Web data that began as unstructured raw grist.
In Focus: Virtuoso
OK, so we just got some sizzle; now it is time for the steak. OpenLink Virtuoso is the core component of the company’s offerings. The diagram at the top of this write-up provides an architectural overview of this product.
Virtuoso can best be described as a complete deployment environment with its own broad data storage and indexing engine; a virtual database server for interacting with all leading data types, third-party database management systems (DBMSs) and Internet “endpoints” (data-access points); and a virtual Web and application server for hosting its own and external applications written in other leading languages. To my knowledge, this “universal server” is the first cross-platform product that implements Web, file, and database server functionality in a single package.
The Virtuoso architecture exposes modular tools that can be strung together in a very versatile information-processing pipeline. Via the huge variety of structure and data format transformations that the product supports (see above), the developer only need worry about interacting with Virtuoso’s canonical formats and APIs. The messy details of real-world diversities and heterogeneities are largely abstracted from view.
Data and application interactions occur through the system’s virtual or federated database server. This core provides internal storage and application facilities, the ability to transparently expose tables and views from external databases, and the capability of exposing external application logic in a homogeneous way [9]. The variety of data sources supported by Virtuoso can be efficiently joined in any number of ways in order to provide a cohesive view of disparate data from virtually any source and in any form.
Since storage is supported for unstructured, semi-structured and structured data in its various forms, applications and users have a choice of retrieval and query constructs. Free-text searching is provided for unstructured data, conventional documents and literal objects within RDF triples; SQL is provided for conventional structured data; and SPARQL is provided for RDF and -related graph data. These forms are also supplemented with a variety of Web service protocols for retrievals across the network.
The major functional components within Virtuoso are thus:
Virtuoso provides its own scripting language (VSP, similar to Microsoft’s ASP) and Web application scripting language (VSPX, similar to Microsoft’s ASPX or PHP) [12]. Virtuoso Conductor is an accompanying system administrator’s console.
Virtuoso currently runs on Windows (XP/2000/2003), Linux (Redhat, Suse) Mac OS X, FreeBSD, Solaris, and other UNIX variants (including AIX and HP-UX and 32- and 64-bit platforms). Exclusive of documentation, a typical install of Virtuoso application code is about 100 MB (with help, documentation and examples, about 300 MB).
The development of Virtuoso first began in 1998. A major re-write with its virtual aspects occurred in 2001. WebDAV was added in early 2004, and RDF support with release of an open source version in 2006. Additional description of the Virtuoso history is available, plus a basic product FAQ and comprehensive online documentation. The most recent version 5.0 was released in April 2007.
RDF and SPARQL Enhancements
For the purposes of the structured Web and the semantic Web, Virtuoso’s addition of RDF and SPARQL support in early 2006 was probably the product’s most important milestone. This update required an expansion of Virtuoso’s native database design to accommodate RDF triples and a mapping and transformation of SPARQL syntax to Virtuoso’s native SQL engine.
For those more technically inclined, OpenLink offers an online paper, Implementing a SPARQL Compliant RDF Triple Store using a SQL-ORDBMS, that provides the details of how RDF and its graph IRI structure were superimposed on a conventional relational table design [10].
Virtuoso’s approach adds the IRI as a built-in and distinct data type. Virtuoso’s ANY type then allows for a single-column representation of a triple’s object (o). Since the graph (g), subject (s) and predicate (p) are always IRIs, they are declared as such. Since an ANY value is a valid key part with a well-defined collation order between non-comparable data types, indices can be built using the object (o) of the triple.
While text indexing is not directly supported in SPARQL, Virtuoso easily added it as an option for selected objects. Virtuoso also adds other SPARQL extensions to deal both with the mapping and transformation to native SQL and for other requirements. Though type cast rules of SPARQL and SQL differ, Virtuoso deals with this by offering a special SQL compiler directive that enables efficient SPARQL translation without introducing needless extra type tests. Virtuoso also extends SPARQL with SQL-like aggregate and group by functionality. With respect to storage, Virtuoso allows a mix of storage options for graphs in a single table or graph-specific tables; in some cases, the graph component does not have to be written in the table at all. Work on a system for declaring storage formats per graph is ongoing.
OpenLink provides an entire section on its RDF and SPARQL implementation in Virtuoso. In addition, there is an interactive SPARQL demo showing these capabilities at http://demo.openlinksw.com/isparql/.
Open Source Version and Differences
OpenLink released Virtuoso in an open source edition in mid-2006. According to Jon Udell at that time, to have “Virtuoso arrive on the scene as an open source implementation of a full-fledged SQL/XML hybrid out into the wild is a huge step because there just hasn’t been anything like that.” And, of course, now with RDF support and the Sponger, the open source uniqueness and advantages (especially for the semantic Web) are even greater.
Virtuoso’s open source home page is at http://virtuoso.openlinksw.com/wiki/main/ with downloads available from http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload.
Virtually all aspects of Virtuoso as described in this paper are available as open source. The one major component found in the commercial version, but not in the open source version, is the virtual database federation at the back-end, wherein a single call can access multiple database sources. This is likely extremely important to larger enterprises, but can be overcome in a Web setting via alternative use of inbound Web services or accessing multiple Internet “endpoints.”
Latest Release
The April 2007 version 5.0 release of Virtuoso demonstrates OpenLink’s commitment to continued improvements, especially in the area of the structured Web. There was a re-factoring of the database engine resulting in significant performance improvements. In addition, RDF-related enhancements included:
In Focus: Open Data Spaces (ODS)
With the emergence of Web 2.0 and its many collaborative frameworks such as blogs, wikis and discussion forums, OpenLink came to realize that each of these locations was a “data space” of meaningful personal and community content, but that the systems were isolated “silos.” Each space was unconnected to other spaces, the protocols and standards for communicating between the spaces were fractured or non-existent, and each had its own organizing framework, means for interacting with it, and separate schema (if it had one at all).
Since the Virtuoso platform itself was designed to overcome such differences, it became clear that an application layer could be placed over the core Virtuoso system that would break down the barriers to these siloed “data spaces,” while at the same time providing all varieties of spaces similiar functionality and standards. Thus was started what became the OpenLink Data Space (or ODS) collaboration application, first released in open source in late 2006.
ODS is now provided as a packaged solution (in both open source and commercial versions) for use on either the Web or behind corporate firewalls, with security and single sign-on (SSO) capabilities. Mitko Iliev is OpenLink’s ODS program manager.
ODS is highly configurable and customizable. ODS has ten or so basic components, which can be included or not on a deployment-wide level, or by community, or by individual. Customizing the look and feel of ODS is also quite easy via CSS and the simple Virtuoso Web application scripting language, VSPX. (See here for examples of the language, which is based on XML.) A sample intro screen for ODS, largely as configured out of the box, is shown by this diagram:
Note the listing of ODS components on the menu bar across the top. These baseline ODS components are:
Here, for example, is the “standard” screen (again, modifiable if desired) when accessing an individual ODS community:
And, here is another example, this time showing the “standard” photo image gallery:
You can check out for yourself these various integrated components by going to http://myopenlink.net/ods/sfront.vspx.
There are subtle, but powerful, consistencies underlying this suite of components that I am amazed more people don’t talk about. While each individual component performs and has functionality quite similar to its standalone brethren, it is the shared functionality and common adherence to standards across all ODS components where the app really shines. All ODS components, for example, share these capabilities where it natively makes sense to the individual component:
This is a very powerful list, and it doesn’t end there. OpenLink has an API to extend ODS, which has not yet been published, but will likely be so in the near future.
So, with the use of ODS, you can immediately become “semantic Web-ready” in all of your postings and online activities. Or, as an enterprise or community leader, you can immediately connect the dots and remove artificial format, syntax and semantic barriers between virtually all online activities by your constituents. I mean, we’re talking here about hurdles that generally appear daunting, if not unsurmountable, that can be downloaded, installed and used today. Hello!?
Finally, OpenLink Data Spaces are provided as a standard inclusion with the open source Virtuoso.
Performance and Interoperability
I mean, what can one say? With all of this scope and power, why aren’t more people using OpenLink’s software? Why doesn’t the world better know about these capabilities? Is there some hidden flaw in all of this due to lack of performance or interoperability or something else?
I think it’s worth decomposing these questions from a number of perspectives because of what it may say about the state of the structured Web.
First, by way of disclaimer, I have not benchmarked OpenLink’s software v. other applications. There are tremendous offerings from other entities, many on my “to do” list for detailed coverage and perhaps hosannas. Some of those alternatives may be superior or better fits in certain environments.
Second, though, OpenLink is an experienced middleware vendor with data access and interoperability as its primary business. As a relatively small player, it can only compete with the big boys based on performance, responsiveness and/or cost. OpenLink understands this better than I; indeed, the company has both a legacy and reputation for being focused on benchmarks and performance testing.
I believe it is telling that the company has been an active developer and participant in performance testing standards, and is assiduous in posting performance results for its various offerings. This speaks to intellectual and technical integrity. For example, Orri Erling, the OpenLink Virtuoso program manager and lead developer, posts on a frequent basis on both how to improve standards and how OpenLink’s current releases stand up to them. The company builds and releases benchmarking utilities in open source. OpenLink is an active participant in the THALIA data integration benchmarking initiative and the LUBM semantic Web repository benchmark. While RDF performance and benchmarks are obviously still evolving, I think we can readily dismiss performance concerns as a factor limiting early uptake [11].
Third, as for functionality or scope, I honestly can not point to any alternative that spans as broad of a spectrum of structured Web requirements than does OpenLink. Further, the sheer scope of data formats and schemas that OpenLink transforms to RDF is the broadest I know. Indeed, breadth and scope are real technical differentiators for OpenLink and Virtuoso.
Fourth, while back-end operability with other triple stores such as Sesame, Jena or Mulgara/Kowari is not yet active with Virtuoso, there apparently has not yet been a demand for it, the ability to add it is clearly within Open Link’s virtual database and data federation strengths, and no other option — commercial or open source — currently does so. We’re at the cusp of real interoperability demands, but are not yet quite there.
And, last, and this is very recent, we are only now beginning to see large and meaningful RDF datasets begin to appear for which these capabilities are naturally associated. Many have been active in these new announcements. OpenLink has taken an enlightened self-interest role in its active participation in the Open Data movement and in its strong support for the Linking Open Data effort of the SWEO (Semantic Web Outreach and Education) community of the W3C.
So, via a confluence of many threads intersecting at once I think we have the answer: The time was not ripe until now.
What we are seeing at this precise moment is the very forming of the structured Web, the first wave of the semantic Web visible to the general public. Offerings such as OpenLink’s with real and meaningful capabilities are being released; large structured datastores such as DBpedia and Freebase have just been announced; and the pundit community is just beginning to evangelize this new market and new phase for the Web. Though some prognosticators earlier pointed to 2007 as the breakthrough year for the semantic Web, I actually think that moment is now. (Did you blink?)
The semantic Web through its initial structured Web expression is not about to come — it has arrived. Mark your calendar. And OpenLink’s offerings are a key enabler of this most significant Internet milestone.
A Deserving Jewels and Doubloons Winner
OpenLink, its visionary head, Kingsley Idehen, and full team are providing real leadership, services and tools to make the semantic Web a reality. Providing such a wide range of capable tools, middleware and RDF database technology as open source means that many of the prior challenges of linking together a working production pipeline have now been met and are affordable. For these reasons, I gladly announce OpenLink Software as a deserving winner of the (highly coveted, but still cheesy!
) AI3 Jewels & Doubloons award.
![]() |
An AI3 Jewels & Doubloons Winner |
[1] A general history of OpenLink Software and the Virtuoso product may be found at http://virtuoso.openlinksw.com/wiki/main/Main/VOSHistory.
[2] Most, if not all, of my reviews and testing in this AI3 blog focus on open source. That is because of the large percentage of such tools in the semantic Web space, plus the fact that I can install and test all comparable alternatives locally. Thus, there may indeed be better performing or more functional commercial software than what I typically cover.
[3] In terms of disclosure, I have no formal relationship with OpenLink Software, though I do informally participate with some OpenLink staff on some open source and open data groups of mutual interest. I am also actively engaged in testing the company’s products in relation to other products for my own purposes.
[4] An informative introduction to OpenLink and its CEO and founder, Kingsley Idehen, can be found in this April 28, 2006 podcast with Jon Udell, http://weblog.infoworld.com/udell/2006/04/28.html#a1437. Other useful OpenLink screencasts include:
[5] I am not suggesting that single-supplier solutions are unqualifiably better. Some components will be missing, and other third-party components may be superior. But it is also likely the case that a first, initial deployment can be up and running faster the fewer the interfaces that need to be reconciled.
[6] I have to say that Stefano Mazzocchi comes to mind with efforts such as Apache Cocoon and “RDFizers”.
[7] Thus, with ODS you get the additional benefit of having SIOC act as a kind of “gluing” ontology that provides a generic data model of Containers, Items, Item Types, and Associations / Relationships between Items). This approach ends up using RDF (via SIOC) to produce the instance data that makes the model conceptually concrete. (According to OpenLink, this is similar to what Apple formerly used in a closed form in the days of EOF and what Microsoft is grappling with ADO.net; it is also a similar model to what is used by Freebase, Dabble, Ning, Google Base, eBay, Amazon, Facebook, Flickr, and others who use Web services in combination or not with proprietary languages to expose data. Dave Beckett’s recently announced Triplr works in a similar manner.) This similarity of approach is why OpenLink can so readily produce RDF data from such services.
[8] I think these not-yet-fully formed constructs for interacting with RDF and using SPARQL are a legacy from the early semantic Web emphasis on “machines talking to machines via data”. While that remains a goal since it will eventually lead to autonomous agents that work in the background serving our interests, it is clear that we humans need to get intimately involved in inspecting, transforming and using the data directly. It is really only in the past year or three that workable user interfaces for these issues have even become a focus of attention. Paradoxically, I believe that sufficient semantic enabling of Web data to support a vision of intelligent and autonomous agents will only occur after we have innovated fun and intuitive ways for us humans to interact and manipulate that data. Whooping it up in the data playpen will also provide tremendous emphasis to bring still-further structure to Web content. These are actually the twin challenges of the structured Web.
[9] Stored procedures first innovated for SQL have been abstracted to support applications written in most of the leading programming languages. Virtuoso presently supports stored procedures or classes for SQL, Java, .Net (Mono), PERL, Python and Ruby.
[10] Many relational databases have been used for storing RDF triples and graphs. Also dedicated non-relational approaches, such as using bitmap indices as primary storage medium for triples, have been implemented. At present, there is no industry consensus on what constitutes the optimum storage format and set of indices for RDF.
[11] The W3C has a very informative article on RDF and traditional SQL databases; also, there is a running tally of large triple store scaling.
[12] Thanks to Ted Thibodeau, Jr, of the OpenLink staff, “This is basically true, but incomplete. In its Application Server role, Virtuoso can host runtime environments for PHP, Perl, Python, JSP, CLR, and others — including ASP and ASPX. Some of these are built-in; some are handled by dynamically loading local library resources. One of the bits of magic here — IIS is not required for ASP or ASPX hosting. Depending on the functionality used in the pages in question, Windows may not be necessary, either. CLR hosting may mandate Windows and Microsoft's .NET Frameworks, because Mono remains an incomplete implementation of the CLI.” Very cool.

If Everyone Could Find These Tools, We’d All be Better Off
About a month ago I announced my
Jewels & Doubloons awards for innovative software tools and developments, most often ones that may be somewhat obscure. In the announcement, I noted that our modern open-source software environment is:
“… literally strewn with jewels, pearls and doubloons — tremendous riches based on work that has come before — and all we have to do is take the time to look, bend over, investigate and pocket those riches.”
That entry begged the question of why this value is often overlooked in the first place. If we know it exists, why do we continue to miss it?
The answers to this simple question are surprisingly complex. The question is one I have given much thought to, since the benefits from building off of existing foundations are manifest. I think the reasons for why we so often miss these valuable, and often obscure, tools of our profession range from ones of habit and culture to weaknesses in today’s search. I will take this blog’s Sweet Tools listing of 500 semantic Web and -related tools as an illustrative endpoint of what a complete listing of such tools in a given domain might look like (not all of which are jewels, of course!), including the difficulties of finding and assembling such listings. Here are some reasons:
Search Sucks — A Clarion Call for Semantic Search
I recall late in 1998 when I abandoned my then-favorite search engine, AltaVista, for the new Google kid on the block and its powerful Page Ranking innovation. But that was tens of billions of documents ago, and I now find all the major search engines to again be suffering from poor results and information overload.
Using the context of Sweet Tools, let’s pose some keyword searches in an attempt to find one of the specific annotation tools in that listing, Annozilla. We’ll also assume we don’t know the name of the product (otherwise, why search?). We’ll also use multiple search terms, and since we know there are multiple tool types in our category, we will also search by sub-categories.
In a first attempt using annotation software mozilla, we do not find Annozilla in the first 100 results. We try adding more terms, such as annotation software mozilla “semantic web”, and again it is not in the first 100 results.
Of course, this is a common problem with keyword searches when specific terms or terms of art may not be known or when there are many variants. However, even if we happened to stumble upon one specific phrase used to describe Annozilla, “web annotation tool”, while we do now get a Google result at about position #70, it is also not for the specific home site of interest:

Now, we could have entered annozilla as our search term, assuming somehow we now knew it as a product name, which does result in getting the target home page as result #1. But, because of automatic summarization and choices made by the home site, even that description is also a bit unclear as to whether this is a tool or not:

Alternatively, had we known more, we could have searched on Annotea Mozilla and gotten pretty good results, since that is what Annozilla is, but that presumes a knowledge of the product we lack.
Standard search engines actually now work pretty well in helping to find stuff for which you already know a lot, such as the product or company name. It is when you don’t know these things that the weaknesses of conventional search are so evident.
Frankly, were our content to be specified by very minor amounts of structure (often referred to as “facets”) such as product and category, we could cut through this clutter quickly and get to the results we wanted. Better still, if we could also specify only listings added since some prior date, we could also limit our inspections to new tools since our last investigation. It is this type of structure that characterizes the lightweight Exhibit database and publication framework underlying Sweet Tools itself, as its listing for Annozilla shows:

The limitations of current unstructured search grow daily as Internet content volumes grows.
We Don’t Know Where to Look
The lack of semantic search also relates to the problem of not knowing where to look, and derives from the losing trade-offs of keywords v. semantics and popularity v. authoritativeness. If, for example, you look for Sweet Tools on Google using “semantic web” tools, you will find that the Sweet Tools listing only appears at position #11 with a dated listing, even though arguably it has the most authoritative listing available. This is because there are more popular sites than the AI3 site, Google tends to cluster multiple site results using the most popular — and generally, therefore, older (250 v. 500 tools in this case!) — page for that given site, and the blog title is used in preference to the posting title:

Semantics are another issue. It is important, in part, because you might enter the search term product or products or software or applications, rather than ‘tools‘, which is the standard description for the Sweet Tools site. The current state of keyword search is to sometimes allow plural and single variants, but not synonyms or semantic variants. The searcher must thus frame multiple queries to cover all reasonable prospects. (If this general problem is framed as one of the semantics for all possible keywords and all possible content, it appears quite large. But remember, with facets and structure it is really those dimensions that best need semantic relationships — a more tractable problem than the entire content.)
We Don’t Have Time
Faced with these standard search limits, it is easy to claim that repeated searches and the time involved are not worth the effort. And, even if somehow we could find those obscure candidate tools that may help us better do our jobs, we still need to evaluate them and modify them for our specific purposes. So, as many claim, these efforts are not worth our time. Just give me a clean piece of paper and let me design what we need from scratch. But this argument is total bullpucky.
Yes, search is not as efficient as it should be, but our jobs involve information, and finding it is one of our essential work skills. Learn how to search effectively.
The time spent in evaluating leading candidates is also time well spent. Studying code is one way to discern a programming answer. Absent such evaluation, how does one even craft a coded solution? No matter how you define it, anything but the most routine coding tasks requires study and evaluation. Why not use existing projects as the learning basis, in addition to books and Internet postings? If, in the process, an existing capability is found upon which to base needed efforts, so much the better.
The excuse of not enough time to look for alternatives is, in my experience, one of laziness and attitude, not a measured evaluation of the most effective use of time.
Concern Over the Viral Effects of Certain Open Source Licenses
Enterprises, in particular, have legitimate concerns in the potential “viral” effects of mixing certain open-source licenses such as GPL with licensed proprietary software or internally developed code. Enterprise developers have a professional responsibility to understand such issues.
That being said, my own impression is that many open-source projects understand these concerns and are moving to more enlightened mix-and-match licenses such as Eclipse, Mozilla or Apache. Also, in virtually any given application area, there is also a choice of open-source tools with a diversity of licensing terms. And, finally, even for licenses with commercial restrictions, many tools can still be valuable for internal, non-embedded applications or as sources for code inspection and learning.
Though the license issue is real when it comes to formal deployment and requires understanding of the issues, the fact that some open source projects may have some use limitations is no excuse to not become familiar with the current tools environment.
We Don’t Live in the Right Part of the World
Actually, I used to pooh-pooh the idea that one needed to be in one of the centers of software action — say, Silicon Valley, Boston, Austin, Seattle, Chicago, etc. — in order to be effective and on the cutting edge. But I have come to embrace a more nuanced take on this. There is more action and more innovation taking place in certain places on the globe. It is highly useful for developers to be a part of this culture. General exposure, at work and the watering hole, is a great way to keep abreast of trends and tools.
However, even if you do not work in one of these hotbeds, there are still means to keep current; you just have to work at it a bit harder. First, you can attend relevant meetings. If you live outside of the action, that likely means travel on occasion. Second, you should become involved in relevant open source projects or other dynamic forums. You will find that any time you need to research a new application or coding area, that the greater familiarity you have with the general domain the easier it will be for you to get current quickly.
We Have Not Been Empowered to Look
Dilbert, cubes and big bureaucracies aside, while it may be true that some supervisors are clueless and may not do anything active to support tools research, that is no excuse. Workers may wait until they are “empowered” to take initiative; professionals, in the true sense of the word, take initiative naturally.
Granted, it is easier when an employer provides the time, tools, incentives and rewards for its developers to stay current. Such enlightened management is a hallmark of adaptive and innovative organizations. And it is also the case that if your organization is not supporting research aims, it may be time to get that resumà © up to date and circulated.
But knowledge workers today should also recognize that responsibility for professional development and advancement rests with them. It is likely all of us will work for many employers, perhaps even ourselves, during our careers. It is really not that difficult to find occasional time in the evenings or the weekend to do research and keep current.
If It’s Important, Become an Expert
One of the attractions of software development is the constantly advancing nature of its technology, which is truer than ever today. Technology generations are measured in the range of five to ten years, meaning that throughout an expected professional lifetime of say, about 50 years, you will likely need to remake yourself many times.
The “experts” of each generation generally start from a clean slate and also re-make themselves. How do they do so and become such? Well, they embrace the concept of lifelong learning and understand that expertise is solely a function of commitment and time.
Each transition in a professional career — not to mention needs that arise in-between — requires getting familiar with the tools and techniques of the field. Even if search tools were perfect and some “expert” out there had assembled the best listing of tools available, they can all be better characterized and understood.
It’s Mindset, Melinda!
Actually, look at all of the reasons above. They all are based on the premise that we have completely within our own lights the ability and responsibility to take control of our careers.
In my professional life, which I don’t think is all that unusual, I have been involved in a wide diversity of scientific and technical fields and pursuits, most often at some point being considered an “expert” in a part of that field. The actual excitement comes from the learning and the challenges. If you are committed to what is new and exciting, there is much room for open-field running.
The real malaise to avoid in any given career is to fall into the trap of “not enough time” or “not part of my job.” The real limiter to your profession is not time, it is mindset. And, fortunately, that is totally within your control.
Gathering in the Riches
Since each new generation builds on prior ones, your time spent learning and becoming familiar with the current tools in your field will establish the basis for that next change. If more of us had this attitude, the ability for each of us to leverage whatever already exists would be greatly improved. The riches and rewards are waiting to be gathered.