Posted:March 19, 2007

Jewels & Doubloons II: Why Do We Miss this Stuff?

Jewels & Doubloons

If Everyone Could Find These Tools, We’d All be Better Off

About a month ago I announced my Jewels & Doubloons Jewels & Doubloons awards for innovative software tools and developments, most often ones that may be somewhat obscure. In the announcement, I noted that our modern open-source software environment is:

“… literally strewn with jewels, pearls and doubloons — tremendous riches based on work that has come before — and all we have to do is take the time to look, bend over, investigate and pocket those riches.”

That entry begged the question of why this value is often overlooked in the first place. If we know it exists, why do we continue to miss it?

The answers to this simple question are surprisingly complex. The question is one I have given much thought to, since the benefits from building off of existing foundations are manifest. I think the reasons for why we so often miss these valuable, and often obscure, tools of our profession range from ones of habit and culture to weaknesses in today’s search. I will take this blog’s Sweet Tools listing of 500 semantic Web and -related tools as an illustrative endpoint of what a complete listing of such tools in a given domain might look like (not all of which are jewels, of course!), including the difficulties of finding and assembling such listings. Here are some reasons:

Search Sucks — A Clarion Call for Semantic Search

I recall late in 1998 when I abandoned my then-favorite search engine, AltaVista, for the new Google kid on the block and its powerful Page Ranking innovation. But that was tens of billions of documents ago, and I now find all the major search engines to again be suffering from poor results and information overload.

Using the context of Sweet Tools, let’s pose some keyword searches in an attempt to find one of the specific annotation tools in that listing, Annozilla. We’ll also assume we don’t know the name of the product (otherwise, why search?). We’ll also use multiple search terms, and since we know there are multiple tool types in our category, we will also search by sub-categories.

In a first attempt using annotation software mozilla, we do not find Annozilla in the first 100 results. We try adding more terms, such as annotation software mozilla “semantic web”, and again it is not in the first 100 results.

Of course, this is a common problem with keyword searches when specific terms or terms of art may not be known or when there are many variants. However, even if we happened to stumble upon one specific phrase used to describe Annozilla, “web annotation tool”, while we do now get a Google result at about position #70, it is also not for the specific home site of interest:

Google Results - 1

Now, we could have entered annozilla as our search term, assuming somehow we now knew it as a product name, which does result in getting the target home page as result #1. But, because of automatic summarization and choices made by the home site, even that description is also a bit unclear as to whether this is a tool or not:

Google Results - 2

Alternatively, had we known more, we could have searched on Annotea Mozilla and gotten pretty good results, since that is what Annozilla is, but that presumes a knowledge of the product we lack.

Standard search engines actually now work pretty well in helping to find stuff for which you already know a lot, such as the product or company name. It is when you don’t know these things that the weaknesses of conventional search are so evident.

Frankly, were our content to be specified by very minor amounts of structure (often referred to as “facets”) such as product and category, we could cut through this clutter quickly and get to the results we wanted. Better still, if we could also specify only listings added since some prior date, we could also limit our inspections to new tools since our last investigation. It is this type of structure that characterizes the lightweight Exhibit database and publication framework underlying Sweet Tools itself, as its listing for Annozilla shows:

Structured Results

The limitations of current unstructured search grow daily as Internet content volumes grows.

We Don’t Know Where to Look

The lack of semantic search also relates to the problem of not knowing where to look, and derives from the losing trade-offs of keywords v. semantics and popularity v. authoritativeness. If, for example, you look for Sweet Tools on Google using “semantic web” tools, you will find that the Sweet Tools listing only appears at position #11 with a dated listing, even though arguably it has the most authoritative listing available. This is because there are more popular sites than the AI3 site, Google tends to cluster multiple site results using the most popular — and generally, therefore, older (250 v. 500 tools in this case!) — page for that given site, and the blog title is used in preference to the posting title:

Google Sweet Tools

Semantics are another issue. It is important, in part, because you might enter the search term product or products or software or applications, rather than ‘tools‘, which is the standard description for the Sweet Tools site. The current state of keyword search is to sometimes allow plural and single variants, but not synonyms or semantic variants. The searcher must thus frame multiple queries to cover all reasonable prospects. (If this general problem is framed as one of the semantics for all possible keywords and all possible content, it appears quite large. But remember, with facets and structure it is really those dimensions that best need semantic relationships — a more tractable problem than the entire content.)

We Don’t Have Time

Faced with these standard search limits, it is easy to claim that repeated searches and the time involved are not worth the effort. And, even if somehow we could find those obscure candidate tools that may help us better do our jobs, we still need to evaluate them and modify them for our specific purposes. So, as many claim, these efforts are not worth our time. Just give me a clean piece of paper and let me design what we need from scratch. But this argument is total bullpucky.

Yes, search is not as efficient as it should be, but our jobs involve information, and finding it is one of our essential work skills. Learn how to search effectively.

The time spent in evaluating leading candidates is also time well spent. Studying code is one way to discern a programming answer. Absent such evaluation, how does one even craft a coded solution? No matter how you define it, anything but the most routine coding tasks requires study and evaluation. Why not use existing projects as the learning basis, in addition to books and Internet postings? If, in the process, an existing capability is found upon which to base needed efforts, so much the better.

The excuse of not enough time to look for alternatives is, in my experience, one of laziness and attitude, not a measured evaluation of the most effective use of time.

Concern Over the Viral Effects of Certain Open Source Licenses

Enterprises, in particular, have legitimate concerns in the potential “viral” effects of mixing certain open-source licenses such as GPL with licensed proprietary software or internally developed code. Enterprise developers have a professional responsibility to understand such issues.

That being said, my own impression is that many open-source projects understand these concerns and are moving to more enlightened mix-and-match licenses such as Eclipse, Mozilla or Apache. Also, in virtually any given application area, there is also a choice of open-source tools with a diversity of licensing terms. And, finally, even for licenses with commercial restrictions, many tools can still be valuable for internal, non-embedded applications or as sources for code inspection and learning.

Though the license issue is real when it comes to formal deployment and requires understanding of the issues, the fact that some open source projects may have some use limitations is no excuse to not become familiar with the current tools environment.

We Don’t Live in the Right Part of the World

Actually, I used to pooh-pooh the idea that one needed to be in one of the centers of software action — say, Silicon Valley, Boston, Austin, Seattle, Chicago, etc. — in order to be effective and on the cutting edge. But I have come to embrace a more nuanced take on this. There is more action and more innovation taking place in certain places on the globe. It is highly useful for developers to be a part of this culture. General exposure, at work and the watering hole, is a great way to keep abreast of trends and tools.

However, even if you do not work in one of these hotbeds, there are still means to keep current; you just have to work at it a bit harder. First, you can attend relevant meetings. If you live outside of the action, that likely means travel on occasion. Second, you should become involved in relevant open source projects or other dynamic forums. You will find that any time you need to research a new application or coding area, that the greater familiarity you have with the general domain the easier it will be for you to get current quickly.

We Have Not Been Empowered to Look

Dilbert, cubes and big bureaucracies aside, while it may be true that some supervisors are clueless and may not do anything active to support tools research, that is no excuse. Workers may wait until they are “empowered” to take initiative; professionals, in the true sense of the word, take initiative naturally.

Granted, it is easier when an employer provides the time, tools, incentives and rewards for its developers to stay current. Such enlightened management is a hallmark of adaptive and innovative organizations. And it is also the case that if your organization is not supporting research aims, it may be time to get that resume up to date and circulated.

But knowledge workers today should also recognize that responsibility for professional development and advancement rests with them. It is likely all of us will work for many employers, perhaps even ourselves, during our careers. It is really not that difficult to find occasional time in the evenings or the weekend to do research and keep current.

If It’s Important, Become an Expert

One of the attractions of software development is the constantly advancing nature of its technology, which is truer than ever today. Technology generations are measured in the range of five to ten years, meaning that throughout an expected professional lifetime of say, about 50 years, you will likely need to remake yourself many times.

The “experts” of each generation generally start from a clean slate and also re-make themselves. How do they do so and become such? Well, they embrace the concept of lifelong learning and understand that expertise is solely a function of commitment and time.

Each transition in a professional career — not to mention needs that arise in-between — requires getting familiar with the tools and techniques of the field. Even if search tools were perfect and some “expert” out there had assembled the best listing of tools available, they can all be better characterized and understood.

It’s Mindset, Melinda!

Actually, look at all of the reasons above. They all are based on the premise that we have completely within our own lights the ability and responsibility to take control of our careers.

In my professional life, which I don’t think is all that unusual, I have been involved in a wide diversity of scientific and technical fields and pursuits, most often at some point being considered an “expert” in a part of that field. The actual excitement comes from the learning and the challenges. If you are committed to what is new and exciting, there is much room for open-field running.

The real malaise to avoid in any given career is to fall into the trap of “not enough time” or “not part of my job.” The real limiter to your profession is not time, it is mindset. And, fortunately, that is totally within your control.

Gathering in the Riches

Since each new generation builds on prior ones, your time spent learning and becoming familiar with the current tools in your field will establish the basis for that next change. If more of us had this attitude, the ability for each of us to leverage whatever already exists would be greatly improved. The riches and rewards are waiting to be gathered.

Schema.org Markup

headline:
Jewels & Doubloons II: Why Do We Miss this Stuff?

alternativeHeadline:

author:

image:

description:
If Everyone Could Find These Tools, We’d All be Better Off About a month ago I announced my Jewels & Doubloons awards for innovative software tools and developments, most often ones that may be somewhat obscure. In the announcement, I noted that our modern open-source software environment is: “… literally strewn with jewels, pearls and […]

articleBody:
see above

datePublished:

One thought on “Jewels & Doubloons II: Why Do We Miss this Stuff?

  1. I agree with you that “Enterprise developers have a professional responsibility to understand” licensing issues. However, I cannot agree with your characterization of the GPL in contrast to more “enlightened” licenses.

    many open-source projects . . . are moving to more enlightened mix-and-match licenses such as Eclipse, Mozilla or Apache

    For many who use and promote the GPL, it is the more enlightened license because it guarantees to those who contribute that they will benefit from improvements to their work. In contrast, proprietary software is the more “viral” because combining it with, say, BSD code removes all future improvements from the commons. From a practical perspective, what GPL’s guarantee achieves is more diverse contributions. From Steven Weber’s The Success of Open Source:

    BSD-style projects typically rest with a small team of developers who together write almost all the code for the project. . . . There is nothing to stop an outside user from submitting code to the core team; but in most BSD-style projects, there is no regularized process for doing that. The BSD model is open source because the source code is free. But as an ideal type, it is not vitally collaborative on a very large scale, in the sense that Linux is. (pp. 62-63)

    So while licenses without the so-called “viral” clause may appear more convenient for businesses producing proprietary software, they are not necessarily as effective at attracting contributions – never mind the ethical considerations that motivate many to advocate and produce free software.

    Finally, I find it ironic, given your implication that licenses like the GPL have unwanted “commercial restrictions”, that you use a noncommercial Creative Commons license for this blog post (the GPL explicitly allows commercial use, and Richard Stallman has criticized this restriction in Creative Commons licenses) – yet also includes a “viral” “share-alike” provision just like the GPL.

Leave a Reply

Your email address will not be published. Required fields are marked *