Posted:March 19, 2007

Jewels & Doubloons

If Everyone Could Find These Tools, We’d All be Better Off

About a month ago I announced my Jewels & Doubloons Jewels & Doubloons awards for innovative software tools and developments, most often ones that may be somewhat obscure. In the announcement, I noted that our modern open-source software environment is:

“… literally strewn with jewels, pearls and doubloons — tremendous riches based on work that has come before — and all we have to do is take the time to look, bend over, investigate and pocket those riches.”

That entry begged the question of why this value is often overlooked in the first place. If we know it exists, why do we continue to miss it?

The answers to this simple question are surprisingly complex. The question is one I have given much thought to, since the benefits from building off of existing foundations are manifest. I think the reasons for why we so often miss these valuable, and often obscure, tools of our profession range from ones of habit and culture to weaknesses in today’s search. I will take this blog’s Sweet Tools listing of 500 semantic Web and -related tools as an illustrative endpoint of what a complete listing of such tools in a given domain might look like (not all of which are jewels, of course!), including the difficulties of finding and assembling such listings. Here are some reasons:

Search Sucks — A Clarion Call for Semantic Search

I recall late in 1998 when I abandoned my then-favorite search engine, AltaVista, for the new Google kid on the block and its powerful Page Ranking innovation. But that was tens of billions of documents ago, and I now find all the major search engines to again be suffering from poor results and information overload.

Using the context of Sweet Tools, let’s pose some keyword searches in an attempt to find one of the specific annotation tools in that listing, Annozilla. We’ll also assume we don’t know the name of the product (otherwise, why search?). We’ll also use multiple search terms, and since we know there are multiple tool types in our category, we will also search by sub-categories.

In a first attempt using annotation software mozilla, we do not find Annozilla in the first 100 results. We try adding more terms, such as annotation software mozilla “semantic web”, and again it is not in the first 100 results.

Of course, this is a common problem with keyword searches when specific terms or terms of art may not be known or when there are many variants. However, even if we happened to stumble upon one specific phrase used to describe Annozilla, “web annotation tool”, while we do now get a Google result at about position #70, it is also not for the specific home site of interest:

Google Results - 1

Now, we could have entered annozilla as our search term, assuming somehow we now knew it as a product name, which does result in getting the target home page as result #1. But, because of automatic summarization and choices made by the home site, even that description is also a bit unclear as to whether this is a tool or not:

Google Results - 2

Alternatively, had we known more, we could have searched on Annotea Mozilla and gotten pretty good results, since that is what Annozilla is, but that presumes a knowledge of the product we lack.

Standard search engines actually now work pretty well in helping to find stuff for which you already know a lot, such as the product or company name. It is when you don’t know these things that the weaknesses of conventional search are so evident.

Frankly, were our content to be specified by very minor amounts of structure (often referred to as “facets”) such as product and category, we could cut through this clutter quickly and get to the results we wanted. Better still, if we could also specify only listings added since some prior date, we could also limit our inspections to new tools since our last investigation. It is this type of structure that characterizes the lightweight Exhibit database and publication framework underlying Sweet Tools itself, as its listing for Annozilla shows:

Structured Results

The limitations of current unstructured search grow daily as Internet content volumes grows.

We Don’t Know Where to Look

The lack of semantic search also relates to the problem of not knowing where to look, and derives from the losing trade-offs of keywords v. semantics and popularity v. authoritativeness. If, for example, you look for Sweet Tools on Google using “semantic web” tools, you will find that the Sweet Tools listing only appears at position #11 with a dated listing, even though arguably it has the most authoritative listing available. This is because there are more popular sites than the AI3 site, Google tends to cluster multiple site results using the most popular — and generally, therefore, older (250 v. 500 tools in this case!) — page for that given site, and the blog title is used in preference to the posting title:

Google Sweet Tools

Semantics are another issue. It is important, in part, because you might enter the search term product or products or software or applications, rather than ‘tools‘, which is the standard description for the Sweet Tools site. The current state of keyword search is to sometimes allow plural and single variants, but not synonyms or semantic variants. The searcher must thus frame multiple queries to cover all reasonable prospects. (If this general problem is framed as one of the semantics for all possible keywords and all possible content, it appears quite large. But remember, with facets and structure it is really those dimensions that best need semantic relationships — a more tractable problem than the entire content.)

We Don’t Have Time

Faced with these standard search limits, it is easy to claim that repeated searches and the time involved are not worth the effort. And, even if somehow we could find those obscure candidate tools that may help us better do our jobs, we still need to evaluate them and modify them for our specific purposes. So, as many claim, these efforts are not worth our time. Just give me a clean piece of paper and let me design what we need from scratch. But this argument is total bullpucky.

Yes, search is not as efficient as it should be, but our jobs involve information, and finding it is one of our essential work skills. Learn how to search effectively.

The time spent in evaluating leading candidates is also time well spent. Studying code is one way to discern a programming answer. Absent such evaluation, how does one even craft a coded solution? No matter how you define it, anything but the most routine coding tasks requires study and evaluation. Why not use existing projects as the learning basis, in addition to books and Internet postings? If, in the process, an existing capability is found upon which to base needed efforts, so much the better.

The excuse of not enough time to look for alternatives is, in my experience, one of laziness and attitude, not a measured evaluation of the most effective use of time.

Concern Over the Viral Effects of Certain Open Source Licenses

Enterprises, in particular, have legitimate concerns in the potential “viral” effects of mixing certain open-source licenses such as GPL with licensed proprietary software or internally developed code. Enterprise developers have a professional responsibility to understand such issues.

That being said, my own impression is that many open-source projects understand these concerns and are moving to more enlightened mix-and-match licenses such as Eclipse, Mozilla or Apache. Also, in virtually any given application area, there is also a choice of open-source tools with a diversity of licensing terms. And, finally, even for licenses with commercial restrictions, many tools can still be valuable for internal, non-embedded applications or as sources for code inspection and learning.

Though the license issue is real when it comes to formal deployment and requires understanding of the issues, the fact that some open source projects may have some use limitations is no excuse to not become familiar with the current tools environment.

We Don’t Live in the Right Part of the World

Actually, I used to pooh-pooh the idea that one needed to be in one of the centers of software action — say, Silicon Valley, Boston, Austin, Seattle, Chicago, etc. — in order to be effective and on the cutting edge. But I have come to embrace a more nuanced take on this. There is more action and more innovation taking place in certain places on the globe. It is highly useful for developers to be a part of this culture. General exposure, at work and the watering hole, is a great way to keep abreast of trends and tools.

However, even if you do not work in one of these hotbeds, there are still means to keep current; you just have to work at it a bit harder. First, you can attend relevant meetings. If you live outside of the action, that likely means travel on occasion. Second, you should become involved in relevant open source projects or other dynamic forums. You will find that any time you need to research a new application or coding area, that the greater familiarity you have with the general domain the easier it will be for you to get current quickly.

We Have Not Been Empowered to Look

Dilbert, cubes and big bureaucracies aside, while it may be true that some supervisors are clueless and may not do anything active to support tools research, that is no excuse. Workers may wait until they are “empowered” to take initiative; professionals, in the true sense of the word, take initiative naturally.

Granted, it is easier when an employer provides the time, tools, incentives and rewards for its developers to stay current. Such enlightened management is a hallmark of adaptive and innovative organizations. And it is also the case that if your organization is not supporting research aims, it may be time to get that resume up to date and circulated.

But knowledge workers today should also recognize that responsibility for professional development and advancement rests with them. It is likely all of us will work for many employers, perhaps even ourselves, during our careers. It is really not that difficult to find occasional time in the evenings or the weekend to do research and keep current.

If It’s Important, Become an Expert

One of the attractions of software development is the constantly advancing nature of its technology, which is truer than ever today. Technology generations are measured in the range of five to ten years, meaning that throughout an expected professional lifetime of say, about 50 years, you will likely need to remake yourself many times.

The “experts” of each generation generally start from a clean slate and also re-make themselves. How do they do so and become such? Well, they embrace the concept of lifelong learning and understand that expertise is solely a function of commitment and time.

Each transition in a professional career — not to mention needs that arise in-between — requires getting familiar with the tools and techniques of the field. Even if search tools were perfect and some “expert” out there had assembled the best listing of tools available, they can all be better characterized and understood.

It’s Mindset, Melinda!

Actually, look at all of the reasons above. They all are based on the premise that we have completely within our own lights the ability and responsibility to take control of our careers.

In my professional life, which I don’t think is all that unusual, I have been involved in a wide diversity of scientific and technical fields and pursuits, most often at some point being considered an “expert” in a part of that field. The actual excitement comes from the learning and the challenges. If you are committed to what is new and exciting, there is much room for open-field running.

The real malaise to avoid in any given career is to fall into the trap of “not enough time” or “not part of my job.” The real limiter to your profession is not time, it is mindset. And, fortunately, that is totally within your control.

Gathering in the Riches

Since each new generation builds on prior ones, your time spent learning and becoming familiar with the current tools in your field will establish the basis for that next change. If more of us had this attitude, the ability for each of us to leverage whatever already exists would be greatly improved. The riches and rewards are waiting to be gathered.

Posted:March 15, 2007

Bill Aron's The Scribe, from http://www.puckergallery.comOK, you lurkers. You know who you are. You hang out on open source forums, learning, gleaning, scheming . . . . You want to dive in, make contributions, but the truth is that the key developers on the project are really quite remarkable and have programming skills you can’t touch. And, so, you lurk, and maybe occasionally comment.

On the other hand, you are likely a user and implementer. And you probably share with me the observation, which I make frequently here and elsewhere, that one of the biggest weaknesses of most open source projects is their (relative) lack of documentation.

Maybe you’re also no great shakes as a programmer. But you can write! (Actually, overcoming the fear to write is a still lower threshold condition to contributing code to an open source project). I don’t know if this picture is true for you, but it is true for me.

But there’s hope and there’s a role waiting for you. If you begin contributing documentation to a project, it won’t be long before you start discovering some valuable secrets:

Secret #1. Those developers you admire so much hate to document and will thank you for it. It is actually the rare case where a great developer is also a great writer and communicator.

Secret #2. While you’re afraid of actually touching the code, getting into the app to write about how to use it or its nuances will bring you up the learning chain tremendously! Just as it is a truism that to learn a subject one needs to teach it, to learn about software code one should document it.

Secret #3. If, like me, you are not a natural programmer, then influencing those who are able to tackle some of the development issues of key concern to you is perhaps a more effective use of time than a direct frontal assault on the code itself. Writing documentation is one way to perhaps play to your greater strengths while still making a valuable contribution.

Secret #4. Committed developers behind every worthwhile open source project are typically creative and innovative. It is always rewarding to interact with smart, creative people. Starting to become a documentation cog within a broader open source wheel is personally and socially rewarding.

Finally, the total scope of documentation required by a project also includes user support and response to user forums. If you can also pick up some of the slack answering questions from newbies, you will also be doing the overall project a favor by freeing up valuable developer time.

Every open source project worth its promise has way, way too much that needs to be done and the community desires. You don’t have to be a world-class code jockey to make a meaningful contribution. So, lurkers and writers unite! Roll up your sleeves and get that quill wet.

Posted by AI3's author, Mike Bergman Posted on March 15, 2007 at 3:50 pm in Open Source, Software Development | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/348/lurkers-and-writers-unite/
The URI to trackback this post is: https://www.mkbergman.com/348/lurkers-and-writers-unite/trackback/
Posted:March 11, 2007

Sweet Tools ListingAI3’s SemWeb Tools Survey is Now Largely Completed

This AI3 blog maintains Sweet Tools, the largest listing of about 800 semantic Web and -related tools available. Most are open source. Click here to see the current listing!

It has taken nearly six months, but I believe my survey of existing semantic Web and related tools is now largely complete. While new tools will certainly be discovered and new ones are constantly being developed (which, I also believe, is at an accelerating pace), I think the existing backlog has largely been found. Though important gaps remain, today’s picture is one of a surprisingly robust tools environment for promoting semantic Web objectives.

Growth and Tools Characterization

My most recent update of Sweet Tools, also published today, now lists 500 semantic Web and related tools and is in its 8th version. Starting with the W3C’s listing of 70 tools, first referenced in August 2006, I have steadily found and added to the listing. Until now, the predominant source of growth in these listings has come through discovery of extant tools.

In its earliest versions, my search strategy very much focused on all topics directly related to the “semantic Web.” However, as time went on, I came to understand the importance of many ancillary tool sets to the entire semantic Web pipeline (such as language processing and information extraction) and came to find whole new categories of pragmatic tools that embodied semantic Web and data mediation processes but which did not label themselves as such. This latter category has been an especially rich vein to mine, with notable contributions from the humanities, biology and the physical sciences.

But the pace of discovery is now approaching its asymptote. Though I by no means believe I have comprehensively found all extant tools, I do believe that most new tools in future listings will come more from organic growth and new development than discovery of hidden gems. So, enjoy!

My view of what is required for the semantic Web vision to reach some degree of fruition begins with uncharacterized content, which then proceeds through a processing pipeline ultimately resulting in the storage of RDF triples that can be managed at scale. By necessity, such a soup-to-nuts vision embraces tools and requirements that, individually, might not constitute semantic technology strictly defined, but is nonetheless an integral part of the overall pipeline. By (somewhat arbitrary) category, here is the breakdown of the current listing of 500 tools:

No. Tools Category
43 Information Extraction
32 Ontology (general)
30 Parser or Converter
29 Composite App/Framework
29 Database/Datastore
26 Annotator
25 Programming Environment
23 Browser (RDF, OWL or semantic)
23 Language Processor
22 Reasoner/Inference Engine
22 Wiki- or blog-related
22 Wrapper (Web data extractor)
20 RDF (general)
19 Search Engine
15 Visualization
13 Query Language or Service
11 Ontology Mapper/Mediator
9 Ontology Editor
8 Data Language
8 Validator
6 NOT ACTIVE (???)
5 Semantic Desktop
4 Harvester
3 Description or Formal Logics
3 RDF Editor
2 RDF Generator
48 Miscellaneous
500

I find it amusing that the diversity and sources of such tool listings — importantly including what is properly in the domain or not — is itself an interesting example of the difficulties facing semantic mediation and resolution. Alas, such is the real world.

Java is the Preferred Language

There are interesting, and historical, trends as well in the use of primary development languages around these tools. Older ones rely on C or C++ or, if they are logic or inference oriented, on the expected languages of Prolog or LISP.

One might be tempted to say that Java is the language of the semantic Web with about 50% of all tools, especially the more developed and prominent ones that embrace a broader spectrum of needs, but I’m not so sure. I’m seeing a trend in the more recently announced tools to use JavaScript or Ruby — these deserve real attention. And while the P languages (Perl, PHP, Python) are also showing some strength, it is not clear that this is anything specific to semantic Web needs but a general reflection of standard Web trends.

So, here is the listing of Sweet Tools apps by primary development language:

Sweet Tools Languages

About half of all apps are written in Java. The next most prevalent language is JavaScript, at 13%, which is two times the amount of the next leading choices of C/C++ or PHP, which have about 6% each. As might be expected, the “major” apps are more likely to be written in Java or the C languages; user interface emphases tend to occur in the P languages; browser extensions or add-ons are generally in JavaScript; and logic applications are often in Lisp or Prolog.

An Alternative Simple Listing

I have also created and will maintain a simple listing of Sweet Tools that lists all 500 tools on a single page with live links and the each tool’s category. This listing is being provided to provide a single access point to users and because the Exhibit presentation is based on JavaScript, which is not adequately indexed by virtually all search engines.

Posted by AI3's author, Mike Bergman Posted on March 11, 2007 at 7:15 pm in Semantic Web Tools | Comments (4)
The URI link reference to this post is: https://www.mkbergman.com/347/listing-of-500-semantic-web-and-related-tools/
The URI to trackback this post is: https://www.mkbergman.com/347/listing-of-500-semantic-web-and-related-tools/trackback/
Posted:March 9, 2007

Firebug is hardly a secret weapon for aiding serious JavaScript development, but effectively learning what it is and how it works has been somewhat of a secret. That is, until now.

Firebug is a Firefox add-on that integrates development tools to edit, debug, and monitor CSS, HTML, and JavaScript live in any Web page. You can inspect and edit HTML, tweak CSS and styles, visualize CSS metrics with DOM highlighting, monitor page and JavaScript loading, debug and profile JavaScript, execute JavaScript on the fly and log its functions, quickly find errors, and explore the DOM. There is no more powerful tool to explain to you what is going on within your browser.

But the problem to date has been discovering and understanding many of the rich aspects of this tool. Firebug’s innovative developer, Joe Hewitt, is shown in this 48-minute screencast to Yahoo! developers about a month ago:

Joe Hewitt introduces Firebug to a developer audience at Yahoo!.

Source: Yahoo! Video (Flash) | download.yahoo.com (MP4; recommended)

This video provides a real power user’s tour of the recently released version 1.0 of Firebug. It is a must-see for Web developers. The download is advisable to better see Joe’s screen examples.

Joe is a Mozilla developer, who also wrote the original Mozilla DOM Inspector. Joe is a co-founder of Parakey, which I profiled in an earlier post.

Oh, by the way: Firebug is a most deserving winner of the AI3 Jewels & Doubloons award.

Jewels & Doubloons An AI3 Jewels & Doubloons Winner

Posted by AI3's author, Mike Bergman Posted on March 9, 2007 at 11:40 am in Software Development | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/345/firebug-from-the-horses-mouth/
The URI to trackback this post is: https://www.mkbergman.com/345/firebug-from-the-horses-mouth/trackback/
Posted:February 26, 2007

Image courtesy of www.webdesign.orgTo Reach Its Full Potential, JavaScript Collaboration Needs New Approaches and New Tools

This posting begins from the perspective: I’ve found a nifty JavaScript code base and would like to help extend it. How do I understand the code and where do I begin? It then takes that perspective and offers some guidance to open source projects and developers of some tools and approaches that might be undertaken to facilitate code collaboration.

Rich Internet applications, browser extensions and add-ons, and the emergence of sophisticated JavaScript frameworks and libraries such as Dojo, YUI and Prototype (among dozens of others, see here for a listing) are belying the language’s past reputation as a “toy.” These very same developments are increasingly leading to open source initiatives around these various sophisticated frameworks and applications. But there can be major barriers to open source collaboration with JavaScript.

JavaScript works great for single developers or single-embedded functions. It also is easily malleable with the use of obfuscation, minifying (including white-space reductions), and short variable names to keep scripts communicated by the server small and opaque. Yet as more JavaScript initiatives become ever complex and seek the involvement of outside parties, these factors and the nature of the language itself work against easy code understanding.

Open source projects are often notorious for poor, even non-existent, documentation. (I very much like Mike Pope’s statement, “If it isn’t documented, it doesn’t exist.”) But the documentation curse is magnified for JavaScript . As Doug Crockford notes, the general state of standard language documentaton and guide books is poor. Developers also often learn from others’ code examples, and there is much in the way of JavaScript snippets, especially historical, that is atrocious. Even widely adopted frameworks, such as Prototype, have for years been nearly undocumented, a shortcoming now being addressed. Rarely (in fact I know of none for major open source packages) does there exist UML class or package diagrams for larger JS apps or frameworks. And, because much JavaScript is communicated to the browser as server-side scripts, embedded code comments are of generally poor quality or totally non-existent. All of these factors translate into the general absence of APIs for larger apps (though most of the leading frameworks and libraries such as Prototype, YUI and Dojo are now beginning to provide professional documentation).

Some of the JavaScript language trade-offs and decisions hark back to early standards tensions between Microsoft and Netscape (and Sun). Some of JavaScript’s challenges arise from the very design of the language. The prototypal nature of inheritance in the language makes it difficult to trace class hierarchies, or even to discern class structure from standard functions via machine translation or interpreter inspection. Again, for embedded JS or small functions, this is not a general limitation. But as apps grow complex, indeed as a result of the inherent extensibility of the language itself, an undocumented prototypal structure can become a limitation to structural understanding. Also, as a loosely-typed language, it often can be difficult to discern intended datatypes. These language characteristics make it difficult to reverse engineer the code (see below) and therefore to tease out structure or round-trip the code with advanced IDEs.

Larger apps may often mix JavaScript with other languages, such as C++ or Java, in an attempt to segregate responsibilities in the code base in relation to perceived responsibilities and language strengths. Indeed, with the exception of some more recent complete frameworks such as the Tibco General Interface, most larger apps have mixed code bases. In these designs, open source contributors need to understand multiple languages and the language interfaces can cause their own challenges and bottlenecks. Alternatively, more structured languages such as Java can be used for initial development, with the code then being translated to JS, as is the case in Rhino or the Google Web Toolkit (GWT). Again, though, external contributors may need to have multiple language skills to contribute productively.

Thus, some of JavaScript’s challenges to open source collaboration are inherent to the language or structural, others are due to inadequate best practices. I discuss below possible tools in three areas — code documentation, code profiling or reverse engineering, and code diagramming — that could aid a bit in overcoming some of these obstacles.

Understand the Code’s Structure and Model Architecture

In any approach to a new code base, especially one that is complex with many thousands of lines of code and multiple files, it is often useful to start with a visual overview of the architectural structure. Preferably, this understanding includes both the static structure (via class, component or package UML diagrams or equivalent) and data flows through the app. Component views of the file structure are one useful starting point for understanding the physical relationships between the JS, and HTML, XHTML or XUL, and possible data files. Then, a next desirable view would be to run the code base through a code visualizer such as MIT’s Relo, which has the usefulness of enabling interactive and gradual discovery and presentation of the class structure of a code base.

“Reverse engineering,” or alternatively round-tripping of code with CASE tools or IDEs, has a fairly rich history, with much of the academic work in computer science occurring in Europe. For structured languages such as Java or C#, there are many reverse engineering tools that will enable rapid construction of UML diagrams. There are some experimental reverse engineering approaches for Web sites and Web pages, including ReversiXML and WebUml, but these are limited to HTML with no JavaScript support and are not production quality. In fact, my investigations have discovered no tools capable for reverse engineering JavaScript code in any manner, let alone to UML.

This perhaps should not be all that surprising given the loose structure of the language. While Ruby is also a loosely typed language and there is a code visualizer for Rails, its operation apparently very much relies on the structure within the Rails scaffolding. And, while one might think that techniques used elsewhere for information extraction or data mining applied to unstructured data could also be adapted to infer the much greater structure in JavaScript, any such inferences still involve a considerable degree of guesswork according to Vineet Sinha, Relo’s developer.

So, what other automated options short of reverse engineering to UML are there for gaining this structural understanding of the code base? Frankly, not many.

One commercial product is JavaScript Reporter. It performs syntax checking, variable type and flow analysis on individual JavaScript files, and adds type analysis across files for a group of related JavaScript files. It does not produce program flowcharts or UML. While one can process JavaScript with the commercial Code Visual 2 Flowchart, the code analysis is limited to a single function at a time (not a complete file let alone multiple files) and the output is limited to a flowchart (no UML) in either a BMP or PNG bitmap format.

The Dojo project is moving fairly aggressively with its JS Linker project, with the purpose of processing HTML/JavaScript code bases to prepare code for deployment by reducing file size, create source code documentation, obfuscate source code to protect intellectual property, and help gather source code metrics for source code analysis and improvements. This effort may evolve to create the basis for UML generation from code, but is still in the pre-release phases and its potential applicability to non-Dojo libraries is not known.

According to one review, “Reverse engineering is successful if the cost of extracting information about legacy systems plus effort to arrive specifications is less than the cost of starting from scratch.” If code generation or reverse engineering is essential, one choice could be to work either with Rhino or GWT directly with the Java source, prior to JavaScript generation. Besides the Java reverse engineering tools above, GWT Designer is a commercial visual designer and bi-directional code generator that is tailored specifically to GWT and Ajax applications. Of course, such an approach would require a commitment to Java development at time of project initiation.

When All Else Fails, Go Manual

Thus, apparently no automatic UML diagramming tools exist that can be driven from existing JavaScript. So, what is an open source project to do that wants to use UML or architectural diagrams to promote external contributions?

The first thing a large, open source JavaScript project should do is document and annotate its file structure. It is surprising, but in my investigations of relevant JavaScript applications, only Zotero has provided such file documentation (though finding the page itself is somewhat hard). (There are likely others that provide file documentation of which I am not aware, but the practice is certainly not common.)

The next thing a project should do is provide an architectural schematic. Since no automated tools exist for JavaScript, the only option is to construct these diagrams manually. Re-constructing a project’s UML structure, though time-consuming and a high threshold to initial participation, is an excellent way to become familiar with the code base. Once created, such diagrams should be prominently posted for subsequent contributors. (New projects, of course, would be advised to include such diagramming and documentation from the outset.)

Manual diagramming, possibly with UML compliance, next begs the question of which drawing program to use. Like program editors, UML and drawing tools are highly varied and a personal preference. I mostly hate selecting new tools since I only want to learn a few, but learn them well. That means tools represent real investments for me, which also means I need to spend substantial time identifying and then testing out the candidates.

Dia is a GTK+ based diagram creation program for Linux, Unix and Windows released under the GPL license. Dia is roughly inspired by the commercial Windows program Visio. It is more geared towards informal diagrams for casual use, but it does have UML drawing templates. Graphviz is a general drawing and modeling program used by some other UML open source programs. I liked this program for its general usefulness and kept it on my system after all evaluations were done, but is really not automated enough alone for UML use.

In the UML realm, one end of the spectrum are “light”, simpler tools that have lower learning thresholds and fewer options. Some candidates here are Violet (which is quite clean and can reside in Eclipse, but has very limited formatting options) and UMLet (which does not snap to grid and also has limited color options). Getting a bit more complex I liked UMLGraph, which allows the declarative specification and drawing of UML class and sequence diagrams. Among its related projects, LightUML integrates UMLGraph into Eclipse and XUG extends UMLGraph into a reverse engineering tool.

BOUML is a another free tool, recently released in France. Program start-up is a little strange, requiring a unique ID to be set prior to beginning a project. Besides the relative developmental status of this offering, English documentation is pretty awkward and hard to understand in some critical areas. StarUML has some interesting features, but is not Eclipse-ready. Its last major upgrade was more than one year ago. It has reverse engineering and code generation for the usual suspects, and has many diagramming options. Program settings are highly parameterized and there is a unique (within open source efforts) template to support Word integration. One quirk is that new class diagrams get provided with the (?) names of the Korean development team behind the project. Like all other of the UML tools investigated, StarUML does not have any support for JavaScript.

Since I use Eclipse as my general IDE, my actual preference is a UML tool integrated into that environment. There are about 40 or UML options that support that platform, about a dozen of which are either free (generally “community versions” of commercial products) or open source. I reviewed all of them and installed and tested most.

LightUML is a little tricky to install, since it has external dependencies on Graphviz and the separate UMLGraph, though those are relatively straightforward to install. The biggest problem with LightUML for my purposes was its requirement to work off of Java packages or classes. I next checked out green. With green, UML diagrams are educational and easy to create and it provides a round-tripping editor. I found it interesting as an academic project but somewhat lacking for my desired outputs. Omondo’s free version of EclipseUML looks quite good, but unfortunately will only work with bona fide Java projects, so it was also eliminated from contention. I found eUML2 to be slow and it seemed to take over the Eclipse resources. I found a similar thing with Blueprint Software Modeler, 85 MB without the Eclipse SDK, which seems rather silly for the degree of functionality I was seeking.

Papyrus is a free, very new, Eclipse plug-in. It has only 5 standard UML diagrams and sparse documentation, but its implementation so far looks clean and its support for the DI2 (diagram interchange) exchange format is a plus (I’ve kept it on my system and will check in on new releases later). Visual Paradigm for UML has a free version usable in Eclipse, which is professionally packaged as is common for commercial vendors. However, this edition limits the number of diagram types per project and the download requires all options and a 117 MB install before the actual free version can be selected, and even then implications of the various choices are obscure. (These not uncommon tactics are not really bait-and-switch, but often times feel so.) The limit of one diagram type per project eliminate this option from consideration, plus it is incredibly complicated and a real resource hog. The community edition of MagicDraw UML operates similarly and with similar limitations.

After numerous Eclipse installs and deletes, I finally began to settle on ArgoEclipse. It had a very familiar interface and installed as a standard Eclipse plug-in, was generally positively reviewed by others, and appeared to have the basic features I initially thought necessary. However, after concerted efforts with my chosen JavaScript project, I soon found many limitations. First, it was limited to UML 1.4 and lacked a component UML view that I wanted. I found the interface to be very hard to work with for extended periods. I wanted to easily add new UML profiles (an extension mechanism for UML models) specific to my project and JavaScript, but could find no easy way to do so. Lastly, and what caused me to abandon the framework, I was unable to split large diagrams into multiple tiled pieces for coherent printing or distribution.

None of these tools support JS reverse engineering or round-tripping. However, keeping everything in the Eclipse IDE would aid later steps when adding in-line code documentation (see below) to the code base. On the other hand, because of the lack of integration, and because UML diagramming needed to occur before code commenting, I decided I could forego Eclipse integration for a standalone tool so long as it supported the standard XMI (XML metadata interchange) format.

This decision led me back to again try StarUML, which is the tool I ended up using to complete my UML diagramming. I remain concerned about StarUML’s lack of recent development and reliance on Delphi and some proprietary components that limit it as a full open-source tool. On the other hand, it is extensible via user-added profiles and tags using XML, it is very intuitive to use, and it has very flexible export and diagramming options. Because of strong XMI support, should better options later emerge, there is no risk of lock-in.

Looking to the future, because of my Eclipse environment and the growing availability of JS editors in Eclipse (Aptana, which I use, but also Spket and JSEclipse), one project worth monitoring is UML2, an EMF– (and GMF-) based implementation of the UML 2.x metamodel for the Eclipse platform due out in June. This effort is part of the broader Model Development Tools (MDT) Eclipse initiative. UML2 builds are presently available for testing with other bleeding-edge components, but I have not tested these.

Throughout this process of UML tools investigations, I came to discover a number of preferences and criteria for selecting a toolset that include: 1) a free, open-source option; 2) easy to use and modify, with flexible user preferences; 3) support for UML 2.x and most or all UML diagrams; 4) support for XMI import and export; 5) extensible profiles and frameworks via an XML syntax, preferably with some profile building utilities; 6) operation within the Eclipse environment, measured by standard plug-in and update installs; 7) clean functionality and user interface; 8 ) the ability to handle large diagrams, particularly in tiling; 9) a variety of output formats; and 10) strength in the class, component and package diagrams of most use to me. Of course, these factors may differ substantially for you.

Now, Document the Code as You Dive Deeper

Since JavaScript apps often have little or no embedded code documentation, one way to contribute is to document it as you trace though it. Like creating UML diagrams, a requirement to provide in-line documentation is a high threshold to initial participation. However, for motivated contributors, documentation can be an excellent way to learn a code base. But whomever does the initial documentation, it should be done in a maintainable manner with the ability to generate API documentation and be posted prominently for other potential contributors.

There are a few emerging JavaScript code documentation systems, which are tools that parse inline documentation in JavaScript source files to produce (preferably) well formatted HTML with links. The “leading” ones are shown in the table below:

Pros Cons
JSDoc
  • Best known and most common
  • Simple parser recognition
  • Operates similarly to Javadoc (but is a subset)
  • Requires separate Perl install
  • Doesn’t support multiple declarations for a single function
  • Tags perhaps too constrained, inadequate (for example, lacks the @alias tag)
  • Little formatting flexibility
  • Not workable with Prototype-style programming
  • Found (generally) to be inadequate for modern JavaScript (according to project leads)
JSDoc-2
  • Written in JavaScript
  • Simple parser recognition
  • Same original developer (Mike Mathews) as JSDoc
  • Backwards compatible with JSDoc (at least for now)
  • More open input process to initial design
  • Reports easily configurable
  • No code interpretation
  • Just released; only in beta
  • No existing report templates (as of yet)
ScriptDoc
  • Supported by Aptana IDE
  • Separate sdoc file can keep documentation separate from source files
  • Has really not gotten traction in nearly 9 months
  • Appears to close to single sponsor (Aptana)
  • Separate sdoc needs to be synchronized with source files
  • No known presentation templates or examples
  • Schizophrenic between Aptana site and ScriptDoc site
NaturalDocs
  • Targeting any language; 4 now have full support, more than 15 (including JS) have basic support
  • Nice layouts
  • Flexibility for how inline documentation is written (e.g., any keyword: name)
  • Lack of rigid parsing rules can lead to ambiguous outputs
  • Little traction so far
  • Appears to be a single-person project
  • Perhaps too flexible

The most widely used approach is JSDoc, which is a Perl-based analog to Javadoc (click here for an example report). JSDoc is the most established of the systems and has an interface familiar to most Java developers. The first version is being phased out, with version 2 release apparently pending. There is a Google Group on this option. JSDocumenteris a graphical user interface built for Javadoc-like documenting using JSDoc. Within this family, the JSDoc-2 option appears to be the choice, since the initial developers themselves have moved in that direction and recognize earlier problems with JSDoc they are planning to overcome.

ScriptDoc is an attempt to standardize documentation formats, and is being pushed and backed by the Aptana open source JS IDE. The major differentiator for the ScriptDoc approach is the ability to link IDs in the code base to external .sdoc files that could perhaps have more extensive commenting than what might be desirable in-line. (The other side of that coin is the synchronization and maintenance of parallel files.) To date, ScriptDoc is also tightly linked with Aptana, with no known other platform support. Also, there is a disappointing level of activity on the ScriptDoc standards site.

NaturalDocs is an open-source, extensible, multi-language documentation generator. You document your code in a natural syntax that reads like plain English. Natural Docs then scans your code and builds high-quality HTML documentation from it. Multiple languages are presently supported in either full or basic mode, including JavaScript. The flexible format for this system is both a strength and weakness.

There are a few other JS documentation systems. Qooxdoo, an open-source, multi-purpose Ajax system, provides a Python script utility that parses JavaScript and also understands Javadoc- and doxygen-like meta information to generate a set of HTML files and a class tree; see further this explanation. JSLinker for Dojo, mentioned above, also includes documentation aspects. For other information, Wikipedia has a fairly useful general comparison of documentation generators, most of which are not related to JavaScript.

The idea of keeping documentation text separate from the code has initial appeal for JavaScript, since it is often necessary to transfer the scripts in minimal form. But this very separation creates synchronization issues and many modern code editors can also support programmatic exclusion of the comments for transferred forms.

On this and other bases, I think JSDoc-2 is the best choice of the options above. The general Javadocs form and expansion of necessary tags appears to have overcome earlier JSDoc limits, and the easy parser recognition for the comments appears sufficiently flexible. Plus, output formats can be tailored to be as fancy as desired.

Of course, once a documentation generator is selected it is then necessary to establish in-line comment standards. These should be a standard product documentation requirement, best a part of standard coding standards. Each project should therefore also publish its own coding and documentation standards, which is especially important when outside collaborators are involved. It is beyond the scope of this survey to suggest standards, but one place to start is with the JavaScript coding standards from Doug Crockford (but which also includes only minimal inline documentation recommendations).

Code Validation and Debugging

Though this survey regards efforts prior to coding collaboration, once coding begins there are some other essential tools any JavaScript developer should have. These include:

  • JSLint — a JavaScript syntax checker and validator
  • Firebug — a great inspection and debugging service for use within the Firefox browser, and
  • Venkman — the code name for another Firefox-based debugger, this one from Mozilla.

Since these tools are beyond the scope of this present review, more detailed discussion of them will be left to another day.

Posted by AI3's author, Mike Bergman Posted on February 26, 2007 at 10:17 pm in Open Source, Software Development | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/341/overcoming-limitations-to-javascript-collaboration/
The URI to trackback this post is: https://www.mkbergman.com/341/overcoming-limitations-to-javascript-collaboration/trackback/