Glut: Mastering Information Through The Ages

Wright’s Book Has Strong Scope, Disappointing Delivery

When I first saw the advanced blurb for Glut: Mastering Information through the Ages by Alex Wright I thought, “Wow, here is the book I have been looking for or wanting to write myself.” As the book jacket explains:

Spanning disciplines from evolutionary theory and cultural anthropology to the history of books, libraries and computer science, Wright weaves an intriguing narrative that connects such seemingly far-flung topics as insect colonies, Stone Age jewelry, medieval monasteries, Renaissance encyclopedias, early computer networks, and the World Wide Web. Finally, he pulls these threads together to reach a surprising conclusion, suggesting that the future of the information age may lie deep in our cultural past.

Wham, bang! The PR snaps with promise and scope!

These are themes that have been my passion for decades, and I ordered the book as soon as it was announced. It was therefore with great anticipation that I cracked open the cover as soon as I received it. (BTW, the actual date of posting for this review is much later only because I left this review in draft for some months; itself an indication of how, unfortunately, I lost interest in it. :( ).

Otlet is a Gem

The best aspect of Glut is the attention it brings to Paul Otlet, quite likely one of the most unique and overlooked innovators in information science in the 20th century. Frankly, I had only an inkling of who Otlet was prior to this book, and Wright provides a real service by bringing more attention to this forgotten hero.

(I have since gone on to try to learn more about Otlet and his pioneering work in faceted classification — as carried on more notably by S. R. Ranganathan with the Colon classification system — and his ideas behind the creation of the Mundaneum in Brussels in 1910. The Mundaneum and Otlet’s ideas were arguably a forerunner to some aspects of the Internet, Wikipedia and the semantic Web. Unfortunately, the Mundaneum and its 14 million ‘permanent encyclopedia’ items were taken over by German troops in World War II. The facility was ravaged and sank into obscurity, as did Otlet’s reputation, who died in 1944 before the war ended. It was not until Boyd Rayward translated many of Otlet’s seminal works to English in the late 1980s that he was rediscovered.)

Alex Wright’s own Google Tech Talk from Oct. 23, 2007, talks much about Otlet, and is a good summary of some of the other topics in Glut.

Stapled Book Reviews

The real disappointment in Glut is the lack of depth and scholarship. The basic technique seemed to be find a prominent book on a given topic, summarize it in a popularized tone, sprinkle in a couple of extra references from the source book relied on for that chapter to show a patina of scholarship, and move on to the next chapter. Then, add a few silly appendices to pad the book length.

So, we see, for example, key dependence on a relative few sources for the arguments and points made. Rather than enumerate them here, one approach if interested is to simply peruse the expanded bibliography on Wright’s Glut Web site. That listing is actually quite a good basis for beginning your own collection.

Books are Different

It seems like today, with blogging and digital content flying everywhere, that a greater standard should be set for creating a book and asking the buying public to actually pay for something. That greater standard should be effort and diligence to research the topic at hand.

I feel like Glut is related to similar efforts where not enough homework was done. For example, see Walter Underwood, who in his review of the Everything is Miscellaneous (not!) book, chastises author David Weinberger on similar grounds. (A conclusion I had also reached after viewing this Weinberger video cast.)

In summary, I give Wright an A for scope and a C or D in execution and depth. I realize that is a pretty harsh review; but it is one occasioned by my substantially unmet high hopes and expectations.

The means by which information and document growth has come to be organized, classified and managed have been major factors in humanity’s progress and skyrocketing wealth. Glut‘s skimpy hors d’œuvre merely whet the appetite: the full historical repast has yet to be served.

Production Printing Press

Was the Industrial Revolution Truly the Catalyst?

Why, roughly beginning in 1820, did historical economic growth patterns skyrocket?

This is a question of no small import, and one that has occupied economic historians for many decades. We know what some of the major transitions have been in recorded history: the printing press, Renaissance, Age of Reason, Reformation, scientific method, Industrial Revolution, and so forth. But, which of these factors were outcomes, and which were causative?

This is not a new topic for me. Some of my earlier posts have discussed Paul Ormerod's Why Most Things Fail: Evolution, Extinction and Economics, David Warsh's Knowledge and the Wealth of Nations: A Story of Economic Discovery, David M. Levy's Scrolling Forward: Making Sense of Documents in the Digital Age, Elizabeth Eisenstein's classic Printing Press, Joel Mokyr’s Gifts of Athena : Historical Origins of the Knowledge Economy, Daniel R. Headrick’s When Information Came of Age : Technologies of Knowledge in the Age of Reason and Revolution, 1700-1850, and Yochai Benkler's, The Wealth of Networks: How Social Production Transforms Markets and Freedoms. Thought provoking references, all.

But, in my opinion, none of them posits the central point.

Statistical Leaps of Faith

Statistics (originally derived from the concept of information about the state) really only began to be collected in France in the 1700s. For example, the first true population census (as opposed to the enumerations of biblical times) occurred in Spain in that same century, with the United States being the first country to set forth a decennial census beginning around 1790. Pretty much everything of a quantitative historical basis prior to that point is a guesstimate, and often a lousy one to boot.

Because no data was collected — indeed, the idea of data and statistics did not exist — attempts in our modern times to re-create economic and population assessments in earlier centuries are truly a heroic — and an estimation-laden exercise. Nonetheless, the renowned economic historian who has written a number of definitive OECD studies, Angus Maddison, and his team have prepared economic and population growth estimates for the world and various regions going back to AD 1 [1].

One summary of their results shows:

YearAve Per CapitaAve AnnualYrs Required
ADGDP (1990 $)Growth Ratefor Doubling

Note that through at least 1000 AD economic growth per capita (as well as population growth) was approximately flat. Indeed, up to the nineteenth century, Maddison estimates that a doubling of economic well-being per capita only occurred every 3000 to 4000 years. But, by 1820 or so onward, this doubling accelerated at warp speed to every 50 years or so.

Looking at a Couple of Historical Breakpoints

The first historical shift in millenial trends occurred roughly about 1000 AD, when flat or negative growth began to accelerate slightly. The growth trend looks comparatively impressive in the figure below, but that is only because the doubling of economic per capita wealth has now dropped to about every 1000 to 2000 years (note the relatively small differences in the income scale). These are annual growth rates about 30 times lower than today, which, with compounding, prove anemic indeed (see estimated rates in the table above).

Nonetheless, at about 1000 AD, however, there is an inflection point, though small. It is also one that corresponds somewhat to the adoption of raw linen paper v. skins and vellum (among other correlations that might be drawn).

When the economic growth scale gets expanded to include today, these optics change considerable. Yes, there was a bit of growth inflection around 1000 AD, but it is almost lost in the noise over the longer historical horizon. The real discontinuity in economic growth appears to have occurred in the early 1800s compared to all previous recorded history. At this major inflection point in the early 1800s, historically flat income averages skyrocketed. Why?

The fact that this inflection point does not correspond to earlier events such as invention of the printing press or Reformation (or other earlier notable transitions) — and does more closely correspond to the era of the Industrial Revolution — has tended to cement in popular histories and the public’s mind that it was machinery and mechanization that was the causative factor creating economic growth.

Had a notable transition occurred in the mid-1400s to 1500s it would have been obvious to ascribe more modern economic growth trends with the availability of information and the printing press. And, while, indeed, the printing press had massive effects, as Elizabeth Eisenstein has shown, the empirical record of changes in economic growth is not directly linked with adoption of the printing press. Moreover, as the graph above shows, something huge did happen in the early 1800s.

Pulp Paper and Mass Media

In its earliest incarnations, the printing press was an instrument of broader idea dissemination, but still largely to and through a relatively small and elite educated class. That is because books and printed material were still too expensive — I would submit largely due to the exorbitant cost of paper — even though somewhat more available to the wealthy classes. Ideas were fermenting, but the relative percentage of participants in that direct ferment were small. The overall situation was better than monks laboriously scribing manuscripts, but not disruptively so.

However, by the 1800s, those base conditions change, as reflected in the figure above. The combination of mechanical presses and paper production with the innovation of cheaper “pulp” paper were the factors that truly brought information to the “masses.” Yet, some have even taken “mass media” to be its own pejorative. But, look closely as what that term means and its importance to bringing information to the broader populace.

In Paul Starr's Creation of the Media, he notes how in 15 years from 1835 to 1850 the cost of setting up a mass-circulation paper increased from $10,000 to over $2 million (in 2005 dollars). True, mechanization was increasing costs, but from the standpoint of consumers, the cost of information content was dropping to zero and approaching a near-time immediacy. The concept of “news” was coined, delivered by the “press” for a now-emerging “mass media.” Hmmm.

This mass publishing and pulp paper were emerging to bring an increasing storehouse of content and information to the public at levels never before seen. Though mass media may prove to be an historical artifact, its role in bringing literacy and information to the "masses" was generally an unalloyed good and the basis for an improvement in economic well being the likes of which had never been seen.

More recent trends show an upward blip in growth shortly after the turn of the 20th century, corresponding to electrification, but then a much larger discontinuity beginning after World War II:

In keeping with my thesis, I would posit that organizational information efforts and early electromechanical and then electronic computers resulting from the war effort, which in turn led to more efficient processing of information, were possible factors for this post-WWII growth increase.

It is silly, of course, to point to single factors or offer simplistic slogans about why this growth occurred and when. Indeed, the scientific revolution, industrial revolution, increase in literacy, electrification, printing press, Reformation, rise in democracy, and many other plausible and worthy candidates have been brought forward to explain these historical inflections in accelerated growth. For my own lights, I believe each and every one of these factors had its role to play.

But at a more fundamental level, I believe the drivers for this growth change came from the global increase and access to prior human information. Surely, the printing press helped to increase absolute volumes. Declining paper costs (a factor I believe to be greatly overlooked but also conterminous with the growth spurt and the transition from rag to pulp paper in the early 1800s), made information access affordable and universal. With accumulations in information volume came the need for better means to organize and present that information — title pages, tables of contents, indexes, glossaries, encyclopedia, dictionaries, journals, logs, ledgers,etc., all innovations of relatively recent times — that themselves worked to further fuel growth and development.

Of course, were I an economic historian, I would need to argue and document my thesis in a 400-pp book. And, even then, my arguments would appropriately be subject to debate and scrutiny.

Information, Not Machines

Tools and physical artifacts distinguish us from other animals. When we see the lack of a direct correlation of growth changes with the invention of the printing press, or growth changes approximate to the age of machines corresponding to the Industrial Revolution, it is easy and natural for us humans to equate such things to the tangible device. Indeed, our current fixation on technology is in part due to our comfort as tool makers. But, is this association with the technology and the tangible reliable, or (hehe) “artifactual”?

Information, specifically non-biological information passed on through cultural means, is what truly distinguishes us humans from other animals. We have been easily distracted looking at the tangible, when it is the information artifacts (“symbols”) that make us the humans who we truly are.

So, the confluence of cheaper machines (steam printing presses) with cheaper paper (pulp) brought information to the masses. And, in that process, more people learned, more people shared, and more people could innovate. And, yes, folks, we innovated like hell, and continue to do so today.

If the nature of the biological organism is to contain within it genetic information from which adaptations arise that it can pass to offspring via reproduction — an information volume that is inherently limited and only transmittable by single organisms — then the nature of human cultural information is a massive shift to an entirely different plane.

With the fixity and permanence of printing and cheap paper — and now cheap electrons — all prior discovered information across the entire species can now be accumulated and passed on to subsequent generations. Our storehouse of available information is thus accreting in an exponential way, and available to all. These factors make the fitness of our species a truly quantum shift from all prior biological beings, including early humans.

What Now Internet?

The information by which the means to produce and disseminate information itself is changing and growing. This is an infrastructural innovation that applies multiplier benefits upon the standard multiplier benefit of information. In other words, innovation in the basis of information use and dissemination itself is disruptive. Over history, writing systems, paper, the printing press, mass paper, and electronic information have all had such multiplier effects.

The Internet is but the latest example of such innovations in the infrastructural groundings of information. The Internet will continue to support the inexorable trend to more adaptability, more wealth and more participation. The multiplier effect of information itself will continue to empower and strengthen the individual, not in spite of mass media or any other ideologically based viewpoint but due to the freeing and adaptive benefits of information itself. Information is the natural antidote to entropy and, longer term, to the concentrations of wealth and power.

If many of these arguments of the importance of the availability of information prove correct, then we should conclude that the phenomenon of the Internet and global information access promises still more benefits to come. We are truly seeing access to meaningful information leapfrog anything seen before in history, with soon nearly every person on Earth contributing to the information dialog and leverage.

Endnote: And, oh, to answer the rhetorical question of this piece: No, it is information that has been the source of economic growth. The Industrial Revolution was but a natural expression of then-current information and through its innovations a source of still newer-information, all continuing to feed economic growth.

[1] The historical data were originally developed in three books by Angus Maddison: Monitoring the World Economy 1820-1992, OECD, Paris 1995; The World Economy: A Millennial Perspective, OECD Development Centre, Paris 2001; and The World Economy: Historical Statistics, OECD Development Centre, Paris 2003. All these contain detailed source notes. Figures for 1820 onwards are annual, wherever possible.

For earlier years, benchmark figures are shown for 1 AD, 1000 AD, 1500, 1600 and 1700. These figures have been updated to 2003 and may be downloaded by spreadsheet from the Groningen Growth and Development Centre (GGDC), a research group of economists and economic historians at the Economics Department of the University of Groningen headed by Maddison. See

Wealth of Networks

This Book Proves the Adage that You See What You Look For

I have been hearing about Yochai Benkler’s book, The Wealth of Networks: How Social Production Transforms Markets and Freedoms, for some time and his exposition around what he (and many others) have called the “networked information economy.” Benkler, a Yale law professor, offers his 527 page (473 in text) book as a free PDF from his web site under a Creative Commons Attribution Noncommercial Sharealike license. I added his book to my summer reading list.

First, let me say, there are a couple of worthwhile insights in the book, which I’ll get to in a moment. But mostly, I found the book overly long, often off-subject, and too political for my tastes. In fairness, some of this might be due to the fact it was written in 2005 (published in 2006) and the social and participatory aspects of the Web are now widely appreciated. Yet I fear the broader problem with this polemic is that it proves the adage that you see what you look for.

The Main Thesis

Benkler’s argument is that cheap processors and the Internet have removed the physical constraints on effective information production. This is in keeping with the non-proprietary nature of information as a “nonrival” good, and is also leading to the democratization of information production and the emergence of large-scale peer-produced content. Benkler definitely allies himself with the camp of technology optimists, a camp I generally like to visit. His observations about trends and new developments from Ebay to Wikipedia to SETI@home and open source software is now commonly appreciated.

With the costs of information duplication and dissemination trending to zero, the limiting factor of production becomes human creativity and effort itself. But here, too, with Internet users approaching a billion in number, just a few hours of contributed content each easily swamps the ability of even the largest firm to compete. These trends to Benkler presage a “radical decentralization” of information production, and many other changes to the political economy and culture.

A Constipated Viewpoint

That radical changes in the nature of information production and authorship and even the role of traditional publishers or the media are underway is without question. Purposeful collaborations like Wikipedia are now clearly successful and were not forecasted by many. Technorati documents literally millions of bloggers online.

The lens, however, in which Benkler looks at all of these trends is through the “modern” history of the mass media. Citing Paul Starr’s Creation of the Media, he notes how in 15 years from 1835 to 1850 the cost of setting up a mass-circulation paper increased from $10,000 to over $2 million (in 2005 dollars). In Benkler’s view, these cost increases shifted the ability to publish away from the common citizen into the “problem” hands of the mass media. Fortunately, now with the Internet and cheap processors, this evil can be reversed back to a “radical decentralization” of content. Though Benkler specifically disclaims that he is not describing “an exercise in pastoral utopianism,” the fact is that is exactly what he is describing.

There can be no doubt that the role of mass media and traditional publishers is under severe challenge from the emergence of the Internet. It is also the case that we are witnessing citizen publishers and authors emerge by the millions. These changes are momentous, but they do not involve everyone — only comparatively small percentages of Internet users blog and still smaller percentages contribute to Wikipedia (about 80,000 at present based on a user base of hundreds of millions) (part of what I have called the “teeny heads” to contrast with the “long tail”). And, as the traditional gatekeepers of printers, publishers and editors lose prominence, new institutions and mechanisms for establishing the authoritativeness and trustworthiness of content will surely need to evolve.

These real trends deserve thoughtful exploration.

However, there is a reason that publishing costs increased so rapidly in that era of the 1800s. Mass publishing and pulp paper were emerging that acted to bring an increasing storehouse of content and information to the public at levels never before seen.

I have earlier written about how the explosion of information content that occurred at this very same time correlates well with the fundamental historical changes in human wealth and economic growth (“The Biggest Disruption in History: Massively Accelerated Growth Since the Industrial Revolution“). Though mass media may prove to be an historical artifact, I would argue that its role in bringing literacy and information to the “masses” was generally an unalloyed good and the basis for an improvement in economic well being the likes of which had never been seen.

By taking a narrow historical horizon and then viewing it through the lens of the vilified “mass media,” Benkler is both looking in the wrong direction and missing the point.

The information by which the means to produce and disseminate information itself is changing and growing supports an inexorable trend to more adaptability, more wealth and more participation. What we are seeing now with the Internet is but a natural phase in that trend. The “mass media” and the costs of information production of the 1800s was only a temporary phase in this longer, historical trend. The multiplier effect of information itself will continue to empower and strengthen the individual, not in spite of mass media or any other ideologically based viewpoint but due to the freeing and adaptive benefits of information itself. Information is the natural antidote to entropy and, longer term, to the concentrations of wealth and power.

By trying to push the trends of the Internet through the false needle’s eye of political economics, an effort that Benkler also erroneously makes with his earlier analysis of the growth of radio, what are in essence historical forces of almost informational or technological determinism are falsely presented as matters of political choice. Hogwash.

Insights Around Successful Social Collaboration

Benkler, however, does observe two useful dimensions for measuring social collaboration efforts: modularity and granularity. By modularity, Benkler means “a property of a project that describes the extent to which it can be broken down into smaller components, or modules, that can be independently produced before they are assembled into a whole.” By granularity, Benkler means “the size of the modules, in terms of the time and effort that an individual must invest in producing them.”

Benkler’s insight is that:

“the number of people who can, in principle, participate in a project is therefore inversely related to the size of the smallest scale contribution necessary to produce a usable module. The granularity of the modules therefore sets the smallest possible individual investment necessary to participate in a project. If this investment is sufficiently low, then incentives" for producing that component of a modular project can be of trivial magnitude. Most importantly for our purposes of understanding the rising role of nonmarket production, the time can be drawn from the excess time we normally dedicate to having fun and participating in social interactions.”

To illustrate this effect of granularity, he contrasts Wikipedia with its simple entries and editing and bounded topics with the far-less successful Wikibooks, which has much larger granularity.

Creators of social collaboration sites are advised to keep granularity small to encourage broader contributions, and if the nature of the site is complex, to increase the number of its modules. Of course, none of this guarantees the magic or timing that also lie behind the most successful sites!

Worth a Skim

I think that Benkler’s arguments could have been more effectively distilled into a 30-page article, with much of the political economy claptrap thrown out. But, there are some worthwhile references (including Elizabeth Eisenstein’s Printing Press as an Agent of Change, as well as Starr). The book is definitely worth a skim.

The W3C’s ESW semantic Web wiki, which I recently featured for its listing of 70 semantic Web tools, has now added a compilation of semantic Web books and conference proceedings, strictly defined. The listing presently contains about 20 books, mostly from the last two years, and a similar number of book-length conference proceedings. Though the predominance of listings is for English, books are also listed in French, German and Hungarian.

Readers are encouraged to add to this list, which should be a good reference point moving forward. My only question is what Ivan Herman’s definition of ‘strictly’ really means. For example, I think it is notable that Jeffrey Pollock’s and Ralph Hodgson’s Adaptive Information: Improving Business Through Semantic Interoperability, Grid Computing, and Enterprise Integration (ISBN: 0471488542) is not listed. Does ‘semantic Web’ specifically need to occur in the title to be considered?

I will suggest Adaptive Information for the listing when my review of it is complete. Meanwhile — and perhaps for a long time — you way want to check out this W3C listing.

A Semantic Web Primer, by Grigoris Antoniou and Frank van Harmelen, achieves just what it sets out to achieve:  to be a useful undergraduate introduction to the semantic Web.  This actually has much broader applicability, because, in the words of the authors’:

The question arises whether there is a need for [such an introductory undergraduate] textbook, given that all information is available online. We think there is a need because on the Web there are too many sources of varying quality and too much information. Some information is valid, some outdated, some wrong, and most sources talk about obscure details. Anyone who is a newcomer and wishes to learn something about the Semantic Web, or who wishes to set up a course on the Semantic Web, is faced with these problems. This book is meant to help out.

I obtained the book for that very same purpose, and it does provide a fairly useful basis for self-study for the layperson practitioner.  It also contains exercises at the end of each section making it useful for course teaching.

The book proceeds from a general discussion of the semantic Web and progresses through XML to XML Schema, XPath and XSL and XSLT, then the RDF and RDF Schema frameworks, on to then OWL and predicate logic, applications, example uses and ontologies and possible future developments.  The progression builds in line with Berner-Lee’s "layer" cake diagram (see my earlier post) and explains concepts clearly and well.

But it is a prettly slim volume.  After removal of blank pages, listings of markup code and accounting for wide white space margins, there are perhaps only 110 pages of useful content in the whole volume.

The references at the end of each section are excellent and will be important follow-on reading for serious students.

I think — as an introductory guide and as a quick way to cut through all of the overlapping and confusing resources on the Web — that this hardcover book deserves attention.  But it does not, unfortunately, alone constitute the one-stop introductory resource it could have been.  After reading this, it is time to move on to the more detailed section references.  I actually suspect that it will also be little consulted as a reference source on the shelf.

But, if you have been wanting a pretty good global, easy introduction to the semantic Web, this is probably worth your purchase.  The book can be obtained for about $30 new from Amazon (April 2004, MIT Press, 272 pp.).

