OK, well, I just finished moving and upgrading some dozen Web sites and wikis, including this one — my main blog — over the weekend, from fixed stuff to the “clouds“. Believe you me, there were some pretty massive changes required.
For someone like me who is relatively clueless about such things, the process has been interesting (to say the least).
It seems like our modern era either involves moving digital things or converting digital things. As for moving, we all experience that laptop or hard drive dying, and then the move. (The Death of a Laptop actually happened to my wife this past week.) But it also is changing providers and venues — what caused me to move all of these Web sites.
So, the mainstream digital age has existed for what, now, some 40 years? How many data formats have we transitioned (ASCII, EBCDIC, UTF-8, an immense number)? And, how many systems and environments have we transitioned?
At the risk of dating myself, when I was in college we still used slide rules; truly the end of an era. Just a year or two later everyone transitioned to having TI or HP calculators, some they wore on their hips like some PDAs and cell phones today.
I won’t bore everyone with my own transition from my first computer (an HP 9100 with 4K RAM and program listings on cash register tapes) through many others including a DEC Rainbow PC with CP/M (a beauty!). For many years, as we moved into the PC era and IBM legitimized the shift, every computer I bought seemed to cost about $3000. Each one was more capable, etc., but they all cost the same.
And, then, about the late 1990s, that changed. In fact, my last capable desktop machine cost way south of $1000.
But, I digress.
What has been the real constant across these decades has been system and data migration. Granted, many of the docs and many of the systems in my own experience from 30 yrs ago have no relevance today (god, do I miss WordPerfect with its embedded, editable codes!), but actually an important minor portion do.
For these, I need to move both apps and data (with readable formats) for each generational transition.
I know that organizations, like the Library of Congress in its NDIIPP program, need to worry about digital preservation, potentially for millenia. These are worthwhile concerns.
But, from my own more prosaic standpoint, I see this issue with my own lens and own bas relief. I am constantly moving apps and data, each transition much like a snake shedding its skin.
It makes one wonder about the effort and process by which the entire meaningful cultural history of our species continues to adapt and transition forward.
Hmmm. All of us have seen these transitions and the loss of productivity they bring in that shift. (Some might argue that the lack of productivity gains from computers until this decade was due to such transitions, which at least now with the Web we see a more common migration framework.)
I think we have no choice but to transition to the next latest and greatest as it emerges. Automated means at acceptable cost for doing such transitions will also be attractive.
But the real point, I think, is that such transitions are inevitable. Faster apps: Check! Better apps: Check! Easier data exchange: Check!!
Living with transition thus becomes a clear constant for all us as we move forward. And, part of that is accepting downtime to screw around moving the keepable old to the potentially useful new.
After this weekend, I’m now ready for a couple of days off before the real work week begins (yeah, right, keep dreaming).
I recently wrote about WOA (Web-oriented architecture), a term coined by Nick Gall, and how it represented a natural marriage between RESTful Web services and RESTful linked data. There was, of course, a method behind that posting to foreshadow some pending announcements from UMBEL and Zitgist.
Well, those announcements are now at hand, and it is time to disclose some of the method behind our madness.
As Fred Giasson notes in his announcement posting, UMBEL has just released some new Web services with fully RESTful endpoints. We have been working on the design and architecture behind this for some time and, all I can say is, it’s UMBELievable!
As Fred notes, there is further background information on the UMBEL project — which is a lightweight reference structure based on about 20,000 subject concepts and their relationships for placing Web content and data in context with other data — and the API philosophy underlying these new Web services. For that background, please check out those references; that is not my main point here.
We discussed much in coming up with the new design for these UMBEL Web services. Most prominent was taking seriously a RESTful design and grounding all of our decisions in the HTTP 1.1 protocol. Given the shared approaches between RESTful services and linked data, this correspondence felt natural.
What was perhaps most surprising, though, was how complete and well suited HTTP was as a design and architectural basis for these services. Sure, we understood the distinctions of GET and POST and persistent URIs and the need to maintain stateless sessions with idempotent design, but what we did not fully appreciate was how content and serialization negotiation and error and status messages also were natural results of paying close attention to HTTP. For example, here is what the UMBEL Web services design now embraces:
There are likely other services out there that embrace this full extent of RESTful design (though we are not aware of them). What we are finding most exciting, though, is the ease with which we can extend our design into new services and to mesh up data with other existing ones. This idea of scalability and distributed interoperability is truly, truly powerful.
It is almost like, sure, we knew the words and the principles behind REST and a Web-oriented architecture, but had really not fully taken them to heart. As our mindset now embraces these ideas, we feel like we have now looked clearly into the crystal ball of data and applications. We very much like what we see. WOA is most cool.
For lack of a better phrase, Zitgist has a component internal plan that it calls its ‘Grand Vision’ for moving forward. Though something of a living document, this reference describes how Zitgist is going about its business and development. It does not describe our markets or products (of course, other internal documents do that), but our internal development approaches and architectural principles.
Just as we have seen a natural marriage between RESTful Web services and RESTful linked data, there are other natural fits and synergies. Some involve component design and architecting for pipeline models. Some involve the natural fit of domain-specific languages (DSLs) to common terminology and design, too. Still others involve use of such constructs in both GUIs and command-line interfaces (CLIs), again all built from common language and terminology that non-programmers and subject matter experts alike can readily embrace. Finally, some is a preference for Python to wrap legacy apps and to provide a productive scripting environment for DSLs.
If one can step back a bit and realize there are some common threads to the principles behind RESTful Web services and linked data, that very same mindset can be applied to many other architectural and design issues. For us, at Zitgist, these realizations have been like turning on a very bright light. We can see clearly now, and it is pretty UMBELievable. These are indeed exciting times.
BTW, I would like to thank Eric Hoffer for the very clever play on words with the UMBELievable tag line. Thanks, Eric, you rock!
Zotero has long been one of my favorite Firefox plug-ins, being a productive and trusted sidekick for collecting and reporting my voluminous citation and bibliographic data. I think perhaps my review of Zotero from January 2007 was one of my most glowing write-ups.
If you go to the Zotero home page, you will see at the lower left the steady increase of functionality that has come out in this free and open source tool. For example, Zotero now supports more than 1100 bibliographic sources, can capture Web pages and many standard Web sources, and has MS Office and WordPress support. Zotero has been developed and is distributed by the Center for History and New Media at George Mason University.
According to the Courthouse News Service with a copy of this complaint filed September 5, Thomson Reuters is suing George Mason University and, as a state institution, the Commonwealth of Virginia, for $10 million in damages and an injunction on further distribution of a beta version of Zotero. Thomson is seeking a jury trial.
Thomson claims that a July 8 beta release of Zotero (version 1.5) included a new feature to read and convert Thomson’s 3,500 plus proprietary .ens style files within the EndNote software into free, open source Zotero .csl files. Thomson claims this is in direct violation with GMU’s current license for EndNote. The Zotero beta release introduces a server-side synchronization function; the standard Zotero release without this feature and the EndNote support is version 1.07.
EndNote is a proprietary and popular citation software used by many academics and researchers. EndNote has very similar functionality to Zotero. It allows users to search online bibliographic databases, organize them, and store and re-format citations in various publication styles. Single user licenses are $250 with volume and academic discounts available. Thomson claims “millions” of ultimate users.
File format ingest and conversions have long been a mainstay of interoperable software systems. This lawsuit will bear close monitoring.
Hat tip to Rafael Sidi for this link.
I’m pleased to present a timeline of 100 or so of the most significant events and developments in the innovation and management of information and documents from cave paintings ( ca 30,000 BC) to the present. Click on the link to the left or on the screen capture below to go to the actual interactive timeline.
This timeline has fast and slow scroll bands — including bubble popups with more information and pictures for each of the entries offered. (See the bottom of this posting for other usage tips.)
Note the timeline only presents non-electronic innovations and developments from alphabets to writing to printing and information organization and conventions. Because there are so many innovations and they are concentrated in the last 100 years or fewer, digital and electronic communications are somewhat arbitrarily excluded from the listing.
I present below some brief comments on why I created this timeline, some caveats about its contents, and some basic use tips. I conclude with thanks to the kind contributors.
Readers of this AI3 blog or my detailed bio know that information — biological embodied in genes, or cultural embodied in human artefacts — has been my lifelong passion. I enjoy making connections between the biological and cultural with respect to human adaptivity and future prospects and I like to dabble on occasion as an amateur economic or information science historian.
About 18 months ago I came across David Huynh‘s nifty Exhibit lightweight data display widget, gave it a glowing review, and then proceeded to convert my growing Sweet Tools listing of semantic Web and related tools to that format. Exhibit still powers the listing (which I just updated yesterday for the twelfth time or so).
At the time of first rolling out Exhibit I also noted that David had earlier created another lightweight timeline display widget that looked similarly cool (and which was also the first API for rendering interactive timelines in Web pages). (In fact, Exhibit and Timeline are but two of the growing roster of excellent lightweight tools from David.) Once I completed adopting Exhibit, I decided to find an appropriate set of chronological or time-series data to play next with Timeline.
I had earlier been ruminating on one of the great intellectual mysteries of human development: Why, roughly beginning in 1820 to 1850 or so, did the historical economic growth patterns of all prior history suddenly take off? I first wrote on this about two years ago in The Biggest Disruption in History: Massively Accelerated Growth Since the Industrial Revolution, with a couple of follow-ups and expansions since then.
I realized that in developing my thesis that wood pulp paper and mechanized printing were the key drivers for this major inflection change in growth (as they effected literacy and the broadscale access to written information) I already had the beginnings of a listing of various information innovations throughout history. So, a bit more than a year ago, I began adding to that list in terms of how humans learned to write, print, share, organize, collate, reproduce and distribute information and when those innovations occurred.
There are now about 100 items in this listing (I’m still looking for and researching others; please send suggestions at any time. ). Here are some of the current items in chronological order from upper left to lower right:
|calendars||tree diagram||encyclopedia||pencil (mass produced)|
|cuneiform||quill pen||capitalization||rotary perfection press|
|papyrus (paper)||library catalog||magazines||catalogues|
|hieroglyphs||movable type||taxonomy (binomial classification)||typewriter|
|alphabet||paper (rag)||timeline||chemical pulp (sulfite)|
|Phaistos Disc||word spaces||data graphs||classification (Dewey)|
|scrolls||printing press||punch cards||kraft process (pulp)|
|manuscripts||advertising (poster)||steam-powered (mechanized) papermaking||flexography|
|glossaries||bookbinding||book (machine-paper)||classification (LoC)|
|dictionaries||pagination||chemcial symbols||classification (UDC)|
|parchment (paper)||punctuation||mechanical pencil||offset press|
|bibliographies||library catalog (printed)||chromolithography||screenprinting|
|concept of categories||public lending library||paper (wood pulp)||ballpoint pen|
|library||dictionaries (alphabetic)||rotary press||xerographic copier|
|classification system (library)||newspapers||mail-order catalog||hyperlink|
|zero||Information graphics||fountain pen||metadata (MARC)|
So, off and on, I have been working with and updating the data and display of this timeline in draft. (I may someday also post my notes about how to effectively work with the Timeline widget.)
With the listing above, completion was sufficient to finally post this version. One of the neat things with Timeline is the ability to drive the display from a simple XML listing. I will update the timeline when I next have an opportunity to fill in some of the missing items still remaining on my innovations list such as alphabeticization, citations, and table of contents, among many others.
Of course, rarely can an innovation be traced to a single individual or a single moment in time. Historians are increasingly documenting the cultural milieu and multiple individuals that affect innovation.
In these regards, then, a timeline such as this one is simplistic and prone to much error and uncertainty. We have no real knowledge, for examples, for the precise time certain historical innovations occurred, and others (the ballpoint pen being one case in point) are a matter of interpretation as to what and when constituted the first expression. For instances where the record indicated multiple dates, I chose to use the date when released to the publlic.
Nonetheless, given the time scales here of more than 30,000 years, I do think broad trends and rough time frames can be discerned. As long as one interprets this timeline as indicative and not meant as definitive in any scholary sense, I believe this timeline can inform and provide some insight and guidance for how information has evolved over human history.
The operation of Timeline is pretty straightforward and intuitive. Here are a couple of tips to get a bit more out of playing with it:
For the sake of consistency, nearly all entries and pictures on the timeline are drawn from the respective entries within Wikipedia. Subsequent updates may add to this listing by reference to original sources, at which time all sources will be documented.
The fantastic Timeline was developed by David Huynh while he was a graduate student at MIT. Timeline and its sibling widgets were developed under funding from MIT’s Simile program. Thanks to all in the program and best wishes for continued funding and innovation.
Finally, my sincere thanks go to Professor Michael Buckland of the School of Information at the University of California, Berkeley, for his kind suggestions, input and provision of additonal references and sources. Of course, any errors or omissions are mine alone. I also thank Professor Buckland for his admonitions about use and interpretation of the timeline dates.
The recent LinkedData Planet conference in NYC marked, I think, a real transition point. The conference signaled the beginning movement of the Linked Data approach from the research lab to the enterprise. As a result, there was something of a schizophrenic aspect at many different levels to the conference: business and research perspectives; realists and idealists; straight RDF and linked data RDF; even the discussions in the exhibit area versus some of the talks presented from the podium.
Like any new concept, my sense was a struggle around terminology and common language and the need to bridge different perspectives and world views. Like all human matters, communication and dialog were at the core of the attendees’ attempts to bridge gaps and find common ground. Based on what I saw, much great progress occurred.
The reality, of course, is that Linked Data is still very much in its infancy, and its practice within the enterprise is just beginning. Much of what was heard at the conference was theory versus practice and use cases. That should and will change rapidly.
In an attempt to help move the dialog further, I offer a definition and Zitgist’s perspective to some of the questions posed in one way or another during the conference.
Sources such as the four principles of Linked Data in Tim Berners-Lee’s Design Issues: Linked Data and the introductory statements on the Linked Data Wikipedia entry approximate — but do not completely express — an accepted or formal or “official” definition of Linked Data per se. Building from these sources and attempting to be more precise, here is the definition of Linked Data used internally by Zitgist:
All references to Linked Data below embrace this definition.
I’m sure many other questions were raised, but listed below are some of the more prominent ones I heard in the various conference Q&A sessions and hallway discussions.
Yes. Though other approaches can also model the first order predicate logic of subject-predicate-object at the core of the Resource Description Framework data model, RDF is the one based on the open standards of the W3C. RDF and FOL are powerful because of simplicity, ability to express complex schema and relationships, and suitability for modeling all extant data frameworks for unstructured, semi-structured and structured data.
No. Linked Data represents a set of techniques applied to the RDF data model that names all objects as URIs and makes them accessible via the HTTP protocol (as well as other considerations; see the definition above and further discussion below).
Some vendors and data providers claim Linked Data support, but if their data is not accessible via HTTP using URIs for data object identification, it is not Linked Data. Fortunately, it is relatively straightforward to convert non-compliant RDF to Linked Data.
There are some excellent references for how to publish Linked Data. Examples include a tutorial, How to Publish Linked Data on the Web, and a white paper, Deploying Linked Data, using the example of OpenLink’s Virtuoso software. There are also recommended approaches and ways to use URI identifiers, such as the W3C’s working draft, Cool URIs for the Semantic Web.
However, there are not yet published guidelines for also how to meet the Zitgist definition above where there is also an emphasis on class and context matching. A number of companies and consultants, including Zitgist, presently provide such assistance.
The key principles, however, are to make links aggressively between data items with appropriate semantics (properties or relations; that is, the predicate edges between the subject and object nodes of the triple) using URIs for the object identifiers, all being exposed and accessible via the HTTP Web protocol.
Absolutely not, though this is a source of some confusion at present.
The Semantic Web is probably best understood as a vision or goal where semantically rich annotation of data is used by machine agents to make connections, find information or do things automatically in the background on behalf of humans. We are on a path toward this vision or goal, but under this interpretation the Semantic Web is more of a process than a state. By understanding that the Semantic Web is a vision or goal we can see why a label such as ‘Web 3.0′ is perhaps simplistic and incomplete.
Linked Data is a set of practices somewhere in the early middle of the spectrum from the initial Web of documents to this vision of the Semantic Web. (See my earlier post at bottom for a diagram of this spectrum.)
Linked Data is here today, doable today, and pragmatic today. Meaningful semantic connections can be made and there are many other manifest benefits (see below) with Linked Data, but automatic reasoning in the background or autonomic behavior is not yet one of them.
Strictly speaking, then, Linked Data represents doable best practices today within the context both of Web access and of this yet unrealized longer-term vision of the Semantic Web.
Definitely not, though early practice has been interpreted by some as such.
One of the stimulating, but controversial, keynotes of the conference was from Dr. Anant Jhingran of IBM, who made the strong and absolutely correct observation that Linked Data requires the interplay and intersection of people, instances and schema. From his vantage, early exposed Linked Data has been dominated by instance data from sources such as Wikipedia and have lacked the schema (class) relationships that enterprises are based upon. The people aspect in terms of connections, collaboration and joint buy-in is also the means for establishing trust and authority to the data.
In Zitgist’s terminology, class-level mappings ‘explode the domain’ and produce information benefits similar to Metcalfe’s Law as a function of the degree of class linkages . While this network effect is well known to the community, it has not yet been shown much in current Linked Data sets. As Anant pointed out, schemas define enterprise processes and knowledge structures. Demonstrating schema (class) relationships is the next appropriate task for the Linked Data community.
In an RDF context, “ontologies” are the vocabularies and structures that capture the schema structures noted above. Ontologies embody the class and instance definitions and the predicate (property) relations that enable legacy schemas and data to be transformed into Linked Data graphs.
Though many public RDF vocabularies and ontologies presently exist, and should be re-used where possible and where the semantics match the existing legacy information, enterprises will require specific ontologies reflective of their own data and information relationships.
Despite the newness or intimidation perhaps associated with the “ontology” term, ontologies are no more complex — indeed, are simpler and more powerful — than the standard relational schema familiar to enterprises. If you’d like, simply substitute schema for ontology and you will be saying the same thing in an RDF context.
Neither, really, though the rationale and justification for Linked Data is grounded in federating widely disparate sources of data that can also vary widely in existing formalism and structure.
Because Linked Data is a set of techniques and best practices for expressing, exposing and publishing data, it can easily be applied to either centralized or federated circumstances.
However, the real world where any and all potentially relevant data can be interconnected is by definition a varied, distributed, and therefore federated world. Because of its universal RDF data model and Web-based techniques for data expression and access, Linked Data is the perfect vehicle, finally, for data integration and interoperability without boundaries.
The simple case is where two data sources refer to the exact same entity or instance (individual) with the same identity. The standard sameAs predicate is used to assert the equivalence in such cases.
The more important case is where the data sources aresimilar subjects or concepts, in which case a structure of well-defined reference classes is employed. Furthermore, if these classes can themselves be expressed in a graph structure capturing the relationships amongst the concepts, we now have some fixed points in the conceptual information space for relating and tieing together disparate data. Still further, such a conceptual structure also provides the means to relate the people, places, things, organizations, events, etc., of the individual instances of the world to one another as well.
Any reference structure that is composed of concept classes that are properly related to each other may provide this referential “glue” or “backbone”.
One such structure provided in open source by Zitgist is the 21,000 subject concept node structure of UMBEL, itself derived from the Cyc knowledge base. In any event, such broad reference structures may often be accompanied by more specific domain conceptual ontologies to provide focused domain-specific context.
No, absolutely not.
While, to date, it is the case that Linked Data has been demonstrated using public Web data and many desire to expose more through the open data movement, there is nothing preventing private, proprietary or subscription data from being Linked Data.
The Linking Open Data (LOD) group formed about 18 months ago to showcase Linked Data techniques began with open data. As a parallel concept to sever the idea that it only applies to open data, François-Paul Servant has specifically identified Linking Enterprise Data (and see also the accompanying slides).
For example, with Linked Data (and not the more restrictive LOD sense), two or more enterprises or private parties can legitimately exchange private Linked Data over a private network using HTTP. As another example, Linked Data may be exchanged on an intranet between different departments, etc.
So long as the principles of URI naming, HTTP access, and linking predicates where possible are maintained, the approach qualifies as Linked Data.
Absolutely yes, without reservation. Indeed, non-transactional legacy data perhapsbe expressed as Linked Data in order to gain its manifest benefits. See #14 below.
Of course. Since Linked Data can be applied to any data formalism, source or schema, it is perfectly suited to integrating data from inside and outside the firewall, open or private.
The basic query language for Linked Data is SPARQL (pronounced “sparkle”), which bears close resemblance to SQL only applicable to an RDF data graph. The actual datastores applied to RDF may also add a fourth aspect to the tuple for graph namespaces, which can bring access and scale efficiencies. In these cases, the system is known as a “quad store”. Additional techniques may be added to data filtering prior to the SPARQL query for further efficiencies.
Templated SPARQL queries and other techniques can lead to very efficient and rapid deployment of various Web services and reports, two techniques often applied by Zitgist and other vendors. For example, all Zitgist DataViewer views and UMBEL Web services are expressed using such SPARQL templates.
This SPARQL templating approach may also be combined with the use of templating standards such as Fresnel to bind instance data to display templates.
In Zitgist’s view, access control or security occurs at the layer of the HTTP access and protocols, and not at the Linked Data layer. Thus, the same policies and procedures that have been developed for general Web access and security are applicable to Linked Data.
However, standard data level or Web server access and security Virtuoso universal server that has proven and robust security mechanisms. Additionally, it is possible to express security and access policies using RDF ontologies as well. These potentials are largely independent of Linked Data techniques.be enhanced by the choice of the system hosting the data. Zitgist, for example, uses OpenLink’s
The key point is that there is nothing unique or inherent to Linked Data with respect to access or control or security that is not inherent with standard Web access. If a given link points to a data object from a source that has limited or controlled access, its results will not appear in the final results graph for those users subject to access restrictions.
For more than 30 years — since the widespread adoption of electronic information systems by enterprises — the Holy Grail has been complete, integrated access to all data. With Linked Data, that promise is now at hand. Here are some of the key enterprise benefits to Linked Data, which provide the rationales for adoption:
Linked Data is well suited to traditional knowledge base or knowledge management applications. Its near-term application to transactional or material process applications is less apparent.
Of special use is the value-added from connecting existing internal and external content via the network effect from the linkages .
Johnnie Linked Data is starting to grow up. Our little semantic Web toddler is moving beyond ga-ga-goo-goo to saying his first real sentences. Language acquisition will come rapidly, and, like what all of us have seen with our own children, they will grow up faster than we can imagine.
There were so many at this meeting that had impact and meaning to this exciting transition point that I won’t list specific names at risk of leaving other names off. Those of you who made so many great observations or stayed up late interacting with passion know who you are. Let me simply say: Thanks!
The LinkedData Planet conference has shown, to me, that enterprises are extremely interested in what our community has developed and now proven. They are asking hard questions and will be difficult task masters, but we need to listen and respond. The attendees were a selective and high-quality group, understanding of their own needs and looking for answers. We did an OK job of providing those answers, but we can do much, much better.
I reflect on these few days now knowing something I did not truly know before: the market is here and it is real. The researchers who have brought us to this point will continue to have much to research. But, those of us desirous of providing real pragmatic value and getting paid for it, can confidently move forward knowing both the markets and the value are real. Linked Data is not magic, but when done with quality and in context, it delivers value worth paying for.
To all of the fellow speakers and exhibitors, to all of the engaged attendees, and to the Juperitermedia organizers and Bob DuCharme and Ken North as conference chairs, let me add my heartfelt thanks for a job well done.
The next LinkedData Planet conference and expo will be October 16-17, 2008, at the Santa Clara Hyatt in Santa Clara, California. The agenda has not been announced, but hopefully we will see a continuing enterprise perspective and some emerging use cases.
Zitgist as a company will continue to release and describe its enterprise products and services, and I will continue to blog on Linked Data matters of specific interest to the enterprise. Pending topics include converting legacy data to Linked Data, converting relational data and schema to Linked Data, placing context to Linked Data, and many others. We think you will like the various announcements as they arise.
Zitgist is also toying with the use of a distinctive icon to indicate the availability of Linked Data conforming to the principles embodied in the questions above. (The color choice is an adoption of the semantic Web logo from the W3C.) The use of a distinctive icon is similar to what RSS feeds or microformats have done to alert users to their specific formats. Drop me a line and let us know what you think of this idea.