I’ve been in the information theory and technology game for quite some time, but believe nothing has matched the pace of advances of the past ten years. As one example, it was a mere eight years ago that I was sitting in a room with language translation vendors contemplating automated translation techniques for US intelligence agencies. The prospects finally looked doable, but the success of large-scale translation was not assured.
At about that same time, and the years until just recently, a whole slew of Grand Challenges  in computing hung out there: tantalizing yet not proven. These areas ranged from information extraction and natural language understanding to speech recognition and automated reasoning.
But things have been changing fast, and with a subtle steadiness that has caused it to go largely unremarked. Sure, all of us have been aware of the huge changes on the Web and search engine ubiquity and social networking. But some of the fundamentally hard problems in computing have also gone through some remarkable (but largely unremarked) advances.
We now have smart phones that speak instructions to us while we instruct them by voice in turn. Virtually all information conceivable is now indexed and made available through the Web; structure is now rapidly characterizing that information, making it even more useful to discover and organize. We can translate documents online with acceptable accuracy into more than 60 languages . We can get directions to or see satellite views of virtually any place on earth. We have in fact become accustomed to new technology magic on a nearly daily basis, so much so that the pace of these advances seems to be a constant, blunting our perspective of just how rapid these advances have been progressing.
These advances are perhaps not the realization of artificial intelligence as articulated in the 1950s to 1980s, but are contributing to a machine-based ability to do tasks useful to humans heretofore impossible and at scales unimaginable. As Google and IBM’s Watson are showing, statistics (among other techniques) applied to massive knowledge bases or text corpora are breaking down all of the Grand Challenges of symbolic computing. The image that is emerging is less one of intelligent machines working autonomously than it is of computers working interactively or semi-automatically with humans to address previously unsolvable problems.
By using a perspective of the decade past, we also demark the seminal paper on the semantic Web by Berners-Lee, Hendler and Lassila from May 2001 . Yet, while this semantic Web vision has been a contributor to the success of the Grand Challenge advances of the past ten years, I think we can also say that it has not been the key or even a primary driver. That day may still yet come. Rather, I think we have to look to natural language and statistics surrounding large-scale corpora as the more telling drivers.
Over the past ten years there have been significant advances on at least ten Grand Challenges in symbolic computation. As the concluding section notes, these advances can be traced in most part to broader advances in natural language processing, the logical and semiotic bases for interoperability, and standards (nominally in the semantic Web) for embracing them. Here are these ten areas of advance, all achieved over the past ten years:
Information extraction (IE) uses various forms of natural language processing (NLP) to identify structured information within unstructured or semi-structured documents. These documents are presented in machine-readable form (including straight text, various document formats or HTML) with the various types of information “tagged” or prompted for inclusion. Information types that can be extracted with one of the various techniques include entities, relations, topics, categories, and so forth. Once tagged or extracted, the information in the documents can now be included and linked to standard structured information (as might come from conventional databases) or to structure in other documents.
Most recently, a large number of online services and open source systems have also become available with strengths in one or more of these extraction types . Some current examples include Yahoo! Term Extraction, OpenCalais, BeliefNetworks, OpenAmplify, Alchemy API, Evri, Extractiv, Illinois Tagger, and about 80 others .
Machine translation is the automatic translation of machine-readable text from one human language to another. Accurate and acceptable machine translation requires applying different types of knowledge including grammar, semantics, facts about the real world, etc. Various approaches have been developed and refined over time.
Especially helpful has been the availability of huge corpora in multiple languages to which large-scale statistical analysis may be applied (as is the case of Google’s machine translation) or human editing and refinement (as is the case with the more than 280 language versions of Wikipedia).
While it is true none of these systems have 100% accuracy (even human translators show much variation), the more advanced ones are truly impressive with remaining ambiguities flagged for resolution by semi-automatic means.
Though sentiment analysis is strictly speaking a subset of information extraction, it has the more demanding and useful task of extracting subjective information, often across a group of documents or texts. Sentiment analysis can be applied to online reviews to determine the “polarity” about specific objects, and it is especially useful for identifying public opinion trends or evaluating social media for ranking, polling or marketing purposes.
Because of its greater difficulty and potential high value, many of the leading sentiment analysis capabilities remain proprietary. Some capable open source versions are available nonetheleless. There is also an interesting online application using Twitter feeds.
Many words have more than one meaning. Word sense disambiguation uses either machine learning, dictionaries (gazetteers) of known entities and concepts, ontologies or linguistic databases such as WordNet, or combinations thereof to evaluate ambiguous terms or phrases and resolve them based on context. Some systems need to be “trained” or some work automatically or others are based on evaulation and prompting (semi-automatic) to complete the disambiguation process.
Speech synthesis is the conversion of text to spoken speech and has been around for quite some time. Speech recognition is a far more difficult task in that a given sound clip or real-time spoken speech of a person must be converted to a textual representation, which itself can then be acted upon such as navigating or making selections. Speech recognition is made difficult because of individual voice differences, the variations of human languages and speech patterns, and the need to segment speech into a sequence of words. (In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the modulated wave form to discrete characters or tokens can be a very difficult process.)
Crude systems of a decade ago required much training with a specific speaker’s voice to show much effectiveness. Today, the range and ability to use these systems without training has markedly improved.
Until recently, improvements largely were driven by military and intelligence requirements. Today, however, with the ubiquity of smart phones and speech interfaces, the consumer market is greatly accelerating progress.
Image recognition is the ability to determine whether or not an electronic image contains some specific object, feature, or activity, and then to extract the image data associated with it. Today, under specific circumstances and for specific tasks, this can be done by computer. However, for the general case of arbitrary objects in arbitrary situations this challenge has not yet been fully met. The systems of today work best for simple geometric objects (e.g., polyhedra), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and orientation of the object relative to the camera.
Auto license recognition at intersections, face recognition by security cameras, and greatly expanded and improved character recognition systems (machine vision) represent some of the current state-of-the-art. Again, smart phone apps are helping to drive advances.
Most of the previous advances are related to extracting structured information or mapping or deriving additional structured information. Once obtained, of course, the next challenge is in how to relate that information together; that is, how to make it interoperate.
We have been steadily climbing a data federation pyramid  — and at an impressively accelerating rate since the adoption of the Internet and Web. These network innovations gave us a common basis and protocols for connecting distributed devices. That, in turn, has freed us to concentrate on the standards for data representation and interoperability.
XML first provided a means for a common data serialization that encouraged various communities and industries to devise exchange vocabularies. RDF provided a means for a common data model, one that was both simple and extensible at the same time . OWL built upon that basis to enable us to build common domain models (see next).
There are alternatives to the semantic Web standards of RDF and OWL such as common logic and there are many competing data exchange formats to XML. None of these standards is essential on its own and all have their communities and advocates. However, because they are standards and they share common network bases, it has also been relatively easy to convert amongst the various available protocols. We are nearly at a global level where everything is connected, machine-readable, and in structured form.
Semantics in machine-readable form means that we can more confidently link and combine available information. We are seeing a veritable explosion of domain models to represent various domains and viewpoints in consensual, interoperable form. What this means is that we are now gaining the computing vocabularies and grammars — along with shared community models (world views) — to get this stuff to work together.
Five years ago we called this phenomena mashups, but no one uses that term any longer because these information brewpots are everywhere, including in our very hands when we interact with the apps on our smart phones. This glue of domain models is generally as invisible to us as is the glue in laminates or the resin in plastics. But they are the strength and foundations nonetheless that enable much of the computing magic unfolding around us.
Once the tyranny of physical separation was shattered between data and machine by the network, the rationale for keeping the data with the app or even the user with the app disappeared. Cloud computing may seem mysterious or sound to have some high-octave hum, but it really is nothing more than saying that the Web enables us to treat all of our computing resources as virtual. Data can be anywhere; machines and hard drives can be anywhere; and applications can be anywhere.
And, virtualness brings benefits in and of itself. Whole computing environments can be installed or removed nearly instantaneously. Peak computing demands can be met with virtual headrooms. Backup and rollover and redundancy practices and strategies can change. Web services mean tailored capabilities can be invoked from anywhere and integrated for local needs. Massive computing resources and server farms can be as accessible to the individual as they are to prior computing behemoths. Combined with continued advances in underlying computing hardware and chips, the computing power available to any user is rising exponentially. There is now even more power in the power curve.
One hears stories of Google or the National Security Agency having access and managing servers measured in the hundreds of thousands. Entirely new operating systems and computing environments — many with roots in open source — such as virtual operating systems and MapReduce approaches like Hadoop have been innovated to deal with the current era of “big data”.
MapReduce is a framework for processing huge datasets using a large number of servers. The “map” step partitions the problem into tractable sub-problems, organized in a tree structure. The “reduce” step then takes the answers to all the sub-problems and combines them to produce the final output.
Such techniques enable analysis of datasets of a size impossible before. This has enabled the development of statistics and analytical techniques that have been able to make correlations and find patterns for some of the Grand Challenge tasks noted before that simply could not be addressed within previous limits. The “big data” approach is providing a brute force alternative to previously intractable problems.
Declining hardware costs and increasing performance (such as from Moore’s Law), combined with the adoption of the Internet + Web network, set the fertile conditions for these unprecedented advances in computing’s Grand Challenges. But the adaptive radiation in innovations now occurring has its own dynamics. In computing terms, we are seeing the equivalent of the Cambrian explosion in evolutionary history.
The dynamics driving this computing explosion are based largely, I believe, on the statistics of information retrieval and extraction needed to cope with the scale of documents on the Web. That, in turn, has impelled innovations in big data and distributed architectures and designs that have pried open previously closed computing lockboxes. As data from everywhere and from every provenance pours into the system, means for handling and interoperating with it have become imperatives. These forces, in turn, have been channeled and are being met through the open and standards-based approaches that helped lead to the development of the Internet and its infrastructure in the first place.
These powerful evolutionary forces in computing are clearly evident in the ten Grand Challenge advances above. But the challenges above are also silent on another factor, underpinning the interoperability initiatives, that is only now just becoming evident and exerting its own powerful force. That is the workable, intellectual foundations for interoperability itself.
Clearly, as the advances in the Grand Challenges show, we are seeing immense exposures of new structured information and impressive means for accessing and managing it on a global, distributed scale. Yet all of this data and this structure begs the question of how to get the information to work together. Further, the sources and viewpoints and methods by which all of this data has been created also puts a huge premium on means to deal with the diversity. Though not evident, and perhaps not even known to many of the innovators and practitioners, there has been a growing intellectual force shaping our foundational views about the nature of things and their representations. This force has been, I believe, one of those root cause drivers helping to show the way to interoperability.
John Sowa, despite his unending criticism of the semantic Web in favor of common logic, has nonetheless been a very positive evangelist for the 19th century American logician and philosopher, Charles Sanders Peirce. Sowa points out that the entire 20th century largely neglected Peirce’s significant contributions in many areas and some philosophers appropriated Peircean insights without proper attribution . Indeed, Peirce has only come to wider attention within the past decade or so. Much of his voluminous lifetime writings have still not yet been committed to publication.
Among many notable contributions, Peirce was passionate about signs and their triadic representations, in a field known as semiotics. The philosophical and logical basis of his triangle of signs deserves your attention, which can not be adequately treated here . However, as summarized by Sowa , “A semiotic view of language and logic gets to the heart of the philosophical controversies and their practical implications for linguistics, artificial intelligence, and related subjects.”
In essence, Peirce’s triadic logic of semiotics helps clarify philosophical questions about things, how they are perceived and how they are named that has vexed philosophers at least since the time of Aristotle. What Peirce was able to put forward was a testable logic for how things and the names of things can be understood and related to one another, via logical statements or structures. These, in turn, can be symbolized and formalized into logical constructs that can capture the structure of natural language as well as more structured data.
The clarity of Peirce’s logic of signs is an underlying factor, I believe, for why we are finally seeing our way clear to how to capture, represent and relate information from a diversity of sources and viewpoints that is defensible and interoperable . As we plumb Peircean logics further, I believe we will continue to gain additional insights and methods for combining and relating information. The next phase of our advances on these Grand Challenges is likely to be fueled more by connections and interoperability than in basic extraction or representation.
We are not seeing the vision of artificial intelligence unfold as posed three decades ago. Nor are we seeing the AI-complete type of problems being solved in their entirety . Rather, we are seeing impressive but incomplete approaches. Full automation and autonomy are not yet at hand, and may be so far in the future as to never be. But we are nevertheless seeing advances across the board in all Grand Challenge areas.
What is emerging is a practical achievement of the Grand Challenges, the scale and scope of which is unprecedented in symbolic computing. As we see Peircean logic continue to take hold and interoperability grow in usefulness and stature, I think it fair to say we can look back in ten years to describe where we stand today as having been in the midst of an evolutionary explosion.