Posted:December 2, 2005

A recent article by Cheryl Gerber in the November 21 issue of Military Information Technology online on "Smart Searching"  provides a useful overview of issues and leading vendors dealing with large-scale issues of content search and discovery.  Some of the useful vendors covered in this article include:   Endeca Technologies, Basis Technology, Inxight Software, Insightful, Attensity, Convera, SRA International (NetOwl), ClearForest and BrightPlanet.

The focus of these efforts in the Defense Intelligence Agency is characterized by Gerber as:

The unique requirements of defense intelligence analysts are refining search technology down from mass production, with its vast and sometimes trivial outcomes, to more guided, dynamic navigation able to produce results that are both inclusive and relevant.

As one of the largest collectors of information on the planet, the Defense Intelligence Agency (DIA) is responsible for amassing and analyzing all sources of human intelligence in the field from all information types in a multitude of languages.

"This forces us to deal with huge volumes of data. It's an enormous challenge," said a senior DIA official.

The task is indeed a massive one. Sources of intelligence in the field include feeds from UAVs, intelligence, surveillance and reconnaissance data from a vast array of sensors and overhead platforms, signal intelligence, satellites, film and video, not to mention all the data from the open source world. "We need to manage all that data and make it available as quickly as possible to analysts," the DIA official said.

The intel community, as with others forming in the commercial sector, is also relying on community standards for metadata transfer and management.  In the case of the DIA, these standards are being provided by the Intelligence Community Metadata Working Group (ICMWG), which is charged with establishing standards for the tagging of all data used by DIA systems.

In the article, BrightPlanet’s Duncan Witte commented on the importance of having the abilities to "organize, manage and distribute the huge volume of information as well. You need various specialties that allow collaboration with teammates and effective distribution of information."

This article again affirms that the federal intelligence community continues to assume the lead in large-scale content discovery and evaluation.

As the article notes, the DIA maintains a steady push toward technology improvement. "We try to do the best we can with the volumes. In-house we have a lot of expertise on search algorithms and text analysis. But we need to do a better job of combing through the massive volumes of information to find that which is interesting and nontrivial in a way that leads to knowledge discovery. We need better information retrieval through machine understanding of the semantic meaning of text, regardless of language," the DIA official said.

Posted:November 25, 2005

There were a number of references to the UMBC Semantic Web Reference Card – v2  when it was first posted about a month ago.  Because it is so useful, I chose to bookmark the reference and post again later (today) after the initial attention had been forgotten.

According to the site:

The UMBC Semantic Web Reference Card is a handy "cheat sheet" for Semantic Web developers. It can be printed double sided on one sheet of paper and tri-folded. The card includes the following content:

  • RDF/RDFS/OWL vocabulary
  • RDF/XML reserved terms (they are outside RDF vocabulary)
  • a simple RDF example in different formats
  • SPARQL semantic web query language reference
  • many handy facts for developers.

The reference card is provided through the University of Maryland, Baltimore County (UMBC) eBiquity program.  The eBiquity site provides excellent links to semantic Web publications as well as generally useful information on context-aware computing; data mining; ecommerce; high-performance computing; knowledge representation and reasoning; language technology; mobile computing; multi-agent systems; networking and systems; pervasive computing; RFID; security, trust and privacy; semantic Web, and Web services.

The UMBC eBiquity program also maintains the Swoogle service.   Swoogle  crawls and indexes semantic Web RDF and OWL documents encoded in XML or N3.  As of today, Swoogle contains about 350,000 documents and over 4,100 ontologies.

The Reference Card itself is available as a PDF download.  Highly recommended!

Posted by AI3's author, Mike Bergman Posted on November 25, 2005 at 12:46 pm in Adaptive Information, Searching, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/161/semantic-web-reference-card-update/
The URI to trackback this post is: https://www.mkbergman.com/161/semantic-web-reference-card-update/trackback/
Posted:November 23, 2005

Yeah, I know it is kind of silly to celebrate a six-months anniversary (today!) for my blog site.  It  too uncomfortably bears resemblance to my daughter’s anniversaries regarding her boyfriends.  As for my wife and me, the periods have moved to decades ….

But I DID notice the recent calendar trigger.  It HAS been interesting watching growing use and popularity of my site; it HAS been instructive getting embedded in the daily/sorta regular posting mentality; it HAS been a change to drafting or writing based on an online medium; it HAS been true (I hate to admit) I watch how what I do on this site is being paid attention to or indexed or scored or ranked by other sites.

Probably enough said … I remain very intrigued with this medium and what it means from the global to the personal.

Thanks again for listening, occasionally watching, and sometimes commenting on what gets posted here.  Happy 6th month! 

Posted by AI3's author, Mike Bergman Posted on November 23, 2005 at 9:52 pm in Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/167/happy-six-months/
The URI to trackback this post is: https://www.mkbergman.com/167/happy-six-months/trackback/

As a longstanding Search Engine Watch viewer and subscriber, a longstanding fan of Danny Sullivan in his role as observer and prognosticator of the search scene,  and a past speaker at SEW conferences, I kinda feel an affinity for the growth and (what used to be) adolescence of this space.  But recently I’ve come to feel I’m looking more at old age or irrelevance.

I don’t know how many owners have gobbled and then digested SEW, but it has been a few.  The site was originally released in 1995.  It was bought by Mecklermedia in 1997, the last time an official history of the site was published, though it flew under the banners of the same Alan Meckler owner in Internet.com, INT Media Group, Jupitermedia and ClickZ.  The most recent purchaser is by Incisive Media, with the transaction occurring in August 2005.

I just got my most recent SEW update (#209 to be exact).  What I found is that it has unfortunately evolved to be a compilation of blog listings. In going to the main SEW site I also see clutter, ad cram, poor refreshes and inattention to the standard metrics and search engine evaluations that used to be SEW’s claim to fame.

Perhaps this is the way of the world.  Things change faster; sites get bought and re-purposed; yada, yada.  After all, who remembers that Lycos was the first search engine to go public in July 1994 with a mere 54,000 URLs listed shortly before SEW was inaugurated?

And, maybe I’m just in a bad mood.  Perhaps SEW will return tomorrow or next week to its older standards.

But, I suspect not.  I think I will move my attention to more comprehensive RSS feeds (targeted by me to my specific interests) and allow the marketing marvel of just a few years back of monthly email updates to go the way of other mass-media dinosaurs.

At one of these points I’m going to think about and write what the implications are when K-species become r-species with lightning-quick generation times…. 

Posted by AI3's author, Mike Bergman Posted on November 23, 2005 at 7:36 pm in Searching | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/166/search-engine-watch-sew-craw-stickers/
The URI to trackback this post is: https://www.mkbergman.com/166/search-engine-watch-sew-craw-stickers/trackback/

In earlier posts I have put forward a vision for the semantic Web in the enterprise that has an extensible database supporting semi-structured data at its core with XML mediating multiple ingest feeds, interaction with analytic tools, and sending results to visualization and reporting tools.

This is well and good as far as it goes.  However, inevitably, whenever more than one tool or semi-structured dataset is added to a system, it brings with it a different “view” of the world.  Formalized and standardized protocols and languages are needed to both:  1) capture these disparate “views” and 2) provide facilities to map them to resolve data and schema federation heterogeneities.  These are the roles of RDF and OWL.

Fortunately, there is a very active community with tools and insights for working in RDF and OWL.  Stanford and UMBC are perhaps the two leading centers of academic excellence.

If you are not generally familiar with this stuff, I recommend you begin with the recent “Order from Chaos” from Natalya Noy of the Protégé group at Stanford Medical.  This piece describes issues like trust, etc., that are likely not as relevant to application of the semantic Web to enterprise intranets as they are to the cowboy nature of the broader Internet.  However, much else of this article is of general use to the architect considering enterprise applications.

To keep things simple and to promote interoperability, a critical aspect of any enterprise semantic Web implementation  will be providing the “data API” (including extensible XML, and RDF and OWL) standards that govern the rules of how to play in the sandbox.  Time spent defining these rules of engagement will pay off in spades in relation to any other appproach for multiple ingest, multiple analytic tools and multiple audiences, reports and collaboration.

Another advantage of this approach is the existence of many open source tools for managing such schema (e.g.Protégé) and visualization (literally dozens), among thousands of ontologies and other intellectual property.