Michael Wacey argues inThe Semantic Organization: Knowing What You Know that corporations have a tremendous amount of stored information and are likely to be the early adoption point for semantic Web capabillities, similar to the ways in which corporations have proven to be the adopters for Web services and the underlying technologies (UDDI, WSDL, SOAP) initially designed for the Web at large.
I agree with his premise that Web-wide adoption of semantic tagging is unlikely at first and individual organizations offer better and easier proving grounds. However, my experience has been that government agencies have been the leaders in semantic and entity extraction; for reasons noted elsewhere, corporations have been slow on the uptake.
Some of the stumbling blocks appear to be lack of understanding of benefits by top management and the lack of automated and accurate means to "tag" content at scale and then manage it. Until these fundamental sticking points are greased, we are likely to continue to see the leadership in promoting semantic Web capabilties by government entities where lives and national security are at stake.
I have been thankful for the many wonderful comments and reactions to my professional blogging guide. My favorite is from Blizzard Internet Marketing:
When you come across such a generous addition to the web community you just have to share. Last night, going through my feeds I came across the … Comprehensive Guide to a Professional Blog Site. What a fantastic resource for the beginner professional blogger. After spending just a few weeks with our own blog, researching and sharing online our quest to educate ourselves on the art, process, and best practices of blogging I came across this gem …. after spending just a few weeks on our own research I don't doubt how long it took to create this easy to read, easy to understand comprehensive guide.
Michael's note taking and attention to detail as he went on his journey is impressive. Just about every step of the way he details his tasks from choosing the program and why, to loading it on the server and configuring the system. Within the guide are a number of resources for even more great information ….
Fantastic! Michael, thank you so much for taking on loading and configuring the system and not giving the task to your tech team. The information you gleaned from doing this yourself only enhances the content of the guide.
I would highly recommend reading the guide if you are planning on installing and using WordPress for your blog. Even if you decide to go with something more out of the box such as Blogger or Typepad the information in the guide on blogging and organizing your thoughts to create a worthy site are just as invaluable.
This post introduces a new category area in my blog related to what I and BrightPlanet are terming the eXtensible Semi-structured Data Model (XSDM). Topics in this category cover all information related to extensible data models and engines applicable to documents, metadata, attributes, semi-structured data, or the processing, storing and indexing of XML, RDF, OWL, or SKOS data formats.
Why this category is important is introduced by Fengdong Du in the master’s thesis, Moving from XML Documents to XML Databases, submitted to the University of British Columbia in March 2004. As succinctly stated in the introduction to that thesis:
Depending on the characteristics of XML applications, the current XML storage techniques can be classified into two major categories. Most text-centric applications (e.g., newspapers) choose an existing file system for data storage. Data is usually divided into logical units, and each logical unit is physically stored as a separate file. As an example, a newspaper application may divide the entire year's newspapers into 12 collections by months, and store each collection as a document file. This type of application usually provides a keyword-based search tool and manipulates the data in application-specific processes. While this approach simplifies the storage problem, it has some major drawbacks. First, storing XML data as plain text makes it difficult to develop a generic data manipulation interface.
Second, mapping logical units of data to individual files makes it difficult to view the data from a different perspective. For this reason, this type of application only provides services with limited functionalities and therefore restricts the usage of data.
On the other hand, in data-centric applications such as e-commerce applications, data is typically highly-structured, e.g., extracted from a relational database management system (RDBMS). XML is primarily used as a tool to publish data to the Web or deliver information in a self-descriptive way in place of the conventional relative files. This type of application relies on the RDBMS for data storage. Data received in XML format is eventually put into an RDBMS when persistence is desired. Over the years, an RDBMS has been well developed to efficiently store and retrieve well-structured data. Structured Query Language (SQL) and many useful extended RDBMS utilities (e.g., Programming Language SQL, stored procedures) act as an application-independent data manipulation interface. Applications can communicate with databases through this generic interface and, on top of it, provide services with very rich functionalities.
While storing XML data into an RDBMS can take advantage of the well-developed relational database techniques and open interfaces, this approach requires an extra schema-mapping process applied to XML data, which involves schema transformation and usually decomposition. The schemas of XML data have to be mapped to strictly-defined relational schemas before data is actually stored. This process is strongly application-dependent or domain-dependent because there must be enough information available to determine many relational database design issues such as which table in the target RDBMS is a good place to store the information delivered, what new tables need to be created, which elements/attributes should be indexed, etc. No matter how this kind of information is obtained, whether delivered with XML data as schemas and processing instructions, or the application context makes it obvious, it is hard to develop an automatic and generic schema-mapping mechanism. Instead, application-specific work needs to take care of the schema-mapping problem. This involves non-trivial work of database server-side programming and database administration.
Another drawback of storing XML data in an RDBMS is that it is hard to efficiently support many types of queries that people want to ask on XML data. In RDBMS, each table has a pre-defined primary key field, and possibly a few other indexed fields. Queries not on the key field and not on the indexed fields will result in table scans (i.e., possibly a very large number of I/O's, which can be very time consuming) such as for the following path and predicate expression:
It is very likely that "department" is not indexed on "street" and that "student" is not indexed on "nationality". Therefore, resolving this path expression will cause table scans. Moreover, storing XML data in an RDBMS often results in schema decomposition and produces many small tables. Hence, evaluating a query often needs many expensive join operations.
For unstructured or semi-structured data, an RDBMS has greater difficulty, and query performance is usually unacceptable for relatively large amount of data. For these reasons, a native database management system is expected in the XML world. Like a traditional RDBMS, native XML databases would provide a comprehensive and generic data management interface, and therefore isolate lower level details from the database applications. Unlike an RDBMS, an ideal native XML database would make no distinction between unstructured data and strictly structured data. It treats all valid XML data in the same way and manages them equally efficiently. Its performance is only affected by the type of data manipulation. In other words, an ideal XML native database is not only access transparent but also performance transparent upon the structural difference of data.
Future topics in this XSDM area will expand on these challenges and describe new standards-based solutions being developed by BrightPlanet that specifically address these challenges..
I just came across a VC blog pondering the value to a start-up of operating in "Stealth Mode" or not. I’ve amusingly come to the conclusion that all of this — particularly the "stealth" giveaway — is so much marketing hype. When a start-up claims they’re coming out of stealth mode, grab your wallet.
The most interesting and telling example I have of this is Rearden Commerce, which was announced in a breathy cover story in InfoWorld in February 2005 about the company and its founder/CEO Patrick Grady. The company has an obvious "in" with the magazine; in 2001 InfoWorld also carried a similar piece on the predecessor company to Rearden, Talaris Corporaton.
According to a recent Business Week article, Rearden Commerce and its predecessors reaching back to a earlier company called Gazoo founded in 1999 have raised $67 million in venture capital. While it is laudable the founder has reportedly put his own money into the venture, this venture through its massive funding and high-water mark of 80 employees or so hardly qualifies as "stealth."
As early as 2001 with the same technology and business model, this same firm was pushing the "stealth" moniker. According to an October 2001 press release:
"The company, under its stealth name Gazoo, was selected by Red Herring magazine as one of its ‘Ten to Watch’ in 2001." [emphasis added]
Even today though no longer the active name Talaris Corporation has close to 115,000 citations on Yahoo! Notable VCs such as Charter Ventures, Foundation Capital, JAFCo and Empire Capital have backed it through its multiple incubations.
Holmes Report a marketing company, provides some insight into how the earlier Talaris was spun in 2001:
"The goal of the Talaris launch was to gain mindshare among key business and IT trade press and position Talaris as a ‘different kind of start-up’ with a multi-tiered business model, seasoned executive team and tested product offering."
The Holmes Report documents the analyst firms and leading journals and newspapers to which it made outreach. Actually, this outreach is pretty impressive. Good companies do the same all of the time and that is to be lauded. What is to be questioned, however, is how many "stealths" a cat can have. Methinks this one is one too many.
"Stealth" thus appears to be code for an existing company of some duration that has had disappointing traction and now has new financing, a new name, new positioning, or all of the above. So, interested in a start-up that just came out of stealth mode? Let me humbly suggest standard due diligence.
BrightPlanet has announced a major upgrade to its Deep Query Manager knowledge worker document platform. According to its press release, the new version achieves extreme scalability and broad internationalization and file format support, among other enhancements. The DQM has added the ability to harvest and process up to 140 different foreign languages in more than 370 file formats plus new content export and system administration features. The company also claims the new distributed architecture allows scalability into hundreds or thousands of users across multiple machines with the ability to handle incremental growth and expansions.
According to the company:
The Deep Query Manager is a content discovery, harvesting, management and analysis platform used by knowledge workers to collaborate across the enterprise. It can access any document content — inside or outside the enterprise — with strengths in deep content harvesting from more than 70,000 unique searchable databases and automated techniques for the analyst to add new ones at will. The DQM’s differencing engine supports monitoring and tracking, among the product’s other powerful project management, data mining, reporting and analysis capabilities.