Posted:October 24, 2005

For some time now I have had a setting of ‘5 days’ as the limit for the number of posts to display on my site.  My rationale, as in most things, was to limit the number of bytes any individual reader needed to wait for during  a site visit (and download).

However, that may be dial-up thinking.

I was playing around with my own site and viewed monthly archive options.  Lo and behold, with my 5-day limit, that also translated into that display as well.  So, I upped my limit to a full month (31 days) and repeated the step.

Sure, downloads were a little slower, but not truly noticeably so, and I am on an IDSL line (128 kb) anyway!  Anyone with anything approaching a modern connection should not have a problem upping post display limits.

Oh, well.  My advice to everyone (and I’m likely the last) is to up your blog display governors.

Posted by AI3's author, Mike Bergman Posted on October 24, 2005 at 4:25 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/149/re-visit-internal-settings-for-blog-viewing/
The URI to trackback this post is: https://www.mkbergman.com/149/re-visit-internal-settings-for-blog-viewing/trackback/

I’ve just finished reading a fascinating 228 pp transcript on the topic of peer production and open data architectures. This discussion, the first of the so-called Union Square Sessions, involved more than 40 prominent Web and other thinkers with a heavy sprinkling of VCs.

That being said, I was disappointed that neither interoperability nor extensibility directly entered into any of the discussions. I suspect this may be due to conjoining the important singular topic of open data architectures with the lens of peer production or social networks. For me, the quote closest to my interests among this disparate group was from Dick Costolo, who stated “the bottom line implication is that an open data architecture will be one that is purely API based and not destination based.”

Nonetheless, this is an interesting start. I’d like to humbly suggest open and extensible data architectures (including, importantly, database engines in addition to extensible exchange formats such as XML) for a future discussion topic.

Here is the link to the Union Square Sessions Transcript.

Posted:October 23, 2005

Michael Wacey argues inThe Semantic Organization: Knowing What You Know that corporations have a tremendous amount of stored information and are likely to be the early adoption point for semantic Web capabillities, similar to the ways in which corporations have proven to be the adopters for Web services and the underlying technologies (UDDI, WSDL, SOAP) initially designed for the Web at large.

I agree with his premise that Web-wide adoption of semantic tagging is unlikely at first and individual organizations offer better and easier proving grounds.  However, my experience has been that government agencies have been the leaders in semantic and entity extraction; for reasons noted elsewhere, corporations have been slow on the uptake.

Some of the stumbling blocks appear to be lack of understanding of benefits by top management and the lack of automated and accurate means to "tag" content at scale and then manage it.  Until these fundamental sticking points are greased, we are likely to continue to see the leadership in promoting semantic Web capabilties by government entities where lives and national security are at stake.

Posted:October 22, 2005

I have been thankful for the many wonderful comments and reactions to my professional blogging guide.  My favorite is from Blizzard Internet Marketing:

When you come across such a generous addition to the web community you just have to share.  Last night, going through my feeds I came across the … Comprehensive Guide to a Professional Blog Site.  What a fantastic resource for the beginner professional blogger.  After spending just a few weeks with our own blog, researching and sharing online our quest to educate ourselves on the art, process, and best practices of blogging I came across this gem …. after spending just a few weeks on our own research I don't doubt how long it took to create this easy to read, easy to understand comprehensive guide.   

Michael's note taking and attention to detail as he went on his journey is impressive.  Just about every step of the way he details his tasks from choosing the program and why, to loading it on the server and configuring the system.  Within the guide are a number of resources for even more great information ….

Fantastic!  Michael, thank you so much for taking on loading and configuring the system and not giving the task to your tech team.  The information you gleaned from doing this yourself only enhances the content of the guide. 

I would highly recommend reading the guide if you are planning on installing and using WordPress for your blog.  Even if you decide to go with something more out of the box such as Blogger or Typepad the information in the guide on blogging and organizing your thoughts to create a worthy site are just as invaluable.

But also, among many others, thanks also to the Marketing Defined Blog, Marketing Slave and  e-Learning Centre.  Thank you, and others not specifically acknowledged, very much!

Posted by AI3's author, Mike Bergman Posted on October 22, 2005 at 3:07 pm in Blogs and Blogging, Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/144/thanks-comprehensive-blogging-guide-receives-kind-comments/
The URI to trackback this post is: https://www.mkbergman.com/144/thanks-comprehensive-blogging-guide-receives-kind-comments/trackback/
Posted:October 21, 2005

This post introduces a new category area in my blog related to what I and BrightPlanet are terming the eXtensible Semi-structured Data Model (XSDM). Topics in this category cover all information related to extensible data models and engines applicable to documents, metadata, attributes, semi-structured data, or the processing, storing and indexing of XML, RDF, OWL, or SKOS data formats.

Why this category is important is introduced by Fengdong Du in the master’s thesis, Moving from XML Documents to XML Databases, submitted to the University of British Columbia in March 2004. As succinctly stated in the introduction to that thesis:

Depending on the characteristics of XML applications, the current XML storage techniques can be classified into two major categories. Most text-centric applications (e.g., newspapers) choose an existing file system for data storage. Data is usually divided into logical units, and each logical unit is physically stored as a separate file. As an example, a newspaper application may divide the entire year's newspapers into 12 collections by months, and store each collection as a document file. This type of application usually provides a keyword-based search tool and manipulates the data in application-specific processes. While this approach simplifies the storage problem, it has some major drawbacks. First, storing XML data as plain text makes it difficult to develop a generic data manipulation interface.

Second, mapping logical units of data to individual files makes it difficult to view the data from a different perspective. For this reason, this type of application only provides services with limited functionalities and therefore restricts the usage of data.

On the other hand, in data-centric applications such as e-commerce applications, data is typically highly-structured, e.g., extracted from a relational database management system (RDBMS). XML is primarily used as a tool to publish data to the Web or deliver information in a self-descriptive way in place of the conventional relative files. This type of application relies on the RDBMS for data storage. Data received in XML format is eventually put into an RDBMS when persistence is desired. Over the years, an RDBMS has been well developed to efficiently store and retrieve well-structured data. Structured Query Language (SQL) and many useful extended RDBMS utilities (e.g., Programming Language SQL, stored procedures) act as an application-independent data manipulation interface. Applications can communicate with databases through this generic interface and, on top of it, provide services with very rich functionalities.

While storing XML data into an RDBMS can take advantage of the well-developed relational database techniques and open interfaces, this approach requires an extra schema-mapping process applied to XML data, which involves schema transformation and usually decomposition. The schemas of XML data have to be mapped to strictly-defined relational schemas before data is actually stored. This process is strongly application-dependent or domain-dependent because there must be enough information available to determine many relational database design issues such as which table in the target RDBMS is a good place to store the information delivered, what new tables need to be created, which elements/attributes should be indexed, etc. No matter how this kind of information is obtained, whether delivered with XML data as schemas and processing instructions, or the application context makes it obvious, it is hard to develop an automatic and generic schema-mapping mechanism. Instead, application-specific work needs to take care of the schema-mapping problem. This involves non-trivial work of database server-side programming and database administration.

Another drawback of storing XML data in an RDBMS is that it is hard to efficiently support many types of queries that people want to ask on XML data. In RDBMS, each table has a pre-defined primary key field, and possibly a few other indexed fields. Queries not on the key field and not on the indexed fields will result in table scans (i.e., possibly a very large number of I/O's, which can be very time consuming) such as for the following path and predicate expression:

//department[@street=”main mall”]/student[@nationality=”Chinese”]

It is very likely that "department" is not indexed on "street" and that "student" is not indexed on "nationality". Therefore, resolving this path expression will cause table scans. Moreover, storing XML data in an RDBMS often results in schema decomposition and produces many small tables. Hence, evaluating a query often needs many expensive join operations.

For unstructured or semi-structured data, an RDBMS has greater difficulty, and query performance is usually unacceptable for relatively large amount of data. For these reasons, a native database management system is expected in the XML world. Like a traditional RDBMS, native XML databases would provide a comprehensive and generic data management interface, and therefore isolate lower level details from the database applications. Unlike an RDBMS, an ideal native XML database would make no distinction between unstructured data and strictly structured data. It treats all valid XML data in the same way and manages them equally efficiently. Its performance is only affected by the type of data manipulation. In other words, an ideal XML native database is not only access transparent but also performance transparent upon the structural difference of data.

Future topics in this XSDM area will expand on these challenges and describe new standards-based solutions being developed by BrightPlanet that specifically address these challenges..

Posted by AI3's author, Mike Bergman Posted on October 21, 2005 at 3:06 pm in Adaptive Information, Information Automation, Semantic Web | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/146/the-semantic-web-demands-different-database-models/
The URI to trackback this post is: https://www.mkbergman.com/146/the-semantic-web-demands-different-database-models/trackback/