<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The Murky Depths of the &#8216;Deep Web&#8217;</title>
	<atom:link href="http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/</link>
	<description>Mike Bergman on the semantic Web and structured Web</description>
	<lastBuildDate>Mon, 06 Feb 2012 00:15:33 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: The Invisible Web</title>
		<link>http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/comment-page-1/#comment-148381</link>
		<dc:creator>The Invisible Web</dc:creator>
		<pubDate>Sun, 22 Jan 2012 09:47:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.mkbergman.com/?p=343#comment-148381</guid>
		<description>[...] Recent estimates of the size of the open web is 167 terabytes while The Invisible Web is estimated a... [...]</description>
		<content:encoded><![CDATA[<p>[...] Recent estimates of the size of the open web is 167 terabytes while The Invisible Web is estimated a&#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quora</title>
		<link>http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/comment-page-1/#comment-102942</link>
		<dc:creator>Quora</dc:creator>
		<pubDate>Mon, 29 Aug 2011 17:09:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.mkbergman.com/?p=343#comment-102942</guid>
		<description>&lt;strong&gt;What is the % of Internet information that is not indexed by search engines (the deep Internet)?...&lt;/strong&gt;

The &#039;400 to 500&#039; times claim has actually been retracted by Mike Bergman, the guy who&#039;s research paper first proposed it. He admits that the methods he used at the time to calculate that number were spotty, and backed of the claim almost immediately...</description>
		<content:encoded><![CDATA[<p><strong>What is the % of Internet information that is not indexed by search engines (the deep Internet)?&#8230;</strong></p>
<p>The &#8217;400 to 500&#8242; times claim has actually been retracted by Mike Bergman, the guy who&#8217;s research paper first proposed it. He admits that the methods he used at the time to calculate that number were spotty, and backed of the claim almost immediately&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Krassimir Fotev</title>
		<link>http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/comment-page-1/#comment-84687</link>
		<dc:creator>Krassimir Fotev</dc:creator>
		<pubDate>Thu, 07 Apr 2011 04:26:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.mkbergman.com/?p=343#comment-84687</guid>
		<description>Great article.. As it seems Bill Breitmayer was right in his assessment that large-scale collaborative effort is needed to process the information not visible to the crawler. Without it is impossible to figure what type of information human beings are actually digging into. We can only guess and the estimate will suffer.

When I started Peer Belt, I did not even consider the deep Web and what it really mean to us all. The initial goal for Peer Belt was to help quality content publishers reach its audience despite somebody else&#039;s aggressive search engine optimization. Along the way, it was discovered, Peer Belt&#039;s approach lets us organize the information that matter most as individuals. After reading this post, I am convinced utilizing the humancontent interaction is to solve yet another problem. 

As intuitive as is, user implicit actions, not artificial algorithms, indicate relevance! It is amazing no one has seen it.

Once again, great article, Mike, and comments, Bill!

-Krassi</description>
		<content:encoded><![CDATA[<p>Great article.. As it seems Bill Breitmayer was right in his assessment that large-scale collaborative effort is needed to process the information not visible to the crawler. Without it is impossible to figure what type of information human beings are actually digging into. We can only guess and the estimate will suffer.</p>
<p>When I started Peer Belt, I did not even consider the deep Web and what it really mean to us all. The initial goal for Peer Belt was to help quality content publishers reach its audience despite somebody else&#8217;s aggressive search engine optimization. Along the way, it was discovered, Peer Belt&#8217;s approach lets us organize the information that matter most as individuals. After reading this post, I am convinced utilizing the humancontent interaction is to solve yet another problem. </p>
<p>As intuitive as is, user implicit actions, not artificial algorithms, indicate relevance! It is amazing no one has seen it.</p>
<p>Once again, great article, Mike, and comments, Bill!</p>
<p>-Krassi</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Breitmayer</title>
		<link>http://www.mkbergman.com/343/the-murky-depths-of-the-deep-web/comment-page-1/#comment-44983</link>
		<dc:creator>Bill Breitmayer</dc:creator>
		<pubDate>Wed, 09 May 2007 18:52:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.mkbergman.com/?p=343#comment-44983</guid>
		<description>A heroic effort to quantify a very complex phenomenon.

A separate but related issue is the rankings, how a reference to content on a particular site gets to the top of the list.  Even if something interesting appears toward toward the top, how likely is it that someone has the patience to get through the 900 or so references listed?

Which raises the issue of Google faking us out about what we are seeing in the results ... I&#039;ve never been able to get past reference 900, nothing seems to exist beyond the first 900 references despite the 9,373,622 hits claimed by the engine.

For example, using the very general search term &quot;financial investments&quot; the Google engine reported 59,600,000 results.  In this case, dividing the 900 visible results by the 59,600,000 total results, what I&#039;m seeing on the Google results page is only about .0015% of everything purported to be in the index.

One should acknowledge the implicit assumption that each result is unique, rarely true of course.  If .15% of the unique results are dispaly by Google, one is actually seeing is still only about .0015% of everything out there.  By what ever method one computes it, the real ratio of references in surface web to the deep web is something like one in many thousands ...  

So, search engines have become a serious &quot;knowledge bottleneck&quot;, to use an old fashioned AI term.  In fact, Google seems to be very aware it and is eager to be part of whatever solution emerges. One presumes this would be a large-scale collaborative effort of some sort, akin to a Wiki.

Actually, your list of &quot;500 Semantic Web Tools&quot; is probably as close as anything I could name to what the Semantic Web would do to break up the search engine logjam. In one version of a solution, many individual lists like your &quot;500 SemWeb Tools&quot; would be linkable by pre-defined semantic classifiers and tags. The function of the search engine would be to integrate across the many more or less static references compiled by many people on a given subject.  Something like FOAF sharing of resources.  Maybe ...  

Thanks for another interesting article.

- Bill</description>
		<content:encoded><![CDATA[<p>A heroic effort to quantify a very complex phenomenon.</p>
<p>A separate but related issue is the rankings, how a reference to content on a particular site gets to the top of the list.  Even if something interesting appears toward toward the top, how likely is it that someone has the patience to get through the 900 or so references listed?</p>
<p>Which raises the issue of Google faking us out about what we are seeing in the results &#8230; I&#8217;ve never been able to get past reference 900, nothing seems to exist beyond the first 900 references despite the 9,373,622 hits claimed by the engine.</p>
<p>For example, using the very general search term &#8220;financial investments&#8221; the Google engine reported 59,600,000 results.  In this case, dividing the 900 visible results by the 59,600,000 total results, what I&#8217;m seeing on the Google results page is only about .0015% of everything purported to be in the index.</p>
<p>One should acknowledge the implicit assumption that each result is unique, rarely true of course.  If .15% of the unique results are dispaly by Google, one is actually seeing is still only about .0015% of everything out there.  By what ever method one computes it, the real ratio of references in surface web to the deep web is something like one in many thousands &#8230;  </p>
<p>So, search engines have become a serious &#8220;knowledge bottleneck&#8221;, to use an old fashioned AI term.  In fact, Google seems to be very aware it and is eager to be part of whatever solution emerges. One presumes this would be a large-scale collaborative effort of some sort, akin to a Wiki.</p>
<p>Actually, your list of &#8220;500 Semantic Web Tools&#8221; is probably as close as anything I could name to what the Semantic Web would do to break up the search engine logjam. In one version of a solution, many individual lists like your &#8220;500 SemWeb Tools&#8221; would be linkable by pre-defined semantic classifiers and tags. The function of the search engine would be to integrate across the many more or less static references compiled by many people on a given subject.  Something like FOAF sharing of resources.  Maybe &#8230;  </p>
<p>Thanks for another interesting article.</p>
<p>- Bill</p>
]]></content:encoded>
	</item>
</channel>
</rss>

