From: Spudboy100@aol.com
Date: Wed Jun 18 2003 - 23:51:00 MDT
Have we been limited by Google?
http://www.microdoc-news.info/newsGoogle/2003/06/19.html#a719
<<
Thursday, 19 June 2003
<A HREF="http://www.microdocs-news.info/newsGoogle/2003/06/19.html#a719">Google Powered Search: How Google Edits the Web</A>
Google may be an amazing database of webpages, however, Google certainly is
not a comprehensive database. I wanted to find out how many webpages have <A HREF="http://www.google.com.au/services/index.html">
Google WebSearch</A> on them. Typically these display text similar to "Powered by
Google", although this is not always true. I soon learnt how much is edited out of
the web by Google through trying to build a comprehensive database of sites
carrying the "Powered by Google" label.
Narrowing the Web World
First of all, we need to understand the Google Inc., strategy of what in fact
it is building. In an article talking about the Grub Project [seeking to
index the whole of the web] Peter Norving, the Director of Search at Google Inc.,
describes the Google strategy. Peter Norving does not consider crawling the
web more, and getting more into the Google database is the problem. Rather he
considers the problem to be fundamentally how to narrow the index:
"Going from tens of thousands of machines to hundreds of thousands of
machines is fundamentally going to change the nature of search," Stechert said.
"Going to millions of machines allows us to ask, ‘What can we do with all that
computing power?’"
But Peter Norvig, director of search quality at Google, said while the Grub
project is topical and interesting; improving Web searches isn't a problem of
widening an index, but narrowing it.
[Wired]
AND how they have narrowed it is surely interesting. Look at this example of
what we found when we tried to build a comprehensive list of sites bearing the
words "Powered by Google".
Our Search for "Powered by Google"
First we searched in Google using the following search query:
<A HREF="http://www.google.com/search?q=search+%22powered+by+google%22+-translate+-google-com&num=100&hl=en&lr=lang_en&ie=UTF-8&oe=utf-8&safe=off&as_qdr=y&start=900&sa=N&filter=0-google-com</A>
restricted to one year and in English
At the time of writing this article there were 713,000 results indicated in
the Google information bar.
Next we started working our way down the list to see each of the results. We
wanted to be sure that a Google WebSearch box was on each of the sites listed.
We excluded any pages in the Google domain and any sites that displayed
Translation Powered by Google.
NOTE THIS: Even though Google has 713,000 results for this search, you and I
can only see 998 of them. Google physically will not serve any more pages up
than that for that particular search. Even though there are another 712,002
results for that search, we cannot see any more!! Google simply just does not
serve any more pages past that number.
AllTheWeb Behavior
We turned to see whether AllTheWeb, a search engine in competition with
Google, would reveal anything more. Our search in AllTheWeb was a similar search:
<A HREF="http://www.alltheweb.com/search?t=adv&q=search+AND+%22powered+by+google%22+ANDNOT+%22google.com%22&c=web&cs=iso-8859-1&ics=iso-8859-1&o=3900&h=100&l=en&no=on&qtf=n&av=1&wf%5Bn%5D=4&wf%5B0%5D%5Br%5D=%2B&wf%5B1%5D%5Br%5D=-&wf%5B1%5D%5Bq%5D=google.com&wf%5B1%5D%5Bwr%5Bm%5D=1&dfr%5By%5D=1980&dto%5Bd%5D=18&dto%5Bm%5D=6&dto%5By%5D=2003">search AND "powered by google" ANDNOT "google.com"</A>
Even though AllTheWeb reveals that it has only 477,231 results in its
database, AlTheWeb actually displays more results to us than Google. You are able to
see 4,000 results of the 477,231 possible results. We actually learnt more
about the "Powered by Google" phrase by researching the results from AllTheWeb
than we did from the Google results page. There was more to research...>>
This archive was generated by hypermail 2.1.5 : Thu Jun 19 2003 - 00:02:32 MDT