Re: web snapshot

From: Eugene Leitl (
Date: Thu Jan 04 2001 - 06:18:35 MST

On Wed, 3 Jan 2001, Spike Jones wrote:

> You know, a technique for archiving the entire content of
> the web on any given day in history would be valuable as all
> hell. Have we not all read something somewhere on the web

Content tends to fill out all available storage space.
I'm certain people will find ways of filling cubic feet of
of molecular tape with something -- especially when machines
become people, and start producing content of their own.

Even with date stamps and version control you can't expect other people to
provide reliable storage (obsolete content is usually pulled down
regularly), and you certainly can't afford the storage space yourself.
Google runs a Linux cluster of about 6 k machines, and hence can afford
storing not only the full text index but also caching part of the content.
Even if there was a way of storing content snapshot in a central database
(you won't be able to do so reliably in one maintained by end customers,
since they'd object wasting the storage they paid for with their money),
there's no way how you'd do scale this up to include multiple versions
which reach all the way back to the begin of the web.

> some time in the past and now we cannot find it? Think of
> the post singularity historians, trying to dig thru all the web
> content trying to reconstruct it all. Looks like an opportunity
> begging for some sharp computer guru to make a ton of
> money by inventing such a thing. spike

The business model being that someone from Far Future will fork over an
equivalent of a major wad of future currency for your content database?
(Assuming that future needs historians, and that these have resources).

This archive was generated by hypermail 2b30 : Mon May 28 2001 - 09:56:16 MDT