s t o r e ALL OF (unrestricted-access) WWW pages FOREVER ;
the second a commercial outfit developing tools to browse and
reuse such cumulative/ multi-generation archive contents.
Acc. to their owner Brewster Kahle --formerly of the Thinking
Machines Corp., and a father of WAIS-- one of the target functions
of Alexa-derived software is to be a `"reliability service" that
will resurrect dead links. Give the URL and an approximate date
to the Archive, and it will dig up the document.'..... rings a
bell, doesn't it?
The Alexa archives are made of successive sweep-n-suck (BIIIG
sucks, too) sessions of the entire WWW dataspace resulting in
consecutive "frozen Webs" stored at one location -- currently
a warehouse in SF; ultimately in the digital storage facility of
the US National Archives in Washington, D.C. Treating an entire
docuverse as a collection of "barts" (or "stamps", I keep mixing
them up) may sound like a bit of overkill, but whoever said that
the (yellow brick) road to Xanadu must be straight and narrow?
__Ian
Based on Paul Bissex' article at:
______________________________________________
http://webreview.com/97/05/09/edge/index2.html
> [...] whereas keyword search engines [AltaVista etc]
> store an index to the Web, the Archive consists of a
> copy of the Web itself. Kahle estimates the current
> size of the Web at about two terabytes (that's two
> million megabytes). Having completed two full sweeps
> of the Web, the Archive now contains about four
> terabytes of data. A recent upgrade of the Archive's
> connection from two T1 lines to a full T3 brings
> a welcome 15-fold increase in bandwidth, meaning
> that future Web "snapshots" will be conducted much
> faster than the first two. With some researchers
> estimating the average life of a Web page at 75 days,
> speed matters.