Re: Galaxy brain problem

Dan Clemmensen (Dan@Clemmensen.ShireNet.com)
Mon, 18 Aug 1997 18:53:41 -0400


Anders Sandberg wrote:
>
> On Sat, 16 Aug 1997, Steve Witham wrote:
>
> > Remember, you don't have to duplicate all the information unless you're
> > worried about losing *all* (and only one copy of) the information. To
> > recover from one bit lost out of m, you need something like log2(m) extra
> > check bits. As the number of errors you want to be safe from approaches
> > infinity (it could be > m) the checkbits per loseable bit goes down toward
> > one. RAID experts correct me if I'm wrong.
>
> Do you have any more information or references on this? I'm just
> trying to finish may Jupiter Brain paper, and if this holds I'll have
> to adjust some of my equations. Unfortunately I slept through error
> correcting codes in combinatorics, I liked generating series too
> much.
>

Look up "Hamming distance" in any computer hardware text. Conceptually,
Its fairly straigtforward, but the details are a bit awkward. Caution:
theory and practice diverge considerably in current practice: about
half of all attempts to recover from backups fail due to procedural
mishaps, because recovery is a rare operation and the administrators
are not used to it. Funny thing: in the 1960's, Burroughs mainframes
had extremely reliable recovery mechanisms that nearly always worked.
They had to, because the hardware was unreliable enough that recovery
was a relatively frequent operation. I think you need to factor this
into you calculations. For example, you may want to run your galaxy
with the ability to detect errors in pairs instead of just single
errors (in each block) then, you can continously induce single errors
to verify that the backup system is working. This continuous effort will
occupy resources over and above the resourced needed for the redundant
data
encoding.