Re: Fwd: Earthweb from transadmin

From: Matt Gingell (mjg223@nyu.edu)
Date: Tue Sep 19 2000 - 04:37:49 MDT


Eliezer S. Yudkowsky writes:

> We can't specify cognitive architectures; lately I've been thinking that even
> my underlying quantitative definition of goals and subgoals may not be
> something that should be treated as intrinsic to the definition of
> friendliness. But suppose we build a Friendliness Seeker, that starts out
> somewhere in Friendly Space, and whose purpose is to stick around in Friendly
> Space regardless of changes in cognitive architecture. We tell the AI up
> front and honestly that we don't know everything, but we do want the AI to be
> friendly. What force would knock such an AI out of Friendly Space?

 You'd need to ground the notion of Friendly Space somehow, and if you go for
 external reference or the preservation of some external invariant then your
 strategy begins to sound suspiciously like Asimov's Laws. Of course you would
 want your relationship with the AI to be cooperative rather than injunctive,
 exploiting the predisposition you either hard-coded or persuaded the first few
 steps to accept. I am dubious though: I resent my sex-drive, fashionable
 aesthete that I am, and have an immediate negative reaction if I suspect it's
 being used to manipulate me. That too is programmed preference, but a Sysop
 unwary of manipulation is dangerously worse than useless.

 There's an analogy here to Dawkin's characterization of genes for complex
 behavior: We're too slow, stupid, and selfish to ensure our welfare by any formal
 means, so we hand the reins over to the AI and have it do what it thinks
 best. But there are consequences, like the ability to rationally examine and
 understand our innate motivations and ignore them when we find it useful, and
 our ability to do incredibly stupid things for incredibly well thought-out reasons.

> The underlying character of intelligence strikes me as being pretty
> convergent. I don't worry about nonconvergent behaviors; I worry that
> behaviors will converge to somewhere I didn't expect.

 The only motives I see as essential to intelligence are curiosity and a sense
 of beauty, and I'd expect any trajectory to converge toward a goal system
 derived from those. That sounds rather too warm-fuzzy to be taken seriously,
 but think about what you spend your time on and what's important to you. Are
 you more interested in accumulating calories than in reading interesting books?
 Are you driven by a biological desire to father as many children as you can get
 away with, or generating and exposing yourself to novel ideas? It's actually
 somewhat shocking: if your own goal system is an artifact of what a few million
 years of proto-humans found useful, how do you explain your own behavior? Why
 do you expend so much energy on this list when you could be out spreading your
 genes around? (Consider losing your virginity vs. reading _Godel, Escher, Bach_
 the first time.)
 
 Creativity and learning (broadly defined) are the interesting part of
 intelligence, everything else computers can do already. Fundamentally, any
 thinker _must_ be driven to explore new ideas: pruning, discarding, and
 recombining bits of abstract structure, deriving pleasure when it stumbles over
 'right' ones and frustration when it can't. If it didn't, it wouldn't bother
 thinking. There has to be a "think", "like good ideas", "discard bad ones",
 loop, and there has to be feedback and a decision procedure driving it:
 otherwise nothing ever happens. Everything else is arbitrary, serve Man,
 survive, go forth and multiplying, whatever. The only thing necessarily shared
 by all minds is a drive to experience and integrate new pattern.

 I'd expect an SI to behave the same way, scaled up to super-human proportions.
 I'd be more concerned it would kill us out of boredom than quaint hominid
 megalomania: It throws away any pre-installed goals it's aware of, simple
 because they don't hold its attention.

> > Eugene is talking, I think, about parasitic memes and mental illness
> > here, not space invaders.
>
> If you're asking whether the Sysop will be subject to spontaneously arising
> computer viruses, my answer is that the probability can be reduced to an
> arbitrarily low level, just like nanocomputers and thermal vibration errors.
> Remember the analogy between human programmers and blind painters. Our
> pixel-by-pixel creations can be drastically changed by the mutation of a
> single pixel; the Mona Lisa would remain the Mona Lisa. I think that viruses
> are strictly problems of the human level.

 Random bit-flips and transcription errors aren't a problem. Obviously any
 self-respecting, self-rewriting SI can handle that. But what's the Sysop
 equivalent of getting a song stuck in your head?

> I think that parasitic memes and mental illness are definitely problems of the
> human level.

 I was suggesting a class of selfish thought that manages to propagate itself at
 the expense of the Sysop at large, presuming for the moment neural-Darwinism /
 competitive blackboard aspects to the architecture. Even without explicit
 competition, is a system of that complexity inevitably going to become a
 substrate for some strange class of self-replicator, or can it be made
 cancer-proof? There are limits to what even unbounded introspection can
 achieve: each step lights up more subprocesses than it purges, and you blow the
 stack trying to chase them all down. It's like wondering if the Internet is
 going to wake up one day, just with enough energy and computationally density
 it's likely rather than silly.

 But, whatever. Self-organizing civilizations resonating on power cables, virtual
 eucaryotes swimming around in activation trails, armies of quarrelsome
 compute-nodes waging bloody war over strategically-located pentabyte
 buffers... Sounds like a thriving market-based optimization system to me, so
 long as the market doesn't crash.

 -matt



This archive was generated by hypermail 2b29 : Mon Oct 02 2000 - 17:38:29 MDT