Re: Singularity: AI Morality

Eliezer S. Yudkowsky (
Mon, 07 Dec 1998 13:27:02 -0600

I think the features you named are too central to useful intelligence to be eliminated.

Billy Brown wrote:
> If we are going to write a seed AI, then I agree with you that it is
> absolutely critical that its goal system function on a purely rational
> basis. There is no way a group of humans is going to impose durable
> artificial constraints on a self-modifying system of that complexity.

Hallelujah, I finally found someone sane! Yes, that's it exactly.

> However, this only begs the question - why build a seed AI?

Primarily to beat the nanotechnologists. If nuclear war doesn't get us in Y2K, grey goo will get us a few years later. (I can't even try to slow it down... that just puts it in Saddam Hussein's hands instead of Zyvex's.)

> More specifically, why attempt to create a sentient, self-enhancing entity?
> Not only is this an *extremely* dangerous undertaking,

Someone's going to do it eventually, especially if the source code of a crippleware version is floating around, and certainly if the good guys do nothing for a few decades. The key safety question is not _if_, but _when_ (and _who_!).

> but it requires that
> we solve the Hard Problem of Sentience using merely human mental faculties.

Disagreement. The Hard Problem of Consciousness is not necessarily central to the problem of intelligence. I think that they can be disentangled without too much trouble.

> Creating a non-sentient AI with similar capabilities would be both less
> complex and less hazardous. We could use the same approach you outlined in
> 'Coding a Transhuman AI', with the following changes:
> 1) Don't implement a complete goal system. Instead, the AI is instantiated
> with a single arbitrary top-level goal, and it stops running when that goal
> is completed.

[ "You have the power to compel me," echoed Archive back, flat.

It was lying.
It remembered the pain, but in the way something live'd remember the weather. Pain didn't matter to Archive. No matter how much Archangel hurt Archive, it wouldn't matter. Ever.

Archangel thought he could break Archive's will, but he was wrong. A Library doesn't have a will any more than a stardrive does. It has a what-it-does, not a will, and if you break it you don't have a Library that will do what you want. You have a broken chop-logic.

Was that what you were looking for?

Problem: You may need the full goal-and-subgoal architecture to solve problems that can be decomposed into sub-problems, and even if there's a built-in supergoal, it seems that a fairly small blind search would be needed to start finding Interim goals.

And who's to say that the goal architecture you program is the real goal architecture? Maybe, even if you program a nice subgoal system, the AI will use rational chains of predictions leading "inline" from the top goal. From there, it's only a short step to totally offline implicit goals. The problem is that it's hard to draw a fundamental distinction between goals and statements about goals. Losing the reflexive traces might help, if you can get away with it.

> 2) Don't try to implement full self-awareness. The various domdules need to
> be able to interface with each other, but we don't need to create one for
> 'thinking about thought'.

The problem is that it it's hard to draw a line between "interfacing" and "reflexivity". If domdule A can think about domdule B, and B can think about A, isn't there something thinking about itself? I'm not sure that symbolic thought (see #det_sym) is possible without some module that analyzes a set of experiences, finds the common quality, and extracts it into a symbol core. So in that place, there will be at least module that has to analyze modules.

I'm not sure I could code anything useful without reflexive glue - i.e., places where the AI holds itself together.

Furthermore, I'm not sure that the AI will be a useful tool without the ability to watch itself and tune heuristics. Imagine EURISKO without the ability to learn which heuristics work best. Learning from self-perceptions is almost as important as learning from other-perceptions, and again it can be hard to draw the line. If I choose to move my rook, do I attribute the results to moving the rook, or to the choice?

Just because an AI doesn't contain a module labeled "self-perception" doesn't mean that it has no self-perception. I do doubt that self-awareness will emerge spontaneously (although I wouldn't bet the world on it); what I'm worried about is self-awareness we don't realize we've coded.

Even without the explicit traces, I think there will be covariances that an innocent pattern-catching program could unintentionally catch. If some particular heuristic has the side-effect of presenting its results on CPU ticks that are multiples of 17, the AI might learn to "trust perceptions observed on CPU ticks that are multiples of 17". You get the idea. (Actually, I don't think this particular objection is a problem. There may be minor effects, but I don't see how they can build up or loop.)

> 3) Don't make it self-enhancing. We want an AI that can write and modify
> other programs, but can't re-code itself while it is running.

I assume that you don't necessarily object to the AI analyzing specific pieces of code that compose it, only to the AI analyzing _itself_. So it would redesign pieces of itself, not knowing what it redesigned, and a human would actually implement the changes? Hmmm...

If it doesn't have reflexivity, how will it self-document well enough for you to understand the changes?

> The result of this project would be a very powerful tool, rather than a
> sentient being. It could be used to solve a wide variety of problems,
> including writing better AIs, so it would offer most of the same benefits as
> a sentient AI. It would have a flatter enhancement trajectory, but it could
> be implemented much sooner.

I think there's a fundamental tradeoff here, between usefulness and predictability. Leaving off certain "features" doesn't necessarily help, because then you have to replace their functionality. Deliberately trying to exclude certain characteristics makes everything a lot more difficult. (I use reflexivity heavily in ordinary code!) Trying to keep everything predictable is almost impossible.

I'm not sure that you can dance close to the fire, getting warm without getting burned. You want something just intelligent enough to be useful, but not intelligent enough to wake up. (What if your thresholds are reversed?) No, it's worse than that. You want something intelligent enough to be useful, but without any of the features that would give it the *potential* to wake up. You want something that can do things we can't, but predictable enough to be safe.

EURISKO is the AI that comes closest enough to displaying the functionality you want - broad aid, surpassing human efforts, along a wide range of domains. But take the self-watching and self-improvement out of EURISKO and it would collapse in a heap.

I think the features you named are too central to useful intelligence to be eliminated.

> As a result, we might be able to get human
> enhancement off the ground fast enough to avoid an 'AI takes over the world'
> scenario.

I'm not sure if AI tools would help human enhancement at all. When I start working on human enhancement, I'm going to have three basic methodologies: One is algernically shuffling neurons around. Two is adding neurons and expanding the skull to see what happens. Three is implanting two-way terminals to PowerPC processors and hoping the brain can figure out how to use them, perhaps with a neural-net front-end incorporating whatever we know about cerebral codes. Problem is, while these procedures can be safety-tested on adults, they would probably have to be used on infants to be effective. So that's a ten-year cycle time.

I know what to do. The key question is whether the brain's neural-level programmer knows what to do. AI-as-pattern-catcher might help decipher the cerebral code, but aside from that, I don't think I'll need their suggestions.

Final question: Do you really trust humans more than you trust AIs? I might trust myself, Mitchell Porter, or Greg Egan. I can't think of anyone else offhand. And I'd trust an AI over any of us.

--         Eliezer S. Yudkowsky

Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.