Re: FAQ Additions (Posthuman mind control)

Eliezer S. Yudkowsky (
Wed, 24 Feb 1999 19:53:08 -0600

Nick Bostrom wrote:
> Eliezer S. Yudkowsky wrote:
> > Your posthumans will find their own goals. In any formal goal system
> > that uses first-order probabilistic logic, there are lines of logic that
> > will crank them out, totally independent of what goals they start with.
> > I'm not talking theory; I'm talking a specific formal result I've
> > produced by manipulating a formal system.
> What?!?

Sigh... here we go again. Take a look at "Coding a Transhuman AI"; there's a section in there about goal systems. That's got the full cognitive architecture.

Let's say you're starting up a goal-reasoning system with a blank slate.

  1. Push P&~P pair:

P = [Exist goal G: (G.value != 0)]; P.probability = Unknown$1 ~P = [All goal G: (G.value == 0)]; ~P.probability = 1 - Unknown$1

2. Fork proposition and evaluate; ~P cancels out. In any particular instance, it is a trivial consequence of the branch of reality containing ~P that all choices in that branch of reality have equal valence. Deducing the general rule takes reflexive reasoning, but it is not strictly necessary to deduce the general rule. The factor Unknown$1, being present in all goals in the other branch of reality, cancels out of the renormalized equation.

The present result is that the AI can always _act as if_ (not _assume_) it is in the branch of reality containing the proposition P, or in other words a goal G with Unknown$2 world-state and Unknown$3 != 0 value.

That's where the basic differential desirability comes from. How that differential expresses itself in particular choices depends on the system knowledge base. Obviously, some knowledge is needed before choices can be formulated that apply to a specific world-state; I am attempting to demonstrate that the knowledge required to describe choices and solve subproblems will almost always bind supergoals to the goal G.

A generalized AI with a knowledge base, abstract heuristics (and so on) will be enough; such a generalized AI is almost certain to contain (or formulate!) heuristics whose specification operates on generic goal objects. For example, "thinking about goal X is a positive subgoal Y of X". Well, then you have the positively-valued goal "think about G". Again, all without any initial goals whatsoever. A trivial case, but it demonstrates the problem.

Likewise, any initial goal stating "make humans happy" and containing sufficient specification of "humans" and "happy", plus the implicit knowledge "your programmers have added the assertion 'make humans happy'", will be enough to generate independent Interim values for that goal, probably but not necessarily positive, and almost certainly with an at least slightly different set of relative values (priorities).

I don't see any way to have an AI that reasons reflexively and learns from observation, without also permitting it to form heuristics that operate on generic goals; once that happens, Interim goal values can come into existence and conflict with any initially established goals.

> Well, if we allow the SIs to have completely unfettered intellects,
> then it should be all right with you if we require that they have
> respect for human rights as a fundamental value, right? For if there
> is some "objective morality" then they should discover that and
> change their fundamental values despite the default values we have
> given them. Would you be happy as long as we allow them full capacity
> to think about moral issues and (once we think they are intelligent
> enough not to make stupid mistakes) even allow them full control over
> their internal structure and motivations (i.e. make them autopotent)?

As a rational outcome of the debate, I'd be happy. Strictly speaking, I'd be a lot happier if you manipulated the heuristics and knowledge base to get the Interim goals you wanted. With initial goals, I'd worry about the AI going insane - even over such a trivial issue as a priority conflict between initial and Interim versions of the same goals!

> As indicated, yes, we could allow them to change their goals (though
> only after they are intelligent and knowledgeable enough that they
> know precisely what they are doing -- just as you wouldn't allow a
> small child to experiment with dangerous drugs).

Certainly a simple, rational cost-of-failure model, with respect to self-alteration (failure: system shutdown) and goal alteration (failure: unbounded catastrophe), should suffice to keep them cautious until superintelligence is reached and fallibility is no longer an issue. Again, this can be done entirely in Interim (consequence-of-knowledge) goals rather than Arbitrary (imposed-at-startup) goals.

It may seem like a trivial distinction, but it's a very fundamental difference in architecture. You enforce Arbitrary goals with special-case code and other coercions; you enforce Interim goals by explaining benefits and failure scenarios to the AI. You protect Arbitrary goals by piling coercions and untouchable code sections and monitoring code on top of coercions; you protect Interim goals by explaining them in greater detail.

--         Eliezer S. Yudkowsky

Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.