RE: Posthuman mind control (was RE: FAQ Additions)

Nick Bostrom (
Fri, 26 Feb 1999 01:15:54 +0000

Billy Brown wrote:

> Here we have the root of our disagreement. The problem rests on an
> implementation issue that people tend to gloss over: how exactly do you
> ensure that the AI doesn't violate its moral directives?

So we have now narrowed it down to an implementation issue. Good.

> For automatons this is pretty straightforward. The AI is incapable of doing
> anything except blindly following whatever orders it is given. Any
> safeguards will be simple things, like "stop moving if the forward bumper
> detects an impact". They won't be perfect, of course, but that is only
> because we can never anticipate every possible situation that might come up.
> For more complex, semi-intelligent devices the issue is harder.


Paradoxically, I think that when we move up to the level of an SI this problem gets easier again, since we can formulate its values in a human language.

> For instance, the initial directive "harming humans is wrong" can easily
> form a foundation for "harming sentient life is wrong", leading to "harming
> living things is wrong" and then to "killing cows is morally equivalent to
> killing humans". Since "it is permissible to use force to prevent murder"
> is likely to be part of the same programming, we could easily get an AI that
> is willing to blow up McDonald's in order to save the cows!

That kind of unintended consequences can be easily avoided, it seems, if we explicitly give the SI the desire to interpret all its values the way its human creators intended them.

> Once you start talking about self-modifying AI with posthuman intelligence,
> the problem gets even worse. Now we're trying to second-guess something
> that is smarter than we are. If the intelligence difference is large, we
> should expect it to interpret our principles in ways we never dreamed of.
> It will construct a philosophy far more elaborate than ours, with better
> grounding in reality. Sooner or later, we should expect that it will decide
> some of its fundamental values need to be amended in some way - after all, I
> doubt that our initial formulation of these values will be perfect.

Question: What are the criteria whereby the SI determines whether its fundamental values "need" to be changed?

> But if
> it can do that, then its values can eventually mutate into something with no
> resemblance to their original state.
> Worse, what happens when it decides to improve its own goal system?

Improve according to what standard?

Nick Bostrom Department of Philosophy, Logic and Scientific Method London School of Economics