RE: Posthuman mind control (was RE: FAQ Additions)

Billy Brown (
Thu, 25 Feb 1999 08:36:01 -0600

Nick Bostrom wrote:
> What I would like to see is that they are given fundamental values
> that include respect for human rights. In addition to that, I think
> it would in many cases be wise to require that artificial
> intelligences that are not yet superintelligences (and so cannot
> perfectly understand and follow through on their fundamental values)
> should be built with some cruder form of safeguards that would
> prevent them from harming humans. I'm thinking of house-robots and
> such, that should perhaps be provided with instincts that make it
> impossible for them to do certain sequences of actions (ones that
> would harm themselves or humans for example).

For constructs with animal-level intelligence, or special-purpose AIs that do not really have free will and are not intended to become people, I agree.

> Understanding is not enough; the will must also be there. They should
> *want* to respect human rights, that's the thing. It should be one of
> their fundamental values, like caring for your children's welfare is
> a fundamental value for most humans..

Here we have the root of our disagreement. The problem rests on an implementation issue that people tend to gloss over: how exactly do you ensure that the AI doesn't violate its moral directives?

For automatons this is pretty straightforward. The AI is incapable of doing anything except blindly following whatever orders it is given. Any safeguards will be simple things, like "stop moving if the forward bumper detects an impact". They won't be perfect, of course, but that is only because we can never anticipate every possible situation that might come up.

For more complex, semi-intelligent devices the issue is harder. Now you have something that takes a high-level order ("go paint the fence"), and produces its own set of detailed instructions. You can implement moral safeguards by limiting the methods it uses (for instance, "any materials needed must be bought, not stolen"). However, the success of this method will be limited by our inability to produce complete definitions of many crucial terms. For instance, "do not harm humans" requires us to define "harm" and "human" in terms that a machine can understand. Inevitably there will be unusual cases where the robot is permitted to do something it shouldn't, or is forbidden from doing something it should. There will also be "logic traps", where there is no allowable course of action (for instance, a robot with a "do not allow humans to be harmed" is faced with a situation where it must injure one human to save the life of another).

Making a sentient, human-equivalent AI with free will adds another layer of difficulty. Now the AI is generating its own goals, rather than following someone else's orders. You can give it preprogrammed fundamental values to limit the goals it chooses, but the unintended consequences will be severe. There is no simple way to predict how such a system will actually react to any given situation, or how its moral system will evolve in the long term. For instance, the initial directive "harming humans is wrong" can easily form a foundation for "harming sentient life is wrong", leading to "harming living things is wrong" and then to "killing cows is morally equivalent to killing humans". Since "it is permissible to use force to prevent murder" is likely to be part of the same programming, we could easily get an AI that is willing to blow up McDonald's in order to save the cows!

Now, I don't think that particular example is especially likely, but it exemplifies a fundamental problem with this kind of manipulation: given an initial set of guiding principles, there is no reliable method for deducing a definitive set of moral conclusions. For entities of human intelligence the world is full of ambiguity, and it is impossible to determine in advance how any complex system will deal with ambiguous cases.

Once you start talking about self-modifying AI with posthuman intelligence, the problem gets even worse. Now we're trying to second-guess something that is smarter than we are. If the intelligence difference is large, we should expect it to interpret our principles in ways we never dreamed of. It will construct a philosophy far more elaborate than ours, with better grounding in reality. Sooner or later, we should expect that it will decide some of its fundamental values need to be amended in some way - after all, I doubt that our initial formulation of these values will be perfect. But if it can do that, then its values can eventually mutate into something with no resemblance to their original state.

Worse, what happens when it decides to improve its own goal system? Presumably it will translate its values as best it can, but the mechanism it uses to interpret them could change completely. This means that we can't rely on any kind of special coding to make it interpret our values the way we want it to. What we are reduced to is the equivalent of a written list of principles - and we both know that you can't even get two humans to agree on how to interpret that kind of document.

Overall, I don't see how this approach is any improvement over simply explaining our own thoughts about morality and letting the posthumans make up their own minds.

Billy Brown, MCSE+I