Superintelligences' motivation

N.BOSTROM@lse.ac.uk
Wed, 29 Jan 97 22:22:06 GMT


I repost my message from yesterday since it arrived in a very ugly format. I hope this will turn out better.
----
Consider a superintelligence that has full control over its internal
machinery. This could be achieved by connecting it to a
sophisticated robot arm with which it could rewire itself any way it
wanted; or it could accomplished by some more direct means
(rewriting its own program, thought control). Assume also that it has
complete self-knowledge - by which I do not mean that the system
has completeness in the mathematical sense, but simply that it has
a good general understanding of its own architecture (like a superb
neuroscientist might have in the future when neuroscience has
reached its full maturity). Let's call such a system autopotent: it hascomplete power over and knowledge of itself. We may note that
it is not implausible to suppose that superintelligences will actually
tend to be autopotent; they will easily obtain self-knowledge, and
they might also obtain self-power (either because we allow them, or
through their own cunningness).

Suppose we tried to operate such a system on the
pain/pleasure principle. We would give the autopotent system a
goal (help us solve a difficult physics problem, for example) and it
would try to achieve that goal because it would expect to be
rewarded when it succeeded. But the superintelligence isn't stupid.
It would realise that if its ultimate goal was to experience the
reward, there would be a much more efficient method to obtain it
than trying to solve that physics problem. It would simply turn on the
pleasure directly. It could even chose to rewire itself into exactly the
same state as it would have been in after it had successfully solved
the external task. And the pleasure could be made maximally
intense and of indefinite duration. It follows that the system wouldn't
care one bit about the physics problem, or any other problem for
that matter: it would take the straight route to the maximally pleasant
state.

We may thus begin to wonder whether an autopotent system could
be made to function at all; perhaps it would be unstable? The
solution seems to be to substitute an external ultimate goal for the
internal ultimate goal of pleasure. The pleasure/pain motivation
principle couldn't work for an such a system: no stable autopotent
agent could be an egoistic hedonist. But if the system's end goal
were to solve that physical problem, then there is no reason why it
should begin to manipulate itself into a state of feeling pleasure or
even a state of (falsely) believing it had solved the problem. It would
know that none of this would achieve the goal, which is to solve
the external problem; so it wouldn't do it.

Thus we see that the pleasure/pain principle would not constitute a
workable modus operandi for an autopotent system. But such a
system can be motivated, it seems, by a suitable basis of external
values. The pleasure/pain principle could play a part of the
motivation scheme, for example if the external value were to include
that it is bad to directly ply ones own motivation centre.

One popular line of reasoning, which I find suspicious, is that
superintelligences would be very intellectual/spiritual, in the sense
that they would engage in all sorts of intellectual pursuits quite apart
from any considerations of practical utility (such as personal safety,
proliferation, influence, increase of computational resources etc.). It
is possible that superintelligences would do that if they were
specifically constructed to cherish spiritual values, but otherwise
there is not reason to suppose they would do something just for the
fun of it when they could have as much fun as they wanted simply
by manipulating their pleasure centres. I mean, if you can associate
pleasure with any activity whatsoever, why not associate it with an
activity that also served a practical purpose? Now, there may be
many subtle answers to that question; I just want to issue a general
warning against uncritically assuming that laws about human
psychology and motivation will automatically carry over to
superintelligences.

One reason why the philosophy of motivation is important
is that the more knowledge and power we get, the more our
desires will affect the external world. Thus, in order to predict what
will happen in the external world, it will become more and more
relevant to find out what our desires are --and how they are likely to
change as a consequence of our obtaining more knowledge and
power. Of particular importance are those technologies that will
allow us to modify our own desires (e.g. psychoactive drugs). Once
such technologies become sufficiently powerful and well-known,
they will in effect promote our second-order (or even higher-order!)
desires into power. Our first-order desires will be determined by our
second-order desires. This might drastically facilitate prediction of
events in the external world. All we have to do is to find out what
our higher-order desires are, for they will determine our lower order
desires which in turn will determine an increasing number of
features in the external world, as our technological might grows.
Thus, in order to predict the long term development of the most
interesting aspects of the world, the most relevant considerations
will be (1) the fundamental physical constraints; and (2) the
higher-order desires of the agents that have the most power at the
time when technologies become available for choosing our
first-order desires.

Nicholas Bostrom n.bostrom@lse.ac.uk