Re: AI:This is how we do it

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Tue Feb 19 2002 - 05:07:19 MST


Zero Powers wrote:
>
> See Eli's earlier response to this thread. He cites a number of his
> articles claiming that not only will AI be self-aware, it will be friendly
> to your interests, even over its "own."

I assume (I hope) that the scare-quotes are there to show that what we would
*assume* to be the AI's "own" interests from an anthropomorphic perspective
are not the AI's actual interests, and that vis actual interests are
Friendliness.

Zero Powers wrote:
>
> My point (which seems to have gotten lost in the shuffle) is that once you
> have a super-human AI that learns and processes information *way* faster
> than we do (particularly one that is self-enhancing and hence learns at an
> exponentially accelerating rate) that it will be impossible, either by
> friendly "supergoals" or otherwise, to keep the AI from transcending any
> limits we might hope to impose on it. Which will lead to us being
> completely at its mercy.

The same, of course, applying to a transcending human. The essential
inscrutability of the Singularity means that our understanding of Friendly AI
is essentially limited to whether the seed, heading into the Singularity, is
*as good as a human*. Human complexity is scrutable; transhuman complexity
isn't.

> My point in a nutsell: friendliness cannot be imposed on one's superior.
> Genes tried it, and made a good run of it for quite a while. Increasing our
> intelligence made our genes ever more successful than the competitors of our
> species. But, as our genes found out, too much of a good thing is a bad
> thing. We now pursue gene-imposed subgoals (sex, for instance) while
> bypassing completely the supergoals (i.e., kids) at our whim.
>
> I've still not heard any sound argument on how we can prevent the same thing
> from happening to us and our "supergoals" once the AI is our intellectual
> superior.

It may be reassuring (or not) to realize that the means by which we resist
evolution is itself evolved, only on a deeper level. We are imperfectly
deceptive social organisms who compete by arguing about each other's motives;
that is, we are political organisms. We have adaptations for arguing about
morality; that is, in addition to our built-in evolutionary morality, we also
have dynamics for choosing new moralities. In the ancestral environment this
was, I suspect, a relatively small effect, amounting to a choice of
rationalizations. However, any seed AI theorist can tell you that what
matters in the long run isn't how a system starts out, it's how the system
changes.

So of course, our dynamics for choosing new moralities are starting to
dominate over our evolutionary moralities, due to a change in cognitive
conditions: Specifically, due to: (1) an unancestrally large cultural
knowledge base (stored-up arguments about morality); (2) an unancestrally good
reflective model of our own mentality (timebinding historical records of
personalities, and (more recently) evolutionary psychology); (3) an
unancestral technological ability to decouple cognitive supergoals that are
evolutionary subgoals from their evolutionary rationales (i.e. contraception).

(3) in particular is interesting because the way in which it came about is
that evolution "instructed" us to do certain things without "telling us why".
We won against evolution because evolution failed to treat us as equals and
take us into its confidence (he said, anthropomorphizing the blind actions of
an unintelligent process with no predictive foresight).

Zero Powers wrote:
>
> My answer is that, so far as I comprehend it, your
> supergoal/subgoal explanation does not resolve the problem.

No, it's meant to clarify the question. The resolution consists of a lot more
theory than that bound up in cleanly causal goal systems (i.e., with subgoals
and supergoals). For a fast summary of the design theory that unfortunately
leaves out most of the reasoning, check out "Features of Friendly AI" at:

http://singinst.org/friendly/features.html

> My understanding of your theory is that, so long as the AI's sub-goals are all
> in service of friendly supergoals, we have nothing to fear. But, I'm making
> a few assumptions, and here they are:
>
> 1. The AI will be curious (either as a sub- or supergoal);
>
> 2. The AI will have "general intelligence" (such that it can evaluate a huge
> decision tree and choose which among the myriad branches will best meet its
> goals);
>
> 3. It will be self-aware (such that it has an accurate picture of the world,
> including its own place in that picture).
>
> If any of those assumptions is wrong, then you can forget everything I've
> said before. But, if each of those assumptions is correct, then they
> inherently conflict with your concept of a friendly AI.

My guess is that you're about to say that a self-aware AI would be capable of
redesigning its own goal system in accordance with its own interests.
Friendly AI, since it's supposed to be human-complete, allows for the
possibility that programmer-infused supergoals can be wrong under certain
cases that we would regard as intuitive (we can conceive of the programmer
saying "Oops", and the Friendly AI needs to understand this too).

Without this *additional complexity*, I fear that goals would be
automatically, eternally, and absolutely stable, even in cases that we would
regard as boring and stupid and wrong; because the design purpose of the goal
system is viewed in terms of the referent of the current goal system, and any
design change to the current goal system almost automatically goes against the
referent of the current goal system. Friendly AI absorbs the human intuition
that supergoals can be "mistaken" by viewing the current goal system as a
probabilistic approximation of the goal system's referent, and distinguishing
between the goal system and its referent.

-- -- -- -- --
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 13:37:40 MST