Re: Informed consent and the exoself

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Mon Feb 21 2000 - 20:12:06 MST


Dan Fabulich wrote:
>
> Why not code in some contingent answers, more or less of the form: "Maybe
> X is right," where X is one of our moral intuitions, but leave the total
> probability that our moral intuitions are right far short of 100%.

Actually, my current take calls for a comparative, rather than a
quantitative, goal system. So under that system, you'd put in the
statement "The humans want you to do X". All else being equal, this
statement is the only consideration; all else not being equal, it can be ignored.

The other half of the problem is phrasing the suggestions in such a way
that they aren't crystalline; i.e., so they don't lead to the
stereotypical "logical but dumb" conclusions. A human goal system is
composed of many different forces operating on approximately the same
level, having evolutionary roots that stretch back for thousands of
years; hence our resilience. In a sense, what we *want* to say to the
AI is completely explicable as the interaction of those forces; an AI
that can fully understand those forces should be able to understand what
we want. That doesn't mean we can just ask it to do what we want,
however; first and foremost, we want it not to be stupid. So it gets complicated.

> Anyway, there's some reason to believe that there's just no way that the
> computer will derive informed consent without a lot of knowledge and/or
> experience about the world around it, experience which we've had coded
> into our genes over the course of millions of years, and which we would be
> sitting ducks without. We're either going to have to wait millions of
> years for the computer to figure out what we've had coded in already, (and
> this time cannot be accelerated by faster computing power; these are facts
> gotten from experience, not from deduction) or else we're going to have to
> tell it about a few of our moral intuitions, and then let the AI decide
> what to make of them.

I don't think so. Richness doesn't have to be extracted from the
environment; it can as easily be extracted from mathematics or
subjunctive simulations.

The general problem is that human beings are stupid, both passively and
actively, and over the course of hundreds of thousands of years we've
evolved personal and cultural heuristics for "How not to be stupid".
The entire discipline of science is essentially a reaction to our
tendency to believe statements for ideological reasons; if an AI doesn't
have that tendency, will it evolve the rigorous methods of science?

My current take on the framework for a general solution is to ask the AI
to design itself so that it's resilient under factual and cognitive
errors; this resilience could be determined by evolutionary simulations
or conscious design or whatever the AI decides works best. The next
question is how to represent non-crystalline suggestions such that they
can be absorbed into the resulting framework. Certainly the cognitive
state and origins of the suggesting humans is a part of what is
represented, if not necessarily of the initial representation itself.

Given a resilient, antibodied, actively smart representation, it might
be possible to say: "Do what we want you to do, and don't do anything
stupid," and have it work. That statement can be fleshed out, i.e. by
saying "Pay more attention to the desires that are nonstupid according
to your resilient implementation," but if the AI is actively smart it
should be able to flesh out that statement for itself...

Anyway, it gets complicated.

-- 
       sentience@pobox.com      Eliezer S. Yudkowsky
          http://pobox.com/~sentience/beyond.html



This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 14:04:00 MDT