From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sat Jul 12 2003 - 05:01:18 MDT
Robin Hanson wrote:
>
>> ... when you start telling me that people do not really want what they
>> say they want, I become worried for several reasons.
>> One, you're assuming that wanting-ness is a simple natural category,
>> which distracts attention away from the task of arriving at a
>> functional decomposition of decision-making into a surprisingly weird
>> evolutionary layer-cake with human icing on top.
>
> Doesn't your claim that people do want what they say they want assume
> just as much simplicity of a natural category?
Guilty. Rephrasing, I would say that when research subjects report on the
presence of a morally and emotionally valent belief, there probably really
is a belief there, and that belief as a piece of information is a natural
category with respect to the human mind's content. If that's the end of
the story, i.e., if people's observed behaviors are consistent with the
implications of that moral belief, then I'd probably identify that belief
as what people "want", at least for purposes of conversation, if not
Friendly AI. If anything more complicated is going on, then yes, I
oversimplified.
Look at how much damage has been done to the field of self-deception by
the epistemological disputes over whether people "really believe" p, or
~p, or whatever. It was a mistake made in founding the field, and a
predictable mistake, in the sense that I guessed before looking up the
subfield that that's what the original history of the field would look
like, whether or not people had moved beyond it as of 2003. Apparently,
Mele at least has caught on, but energy is still going into arguing
against the basic oversimplification of self-deception as involving
primary belief that p, and secondary belief that ~p.
If you specify what you mean by "belief" in terms of events and
information present in people's mental content, rather than by attempted
functional tests (which I would regard as secondary outcomes rather than
primary causes), then the self-deceived have a declarative false belief p
and accurately report its presence as their belief; they also have adapted
biases leading them to arrive at the declarative false belief p, rather
than motivational biases stemming from ~p represented in a hidden
secondary control center. So the false belief p is there, and the true
belief ~p is not - if you take "belief" as referring to the declarative
statements stored in memory, then people believe just p, and not ~p. But
really it would be much wiser not to invoke propositional logic to begin
with. The real events take place in a complex brain that does not
actually run on this kind of logic. If the people who started the debate
on self-deception had not defined it as simultaneous belief in p and ~p,
i.e., a secondary reported belief in p, and a primary hidden belief in ~p,
a lot of energy would have been spared, and the field's leaders could be
focusing on uncovering the cognitive mechanisms of self-deception rather
than arguing against the basic mistake made at the start of the debate.
>> Two, you're setting off people's cheater-detectors in a way that
>> invokes an implicit theory of mind that I think is oversimplified,
>> false-to-fact, and carves the mind at the wrong joints; ...
>
> I hope you are not blaming me for people's cheater-detectors having the
> wrong implicit theory of mind.
Cheater detectors have the right theory of mind with respect to tobacco PR
departments, and the wrong theory of mind with respect to adaptive
self-deception, so I'm blaming the analogy.
>> Three, you're making a preemptive philosophical judgment ...
>
> I'm making a judgement, but I'm not sure what you mean it is "pre" to.
> It is post consideration of many issues, but surely pre to more
> consideration.
True. I have no idea how much thought you've put into your philosophical
judgment. However, I do still think it's wrong. If I were writing the
same paper about the same evidence I would title it:
"How Evolution Deceives You To Prevent You From Living Up To Your Ideals."
Implicit assumptions in that title:
1: "You" are identified with your ideals.
2: Evolution is identified as an external, infringing force.
3: The deceptive part of self-deception is carried out by evolution as an
agent - any knowledge of p, used in constructing the lie ~p, is identified
as residing in evolution as the liar.
4: Ideals are to be "lived up" to; one's current actions are to be
interpreted as an imperfect approximation to one's ideals, in the same way
that human rationality is interpreted as being a Bayesian wannabe; one's
ideals can be determined to some degree independently of one's ability to
live up to them; failure to live up to your ideals does not imply that
your ideals are really something else.
>> Five, there's no good reason to mess with points 1-4 - they are
>> totally extraneous to the real substance of your theory. You can
>> declare yourself to be studying "adaptive gross inconsistencies in
>> moral belief and real actions", and get the benefit of intersection
>> with both evolutionary psychology and experimental psychology, without
>> ever needing to take a stance about what people "really" "want", or
>> presuming a particular functional decomposition of the mechanisms
>> involved. Modularize away those contrarian points...
>
> I agree we can consider these issues separately, but I am really
> interested in what people really want; it is basic to normative
> analysis, which economists do a lot of.
I think a full theory of volition is unnecessary overkill if you're
dealing with human-level economic issues. However, speaking from within
my unpublished thoughts on volition, there isn't a register in people's
minds that stores what they "really want". You can ask about what people
really *are*, and get back information about what emotional brainware they
really have, and what moral beliefs are really stored in their memory, but
to ask what people "really want" is a different order of question
entirely. The theory of volition I'm constructing to handle Friendly AI
doesn't have the concept of a "really want" that's hidden in the brain
somewhere; it has the idea of extrapolating someone's reactions and
determining the spread in them. If your reactions have a small amount of
spread, they may compound in such a way as to converge on a single
strongly determined answer to a given question. But if this is not the
case, then for a given problem, there may not yet be something that you
"really want" in the sense that you mean it.
>>> ... We've seen how people behave if briefly informed. We haven't
>>> seen your ideal of fully informed post-upheaval choice.
>>
>> When have we seen how people behave if briefly informed? How can you
>> briefly inform someone of something they don't believe to be true?
>
> For example, in the area of disagreement, we can see how people respond
> to being privately persuaded that persistent disagreement is irrational
> - they continue to disagree. We can see that people who are privately
> informed that they probably over-estimate their own abilities soon
> continue to do so.
I'd read this as: "'Briefly informing' people doesn't work." It
certainly wouldn't work in FAI - death with a filtered-out warning label
on it is not friendly.
>> ... I have to say that I was not surprised by my brief survey - it
>> looks pretty much like what I expected. Is there any particular area
>> or result of which you worry I am ignorant?
>
> In economics it goes by the name of "social welfare analysis".
Do you mean social choice theory?
>> But if I had a genie built using your definition of "wanting", I would
>> never, ever make a wish to it.
>
> You would apparently also not wish for it to make you happy, or to make
> the choice you would make if you were privately informed.
I would not make any wish to a genie built using your definition of
"wanting", because it identifies the person known as Eliezer in a
different place than I identify myself. The way the genie interprets what
makes *me* happy, or the choice that *I* would make if *I* were privately
informed, rests on a construction of volition that I don't trust and in
fact actively disagree with. It doesn't matter how many meta-wishes I
use; I can't trust the root-level interpreter.
Incidentally, I'm not discriminating against your genie in particular; I
wouldn't go anywhere near any genie unless it satisfied a considerable
number of nonobvious properties. It's an FAI-complete problem. To give
an example of a failure that would result from the specific points under
debate, the genie might decide that what I "really want" is to be
wireheaded. That I would verbally object to this, if fully informed,
might not strike the genie as relevant, given that I would in fact,
presented with the wire, be deliriously happy - who says that my verbal
objections are what count, when my "real wants" are clearly for the wire?
It's worth noting that the definition of volition I'm using is constructed
around this specific problem of helping people - not around an economic
problem, or around the disembodied philosophical question of what people
"really want". When in doubt, the question I step back and ask myself is
"But would this really be helpful?" A Friendly AI looking at Robin Hanson
pondering the abstract philosophical question of how to define the phrase
"really want" might see that you would unambiguously converge on a single
definition, given that you were fully informed about volitional theory.
Or not. I don't much believe in arguing over the definitions of words -
there's no ready observable for "really want", and there may be no good
definition at all without the context provided by a meta-problem such as
interpersonal interactions in granting wishes.
> I had in mind examples that look much less like deliberate deception. A
> corporation (or non-profit org) can start out with principles that it
> declares, and that the top individuals in it believe it follows, but
> market selection pressures can end up making it violate those
> principles. It might be company policy not to pollute, or to always
> tell the costumer the truth, but the company may not give individual
> employees the incentive to follow those policies. Those low level
> employees may engage in deliberate deception, but the corporation as a
> whole is may be better described as engaging in self-deception. The CEO
> may not realize that the incentives are off, but that is because he has
> not paid sufficient attention to such issues. Each person may think it
> is someone else's job to deal with that problem. And the corporate PR
> department may be even less away of incentive issues than most parts of
> the company.
This organizational pathology sounds like something entirely separate from
human self-deception, operating through different mechanisms to produce
results with surface similarity. If I were analyzing the two systems I
would be very careful to do so separately, to avoid fearsome confusion and
dismay.
-- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Sat Jul 12 2003 - 05:12:50 MDT