Re: Why Does Self-Discovery Require a Journey?

From: Eliezer S. Yudkowsky (
Date: Sat Jul 12 2003 - 05:01:18 MDT

  • Next message: Anders Sandberg: "A vision"

    Robin Hanson wrote:
    >> ... when you start telling me that people do not really want what they
    >> say they want, I become worried for several reasons.
    >> One, you're assuming that wanting-ness is a simple natural category,
    >> which distracts attention away from the task of arriving at a
    >> functional decomposition of decision-making into a surprisingly weird
    >> evolutionary layer-cake with human icing on top.
    > Doesn't your claim that people do want what they say they want assume
    > just as much simplicity of a natural category?

    Guilty. Rephrasing, I would say that when research subjects report on the
    presence of a morally and emotionally valent belief, there probably really
    is a belief there, and that belief as a piece of information is a natural
    category with respect to the human mind's content. If that's the end of
    the story, i.e., if people's observed behaviors are consistent with the
    implications of that moral belief, then I'd probably identify that belief
    as what people "want", at least for purposes of conversation, if not
    Friendly AI. If anything more complicated is going on, then yes, I

    Look at how much damage has been done to the field of self-deception by
    the epistemological disputes over whether people "really believe" p, or
    ~p, or whatever. It was a mistake made in founding the field, and a
    predictable mistake, in the sense that I guessed before looking up the
    subfield that that's what the original history of the field would look
    like, whether or not people had moved beyond it as of 2003. Apparently,
    Mele at least has caught on, but energy is still going into arguing
    against the basic oversimplification of self-deception as involving
    primary belief that p, and secondary belief that ~p.

    If you specify what you mean by "belief" in terms of events and
    information present in people's mental content, rather than by attempted
    functional tests (which I would regard as secondary outcomes rather than
    primary causes), then the self-deceived have a declarative false belief p
    and accurately report its presence as their belief; they also have adapted
    biases leading them to arrive at the declarative false belief p, rather
    than motivational biases stemming from ~p represented in a hidden
    secondary control center. So the false belief p is there, and the true
    belief ~p is not - if you take "belief" as referring to the declarative
    statements stored in memory, then people believe just p, and not ~p. But
    really it would be much wiser not to invoke propositional logic to begin
    with. The real events take place in a complex brain that does not
    actually run on this kind of logic. If the people who started the debate
    on self-deception had not defined it as simultaneous belief in p and ~p,
    i.e., a secondary reported belief in p, and a primary hidden belief in ~p,
    a lot of energy would have been spared, and the field's leaders could be
    focusing on uncovering the cognitive mechanisms of self-deception rather
    than arguing against the basic mistake made at the start of the debate.

    >> Two, you're setting off people's cheater-detectors in a way that
    >> invokes an implicit theory of mind that I think is oversimplified,
    >> false-to-fact, and carves the mind at the wrong joints; ...
    > I hope you are not blaming me for people's cheater-detectors having the
    > wrong implicit theory of mind.

    Cheater detectors have the right theory of mind with respect to tobacco PR
    departments, and the wrong theory of mind with respect to adaptive
    self-deception, so I'm blaming the analogy.

    >> Three, you're making a preemptive philosophical judgment ...
    > I'm making a judgement, but I'm not sure what you mean it is "pre" to.
    > It is post consideration of many issues, but surely pre to more
    > consideration.

    True. I have no idea how much thought you've put into your philosophical
    judgment. However, I do still think it's wrong. If I were writing the
    same paper about the same evidence I would title it:

    "How Evolution Deceives You To Prevent You From Living Up To Your Ideals."

    Implicit assumptions in that title:

    1: "You" are identified with your ideals.
    2: Evolution is identified as an external, infringing force.
    3: The deceptive part of self-deception is carried out by evolution as an
    agent - any knowledge of p, used in constructing the lie ~p, is identified
    as residing in evolution as the liar.
    4: Ideals are to be "lived up" to; one's current actions are to be
    interpreted as an imperfect approximation to one's ideals, in the same way
    that human rationality is interpreted as being a Bayesian wannabe; one's
    ideals can be determined to some degree independently of one's ability to
    live up to them; failure to live up to your ideals does not imply that
    your ideals are really something else.

    >> Five, there's no good reason to mess with points 1-4 - they are
    >> totally extraneous to the real substance of your theory. You can
    >> declare yourself to be studying "adaptive gross inconsistencies in
    >> moral belief and real actions", and get the benefit of intersection
    >> with both evolutionary psychology and experimental psychology, without
    >> ever needing to take a stance about what people "really" "want", or
    >> presuming a particular functional decomposition of the mechanisms
    >> involved. Modularize away those contrarian points...
    > I agree we can consider these issues separately, but I am really
    > interested in what people really want; it is basic to normative
    > analysis, which economists do a lot of.

    I think a full theory of volition is unnecessary overkill if you're
    dealing with human-level economic issues. However, speaking from within
    my unpublished thoughts on volition, there isn't a register in people's
    minds that stores what they "really want". You can ask about what people
    really *are*, and get back information about what emotional brainware they
    really have, and what moral beliefs are really stored in their memory, but
    to ask what people "really want" is a different order of question
    entirely. The theory of volition I'm constructing to handle Friendly AI
    doesn't have the concept of a "really want" that's hidden in the brain
    somewhere; it has the idea of extrapolating someone's reactions and
    determining the spread in them. If your reactions have a small amount of
    spread, they may compound in such a way as to converge on a single
    strongly determined answer to a given question. But if this is not the
    case, then for a given problem, there may not yet be something that you
    "really want" in the sense that you mean it.

    >>> ... We've seen how people behave if briefly informed. We haven't
    >>> seen your ideal of fully informed post-upheaval choice.
    >> When have we seen how people behave if briefly informed? How can you
    >> briefly inform someone of something they don't believe to be true?
    > For example, in the area of disagreement, we can see how people respond
    > to being privately persuaded that persistent disagreement is irrational
    > - they continue to disagree. We can see that people who are privately
    > informed that they probably over-estimate their own abilities soon
    > continue to do so.

    I'd read this as: "'Briefly informing' people doesn't work." It
    certainly wouldn't work in FAI - death with a filtered-out warning label
    on it is not friendly.

    >> ... I have to say that I was not surprised by my brief survey - it
    >> looks pretty much like what I expected. Is there any particular area
    >> or result of which you worry I am ignorant?
    > In economics it goes by the name of "social welfare analysis".

    Do you mean social choice theory?

    >> But if I had a genie built using your definition of "wanting", I would
    >> never, ever make a wish to it.
    > You would apparently also not wish for it to make you happy, or to make
    > the choice you would make if you were privately informed.

    I would not make any wish to a genie built using your definition of
    "wanting", because it identifies the person known as Eliezer in a
    different place than I identify myself. The way the genie interprets what
    makes *me* happy, or the choice that *I* would make if *I* were privately
    informed, rests on a construction of volition that I don't trust and in
    fact actively disagree with. It doesn't matter how many meta-wishes I
    use; I can't trust the root-level interpreter.

    Incidentally, I'm not discriminating against your genie in particular; I
    wouldn't go anywhere near any genie unless it satisfied a considerable
    number of nonobvious properties. It's an FAI-complete problem. To give
    an example of a failure that would result from the specific points under
    debate, the genie might decide that what I "really want" is to be
    wireheaded. That I would verbally object to this, if fully informed,
    might not strike the genie as relevant, given that I would in fact,
    presented with the wire, be deliriously happy - who says that my verbal
    objections are what count, when my "real wants" are clearly for the wire?

    It's worth noting that the definition of volition I'm using is constructed
    around this specific problem of helping people - not around an economic
    problem, or around the disembodied philosophical question of what people
    "really want". When in doubt, the question I step back and ask myself is
    "But would this really be helpful?" A Friendly AI looking at Robin Hanson
    pondering the abstract philosophical question of how to define the phrase
    "really want" might see that you would unambiguously converge on a single
    definition, given that you were fully informed about volitional theory.
    Or not. I don't much believe in arguing over the definitions of words -
    there's no ready observable for "really want", and there may be no good
    definition at all without the context provided by a meta-problem such as
    interpersonal interactions in granting wishes.

    > I had in mind examples that look much less like deliberate deception. A
    > corporation (or non-profit org) can start out with principles that it
    > declares, and that the top individuals in it believe it follows, but
    > market selection pressures can end up making it violate those
    > principles. It might be company policy not to pollute, or to always
    > tell the costumer the truth, but the company may not give individual
    > employees the incentive to follow those policies. Those low level
    > employees may engage in deliberate deception, but the corporation as a
    > whole is may be better described as engaging in self-deception. The CEO
    > may not realize that the incentives are off, but that is because he has
    > not paid sufficient attention to such issues. Each person may think it
    > is someone else's job to deal with that problem. And the corporate PR
    > department may be even less away of incentive issues than most parts of
    > the company.

    This organizational pathology sounds like something entirely separate from
    human self-deception, operating through different mechanisms to produce
    results with surface similarity. If I were analyzing the two systems I
    would be very careful to do so separately, to avoid fearsome confusion and

    Eliezer S. Yudkowsky                
    Research Fellow, Singularity Institute for Artificial Intelligence

    This archive was generated by hypermail 2.1.5 : Sat Jul 12 2003 - 05:12:50 MDT