Re: Parallel Universes

From: Wei Dai (weidai@weidai.com)
Date: Wed Feb 12 2003 - 19:15:00 MST

  • Next message: Mike Lorrey: "Re: right to drive cars"

    On Wed, Feb 12, 2003 at 10:59:49AM -0500, Eliezer S. Yudkowsky wrote:
    > Before you see anything, the expected global utility is maximized by
    > following the rule "always choose 1". After you see 0, your local utility
    > is maximized by choosing 1, according with your *previous* expected
    > global utilities for what you would do *after* seeing evidence. However,
    > on the next round, if you see 0 *again*, this time it will make more sense
    > to choose 0. So if you see 0, you wish that you could tell all the
    > observers that you saw 0, in which case it would make more sense for
    > everyone else to follow the policy "choose whatever number you see".
    > However, you can't tell them this. Your expected local utility after
    > seeing evidence should always be consistent with your expected global
    > utility before seeing evidence.

    I don't understand why you're bringing a second round into this. There is
    no second round in either of the thought experiments.

    > After you see evidence your expected
    > global utility for policies will change, reflecting the new strategy that
    > you would feel best for the second round.

    Ok, but why do you bring it up since there is no second round? Does
    imagining a second round help you make a decision in the first and only
    round?

    > This was a description of "expected global utility" in the sense of
    > describing what the observer thinks has *already* happened. This is
    > a confusing and inconsistent sense of "expected global utility", which I
    > used without thinking, and I apologize for it. I should have called it an
    > "outcome estimation" or something like that.

    Outcome estimation doesn't make sense either. See last paragraph below.

    > There is nothing inconsistent about having a global policy such that:
    >
    > 1) The expected global utility of all observers, before the evidence, is
    > maximized by the policy "always choose 1".
    > 2) The expected local utility of a Bayesian observer, after the evidence,
    > is maximized by "always choose 1".

    I'm still confused what what you mean by "global" versus "local". Do you
    mean that for any given observer-moment and possible course of action,
    there are two different expected utilities, one global and one local. Or
    are you using them to denote the expected utilities of two different
    observer-moments? At first I thought it was the former because you gave
    different global and local utilities to the same observer-moment, but then
    you said that what you gave for "global utility" was actually "outcome
    estimation" so now I don't know.

    > 3) If a Bayesian observer sees 0, his outcome estimation for all
    > observers having followed the rule "always choose 1" will be less than vis
    > "wistful hindsight outcome estimation" for the rule "choose what you see".

    Suppose you see 0. At this point it's not too late to change your mind and
    guess 0. If you did, then that would be equivalent to having the rule
    "choose what you see" to begin with. So if you're feeling regret about
    "always choose 1", why not change your mind? Obviously you should not be
    feeling regret since you haven't done anything yet, therefore the "outcome
    estimation" formula you gave must be wrong.

    Eliezer, how about we switch roles for a bit. Let me tell you the decision
    making process I would use in more detail, and you can tell me where you
    disagree.

    First, for each observer-moment and course of action (i.e., policy or
    rule), there is one expected utility. Each observer-moment should choose
    the course of action with the greatest expected utility. Expected utility
    of a course of action is defined as the average of the utility function
    evaluated on each possible state of the multiverse, weighted by the
    probability of that state being the actual state if the course was chosen.

    (Actually this is a drastic simplication of what I'm actually proposing,
    but it's sufficient to deal with this thought experiment. If you see any
    problems with this definition, please read
    everything-list@eskimo.com/msg03812.html">http://www.mail-archive.com/everything-list@eskimo.com/msg03812.html where
    you'll get the full proposed definition of expected utility in the
    multiverse context.)

    There are three observer-moments in the thought experiment: (A) you before
    seeing any printout, (B) you that sees 0, and (C) you that sees 1. They
    each make their own decisions but all of the decisions will be consistent
    with each other (since they are after all the same person). So A has three
    possible courses of action: (A0) always choose 0, (A1) always choose 1,
    and (A2) choose what you see. B has two possible courses of action: (B0)
    choose 0, (B1) choose 1. C has (C0) choose 0, (C1) choose 1.

    Discard A0 and C0 first since they're obviously wrong. Then for A:

    EU(A1) = P_A(X=0)*U(1m P) + P_A(X=1)*U(1m E)
    = .5*(-1) + .5*1000 = 499.5

    EU(A2) = P_A(X=0)*U(.99m R & .01m P) + P_A(X=1)*U(.99m E & .01m P)
    = .5*(.99-.01) + .5*(990-.01) = 495.485

    So A should choose A1. This part I think we both agree on. I'll go into
    more detail for B:

    EU(B1) = P_B(X=0)*U(state of the multiverse if B chooses 1 and X=0) +
    P_B(X=1)*U(state of the multiverse if B chooses 1 and X=1)

    EU(B0) = P_B(X=0)*U(state of the multiverse if B chooses 0 and X=0) +
    P_B(X=1)*U(state of the multiverse if B chooses 0 and X=1)

    What's the state of the multiverse if B chooses 1 and X=0? C is certain to
    choose 1. So if B chooses 1 and X=0 then 1m observers get punished.
    And if B chooses 1 and X=1? Since C chooses 1 then 1m observers gets
    rewarded extra. So:

    EU(B1) = P_B(X=0)*U(1m P) + P_B(X=1)*U(1m E)

    If B chooses 0 and X=0, B gets rewarded and C gets punished, and B has
    measure .99m and C has measure .01m, so it means .99m R & .01m P. If B
    chooses 0 and X=1, B gets punished and C gets rewarded extra, and B has
    measure .01m and C has measure .99m, so it means .01m P & .99m E. So:

    EU(B0) = P_B(X=0)*U(.99m R & .01m P) + P_B(X=1)*U(.99m E & .01m P)

    Do you agree so far?

    Now what should P_B(X=0) and P_B(X=1) be? If B applies Bayes's rule and
    make them .99 and .01, then he's going to choose B0, which we've agreed is
    wrong and also inconsistent with A's decision. So they must remain .5 and
    .5.



    This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 21:16:21 MST