Re: Parallel Universes

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Feb 12 2003 - 08:59:49 MST

  • Next message: Dehede011@aol.com: "Re: Fuel Efficient Cars (was Oil Economics)"

    Wei Dai wrote:
    > On Wed, Feb 12, 2003 at 02:10:41AM -0500, Eliezer S. Yudkowsky wrote:
    >
    >> You know, there are days when you want to just give up and say:
    >> "Don't tell ME how Bayes' Theorem works, foolish mortal." Anyway,
    >> for some strange reason you appear to be applying Bayes adjustments
    >> to the measure of observers and *and* to those observers' subjective
    >> probabilities. That's where the double discount is coming from in
    >> your strange calculations. You're applying numbers to the measure of
    >> observers that just don't go there.
    >
    > I was trying to point out your double discounting by doing the same
    > thing where it's more obviously wrong. Look at what you wrote earlier
    > again:
    >
    >> p(.99)*u(.99m observers see 0 => .99m observers choose 0 => .99m
    >> observers are rewarded && .01m observers see 1 => .01m observers
    >> choose 1 => .01m observers are punished) + p(.01)*u(.99m observers
    >> see 1 => .99m observers choose 1 => .99m observers are rewarded &&
    >> .01m observers see 0 => .01m observers choose 0 => .01m observers are
    >> punished)
    >
    > Now compare this to what I wrote:
    >
    >>> EU(choose 0)
    >>> .99*u(.99m R & .01m P) + .01*u(.99m E & .01m P)
    >>> =
    >>> .99*.98 + .01*(990-.01)
    >>> =
    >>> 10.8701
    >>>
    >>
    >> Um... I just don't see where you get these numbers.
    >
    > I got them by simple substitution from your own formula. Other than
    > substituting some words with symbols, I just replaced ".99m observers
    > are rewarded" inside .01*u(...) with ".99m E".

    Ah. Okay, now I see where you got that from and *now* I understand where
    your doubling is coming from.

    Before you see anything, the expected global utility is maximized by
    following the rule "always choose 1". After you see 0, your local utility
    is maximized by choosing 1, according with your *previous* expected
    global utilities for what you would do *after* seeing evidence. However,
    on the next round, if you see 0 *again*, this time it will make more sense
    to choose 0. So if you see 0, you wish that you could tell all the
    observers that you saw 0, in which case it would make more sense for
    everyone else to follow the policy "choose whatever number you see".
    However, you can't tell them this. Your expected local utility after
    seeing evidence should always be consistent with your expected global
    utility before seeing evidence. After you see evidence your expected
    global utility for policies will change, reflecting the new strategy that
    you would feel best for the second round.

    I apologize for not making this clearer in my original reply. The
    original statement I was replying to was:

    [Wei Dai:]
    > Note that if he did apply Bayes's rule, then his expected utility would
    > instead become .99*U(.99m people rewarded) + .01*U(.01m people
    > punished) which would weight the reward too heavily. It doesn't matter
    > in this case but would matter in other situations.

    [Eliezer's response:]
    > Roughly speaking, if a Bayesian altruist-Platonist sees X=0, his
    > expected global utility is:
    >
    > p(.99)*u(.99m observers see 0 => .99m observers choose 0 => .99m
    > observers are rewarded && .01m observers see 1 => .01m observers choose
    > 1 => .01m observers are punished)

    This was a description of "expected global utility" in the sense of
    describing what the observer thinks has *already* happened. This is
    a confusing and inconsistent sense of "expected global utility", which I
    used without thinking, and I apologize for it. I should have called it an
    "outcome estimation" or something like that.

    There is nothing inconsistent about having a global policy such that:

    1) The expected global utility of all observers, before the evidence, is
    maximized by the policy "always choose 1".
    2) The expected local utility of a Bayesian observer, after the evidence,
    is maximized by "always choose 1".
    3) If a Bayesian observer sees 0, his outcome estimation for all
    observers having followed the rule "always choose 1" will be less than vis
    "wistful hindsight outcome estimation" for the rule "choose what you see".

    > BTW, I wish you would answer my question about global versus local
    > utility. Which one is the altruist-Platonist supposed to maximize, and
    > what's the purpose of the other one? If there is any misunderstanding
    > on my part about what you're saying, getting this question answered
    > should help correct that.

    You're supposed to maximize global utility.

    -- 
    Eliezer S. Yudkowsky                          http://singinst.org/
    Research Fellow, Singularity Institute for Artificial Intelligence
    


    This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 09:02:14 MST