From: Wei Dai (weidai@weidai.com)
Date: Wed Feb 12 2003 - 19:15:00 MST
On Wed, Feb 12, 2003 at 10:59:49AM -0500, Eliezer S. Yudkowsky wrote:
> Before you see anything, the expected global utility is maximized by
> following the rule "always choose 1". After you see 0, your local utility
> is maximized by choosing 1, according with your *previous* expected
> global utilities for what you would do *after* seeing evidence. However,
> on the next round, if you see 0 *again*, this time it will make more sense
> to choose 0. So if you see 0, you wish that you could tell all the
> observers that you saw 0, in which case it would make more sense for
> everyone else to follow the policy "choose whatever number you see".
> However, you can't tell them this. Your expected local utility after
> seeing evidence should always be consistent with your expected global
> utility before seeing evidence.
I don't understand why you're bringing a second round into this. There is
no second round in either of the thought experiments.
> After you see evidence your expected
> global utility for policies will change, reflecting the new strategy that
> you would feel best for the second round.
Ok, but why do you bring it up since there is no second round? Does
imagining a second round help you make a decision in the first and only
round?
> This was a description of "expected global utility" in the sense of
> describing what the observer thinks has *already* happened. This is
> a confusing and inconsistent sense of "expected global utility", which I
> used without thinking, and I apologize for it. I should have called it an
> "outcome estimation" or something like that.
Outcome estimation doesn't make sense either. See last paragraph below.
> There is nothing inconsistent about having a global policy such that:
>
> 1) The expected global utility of all observers, before the evidence, is
> maximized by the policy "always choose 1".
> 2) The expected local utility of a Bayesian observer, after the evidence,
> is maximized by "always choose 1".
I'm still confused what what you mean by "global" versus "local". Do you
mean that for any given observer-moment and possible course of action,
there are two different expected utilities, one global and one local. Or
are you using them to denote the expected utilities of two different
observer-moments? At first I thought it was the former because you gave
different global and local utilities to the same observer-moment, but then
you said that what you gave for "global utility" was actually "outcome
estimation" so now I don't know.
> 3) If a Bayesian observer sees 0, his outcome estimation for all
> observers having followed the rule "always choose 1" will be less than vis
> "wistful hindsight outcome estimation" for the rule "choose what you see".
Suppose you see 0. At this point it's not too late to change your mind and
guess 0. If you did, then that would be equivalent to having the rule
"choose what you see" to begin with. So if you're feeling regret about
"always choose 1", why not change your mind? Obviously you should not be
feeling regret since you haven't done anything yet, therefore the "outcome
estimation" formula you gave must be wrong.
Eliezer, how about we switch roles for a bit. Let me tell you the decision
making process I would use in more detail, and you can tell me where you
disagree.
First, for each observer-moment and course of action (i.e., policy or
rule), there is one expected utility. Each observer-moment should choose
the course of action with the greatest expected utility. Expected utility
of a course of action is defined as the average of the utility function
evaluated on each possible state of the multiverse, weighted by the
probability of that state being the actual state if the course was chosen.
(Actually this is a drastic simplication of what I'm actually proposing,
but it's sufficient to deal with this thought experiment. If you see any
problems with this definition, please read
everything-list@eskimo.com/msg03812.html">http://www.mail-archive.com/everything-list@eskimo.com/msg03812.html where
you'll get the full proposed definition of expected utility in the
multiverse context.)
There are three observer-moments in the thought experiment: (A) you before
seeing any printout, (B) you that sees 0, and (C) you that sees 1. They
each make their own decisions but all of the decisions will be consistent
with each other (since they are after all the same person). So A has three
possible courses of action: (A0) always choose 0, (A1) always choose 1,
and (A2) choose what you see. B has two possible courses of action: (B0)
choose 0, (B1) choose 1. C has (C0) choose 0, (C1) choose 1.
Discard A0 and C0 first since they're obviously wrong. Then for A:
EU(A1) = P_A(X=0)*U(1m P) + P_A(X=1)*U(1m E)
= .5*(-1) + .5*1000 = 499.5
EU(A2) = P_A(X=0)*U(.99m R & .01m P) + P_A(X=1)*U(.99m E & .01m P)
= .5*(.99-.01) + .5*(990-.01) = 495.485
So A should choose A1. This part I think we both agree on. I'll go into
more detail for B:
EU(B1) = P_B(X=0)*U(state of the multiverse if B chooses 1 and X=0) +
P_B(X=1)*U(state of the multiverse if B chooses 1 and X=1)
EU(B0) = P_B(X=0)*U(state of the multiverse if B chooses 0 and X=0) +
P_B(X=1)*U(state of the multiverse if B chooses 0 and X=1)
What's the state of the multiverse if B chooses 1 and X=0? C is certain to
choose 1. So if B chooses 1 and X=0 then 1m observers get punished.
And if B chooses 1 and X=1? Since C chooses 1 then 1m observers gets
rewarded extra. So:
EU(B1) = P_B(X=0)*U(1m P) + P_B(X=1)*U(1m E)
If B chooses 0 and X=0, B gets rewarded and C gets punished, and B has
measure .99m and C has measure .01m, so it means .99m R & .01m P. If B
chooses 0 and X=1, B gets punished and C gets rewarded extra, and B has
measure .01m and C has measure .99m, so it means .01m P & .99m E. So:
EU(B0) = P_B(X=0)*U(.99m R & .01m P) + P_B(X=1)*U(.99m E & .01m P)
Do you agree so far?
Now what should P_B(X=0) and P_B(X=1) be? If B applies Bayes's rule and
make them .99 and .01, then he's going to choose B0, which we've agreed is
wrong and also inconsistent with A's decision. So they must remain .5 and
.5.
This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 21:16:21 MST