From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Feb 12 2003 - 00:10:41 MST
Wei Dai wrote:
> On Tue, Feb 11, 2003 at 11:13:32PM -0500, Eliezer S. Yudkowsky wrote:
>
>>If he chooses to apply Bayes's rule, then his expected *global* utility is
>>p(1)*u(.99m rewarded) + p(1)*u(.01m punished), while his expected *local*
>>utility is p(.99)*(1 rewarded) + p(.01)*(1 punished).
>
>
> I assume an altruist-Platonist is supposed to try to maximize his global
> utility. What role is the local utility supposed to play? BTW, the only
> difference between your "global" utility and my utility is that I
> subtracted out the parts of the multiverse that can't be causally affected
> by the decision maker, so they're really equivalent. But for the argument
> below I'll use your definition.
>
>
>>If he sees X=0
>>locally it doesn't change his estimate of the global truth that .99m
>>observers see the true value of X and .01m observers see a false value of
>>X. Roughly speaking, if a Bayesian altruist-Platonist sees X=0, his
>>expected global utility is:
>>
>>p(.99)*u(.99m observers see 0 => .99m observers choose 0 => .99m observers
>>are rewarded && .01m observers see 1 => .01m observers choose 1 => .01m
>>observers are punished)
>>+
>>p(.01)*u(.99m observers see 1 => .99m observers choose 1 => .99m observers
>>are rewarded && .01m observers see 0 => .01m observers choose 0 => .01m
>>observers are punished)
>
> Remember in my last post I wrote:
>
> Note that if he did apply Bayes's rule, then his expected utility would
> instead become .99*U(.99m people rewarded) + .01*U(.01m people punished)
> which would weight the reward too heavily. It doesn't matter in this case
> but would matter in other situations.
No, Wei Dai, this does *not* happen. I see the mistake you're making but
I honestly don't know why you're making it, which makes it hard to
correct. If you see 1 it affects your Bayesian probability that the real
digit is 1 or 0; it does not affect your estimate of the global outcome of
people following a Bayesian strategy. You seem to be assuming that
Bayesian reasoners think that all observers see the same digit... or
something. Actually, I don't know what you're assuming. I can't make the
math come out your way, period.
> You're basically making exactly this mistake,
Please actually read my numbers.
> and I'll describe one of the
> "other situations" where it's easy to see that it is a mistake. Suppose
> in the thought experiment, the reward for guessing 1 if X=1 is 1000 times
> as high as the reward for guessing 0 if X=0 (and the punishments stay the
> same). Wouldn't you agree that in this case, you should guess 1 even if
> the printout says 0?
Yes.
> The way you compute global utility, however, your
> utility is still higher if you guess 0 when you see 0.
No.
> Here're the calculations. Let R stand for "observers rewarded", E stand
> for "observers rewarded extra", i.e. rewarded 1000 times more, and P for
> "observers punished". And let u(E) = 1000*u(R) = -1000*u(P) = 1000.
> According to your utility function, the expected utilities of choosing 0
> or 1 are:
>
> EU(choose 0)
> .99*u(.99m R & .01m P) + .01*u(.99m E & .01m P)
> =
> .99*.98 + .01*(990-.01)
> =
> 10.8701
Um... I just don't see where you get these numbers. Are you talking about
the expected total utility of choosing 0 every time, or the expected
utility of choosing 0 given that you saw 0?
Given that you see 0, the expected utility of all reasoners choosing 0 is:
.99*u(1.0m R) + 0.01u(1.0m P)
=
.99 - .01
Given that you see 0, the expected utility of all reasoners choosing 1 is:
.99*u(1.0m P) + 0.01u(1.0m E)
=
-.99 + 10
> If you do the same computations without applying Bayes's rule first,
> you'll see that EU(choose 1) > EU(choose 0), but I won't go through the
> details. What's going on here is that by applying Bayes's rule, you're
> discounting the extra reward twice, once through the probability, and
> another time through the measure of observers, so the extra reward is
> discounted by a factor of 10000 instead of 100. That's why I choose to
> make the extra reward worth 1000 times the normal reward.
You know, there are days when you want to just give up and say: "Don't
tell ME how Bayes' Theorem works, foolish mortal." Anyway, for some
strange reason you appear to be applying Bayes adjustments to the measure
of observers and *and* to those observers' subjective probabilities.
That's where the double discount is coming from in your strange
calculations. You're applying numbers to the measure of observers that
just don't go there.
-- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 00:12:59 MST