Expected Value and Variance: Formulas and Derivations.
March 14, 2011
I do this post anticipating the question, "Yeah, but why are these formulas the same?" from some of my statistics students. This is all done in the discrete random variable sense.
For some probability distribution where the probability of in happening is given by , then the expected value of is given by the following formula:
and what this formula means is, "multiply the value of by the probability of and then add all this up. The idea behind this is the following: if you have a 20% chance of winning $1000, then what are your "expected winnings"? Well, since you have a 0.2 chance of getting $1000, then your expected winnings should be $1000(0.2) = $200. This means that if you played this game over and over and over again forever, the average amount of money that you’d get would be $200. This has the following cool corollary:
Kind of Real Life Application!: Suppose there is a lottery where tickets are bought. Let’s suppose (somewhat unrealistically) that each of these tickets is distinct. Let’s suppose that if you win, you win $, where stands for some amount of money. The question: how much should you spend on each ticket?
This comes up in real life (or, a nice approximation of this) when we have something like the New York lottery paying out something like $6 million and millions upon millions of tickets being bought for a dollar each. Let’s think about this. What are the odds you will win? Well, if there are tickets bought and each are distinct, then you have a chance in winning. This is usually pretty low. What is the expected pay-off then?
where, recall, is the amount you’ll win if you win, and the right hand side is the probability you don’t win. You get $0 if you don’t win. Sad. Either way,
which is a nice way to find the expected value. So, let’s put some real numbers in. For the New York Lotto, there are 59 picks for each of the 6 lotto numbers. This means that there are possibilities, which is 32,441,381,280. According to the US Census Bureau, the population of NY in July of 2009 was 19,541,452. Let’s assume that each of these people (not all of whom are of gambling age!) bought 5 tickets each. This brings the total number of tickets to 97,707,260. These will probably not be distinct, but it probably will not make a huge difference to assume they are distinct. So what are the expected winnings if we only count the winnings of those who win the *big prize*? As of right now, that jackpot is $12 million dollars, so by our calculations, your expected winnings are
which implies the expected value that you’ll get is $0.12. Thus, you should not spend more than about 12 cents for a lotto ticket. Hm. You’re getting ripped off.
I won’t spend too much time explaining variance, but there are two different formulas that we use to compute variance and a few students might be confused as to why these formulas are actually the same. So, let’s show both and then derive one from the other. Recall that is the expected value of the random variable .
These are both equivalent, but it takes a bit of manipulating to see that. Most of the time, I prefer the second equation (because it’s somewhat annoying to calculate the term, though I find the first one more intuitive and so that’s the one I usually wind up remembering. Let’s derive the second equation starting with the first equation. First, expand out the binomial:
Now, let’s look and see what we have here. The first sum is going to stay where it is, but what about the second sum? Note that is just a fixed number here, so we may factor it out of the sum. But then what is the sum of ? Yes, that’s right, it’s the expected value! In this case, we call that , and so our second sum here is equal to
Now what about the last sum? Well, all we need to think about is what do we get if we add up all of the probabilities of every possible situation? We get exactly 1! If these are the only possibilities, then at least one of them must happen; thus the sum of the probabilities is equal to 1. This means
Which allows us to simplify our equation above. Recopying a bit, we get
which is exactly the second formula. Neat.