23 Jul 2015

Note on Think Stats: Given the PMF of a random variable X, compute the variance of X

Tags	probability

This week, I started using some free time to read Think Stats by Allen B. Downey. So far it’s been a pretty good book. I was reading the start of Chapter 2.4, where the author says to download the Pmf.py file from http://thinkstats.com/Pmf.py . Curious of what’s inside, I read through the code and encountered this method in the Pmf class:

def Var(self, mu=None):
    """Computes the variance of a PMF.

    Args:
        mu: the point around which the variance is computed;
            if omitted, computes the mean

    Returns:
        float variance
    """
    if mu is None:
        mu = self.Mean()

    var = 0.0
    for x, p = self.d.iteritems():
        var += p * (x - mu)**2
    return var

Without going into details of the Pmf.Mean method and what the Pmf.d dictionary stores, I hope you trust me and the author that Pmf.Mean computes the mean of the PMF of a random variable $ X $ correctly using the following formula:

$$ \mathbb{E} [X] = \sum_{i=1}^{n} p_i X_i $$

and that Pmf.d is a dictionary which maps values the random variable can take on, to the probabilities of appearance.

What kind of surprised me about the Pmf.Var code above was this part:

var = 0.0
for x, p = self.d.iteritems():
    var += p * (x - mu)**2
return var

which translates to:

$$ Var(X) = \sum_{i=1}^{n} p_i * (X_i - \mu)^{2} $$

Ok, I gotta admit, my Probability chops are kind of rusty. To cut myself some slack, I did not recall doing exercises to compute variance from a pmf, nor did I recall seeing this formula in the lecture notes and textbooks I’ve used for several introductory Probability and Statistics classes. Or maybe my memory is just plain bad. (See Afterthoughts section). So I was doubtful and the one thought going on in my mind is: Is that really how you compute the variance of a random variable $ X $ given its pmf? I did a little googling on computing variance and one of the top few results was this familiar formula: $ Var ( X ) = \mathbb{E} [ X^2 ] - ( \mathbb{E} [ X ] )^{2} $

which looks quite different from $ \sum_{i=1}^{n} p_i * (X_i - \mu)^2 $. Could it be that they’re actually equivalent? Let’s prove it.

We want to show that:

$$ Var(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \sum_{i=1}^{n} p_i * (X_i - \mu)^2 $$

It’s actually easier to start from $$ \sum_{i=1}^{n} p_i * (X_i - \mu)^2 $$, so let’s start from that.

Proof:

$$ \sum_{i=1}^{n} p_i * (X_i - \mu)^2 \ = \sum_{i=1}^{n} p_i * ({X_i}^2 - 2 X_i \mu + {\mu}^2) \ = \sum_{i=1}^{n} [ p_i {X_i}^2 - 2 p_i X_i \mu + p_i {\mu}^2] \ = \sum_{i=1}^{n} [ p_i {X_i}^2 ] - 2 \mu \sum_{i=1}^{n} [ p_i X_i ] + {\mu}^2 \sum_{i=1}^{n} p_i \ = \mathbb{E}[X^2] - 2 \mu \mathbb{E}[X] + {\mu}^2 \cdot 1 \ = \mathbb{E}[X^2] - 2 \mu \cdot \mu + {\mu}^2 \ = \mathbb{E}[X^2] - {\mu}^2 \ = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 $$

A proof starting from $ \mathbb{E}[X^2] - (\mathbb{E}[X])^2 $ requires a little bit of trickery, namely the use of $ a + (-a) = 0 $, but it’s not too complicated, so let’s do it.

Proof:

$$ \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \ = \sum_{i=1}^{n} [ p_i {X_i}^2 ] - \mathbb{E}[X] \cdot \mathbb{E}[X] \ = \sum_{i=1}^{n} [ p_i {X_i}^2 ] - \mathbb{E}[X] \cdot \sum_{i=1}^{n} [ p_i X_i ] \ = \sum_{i=1}^{n} [ p_i {X_i}^2 ] - \mu \sum_{i=1}^{n} [ p_i X_i ] \ = \sum_{i=1}^{n} [ p_i {X_i}^2 - \mu p_i X_i ] \ = \sum_{i=1}^{n} [ p_i X_i (X_i - \mu) ] \ = \sum_{i=1}^{n} [ p_i (X_i - \mu + \mu) \cdot (X_i - \mu) ] \ = \sum_{i=1}^{n} [ p_i ({X_i}^{2} - X_i \mu - X_i \mu + {\mu}^{2} + X_i \mu - {\mu}^{2}) \ = \sum_{i=1}^{n} [ p_i (({X_i}^{2} - X_i \mu - X_i \mu + {\mu}^{2}) + (X_i \mu - {\mu}^{2})) ] \ = \sum_{i=1}^{n} [ p_i (({X_i}^{2} - 2 X_i \mu + {\mu}^{2}) + (X_i \mu - {\mu}^{2})) ] \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 + p_i (X_i \mu - {\mu}^{2}) ] \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 ] + \sum_{i=1}^{n} [ p_i (X_i \mu - {\mu}^{2}) ] \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 ] + \mu \sum_{i=1}^{n} [ p_i X_i ] - \sum_{i=1}^{n} [ p_i {\mu}^{2} ] \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 ] + \mu \cdot \mu - {\mu}^2 \sum_{i=1}^{n} p_i \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 ] + {\mu}^2 - {\mu}^2 \cdot 1 \ = \sum_{i=1}^{n} [ p_i (X_i - \mu)^2 ] $$

So there we have it, $$ Var(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \sum_{i=1}^{n} p_i (X_i - \mu)^2 $$. More importantly, $$ \sum_{i=1}^{n} p_i (X_i - \mu)^2 $$ is indeed the correct way to compute the variance of a random variable $$ X $$ given its pmf.

Afterthoughts

Halfway through this post I concluded that it’s really my poor memory. Granted, I typically compute variance using $ \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \ $ instead of $$ \sum_{i=1}^{n} p_i * (X_i - \mu)^2 $$, as do most students in an introductory Probability class I guess. Or you just use a computer program to do it and not worry about what goes on in the background. Ah wells. More reason to brush up on the fundamentals of Probability. It’s a good exercise though =)

Disclaimer: Opinions expressed on this blog are solely my own and do not express the views or opinions of my employer(s), past or present.