Calculating the mean and variance with one pass

Calculating the mean and variance with one pass February 15th, 2011
Patrick Stein

A friend showed me this about 15 years ago. I use it every time I need to calculate the variance of some data set. I always forget the exact details and have to derive it again. But, it’s easy enough to derive that it’s never a problem.

I had to derive it again on Friday and thought, I should make sure more people get this tool into their utility belts.

First, a quick refresher on what we’re talking about here. The mean $\mu$ of a data set ${ x_1, x_2, \ldots, x_n }$ is defined to be $\frac{1}{n} \sum_{i=1}^n x_i$ . The variance $\sigma^2$ is defined to be $\frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2$ .

A naÃ¯ve approach to calculating the variance then goes something like this:

(defun mean-variance (data)
(flet ((square (x) (* x x)))
(let* ((n (length data))
(sum (reduce #'+ data :initial-value 0))
(mu (/ sum n))
(vv (reduce #'(lambda (accum xi)
(+ accum (square (- xi mu))))
data :initial-value 0)))
(values mu (/ vv n)))))

This code runs through the data list once to count the items, once to calculate the mean, and once to calculate the variance. It is easy to see how we could count the items at the same time we are summing them. It is not as obvious how we can calculate the sum of squared terms involving the mean until we’ve calculated the mean.

If we expand the squared term and pull the constant $\mu$ outside of the summations it ends up in, we find that:

$\frac{\sum (x_i - \mu)^2}{n} = \frac{\sum x_i^2}{n} - 2 \mu \frac{\sum x_i}{n} + \mu^2 \frac{\sum 1}{n}$

When we recognize that $\frac{\sum x_i}{n} = \mu$ and $\sum_{i=1}^n 1 = n$ , we get:

$\sigma^2 = \frac{\sum x_i^2}{n} - \mu^2 = \frac{\sum x_i^2}{n} - \left( \frac{\sum x_i}{n} \right)^2$ .

This leads to the following code:

(defun mean-variance (data)
(flet ((square (x) (* x x)))
(destructuring-bind (n xs x2s)
(reduce #'(lambda (accum xi)
(list (1+ (first accum))
(+ (second accum) xi)
(+ (third accum) (square xi))))
data :initial-value '(0 0 0))
(let ((mu (/ xs n)))
(values mu (- (/ x2s n) (square mu)))))))

The code is not as simple, but you gain a great deal of flexibility. You can easily convert the above concept to continuously track the mean and variance as you iterate through an input stream. You do not have to keep data around to iterate through later. You can deal with things one sample at a time.

The same concept extends to higher-order moments, too.

Happy counting.

Edit: As many have pointed out, this isn’t the most numerically stable way to do this calculation. For my part, I was doing it with Lisp integers, so I’ve got all of the stability I could ever want. 🙂 But, yes…. if you are intending to use these numbers for big-time decision making, you probably want to look up a really stable algorithm.

gambiteer

View 2011-02-15

See this page for a more accurate algorithm.

Reply
Jan Van lent

View 2011-02-15

For certain data sets this may not be the best approach.
See [1] for some alternatives.

[1] http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

Reply
Andrej Panjkov

View 2011-02-15

This looks like a standard single pass algorithm, which can have some numerical problems, arising from catastrophic cancellation in the final subtraction at the end. For example, this algorithm can sometimes result in negative variances. (The two-pass algorithm is also unstable for some data sets too.) You can fix things somewhat by using Kahan compensation.

Have you looked at the wikipedia page This is a one-pass Welford algorithm that is numerically stable. I have compared this algorithm to kahan-compensated two-pass computations for large data sets and the differences are very small. If you analyse the Welford algorithm, the computations tend to reduce round-off errors (by dividing subtractions by the data count, a number greater than one).

I recommend reading the whole page and the Chan and Terriberry references, which extend the one-pass method to higher moments, also in a numerically stable way.

I have a python implementation of the Chan-Terriberry-Welford system which works quite nicely over data sets with up to 1.2 million samples. I also have an implementation of the Welford one-pass algorithm for weighted data. In the weighted case, the variance computations do not automatically reduce round-off errors, as you can’t be sure that the divisor of the subtractions, (the running sum of weights) is greater than one , so you have to build it with Kahan compensation. I’m happy to supply the code to anyone who is interested.

Reply
Andrej Panjkov

View 2011-02-15

oh darn I got the link wrong.

The wikipedia page I meant to refer to is: Algorithms_for_calculating_variance#On-line_algorithm

sorry about that!

Reply
Ivan Boldyrev

View 2011-02-16

AFAIR, second method is less numerically stable. It doesn’t mean it is useless, but it may behave poorly in some situations. 🙂

Reply

nklein software

software development and consulting