Calculating the mean and variance with one pass February 15th, 2011
Patrick Stein

A friend showed me this about 15 years ago. I use it every time I need to calculate the variance of some data set. I always forget the exact details and have to derive it again. But, it’s easy enough to derive that it’s never a problem.

I had to derive it again on Friday and thought, I should make sure more people get this tool into their utility belts.

First, a quick refresher on what we’re talking about here. The mean \mu of a data set { x_1, x_2, \ldots, x_n } is defined to be \frac{1}{n} \sum_{i=1}^n x_i. The variance \sigma^2 is defined to be \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2.

A naïve approach to calculating the variance then goes something like this:

(defun mean-variance (data)
  (flet ((square (x) (* x x)))
    (let* ((n (length data))
           (sum (reduce #'+ data :initial-value 0))
           (mu (/ sum n))
           (vv (reduce #'(lambda (accum xi)
                           (+ accum (square (- xi mu))))
                       data :initial-value 0)))
      (values mu (/ vv n)))))

This code runs through the data list once to count the items, once to calculate the mean, and once to calculate the variance. It is easy to see how we could count the items at the same time we are summing them. It is not as obvious how we can calculate the sum of squared terms involving the mean until we’ve calculated the mean.

If we expand the squared term and pull the constant \mu outside of the summations it ends up in, we find that:

\frac{\sum (x_i - \mu)^2}{n} = \frac{\sum x_i^2}{n} - 2 \mu \frac{\sum x_i}{n} + \mu^2 \frac{\sum 1}{n}

When we recognize that \frac{\sum x_i}{n} = \mu and \sum_{i=1}^n 1 = n, we get:

\sigma^2 = \frac{\sum x_i^2}{n} - \mu^2 = \frac{\sum x_i^2}{n} - \left( \frac{\sum x_i}{n} \right)^2
.

This leads to the following code:

(defun mean-variance (data)
  (flet ((square (x) (* x x)))
    (destructuring-bind (n xs x2s)
        (reduce #'(lambda (accum xi)
                    (list (1+ (first accum))
                          (+ (second accum) xi)
                          (+ (third accum) (square xi))))
                data :initial-value '(0 0 0))
      (let ((mu (/ xs n)))
        (values mu (- (/ x2s n) (square mu)))))))

The code is not as simple, but you gain a great deal of flexibility. You can easily convert the above concept to continuously track the mean and variance as you iterate through an input stream. You do not have to keep data around to iterate through later. You can deal with things one sample at a time.

The same concept extends to higher-order moments, too.

Happy counting.

Edit: As many have pointed out, this isn’t the most numerically stable way to do this calculation. For my part, I was doing it with Lisp integers, so I’ve got all of the stability I could ever want. πŸ™‚ But, yes…. if you are intending to use these numbers for big-time decision making, you probably want to look up a really stable algorithm.

USerial Library — v0.1.2010.12.26 December 27th, 2010
Patrick Stein

I am putting together a networking library atop usocket for use in a multiplayer Lisp game. So far, I have implemented a library for serializing data into a byte buffer.

Here is a simple example of serializing some things into a buffer:

(make-enum-serializer :opcode (:login :run :jump :logout))
(make-bitfield-serializer :login-flags (:hidden :stay-logged-in))

(serialize* (:opcode :login
             :uint32 sequence-number
             :login-flags '(:hidden)
             :string login-name
             :string password) buffer)

You can unserialize those bits into existing places:

(let (opcode sequence-number flags login-name password)
  (unserialize* (:opcode opcode
                 :uint32 sequence-number
                 :login-flags flags
                 :string login-name
                 :string password) buffer)
  ...)

You can unserialize them into newly-created variables for use within a body:

(unserialize-let* (:opcode opcode
                   :uint32 sequence-number
                   :login-flags flags
                   :string login-name
                   :string password) buffer
  ...)

Or, you can unserialize them into a list:

(let ((parts (unserialize-list* (:opcode
                                 :uint32
                                 :login-flags
                                 :string
                                 :string) buffer)))
  ...)

You can find out more about the serialization library on my unet page.

A Currying Pipeline November 20th, 2010
Patrick Stein

I have been reading a great deal about Haskell and thinking a great deal about a networked Lisp game that I intend to work on soon. For the Lisp project, I will need to serialize and unserialize packets to send them over the network. I re-read the chapter on parsing binary files in Practical Common Lisp and started to think about how I could make readers and writers that worked on buffers. Thanks to the Haskell influence, I was also trying to do this serialization without side effects.

The Goal

I wanted to accomplish something like this without all of the SETF action and verbiage:

(let (vv)
  (setf vv (serialize 'real xx vv))
  (setf vv (serialize 'real yy vv))
  (setf vv (serialize 'real zz vv))
  vv)

The First Attempt

Well, also thanks to Haskell, my instinct was to make a CURRY-PIPELINE macro that gets called something like this:

(curry-pipeline nil
  (serialize 'real xx)
  (serialize 'real yy)
  (serialize 'real zz))

and expands into something like this:

(serialize 'real zz (serialize 'real yy (serialize 'real xx nil)))

Unfortunately, this changes the order of evaluation of xx, yy, and zz entirely. So, this was suboptimal. It also involved a fifteen-line macro with macro recursion and two conditionals.

Slightly Cleaner

My next attempt was about a ten-line macro with one conditional that turned it into a bunch of nested LET statements.

(let ((#:V-1 nil))
  (let ((#:V-1 (serialize 'real xx #:V-1)))
    (let (#:V-1 (serialize 'real yy #:V-1)))
      (let (#:V-1 (serialize 'real zz #:V-1)))
        #:V-1))))

Cleaner Still

Then, I realized that most of my simple examples would be simplest if I curried into the first argument instead of the last. (In fact, it would have even fixed my order of evaluation problem in the initial version.) And, I realized that I could abandon the nested LET if I used a LET*. Now, I have a six-line macro that I really like.

(defmacro curry-pipeline (initial &body body)
  (let ((vv (gensym "VARIABLE-")))
    (labels ((curry-and-store (line)
               `(,vv (,(first line) ,vv ,@(rest line)))))
      `(let* ((,vv ,initial)
              ,@(mapcar #'curry-and-store body))
         ,vv))))

So, here is an example that does one of those grade school magic tricks. Pick a number, multiply by five, add six, multiply by four, add nine, and multiply by five. Tell me the answer. I subtract 165 and divide by one hundred to tell you your original number.

(defun magic-trick (n)
  (let ((rube (curry-pipeline n
                (* 5)
                (+ 6)
                (* 4)
                (+ 9)
                (* 5))))
    (curry-pipeline rube
      (- 165)
      (/ 100))))

Back to the Original Problem

Now, my serialize functions can simply take a buffer and return a buffer which is the concatenation of the input buffer and the bytes required to encode the given value. The unserialize is not as nice since I will have to return both a buffer and a value, but I am sure I can work something out using a CONS as an accumulator. And, heck, it is going to kill my performance anyway if I really copy the buffer every time I want to add another item to it. I am probably going to ditch the functional aspect anyway. *shrug*

A New Tack

If you don’t like currying or need to have more control over where the accumulator goes in each step of the chain, you can still get the same kind of chaining if you require a declaration of the variable. And, it simplifies the macro:

(defmacro let-pipeline ((var initial) &body body)
  `(let* ((,var ,initial)
          ,@(mapcar #'(lambda (line) `(,var ,line)) body))
     ,var))

(defun magic-trick (n)
  (let ((rube (let-pipeline (r n)
                (* 5 r)
                (+ 6 r)
                (* 4 r)
                (+ 9 r)
                (* 5 r))))
    (let-pipeline (a rube)
      (- a 165)
      (/ a 100))))

From there, it is not too big of a leap to allow MULTIPLE-VALUE-BIND instead. To accomplish the unserialize, as I mentioned, I will need to track multiple values. I am now down to a four line macro:

(defmacro mv-pipeline ((vars call &rest rest-pipeline) &body body)
  `(multiple-value-bind ,vars ,call
     ,(if rest-pipeline
          `(mv-pipeline ,rest-pipeline ,@body)
          `(progn ,@body))))

Here is an easy to follow example that returns the first five Fibonacci numbers:

(mv-pipeline ((f0 f1) (values 0 1)
              (f2) (+ f1 f0)
              (f3) (+ f2 f1)
              (f4) (+ f3 f2))
  (list f0 f1 f2 f3 f4))

Now, my unserialize case might look something like this:

(defmethod unserialize ((type (eq 'vector)) buffer)
  (mv-pipeline ((xx buffer) (unserialize 'real buffer)
                (yy buffer) (unserialize 'real buffer)
                (zz buffer) (unserialize 'real buffer))
    (values (vector xx yy zz) buffer)))

Conclusion

Luckily, I am not paying myself based on lines of code per hour. Almost every time I have done more work, I have reduced the number of lines of code. I am reminded of this quote from Blaise Pascal: I have made this letter longer, because I have not had the time to make it shorter.

Won An “Award” November 19th, 2010
Patrick Stein

Early this year, I wrote a small start of a game for a 7-Day Lisp Programming Contest. I just got some hilarious please, give us information about you we can sell to third parties spam saying that my product has been granted the Famous Software Award.

There is an (apparently apocryphal) story that the World Series of Baseball was not meant to imply something global, but rather was to reflect that it was sponsored by the newspaper The New York World. In the present case, however, there is no implication that my software is famous. The company sponsoring the award has the word famous in its name.

Anyhow, I found it quite amusing that my half-a-game experiment with a one-button interface was being recognized for:

The Famous Software Award has been initiated by [Spammer’s URL Here] to recognize Famous Software, which come up with innovative and efficient ways to reflect the best relationship with users assuring their satisfacation.

The broken English there makes it tough to discern if they’re claiming that my famous software assures user satisfaction or if the Spammer company does. Either way, Go, me! πŸ™‚

Installed Quicklisp November 17th, 2010
Patrick Stein

I installed Quicklisp tonight. It was super-simple. In about 1/2 an hour, I got slime up and running and installed all of the packages that I regularly use.

It installs itself in a quicklisp/ subdirectory of your home directory. I didn’t really want it cluttering up my normal ls output, so I moved it to .quicklisp/ and updated my .sbclrc to refer to this new path. It had to recompile everything when I loaded it next, but it handled it gracefully.

It took me less than a minute to get slime set up. This is an improvement of about five hours and fifty-nine minutes over the previous time that I set up slime.

I definitely give my two thumbs up for Quicklisp.

Thanks, Zach!

Updates In Email

Email:

l