It looks like Google Maps is having some longitude problems along the eastern coast of the U.S. This is from the CL-USERS Google Map:
Swim, Zach! Swim!
February 26th, 2011
Patrick Stein
Calculating the mean and variance with one pass
February 15th, 2011
Patrick Stein
A friend showed me this about 15 years ago. I use it every time I need to calculate the variance of some data set. I always forget the exact details and have to derive it again. But, it’s easy enough to derive that it’s never a problem.
I had to derive it again on Friday and thought, I should make sure more people get this tool into their utility belts
.
First, a quick refresher on what we’re talking about here. The mean of a data set is defined to be . The variance is defined to be .
A naïve approach to calculating the variance then goes something like this:
(flet ((square (x) (* x x)))
(let* ((n (length data))
(sum (reduce #'+ data :initial-value 0))
(mu (/ sum n))
(vv (reduce #'(lambda (accum xi)
(+ accum (square (- xi mu))))
data :initial-value 0)))
(values mu (/ vv n)))))
This code runs through the data list once to count the items, once to calculate the mean, and once to calculate the variance. It is easy to see how we could count the items at the same time we are summing them. It is not as obvious how we can calculate the sum of squared terms involving the mean until we’ve calculated the mean.
If we expand the squared term and pull the constant outside of the summations it ends up in, we find that:
When we recognize that and , we get:
This leads to the following code:
(flet ((square (x) (* x x)))
(destructuring-bind (n xs x2s)
(reduce #'(lambda (accum xi)
(list (1+ (first accum))
(+ (second accum) xi)
(+ (third accum) (square xi))))
data :initial-value '(0 0 0))
(let ((mu (/ xs n)))
(values mu (- (/ x2s n) (square mu)))))))
The code is not as simple, but you gain a great deal of flexibility. You can easily convert the above concept to continuously track the mean and variance as you iterate through an input stream. You do not have to keep data around to iterate through later. You can deal with things one sample at a time.
The same concept extends to higher-order moments, too.
Happy counting.
Edit: As many have pointed out, this isn’t the most numerically stable way to do this calculation. For my part, I was doing it with Lisp integers, so I’ve got all of the stability I could ever want. 🙂 But, yes…. if you are intending to use these numbers for big-time decision making, you probably want to look up a really stable algorithm.
USerial Library — v0.1.2010.12.26
December 27th, 2010
Patrick Stein
I am putting together a networking library atop usocket for use in a multiplayer Lisp game. So far, I have implemented a library for serializing data into a byte buffer.
Here is a simple example of serializing some things into a buffer:
(make-bitfield-serializer :login-flags (:hidden :stay-logged-in))
(serialize* (:opcode :login
:uint32 sequence-number
:login-flags '(:hidden)
:string login-name
:string password) buffer)
You can unserialize those bits into existing places:
(unserialize* (:opcode opcode
:uint32 sequence-number
:login-flags flags
:string login-name
:string password) buffer)
...)
You can unserialize them into newly-created variables for use within a body:
:uint32 sequence-number
:login-flags flags
:string login-name
:string password) buffer
...)
Or, you can unserialize them into a list:
:uint32
:login-flags
:string
:string) buffer)))
...)
You can find out more about the serialization library on my unet page.
A Currying Pipeline
November 20th, 2010
Patrick Stein
I have been reading a great deal about Haskell and thinking a great deal about a networked Lisp game that I intend to work on soon. For the Lisp project, I will need to serialize and unserialize packets to send them over the network. I re-read the chapter on parsing binary files in Practical Common Lisp and started to think about how I could make readers and writers that worked on buffers. Thanks to the Haskell influence, I was also trying to do this serialization without side effects.
The Goal
I wanted to accomplish something like this without all of the SETF action and verbiage:
(setf vv (serialize 'real xx vv))
(setf vv (serialize 'real yy vv))
(setf vv (serialize 'real zz vv))
vv)
The First Attempt
Well, also thanks to Haskell, my instinct was to make a CURRY-PIPELINE macro that gets called something like this:
(serialize 'real xx)
(serialize 'real yy)
(serialize 'real zz))
and expands into something like this:
Unfortunately, this changes the order of evaluation of xx, yy, and zz entirely. So, this was suboptimal. It also involved a fifteen-line macro with macro recursion and two conditionals.
Slightly Cleaner
My next attempt was about a ten-line macro with one conditional that turned it into a bunch of nested LET statements.
(let ((#:V-1 (serialize 'real xx #:V-1)))
(let (#:V-1 (serialize 'real yy #:V-1)))
(let (#:V-1 (serialize 'real zz #:V-1)))
#:V-1))))
Cleaner Still
Then, I realized that most of my simple examples would be simplest if I curried into the first argument instead of the last. (In fact, it would have even fixed my order of evaluation problem in the initial version.) And, I realized that I could abandon the nested LET if I used a LET*. Now, I have a six-line macro that I really like.
(let ((vv (gensym "VARIABLE-")))
(labels ((curry-and-store (line)
`(,vv (,(first line) ,vv ,@(rest line)))))
`(let* ((,vv ,initial)
,@(mapcar #'curry-and-store body))
,vv))))
So, here is an example that does one of those grade school magic tricks. Pick a number, multiply by five, add six, multiply by four, add nine, and multiply by five. Tell me the answer. I subtract 165 and divide by one hundred to tell you your original number.
(let ((rube (curry-pipeline n
(* 5)
(+ 6)
(* 4)
(+ 9)
(* 5))))
(curry-pipeline rube
(- 165)
(/ 100))))
Back to the Original Problem
Now, my serialize functions can simply take a buffer and return a buffer which is the concatenation of the input buffer and the bytes required to encode the given value. The unserialize is not as nice since I will have to return both a buffer and a value, but I am sure I can work something out using a CONS as an accumulator. And, heck, it is going to kill my performance anyway if I really copy the buffer every time I want to add another item to it. I am probably going to ditch the functional aspect anyway. *shrug*
A New Tack
If you don’t like currying or need to have more control over where the accumulator goes in each step of the chain, you can still get the same kind of chaining if you require a declaration of the variable. And, it simplifies the macro:
`(let* ((,var ,initial)
,@(mapcar #'(lambda (line) `(,var ,line)) body))
,var))
(defun magic-trick (n)
(let ((rube (let-pipeline (r n)
(* 5 r)
(+ 6 r)
(* 4 r)
(+ 9 r)
(* 5 r))))
(let-pipeline (a rube)
(- a 165)
(/ a 100))))
From there, it is not too big of a leap to allow MULTIPLE-VALUE-BIND instead. To accomplish the unserialize, as I mentioned, I will need to track multiple values. I am now down to a four line macro:
`(multiple-value-bind ,vars ,call
,(if rest-pipeline
`(mv-pipeline ,rest-pipeline ,@body)
`(progn ,@body))))
Here is an easy to follow example that returns the first five Fibonacci numbers:
(f2) (+ f1 f0)
(f3) (+ f2 f1)
(f4) (+ f3 f2))
(list f0 f1 f2 f3 f4))
Now, my unserialize case might look something like this:
(mv-pipeline ((xx buffer) (unserialize 'real buffer)
(yy buffer) (unserialize 'real buffer)
(zz buffer) (unserialize 'real buffer))
(values (vector xx yy zz) buffer)))
Conclusion
Luckily, I am not paying myself based on lines of code per hour. Almost every time I have done more work, I have reduced the number of lines of code. I am reminded of this quote from Blaise Pascal: I have made this letter longer, because I have not had the time to make it shorter.
Won An “Award”
November 19th, 2010
Patrick Stein
Early this year, I wrote a small start of a game for a 7-Day Lisp Programming Contest. I just got some hilarious please, give us information about you we can sell to third parties
spam saying that my product has been granted the Famous Software Award
.
There is an (apparently apocryphal) story that the World Series of Baseball was not meant to imply something global, but rather was to reflect that it was sponsored by the newspaper The New York World. In the present case, however, there is no implication that my software is famous. The company sponsoring the award
has the word famous
in its name.
Anyhow, I found it quite amusing that my half-a-game experiment with a one-button interface was being recognized
for:
The Famous Software Award has been initiated by [Spammer’s URL Here] to recognize
Famous Software, which come up with innovative and efficient ways to reflect the best relationship with users assuring their satisfacation.
The broken English there makes it tough to discern if they’re claiming that my famous software assures user satisfaction or if the Spammer company does. Either way, Go, me!
🙂