Casting to Integers Considered Harmful ⇐

Casting to Integers Considered Harmful August 6th, 2009
Patrick Stein

Background

Many years back, I wrote some ambient music generation code. The basic structure of the code is this: Take one queen and twenty or so drones in a thirty-two dimensional space. Give them each random positions and velocities. Limit the velocity and acceleration of the queen more than you limit the same for the drones. Now, select some point at random for the queen to target. Have the queen accelerate toward that target. Have the drones accelerate toward the queen. Use the average distance from the drones to the queens in the $i$ -th dimension as the volume of the $i$ -th note where the notes are logarithmically spaced across one octave. Clip negative volumes to zero. Every so often, or when the queen gets close to the target, give the queen a new target.

It makes for some interesting ambient noise that sounds a bit like movie space noises where the lumbering enemy battleship is looming in orbit as its center portion spins to create artificial gravity within.

I started working on an iPhone application based on this code. The original code was in C++. The conversion to Objective C was fairly straightforward and fairly painless (as I used the opportunity to try to correct my own faults by breaking things out into separate functions more often).

Visualization troubles

The original code though chose random positions and velocities from uniform distributions. The iPhone app is going to involve visualization as well as auralization. The picture at the right here is a plot of five thousand points with each coordinate selected from a uniform distribution with range [-20,+20]. Because each axis value is chosen independently, it looks very unnatural.

What to do? The obvious answer is to use Gaussian random variables instead of uniform ones. The picture at the right here is five thousand points with each coordinate selected from a Gaussian distribution with a standard-deviation of 10. As you can see, this is much more natural looking.

How did I generate the Gaussians?

I have usually used the Box-Muller method of generating two Gaussian-distributed random variables given two uniformly-distributed random variables:

(defun random-gaussian ()
(let ((u1 (random 1.0))
(u2 (random 1.0)))
(let ((mag (sqrt (* -2.0 (log u1))))
(ang (* 2.0 pi u2)))
(values (* mag (cos ang))
(* mag (sin ang))))))

But, I found an article online that shows a more numerically stable version:

(defun random-gaussian ()
(flet ((pick-in-circle ()
(loop as u1 = (random 1.0)
as u2 = (random 1.0)
as mag-squared = (+ (* u1 u1) (* u2 u2))
when (< mag-squared 1.0)
return (values u1 u2 mag-squared))))
(multiple-value-bind (u1 u2 mag-squared) (pick-in-circle)
(let ((ww (sqrt (/ (* -2.0 (log mag-squared)) mag-squared))))
(values (* u1 ww)
(* u2 ww))))))

For a quick sanity check, I thought, let’s just make sure it looks like a Gaussian. Here, I showed the code in Lisp, but the original code was in Objective-C. I figured, If I just change the function declaration, I can plop this into a short C program, run a few thousand trials into some histogram buckets, and see what I get.

The trouble with zero

So, here comes the problem with zero. I had the following main loop:

#define BUCKET_COUNT 33
#define STDDEV 8.0
#define ITERATIONS 100000

for ( ii=0; ii < ITERATIONS; ++ii ) {
int bb = val_to_bucket( STDDEV * gaussian() );
if ( 0 <= bb && bb < BUCKET_COUNT ) {
++buckets[ bb ];
}
}

I now present you with three different implementations of the val_to_bucket() function.

int val_to_bucket( double _val ) {
return (int)_val + ( BUCKET_COUNT / 2 );
}

int val_to_bucket( double _val ) {
return (int)( _val + (int)( BUCKET_COUNT / 2 ) );
}

int val_to_bucket( double _val ) {
return (int)( _val + (int)( BUCKET_COUNT / 2 ) + 1 ) - 1;
}

As you can probably guess, after years or reading trick questions, only the last one actually works as far as my main loop is concerned. Why? Every number between -1 and +1 becomes zero when you cast the double to an integer. That’s twice as big a range as any other integer gets. So, for the first implementation, the middle bucket has about twice as many things in it as it should. For the second implementation, the first bucket has more things in it than it should. For the final implementation, the non-existent bucket before the first one is the overloaded bucket. In the end, I used this implementation instead so that I wouldn’t even bias non-existent buckets:

int val_to_bucket( double _val ) {
return (int)lround(_val) + ( BUCKET_COUNT / 2 );
}

nklein software

software development and consulting