October, 2009 :: nklein software

Lisp Fourier Transform Library Faster October 12th, 2009
Patrick Stein

I released version 1.2 of my Common Lisp Fourier Transform library today. It is significantly faster than the old version. On my MacBook Pro in SBCL 1.0.30 with a 512 by 512 by 16 array, version 1.2 takes 9.77 seconds where version 1.0 took 25.31 seconds. Version 1.2 consed 2.4M while version 1.0 consed 7.4M. Neither version should have to cons nearly that much, so there’s still work to be done.

This release is significantly faster under Clozure64 1.3-r11936, too. For a 512 by 512 array, version 1.2 takes 2.72 seconds (single-threaded) and 2.07 seconds (multi-threaded) where version 1.0 took 4.45 seconds (single-threaded) and 3.06 seconds (multi-threaded).

My library is still about 1/2 to 1/3 the speed of Bordeaux FFT and still allocates memory in situations that Bordeaux does not. Hopefully, I can close those gaps most of the way soon.

Lisp Troubles: Fabricating a Closure… October 7th, 2009
Patrick Stein

Edit: lispnik on Livejournal pointed out this section of the Hyperspec which gaurantees that my pointer math is valid. I’d still like to know how to make a closure like this though, for future reference.

Edit(2): See tfb’s comment below for a solution that does make a closure using a nested (LAMBDA …) inside the backquote.

My Fast Fourier Transform library operates on multi-dimensional arrays. When you do a Fourier Transform on a two-dimensional array, you do a Fourier Transform on each row and then on each column of the result (or each column and then on each row of the result). When you do a three-dimensional Fourier Transform, you do a two-dimensional Fourier Transform on each horizontal slice and then a one-dimensional Fourier Transform on each vertical column.

The problem

To accomplish this on arrays of any number of dimensions, I wrote a wrapper class called virtual-row that one can use to index any axis-parallel row in a data hypercube with an API that makes it look like a one-dimensional array. So, for example, if I had a four-dimensional array, zero-ing one of those rows explicitly might look like this:

(dotimes (kk (array-dimension array 2))
(setf (aref array 1 3 kk 5) 0)

In my abstracted world, it would look like this:

(let ((row (virtual-row array '(1 3) '(5))))
(dotimes (kk (row-length row))
(setf (row-ref row kk) 0)))

A simple, straightforward implementation of the virtual row setter function might look like this:

(defun (setf row-ref) (row index value)
(with-slots (array pre post) row
(let ((index (apply #'array-row-major-index
array
(append pre (list index) post))))
(setf (row-major-aref index) value))))

The problem with that implementation is that it allocates a new list every time. This function is called $O(n \log n)$ times per row (and there are $\sum_{i=1}^{k} \prod_{i\neq j} d_j$ virtual rows in a $k$ -dimensional hypercuboid where the $j$ -th dimension has $d_j$ entries).

The Goal

What I wanted to do instead was to generate a closure that incorporated the pre and post lists. What I wanted was almost this:

(defun generate-setter (array pre post)
(eval `(lambda (index value)
(setf (aref array ,@pre index ,@post) value))))

I say almost because (EVAL …) occurs in the null lexical environment. This means that it doesn’t capture the array variable. If I put a comma in front of the array, then I get a representation of the value to which array is bound, which doesn’t help me. Nor does passing in ’array since array is a lexical variable where I’m calling it from, too.

Putting the (EVAL …) inside the (LAMBDA …) doesn’t help because then it happens at runtime and I just spend all of my time in crazy functions like (APPEND …) and (READ-FROM-STRING …).

I tried assembling a list by hand instead of with the backquote. This was hampered both by the fact that (LAMBDA …) is a macro and because I still couldn’t capture a lexical variable (only its value). Other attempts that lead nowhere are:

(let ((aa "ABCDEFG"))
(let ((ff (read-from-string "(lambda () aa)")))
(funcall ff)))

(let ((aa "ABCDEFG"))
(let ((ff (compile nil (macroexpand '(lambda () aa)))))
(funcall ff)))

The (READ-FROM-STRING …) tells me that AA is UNBOUND. This is unsurprising. Why should it care about my current lexical environment?

The (MACROEXPAND …) gives me a fabulous message under SBCL:

FUNCTION fell through ECASE expression.
Wanted one of (SB-C:LAMBDA-WITH-LEXENV LAMBDA
SB-KERNEL:INSTANCE-LAMBDA SB-INT:NAMED-LAMBDA).
[Condition of type SB-KERNEL:CASE-FAILURE]

It seems like it might be slightly possible to work something out with an extra macro-level and some use of the &ENVIRONMENT lambda list keyword. But, I am not even sure where to start trying to figure out how that all plays together.

The Hack

Having failed to make a good closure on the fly and not being interested in consing as much as the simple implementation above does, I have settled instead for cheating. Here is how I am cheating.

(defun generate-setter (array pre post)
(flet ((compute-index (pre index post)
(apply #'array-row-major-index
(append pre (list index) post))))
(let* ((axis (length pre))
(offset (compute-index pre 0 post))
(second (min 1 (1- (array-dimension array axis))))
(span (- (compute-index pre second post) offset)))
(lambda (index value)
(setf (row-major-aref array (+ (* index span) offset)) value)))))

The idea here is to compute the difference in (ARRAY-ROW-MAJOR-INDEX …) between the first and second elements in the row. (Note: the whole business with (MIN …) is to avoid trouble if there is no second element in the row.) Then, I assume that the $n$ -th element of the row will be $n$ times that difference beyond the initial element of the row.

This works well and quickly. But, it is somewhat cheesy. There is no requirement that my index math be correct for any given implementation. It would shock me to find that some implementation disagreed with the above index math. Still, I’d really like to just fabricate a proper closure around (AREF …) instead.

Can anyone lend me a half a cup of clue?

Public Domain Fourier Transform Library for Common Lisp October 7th, 2009
Patrick Stein

In response to my recent post about genetically selecting cosine waves for image approximation, several reddit commentors said that I should just have taken the Fourier transform, kept the largest 100 coefficients and did the inverse Fourier transform. [I haven’t run the RMS calculation on the result yet, but visually, it looks pretty nice with my test image. More details on that in a later post.]

The Lisp code that I used in that article didn’t actually use Fast Fourier Transforms. To test the commentors’ claims, I needed an FFT library in Lisp. I searched around a bit and found an FFI (Foreign Function Interface) wrapper around the FFTW library. After fiddling with that library and the wrapper for about two hours, I bailed on it and wrote my own Fast Fourier Transform library entirely in Lisp.

More information as well as links to getting the code are here: nklein / Software / CL-FFT.

Image Approximation with Genetically Selected Cosines October 2nd, 2009
Patrick Stein

I started with the target image here on the right. I then used a genetic algorithm to try to find an approximation of this image as a sum of cosine waves.

Each cosine wave has the following form

$f_i(x,y) = A_i \cos \left( 2 \pi \left( x \cos \theta_i + y \cos \theta_i \right) \omega_i + 2 \pi \phi_i \right)$ The genetic algorithm can tweak the amplitude $A_i$ , the orientation in the plane $\theta_i$ , the frequency $\omega_i$ , and the phase $\phi_i$ of each wave.

The image at the right here, is from the 443rd generation of a population of 200 where each gene represents 100 cosine waves. For the video below, I used a population of 100 genes, each representing the sum of 50 cosine waves.

Image Approximation with Genetically Selected Cosines from Patrick Stein on Vimeo.

Nitty Gritty Details

The genes are represented internally as an array of fixed-point numbers. Each number is five-bits integer, ten-bits fractional, and a sign bit. I scale the amplitudes down by a factor of 50 and scale the frequency and offset up by 50% when I convert them back to floating point from the fixed point representation just to get them into a more useful ballpark. During each generation, the fittest two quartiles of the population are cross-bred (in randomly selected pairs with two-point crossover) to replace the third quartile of the population, and the fourth quartile is replaced by mutations of randomly selected members of the top quartile. I also run through and remove any duplicate genes and replace them with randomly generated ones so that I don’t breed myself into a total corner.

I toyed around with two different fitness functions: RMS and entropy. So, I would calculate the image represented by the gene and subtract that from the original image. Then, I would calculate the RMS ( $\frac{\sum_{i=1}^{n} v_i^2}{n}$ ) or the sum of the entropies of the possible values ( $\sum_{v=0}^{255} - p_v \log p_v$ , where $p_v$ is the proportion of the pixels in the resulting image that had value $v$ ).

I bothered with entropy because I was hoping that if I could encode a gene in just a hundred or two hundred bytes that I might be able to compress the remainder (losslessly) and save bytes over just compressing the original image. The big problem with the entropy calculation though is that I get the same answer when every pixel value is off by 100 as I get when every pixel value is right on target. So, for a time, I was multiplying the RMS and the entropy for the fitness function. For the video and images above, I just used the RMS since it reached very close to the same solution as the multiplying method, and I saved myself fifty histograms per generation.

Source Code

Here is the Common Lisp source code for the genetic algorithm. It depends on the CL-PNG library for image input and output. As it appears here, it will create an output image from the fittest member of each generation, an image of the fittest member of the current generation, an image of the original with the fittest member of the current generation subtracted, a Lisp file containing the fittest gene of the current generation, and the current state of the random number generator when the run ends.

For the video above, I quickly put together some code to demonstrate the different parameters that the genetic algorithm can tweak and to show incrementally adding in more of the cosine waves. Here is some general utility code that is display-independent. And, here is the OpenGL display code which depends on the CL-OpenGL libraries. I included the source code here because I had trouble finding an example on the net of using Lisp-created textures with CL-OpenGL. I render the cosine waves into a buffer, copy that buffer into an OpenGL texture, and draw a quad with that texture. Note: it looks a little sluggish when run, but that is because I put a big (SLEEP …) in there to keep things from flying by too quickly.

nklein software

software development and consulting

Lisp Fourier Transform Library Faster October 12th, 2009
Patrick Stein