Common Lisp vs. C/ASM: Battle FFT October 16th, 2009
Patrick Stein

Dr. David McClain of Refined Audiometrics sent me his code for interfacing with the Apple vDSP library. Here it is along with a Makefile that I put together for it: vdsp-wrapper.tar.gz. I have not actually run the code myself. I started to convert his Lispworks FLI code to CFFI code, but bailed on it in favor of making a minimal CFFI interface to FFTW.

Dr. McClain says that Apple’s vDSP library usually gains him 20% or so versus FFTW. Further, there may well be plenty of alignment issues that should be taken into account in allocating the foreign buffers. Regardless, here is my very basic wrapper around FFTW’s fftw_plan_dft_1d() function: fftw.lisp.

To get this working with SBCL on my Mac, I needed to get FFTW compiling 32-bit shared libraries. By default, it compiles 64-bit static libraries on my system.

####### for 32-bit
% CC="gcc -arch i386" ./configure --prefix=/usr/local --enable-shared

Before I recompiled for 32-bit though, I manually moved the libfftw3.* files in /usr/local/lib into libfftw3_64.* files (of course, moving both the symlink libfftw3.dylib and its target).

Doing the FFI to FFTW does a 1,048,576-sample buffer in 0.202 seconds under SBCL 1.0.30 versus 0.72 seconds for my all-Lisp version. Under Clozure-64, it takes 0.91 seconds versus nearly 30 seconds for the all-Lisp version. I should check to make sure FFTW performs as well when my buffer is randomly filled instead of filled with a constant value. But, yep. There you have it.

So, certainly, if you need high-performance FFTs, you’ll want to FFI wrap something rather than use my all-Lisp version. If you don’t need such high-performance, I give you no-hassle portability.

Adventures in Package Dependencies October 14th, 2009
Patrick Stein

A variety of people have asked how the speed of my Lisp Fourier Transform Library compares to an FFI call to the FFTW Library. I downloaded the FFTW Library, built it, and installed it. Then, I tried to get someone else’s FFI wrappers for the FFTW Library going. I worked on this for two hours several weeks ago. Today, I put almost an hour into it. If it’s going to get done, I’m just going to have to write my own wrapper.

Here’s the problem…. package dependencies. This wouldn’t be so much of a problem if all of the packages were in the Cliki and thus accessible to ASDF-Install. Most of them are not.

To install fftw-ffi, I needed to install ch-image and clem. One of those depended on ch-util. That (and fftw-ffi) depended on ch-asdf. That depended on smarkup. Something there depended on cl-typesetting which depends on cl-pdf. The ch-asdf also depended on cl-bibtex and ch-bib (who knows why? or where to find it?). And, something in the above depended on cl-fad. CL-FAD was the only one in that list that ASDF-Install could find on its own. Several of the typesetting and PDF libraries along with the ch-image library did not build properly. There was much reassuring ASDF-Install that everything was going to be okay even if there were no PGP signatures for half of this or if the PGP signatures didn’t jive with the owner of the website. There was also much reassuring ASDF-Install that I wanted it to try to forge ahead even if it couldn’t build cl-bibtex for me.

I couldn’t find a list of projects supported by clbuild, so I went to build darcs to fetch clbuild. Did you know darcs needs the GNU Multi-precision Library? I’m assuming it uses its own hashing functions that want big integers or something, but I wasn’t really expecting all of that. So, after a half hour getting clbuild, it knows nothing about fftw-ffi.

Alright, maybe I managed to get enough stuff installed that I can just load fftw-ffi now? No dice. The fftw-ffi seems to have been generated with some tool that spits out an XML file and a corresponding C file. Then, it tries to use special ASDF tricks to get the C file compiled. Alas, these blow up on me.

debugger invoked on a ASDF::FORMATTED-SYSTEM-DEFINITION-ERROR:
  don't recognize component type FFTW-FFI-GCC-XML-C-SOURCE-FILE

Erf. Defeated.

Lisp Fourier Transform Library Faster Yet (v1.3) October 13th, 2009
Patrick Stein

I released version 1.3 of my Common Lisp Fourier Transform library today. It is significantly faster than yesterday’s version. On my MacBook Pro in SBCL 1.0.30 with a 512 by 512 by 16 array, version 1.3 takes 3.74 seconds where version 1.2 took 9.77 seconds and version 1.0 took 25.31 seconds. For a breakdown of performance on various array sizes with various Lisp implementations, see the performance section of the library page.

Most of the speed improvement in this version came from memory improvements. Version 1.3 doesn’t cons at all in SBCL during the processing of each row. Version 1.2 inadvertently consed three rows worth of complex numbers for every row transformed.

My library is still about 25% slower than Bordeaux FFT for one-dimensional arrays. My library, however, has full support for multi-dimensional arrays where Bordeaux-FFT does not.

I also included some validation test cases using NST to this release. To try them, go to the library directory, hop in your Lisp, and load regression.lisp.

Lisp Fourier Transform Library Faster October 12th, 2009
Patrick Stein

I released version 1.2 of my Common Lisp Fourier Transform library today. It is significantly faster than the old version. On my MacBook Pro in SBCL 1.0.30 with a 512 by 512 by 16 array, version 1.2 takes 9.77 seconds where version 1.0 took 25.31 seconds. Version 1.2 consed 2.4M while version 1.0 consed 7.4M. Neither version should have to cons nearly that much, so there’s still work to be done.

This release is significantly faster under Clozure64 1.3-r11936, too. For a 512 by 512 array, version 1.2 takes 2.72 seconds (single-threaded) and 2.07 seconds (multi-threaded) where version 1.0 took 4.45 seconds (single-threaded) and 3.06 seconds (multi-threaded).

My library is still about 1/2 to 1/3 the speed of Bordeaux FFT and still allocates memory in situations that Bordeaux does not. Hopefully, I can close those gaps most of the way soon.

Lisp Troubles: Fabricating a Closure… October 7th, 2009
Patrick Stein

Edit: lispnik on Livejournal pointed out this section of the Hyperspec which gaurantees that my pointer math is valid. I’d still like to know how to make a closure like this though, for future reference.

Edit(2): See tfb’s comment below for a solution that does make a closure using a nested (LAMBDA …) inside the backquote.

My Fast Fourier Transform library operates on multi-dimensional arrays. When you do a Fourier Transform on a two-dimensional array, you do a Fourier Transform on each row and then on each column of the result (or each column and then on each row of the result). When you do a three-dimensional Fourier Transform, you do a two-dimensional Fourier Transform on each horizontal slice and then a one-dimensional Fourier Transform on each vertical column.

The problem

To accomplish this on arrays of any number of dimensions, I wrote a wrapper class called virtual-row that one can use to index any axis-parallel row in a data hypercube with an API that makes it look like a one-dimensional array. So, for example, if I had a four-dimensional array, zero-ing one of those rows explicitly might look like this:

(dotimes (kk (array-dimension array 2))
    (setf (aref array 1 3 kk 5) 0)

In my abstracted world, it would look like this:

(let ((row (virtual-row array '(1 3) '(5))))
  (dotimes (kk (row-length row))
    (setf (row-ref row kk) 0)))

A simple, straightforward implementation of the virtual row setter function might look like this:

(defun (setf row-ref) (row index value)
  (with-slots (array pre post) row
    (let ((index (apply #'array-row-major-index
                        array
                        (append pre (list index) post))))
      (setf (row-major-aref index) value))))

The problem with that implementation is that it allocates a new list every time. This function is called O(n \log n) times per row (and there are \sum_{i=1}^{k} \prod_{i\neq j} d_j virtual rows in a k-dimensional hypercuboid where the j-th dimension has d_j entries).

The Goal

What I wanted to do instead was to generate a closure that incorporated the pre and post lists. What I wanted was almost this:

(defun generate-setter (array pre post)
  (eval `(lambda (index value)
           (setf (aref array ,@pre index ,@post) value))))

I say almost because (EVAL …) occurs in the null lexical environment. This means that it doesn’t capture the array variable. If I put a comma in front of the array, then I get a representation of the value to which array is bound, which doesn’t help me. Nor does passing in ’array since array is a lexical variable where I’m calling it from, too.

Putting the (EVAL …) inside the (LAMBDA …) doesn’t help because then it happens at runtime and I just spend all of my time in crazy functions like (APPEND …) and (READ-FROM-STRING …).

I tried assembling a list by hand instead of with the backquote. This was hampered both by the fact that (LAMBDA …) is a macro and because I still couldn’t capture a lexical variable (only its value). Other attempts that lead nowhere are:

(let ((aa "ABCDEFG"))
  (let ((ff (read-from-string "(lambda () aa)")))
    (funcall ff)))

(let ((aa "ABCDEFG"))
  (let ((ff (compile nil (macroexpand '(lambda () aa)))))
    (funcall ff)))

The (READ-FROM-STRING …) tells me that AA is UNBOUND. This is unsurprising. Why should it care about my current lexical environment?

The (MACROEXPAND …) gives me a fabulous message under SBCL:

FUNCTION fell through ECASE expression.                                        
Wanted one of (SB-C:LAMBDA-WITH-LEXENV LAMBDA                                  
               SB-KERNEL:INSTANCE-LAMBDA SB-INT:NAMED-LAMBDA).
   [Condition of type SB-KERNEL:CASE-FAILURE]

It seems like it might be slightly possible to work something out with an extra macro-level and some use of the &ENVIRONMENT lambda list keyword. But, I am not even sure where to start trying to figure out how that all plays together.

The Hack

Having failed to make a good closure on the fly and not being interested in consing as much as the simple implementation above does, I have settled instead for cheating. Here is how I am cheating.

(defun generate-setter (array pre post)
  (flet ((compute-index (pre index post)
           (apply #'array-row-major-index
                  (append pre (list index) post))))
    (let* ((axis   (length pre))
           (offset (compute-index pre 0 post))
           (second (min 1 (1- (array-dimension array axis))))
           (span   (- (compute-index pre second post) offset)))
      (lambda (index value)
        (setf (row-major-aref array (+ (* index span) offset)) value)))))

The idea here is to compute the difference in (ARRAY-ROW-MAJOR-INDEX …) between the first and second elements in the row. (Note: the whole business with (MIN …) is to avoid trouble if there is no second element in the row.) Then, I assume that the n-th element of the row will be n times that difference beyond the initial element of the row.

This works well and quickly. But, it is somewhat cheesy. There is no requirement that my index math be correct for any given implementation. It would shock me to find that some implementation disagreed with the above index math. Still, I’d really like to just fabricate a proper closure around (AREF …) instead.

Can anyone lend me a half a cup of clue?

Updates In Email

Email:

l