Lisp Jump Tables July 12th, 2009
Patrick Stein

A few weeks back, I posted about how I know that I have more to grok about Lisp macros. In that post, I needed to explain Lisp macros a bit for those not in the know. The example that I used was creating a jump table.

Geoff Wozniak just posted an article in his blog wherein he explores using a macro to do jump tables using an array of functions. He performed timing tests using Tim Bradshaw’s ncase macro versus case and switch.

The timing differences there are quite amazing. I’ve toyed around with optimizations quite a bit and have never seen Allegro outperform SBCL before this. It is also interesting that CLISP may already do this jump table optimization. I have never seen CLISP come anywhere close to SBCL or Allegro before this.

Update: Including timing numbers from my own machine

My machine is a 2.4GHz Intel-based Mac with 4GB of RAM. Here are the numbers that I get for this test. (All numbers in milliseconds).

  n case ncase switch
SBCL 1.0.29 10 166 149 207
  100 185 146 191
  1000 572 147 207
CLISP 2.47 10 115 211 188
  100 117 205 219
  1000 111 206 189
ACL 8.1 (heap-limited) 10 10 20 90
  100 20 10 90
  1000 1420 20 130
CMUCL 19f 10 120 120 180
  100 150 120 210
  1000 520 120 210
Clozure v1.3-r11936 10 19 15 128
  100 108 14 130
  1000 1214 14 117

  • With Allegro (ACL), I couldn’t compile easily. I had to load it, then compile it, then load the compiled version. If I just started Allegro and tried to compile, it complained that WITH-GENSYMS was an undefined function.
  • With a heap-limited LispWorks, I failed to even compile the program. *shrug*

Lisp History July 12th, 2009
Patrick Stein

To make sure that I’m extra prepared for my upcoming talk on Lisp Macros, I’ve been scouring the net reading everything I can find that other folks have written about macros.

This set of lecture notes is a really nice explanation of basic Lisp macros and the whole backquote/comma paradigm. I think it’s off base with its Lisp history though.

It says:

In the late 1950’s, John McCarthy invented Lisp to cover many common weaknesses of languages such as FORTRAN, COBOL and ALGOL. Thus, Lisp was the first language to truly embrace concepts such as recursion, first-class treatment of functions (i.e. they are treated just like any other data), weak typing, and macros.

Maybe I am way off base, but my understanding was that McCarthy made Lisp because FORTRAN was useful for computation and nigh impossible for proving anything. Turing machines were useful for proving things and nigh impossible for computation. My understanding was that Lisp was created to be something easier to compute in than a Turing machine yet easier to prove things in than the other languages at the time. My understanding was that Lisp macros came around three or four years later.

I suppose it’s arguable that not being able to prove things in it is a common weakness in programming languages. Heck, you’d be hard pressed to prove anything about Common Lisp without restricting yourself to a ridiculously small subset, and Lisp is way better off than Ruby, Python, Java, C, C#, C++, etc. I suppose it’s also arguable that the lecture notes don’t claim that McCarthy invented Lisp with macros in situ, but rather that Lisp embraced them quickly.

Me, I think it was kind of an accident. The very same things that made it easy to prove things about Lisp programs also made it easy to graft in macros. The functional nature, played out in s-expressions, is suited for macros in a way that infix languages will never be.

Why Not Return A Function? July 2nd, 2009
Patrick Stein

This morning, I was catching up on the RSS feeds I follow. I noticed an interesting snippet of code in the Abstract Heresies journal for the Discrete-time Fourier Transform. Here is his snippet of code:

(define (dtft samples)
  (lambda (omega)
    (sum 0 (vector-length samples)
        (lambda (n)
          (* (vector-ref samples n)
             (make-polar 1.0 (* omega n)))))))

My first thought was That’s way too short. Then, I started reading through it. My next thought was, maybe I don’t understand scheme at all. Then, my next thought was, I do understand this code, it just didn’t do things the way I expected.

Here, I believe is a decent translation of the above into Common Lisp:

(defun dtft (samples)
  #'(lambda (omega)
      (loop for nn from 0 below (length samples)
           summing (let ((angle (* omega nn)))
                     (* (aref samples nn)
                        (complex (cos angle) (sin angle)))))))

Now, what I find most interesting here is that most implementations you’ll find for the DTFT (Discrete-Time Fourier Transform) take an array of samples and a wavelength, and returns a result. This, instead, returns a function which you can call with a wavelength omega. It returns an evaluator. This is an interesting pattern that I will have to try to keep in mind. I have used it in some places before. But, I am sure there are other places that I should have used it, and missed. This is one where I would have missed.

Usage for the above would go something like:

(let ((dd (dtft samples)))
  (dd angle1)
  (dd angle2)
  ...)

For those not into Lisp either (who are you?), here is a rough translation into C++.

#include <cmath>
#include <complex>

class DTFT {
  unsigned int len;
  double* samples;

public:
  DTFT( double* _samples, unsigned int _len ) {
    this->len = _len;
    this->samples = new double[ this->len ];
    for ( unsigned int ii=0; ii < this->len; ++ii ) {
      this->samples[ ii ] = _samples[ ii ];
    }
  }

  ~DTFT( void ) {
    delete[] this->samples;
  }

  std::complex< double > operator () ( double omega ) const {
    std::complex< double > sum( 0.0, 0.0 );
    for ( unsigned int ii=0; ii < this->len; ++ii ) {
      sum += this->samples[ ii ] * std::polar( 1.0, omega );
    }
    return sum;
  }
};

With usage like:

DTFT dd( samples, 1024 );
dd( angle1 );
dd( angle2 );
...

So, six lines of Scheme or Lisp. Twenty-five lines of C++ including explicit definition of a class to act as a pseudo-closure, explicit copying and management of the samples buffer, etc. I suppose, a more direct translation would have used a std::vector to hold the samples and would have just kept a pointer to the buffer. That would have shaved off six or seven lines and the whole len and _len variables.

Speedy Matrix Multiplication in Lisp, Again… June 30th, 2009
Patrick Stein

In response to my earlier post about hidden memory allocations, Jason Cornez pointed out how to rework my loop so that Allegro wouldn’t need to allocate memory. The new structure avoids using the summing keyword in (loop …) and doesn’t use the return-value from (loop …).

(declaim (ftype (function ((simple-array single-float (12))
                           (simple-array single-float (3))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mvl*-na))
(defun mvl*-na (matrix vec ret)
  (declare (type (simple-array single-float (12)) matrix)
           (type (simple-array single-float (3)) vec)
           (type (simple-array single-float (3)) ret)
           (optimize (speed 3) (safety 0)))
  (loop for jj fixnum from 0 below 3
     do (let ((offset (* jj 4)))
          (declare (type fixnum offset))
          (let ((current (aref matrix (+ offset 3))))
            (declare (type single-float current))
            (loop for ii fixnum from 0 below 3
               for kk fixnum from offset below (+ offset 3)
               do (incf current
                        (* (aref vec ii)
                           (aref matrix kk))))
            (setf (aref ret jj) current))))
  ret)

Now, the Allegro moved up in the rankings. All of the implementations (except 32-bit Clozure) ran slightly faster with this version. Allegro improved the most here though as its allocation/garbage dropped to nothing.

  wall non-GC alloced
SBCL 1.0.29 0.406s 0.406s 0
CMUCL 19f 0.539s 0.539s 32
Allegro 8.1 (Personal) 0.720s 0.720s 0
Clozure-64bit 1.3-r11936 1.100s 1.100s ?? 0
Clozure-32bit 1.3-r11936 6.884s 5.023s 1,680,000,000
LispWorks 5.1 (Personal) 12.587s ?? 3,120,000,516
ECL 9.6.2 30.875s ?? 17,280,000,256
GNU CLISP 2.47 88.008s 74.455s 2,160,000,000

CMUCL compiled with 30 Notes. I should look into those some time. It still ran quite fast though.

Allegro definitely liked this version better. It kept from allocating anything.

For comparison, I am including the previous runtimes here as well:

  wall non-GC alloced
SBCL 1.0.29 0.444s 0.444s 0
CMUCL 19f 0.567s 0.567s 0
Clozure-64bit 1.3-r11936 1.272s 1.272s ?? 0
Clozure-32bit 1.3-r11936 5.009s 4.418s 1,200,000,000
Allegro 8.1 (Personal) 6.131s 2.120s 1,440,000,000
LispWorks 5.1 (Personal) 14.054s ?? 3,360,000,480
ECL 9.6.2 33.009s ?? 18,240,000,256
GNU CLISP 2.47 93.190s 77.356s 2,520,000,000

To CLOS or not to CLOS June 29th, 2009
Patrick Stein

I am working on some Lisp code. I am trying to mimic the basic structure of a large C++ project. I think the way the C++ project is structured is a good fit for the tasks involved.

Most of the C++ stuff is done with classes. Most of the methods of those classes are virtual methods. Many of the methods will be called multiple times every hundredth of a second. Very few of the virtual methods ever get overridden by subclasses.

So, I got to thinking. Does it make sense for me to use CLOS stuff at all for most of this? Would it be significantly faster to use (defstruct …) and (defun …) instead of (defclass …) and (defgeneric …)?

My gut instinct was: Yes. My first set of test code didn’t bear that out. For both the classes and the structs, I used (MAKE-INSTANCE …) to allocate and (WITH-SLOTS …) to access. When doing so, the classes with generic functions took about 1.5 seconds for 10,000,000 iterations under SBCL while the structs with non-generic functions took about 2.2 seconds.

For the next iteration of testing, I decided to use accessors instead of slots. I had the following defined:

(defclass base-class ()
  ((slot-one :initform (random 1.0f0) :type single-float
             :accessor class-slot-one)
   (slot-two :initform (random 1.0f0) :type single-float
             :accessor class-slot-two)))

(defclass sub-class (base-class)
  ((slot-three :initform (random 1.0f0) :type single-float
               :accessor class-slot-three)
   (slot-four  :initform (random 1.0f0) :type single-float
               :accessor class-slot-four)))

(defstruct (base-struct)
  (slot-one (random 1.0f0) :type single-float)
  (slot-two (random 1.0f0) :type single-float))

(defstruct (sub-struct (:include base-struct))
  (slot-three (random 1.0f0) :type single-float)
  (slot-four (random 1.0f0) :type single-float))

I then switched from using calls like (slot-value instance ‘slot-three) in my functions to using calls like (class-slot-three instance) or (sub-struct-slot-three instance). Now, 10,000,000 iterations took 2.6 seconds for the classes and 0.3 seconds for the structs in SBCL. In Clozure (64-bit), 10,000,000 iterations with classes and with method-dispatch and with accessors took 11.0 seconds, with classess and without method-dispatch but with accessors took 6.1 seconds, and with structs and without method-dispatch and with accessors took 0.4 seconds.

There is much more that I can explore to tune this code. But, for now, I think the answer is fairly easy. I am going to use structs, functions, and accessors for as many cases as I can. Here is the generic-function, class, struct test code with accessors. The first loop uses classes and generic functions. The second loop uses classes and regular functions. The third loop uses structs and regular functions.

Updates In Email

Email:

l