Trying to Unconfound Lisp Speeds June 24th, 2009
Patrick Stein

In the original version of my Optimizing Lisp Some More article, I did a bad comparison between SBCL and Clozure. SBCL supports two different ways to declare the arguments to a function. Clozure only supports one of those ways. As such, my declarations didn’t matter at all to Clozure.

I updated that post with new numbers after putting in both types of declarations. Clozure was much closer to SBCL. I then decided to expand the list to include CMUCL, Allegro (Personal), Lispworks (Personal), ECL, and CLISP. I failed to get GCL or ABCL up and running on my Mac, and Scieneer CL isn’t available for the Mac.

As it turns out, the Allegro and LispWorks versions that I have are heap limited. Thus, they spent a great deal of time cleaning up garbage. To try to even the playing field, I reworked the function to take the ret buffer as a third argument so that allocation is no longer inside the timing loop.

(declaim (ftype (function ((simple-array single-float (12))
                           (simple-array single-float (3))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mvl*-na))
(defun mvl*-na (matrix vec ret)
  (declare (type (simple-array single-float (12)) matrix)
           (type (simple-array single-float (3)) vec)
           (type (simple-array single-float (3)) ret)
           (optimize (speed 3) (safety 0)))
  (loop for jj fixnum from 0 below 3
     do (let ((offset (* jj 4)))
          (declare (type fixnum offset))
          (setf (aref ret jj)
                (+ (aref matrix (+ offset 3))
                   (loop for ii fixnum from 0 below 3
                      for kk fixnum from offset below (+ offset 3)
                      summing (* (aref vec ii)
                                 (aref matrix kk))
                      of-type single-float)))))
  ret)
(let ((matrixes (make-ring-of-matrixes '(12) 4096))
      (vectors  (make-ring-of-matrixes '(3) 4095))
      (ret      (make-array 3 :element-type 'single-float
                              :initial-element 0.0f0)))
  (time (loop for jj fixnum from 1 to 10000000
           for mm in matrixes
           for vv in vectors
           do (mvl*-na mm vv ret))))

Here are the results in terms of total user time, non-GC time, and bytes allocated:

  wall non-GC alloced
SBCL 1.0.29 0.444s 0.444s 0
CMUCL 19f 0.567s 0.567s 0
Clozure-64bit 1.3-r11936 1.272s 1.272s ?? 0
Clozure-32bit 1.3-r11936 5.009s 4.418s 1,200,000,000
Allegro 8.1 (Personal) 6.131s 2.120s 1,440,000,000
LispWorks 5.1 (Personal) 14.054s ?? 3,360,000,480
ECL 9.6.2 33.009s ?? 18,240,000,256
GNU CLISP 2.47 93.190s 77.356s 2,520,000,000

As you can see, I failed to keep most of the implementations from allocating things (especially, ECL). Intriguingly, the 32-bit Clozure allocates a bunch where the 64-bit Clozure doesn’t seem to do so. It looks like Allegro would be pretty competitive if it weren’t using all of this extra memory.

I’m not sure why any of them are allocating with this code. Do they allocate loop counters? loop sums? function parameters? I may delve into the assembly of some of them at a later time. But, at this point, I’m just going to focus on those that don’t cons when I’m not looking.

Even More Optimization June 23rd, 2009
Patrick Stein

In the previous article, I showed a chunk of code for multiplying a vector by a matrix. Nikodemus Silvola, on the sbcl-help mailing list, recommended that I change the matrix to a one-dimensional array instead of a two dimensional array.

Doing that takes me from ten million iterations in 1.038 seconds to ten million iterations in 0.645 seconds. That brings me an overall improvement of 14x from the original, untyped code.

Thanks, all!

For completeness, here is the final code:

(declaim (ftype (function ((simple-array single-float (12))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mvl*))
(defun mvl* (matrix vec)
  (declare (optimize (speed 3) (safety 0)))
  (let ((ret (make-array 3 :initial-element 0.0f0
                           :element-type 'single-float)))
    (loop for jj from 0 below 3
       do (let ((offset (* jj 4)))
            (setf (aref ret jj)
                  (+ (aref matrix (+ offset 3))
                     (loop for ii from 0 below 3
                        for kk from offset below (+ offset 3)
                        summing (* (aref vec ii)
                                   (aref matrix kk))
                        single-float)))))
    ret))

To test this code, I made a ring of 4096 random matrices and a ring of 4095 random vectors. Then, I cycled through each ring as I iterated.

(let ((matrixes (make-ring-of-matrixes '(12) 4096))
      (vectors  (make-ring-of-matrixes '(3) 4095)))
  (time (loop for jj from 1 to 10000000
           for mm in matrixes
           for vv in vectors
           do (mvl* mm vv))))

Making a random matrix is simple enough.

(defun random-matrix (dims)
  (let ((ret (make-array dims :element-type 'single-float)))
    (loop for jj from 0 below (array-total-size ret)
         do (setf (row-major-aref ret jj) (random 1.0f0)))
    ret))

Making a bunch of them into a ring takes slightly more effort (in the form of tail recursion), but not much.

(defun make-ring-of-matrixes (dims count &optional ring first)
  (cond
    ((zerop count) (setf (cdr first) ring))
    (t (let ((mm (cons (random-matrix dims) ring)))
         (make-ring-of-matrixes dims (1- count) mm (or first mm))))))

Optimizing Lisp Some More June 22nd, 2009
Patrick Stein

Today, I spent a large chunk of the day trying optimize a very small chunk of code. The code takes a matrix with three rows and four columns. It multiplies it by a vector that has three rows (and a pretend fourth row that is set to one). Here is the code that I started from:

(defun mv*-nodecl (matrix vec)
  (let ((ret (make-array 3 :initial-element 0.0f0
                           :element-type 'single-float)))
    (loop for jj from 0 below 3
       do (setf (aref ret jj)
                (+ (aref matrix jj 3)
                   (loop for ii from 0 below 3
                      summing (* (aref vec ii)
                                 (aref matrix jj ii))))))
    ret))

I can do four million iterations of this function in 4.113 seconds. As such, it doesn’t seem like it’s necessarily the highest priority to optimize. But, my Lisp optimizing fu is not so great. This, being a small function, seemed like a good case study.

Compiler optimization without any type declarations

First, we’ll just see how it does if we ask SBCL to optimize for speed and throw caution to the wind.

(defun mv*-nodecl-opt (matrix vec)
  (declare (optimize (speed 3) (safety 0))
  (let ((ret (make-array 3 :initial-element 0.0f0
                           :element-type 'single-float)))
    (loop for jj from 0 below 3
       do (setf (aref ret jj)
                (+ (aref matrix jj 3)
                   (loop for ii from 0 below 3
                      summing (* (aref vec ii)
                                 (aref matrix jj ii))))))
    ret))

The compiler spits out 38 notes telling me all of the stuff it failed to optimize for me. Then, it runs four million iterations in 3.944 seconds.

Type declarations without compiler optimizations

So, what can we tackle by declaring the types of the variables and the intermediate values?

(declaim (ftype (function ((simple-array single-float (3 4))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mv*-noopt))
(defun mv*-noopt (matrix vec)
  (let ((ret (make-array 3 :initial-element 0.0f0
                           :element-type 'single-float)))
    (loop for jj from 0 below 3
       do (setf (aref ret jj)
                (+ (aref matrix jj 3)
                   (loop for ii from 0 below 3
                      summing (* (aref vec ii)
                                 (aref matrix jj ii))
                      single-float))))
    ret))

Now, I can do four million iterations in 0.344 seconds. That’s better by a factor of eleven.

Both compiler optimizations and type declarations

Now, if I turn on compiler optimizations and turn off compiler safety:

(declaim (ftype (function ((simple-array single-float (3 4))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mv*))
(defun mv* (matrix vec)
  (declare (optimize (speed 3) (safety 0)))
  (let ((ret (make-array 3 :initial-element 0.0f0
                           :element-type 'single-float)))
    (loop for jj from 0 below 3
       do (setf (aref ret jj)
                (+ (aref matrix jj 3)
                   (loop for ii from 0 below 3
                      summing (* (aref vec ii)
                                 (aref matrix jj ii))
                      single-float))))
    ret))

With that, I am up to 0.394 seconds for four million iterations. I’m not sure why the compiler optimization slows me down slightly. So, I think I am going to nuke the speed request at the moment and call it a day.

Other Lisp Implementations

Justin Grant pointed out in a comment below that Clozure expects the declarations of types to be inside the function instead of externally as I did using the (declaim (ftype …)). So, the numbers in the following paragraph are slightly skewed.

SBCL cleaned Clozure‘s clock. The fully typed and optimized version took 6.272 seconds with 32-bit Clozure while the untyped, unoptimized version took 6.405 seconds. The fully typed and optimized version took 2.029 seconds with 64-bit Clozure while the untyped, unoptimized version too 2.129 seconds. It is hard to say how much of that is overhead with the one to four million loop.

CLisp took 158.738 seconds for the fully typed and optimized version. I didn’t bother letting it run the others.

Here are the numbers for the Lisps that I have running on my Mac with ten million iterations:

  full-decls,
2D-matrix
full-decls,
1D-matrix
no decls,
2D-matrix
SBCL 1.0.29 1.040s 0.633s 8.903s
CMUCL 19f 1.110s 1.960s 25.250s
Clozure-64bit 1.3-r11936 1.775s 1.949s 5.290s
Clozure-32bit 1.3-r11936 8.806s 5.492s 11.455s
Allegro 8.1 (Personal) 8.044s 8.270s 43.220s
LispWorks 5.1 (Personal) 14.092s 14.548s 20.369s
ECL 9.6.2 40.143s 43.651s 38.765s
GNU CLISP 2.47 365.477s 382.488s 362.511s

Interesting tidbits from the above:

  • For the no declarations case, Allegro says it allocated -1,549,934,592 other bytes. I am assuming that it just crossed over 2^{31} and it really meant 2,745,032,704.
  • LispWorks did about 40% better when I selected Compile and Load from the File menu instead of typing (load (compile-file “tm.lisp”)) at the REPL.
  • I tried, but failed, to get GCL and ABCL installed. I may have to try those again some other time.
  • Except for CLISP and ECL, the implementations all allocated significantly more memory for the no-decls case. SBCL allocated ten times as much in the no-decls case. CLISP allocated the same in all cases. ECL allocated 20% less in the no-decls case.

Important Lessons

I ran into all sorts of compiler notes and warnings and errors while trying to optimize this code. I was trying to use (SVREF …) to reference into an array. SVREF means simple-vector reference. I am not allowed to use it on an array. I hadn’t realized before that in a simple-vector, the elements are required to have unspecified type. For a vector or a simple-array, you can specify the type of the elements. For a simple-vector, you cannot. Richard Krueter, on the sbcl-help mailing list, set me straight.

My other big stumbling block was this. The following two lines mean exactly the same thing (they both create an array of three zeros):

(make-array 3 :initial-element 0.0f0 :element-type 'single-float)
(make-array '(3) :initial-element 0.0f0 :element-type 'single-float)

If I wanted a multidimensional array, I would need to use the second form there and put a list of numbers in where the three is. For a one-dimensional array, I can use either the length as a number or a list with the length as its only element.

On the other hand, the following two lines mean very different things:

(declare (type (simple-array single-float 3) foo))
(declare (type (simple-array single-float (3)) foo))

The former means that foo is a three-dimensional array of single-precision floating point numbers with the length of each array dimension left unspecified. The latter means that foo is a one-dimensional array with three single-precision floating point elements.

How I Know I Don’t Know Enough About Lisp Macros June 19th, 2009
Patrick Stein

I have a pretty good grasp on macros in Lisp, C/C++, TeX, m4, and nroff. Lisp macros are, far and away, the most powerful in that set. In fact, Lisp macros get a bad rap by sharing a name with macros in those other languages. Still, I often hit the same wall when trying to write a Lisp macro so I know that there’s more I have to learn.

A Quick Look at Macros

First, let’s review macros for a moment to make sure we’re all on the same page. The C/C++ macros are the simplest to understand so we’ll start there. The C and C++ compilers run in two stages. First, the C preprocessor runs to expand all of your macros. Then, the compiler runs to actually compile your code. In C and C++, macros are declared using the #define command to the preprocessor. For example, you might do something like the following:
Read the rest of this entry ⇒

Design of Everyday Parentheses June 15th, 2009
Patrick Stein

This weekend, I was reading The Design of Everyday Things by Donald Norman. At one point, he is talking about automatic behaviours:

Doing several things at once is essential even in carrying out a single task. To play the piano, we must move the fingers properly over the keyboard while reading the music, manipulating the pedals and listening to the resulting sounds. But to play the piano well we should do these things automatically. Our conscious attention should be focused on the higher levels of the music, on style, and on phrasing. So it is with every skill. The low-level, physical movements should be controlled subconsciously.

[Wow. In typing that paragraph out, I see all sorts of parallels to math, volleyball, cooking, and pretty much every other area of my life. But, for this essay, I’m just going to focus on what brought me here.]

People who are new to Lisp often end up complaining The parentheses. Man, too much with the parentheses. It’s an understandable complaint, but it’s just like complaining There are too many keys on this piano. It is a complaint that will go away after a small amount of practice.

Here is one sample from many, many similar threads in the Usenet news group comp.lang.lisp:

Ken Tilton wrote:
    Bill Dubuque wrote:
        Leo Sarasua wrote:
        Let's face it: all the ((((... and )))) in Lisp are awfully unreadable.
    Seasoned Lisp programmers see past parentheses just as a seasoned
    reader of a printed language sees past the individual characters
    comprising a word.
Parentheses? What parentheses? <g> Like you say, I stopped seeing them
after about a week of serious Lisp.
...
My retort to "what about all those parentheses?" is "do spaces between
words bother you?". <g>

The parenthesis are an unconscious action. They are part of the flow, not an interruption in it. Looking back on the programming classes that I took in high school and college, most everything was at the this is the next piano key level. Part of that is out of necessity. You have to learn where your feet are before you learn to walk or to run or to dance. [And, in a classroom setting, thirty people have to learn where their feet are before any of you can move on.] But, part of it is just tragic omission.

Teachers should share some glimpses of the future: the higher levels of the music, on style, and on phrasing. Teachers should talk about how some day, you’ll no longer be tripping over the parentheses. You will be dancing with them, complex dances, with grace and power.

Live coding demos? If you watch someone skilled at programming doing it, you see that this stuff doesn’t get in the way. With practice, the syntax becomes unconscious and automatic. So, I leave you with some live coding demos. The first is Andrew Sorensen performing with the scheme package Impromptu.

Day of the Triffords from Andrew Sorensen on Vimeo.

The second is Sven Van Caekenberghe implementing a Reddit clone in Lisp in twenty minutes:

Updates In Email

Email:

l