Today, I spent a large chunk of the day trying optimize a very small chunk of code. The code takes a matrix with three rows and four columns. It multiplies it by a vector that has three rows (and a pretend fourth row that is set to one). Here is the code that I started from:
(defun mv*nodecl (matrix vec)
(let ((ret (makearray 3 :initialelement 0.0f0
:elementtype 'singlefloat)))
(loop for jj from 0 below 3
do (setf (aref ret jj)
(+ (aref matrix jj 3)
(loop for ii from 0 below 3
summing (* (aref vec ii)
(aref matrix jj ii))))))
ret))
I can do four million iterations of this function in 4.113 seconds. As such, it doesn’t seem like it’s necessarily the highest priority to optimize. But, my Lisp optimizing fu is not so great. This, being a small function, seemed like a good case study.
Compiler optimization without any type declarations
First, we’ll just see how it does if we ask SBCL to optimize for speed and throw caution to the wind.
(defun mv*nodeclopt (matrix vec)
(declare (optimize (speed 3) (safety 0))
(let ((ret (makearray 3 :initialelement 0.0f0
:elementtype 'singlefloat)))
(loop for jj from 0 below 3
do (setf (aref ret jj)
(+ (aref matrix jj 3)
(loop for ii from 0 below 3
summing (* (aref vec ii)
(aref matrix jj ii))))))
ret))
The compiler spits out 38 notes telling me all of the stuff it failed to optimize for me. Then, it runs four million iterations in 3.944 seconds.
Type declarations without compiler optimizations
So, what can we tackle by declaring the types of the variables and the intermediate values?
(declaim (ftype (function ((simplearray singlefloat (3 4))
(simplearray singlefloat (3)))
(simplearray singlefloat (3))) mv*noopt))
(defun mv*noopt (matrix vec)
(let ((ret (makearray 3 :initialelement 0.0f0
:elementtype 'singlefloat)))
(loop for jj from 0 below 3
do (setf (aref ret jj)
(+ (aref matrix jj 3)
(loop for ii from 0 below 3
summing (* (aref vec ii)
(aref matrix jj ii))
singlefloat))))
ret))
Now, I can do four million iterations in 0.344 seconds. That’s better by a factor of eleven.
Both compiler optimizations and type declarations
Now, if I turn on compiler optimizations and turn off compiler safety:
(declaim (ftype (function ((simplearray singlefloat (3 4))
(simplearray singlefloat (3)))
(simplearray singlefloat (3))) mv*))
(defun mv* (matrix vec)
(declare (optimize (speed 3) (safety 0)))
(let ((ret (makearray 3 :initialelement 0.0f0
:elementtype 'singlefloat)))
(loop for jj from 0 below 3
do (setf (aref ret jj)
(+ (aref matrix jj 3)
(loop for ii from 0 below 3
summing (* (aref vec ii)
(aref matrix jj ii))
singlefloat))))
ret))
With that, I am up to 0.394 seconds for four million iterations. I’m not sure why the compiler optimization slows me down slightly. So, I think I am going to nuke the speed request at the moment and call it a day.
Other Lisp Implementations
Justin Grant pointed out in a comment below that Clozure expects the declarations of types to be inside the function instead of externally as I did using the (declaim (ftype …)). So, the numbers in the following paragraph are slightly skewed.
SBCL cleaned Clozure‘s clock. The fully typed and optimized version took 6.272 seconds with 32bit Clozure while the untyped, unoptimized version took 6.405 seconds. The fully typed and optimized version took 2.029 seconds with 64bit Clozure while the untyped, unoptimized version too 2.129 seconds. It is hard to say how much of that is overhead with the one to four million loop.
CLisp took 158.738 seconds for the fully typed and optimized version. I didn’t bother letting it run the others.
Here are the numbers for the Lisps that I have running on my Mac with ten million iterations:

fulldecls, 2Dmatrix 
fulldecls, 1Dmatrix 
no decls, 2Dmatrix 
SBCL 1.0.29 
1.040s 
0.633s 
8.903s 
CMUCL 19f 
1.110s 
1.960s 
25.250s 
Clozure64bit 1.3r11936 
1.775s 
1.949s 
5.290s 
Clozure32bit 1.3r11936 
8.806s 
5.492s 
11.455s 
Allegro 8.1 (Personal) 
8.044s 
8.270s 
43.220s 
LispWorks 5.1 (Personal) 
14.092s 
14.548s 
20.369s 
ECL 9.6.2 
40.143s 
43.651s 
38.765s 
GNU CLISP 2.47 
365.477s 
382.488s 
362.511s 
Interesting tidbits from the above:
 For the no declarations case, Allegro says it allocated
1,549,934,592 other bytes
. I am assuming that it just crossed over and it really meant 2,745,032,704.
 LispWorks did about 40% better when I selected Compile and Load from the File menu instead of typing (load (compilefile “tm.lisp”)) at the REPL.
 I tried, but failed, to get GCL and ABCL installed. I may have to try those again some other time.
 Except for CLISP and ECL, the implementations all allocated significantly more memory for the nodecls case. SBCL allocated ten times as much in the nodecls case. CLISP allocated the same in all cases. ECL allocated 20% less in the nodecls case.
Important Lessons
I ran into all sorts of compiler notes and warnings and errors while trying to optimize this code. I was trying to use (SVREF …) to reference into an array. SVREF means simplevector reference. I am not allowed to use it on an array. I hadn’t realized before that in a simplevector, the elements are required to have unspecified type. For a vector or a simplearray, you can specify the type of the elements. For a simplevector, you cannot. Richard Krueter, on the sbclhelp mailing list, set me straight.
My other big stumbling block was this. The following two lines mean exactly the same thing (they both create an array of three zeros):
(makearray 3 :initialelement 0.0f0 :elementtype 'singlefloat)
(makearray '(3) :initialelement 0.0f0 :elementtype 'singlefloat)
If I wanted a multidimensional array, I would need to use the second form there and put a list of numbers in where the three is. For a onedimensional array, I can use either the length as a number or a list with the length as its only element.
On the other hand, the following two lines mean very different things:
(declare (type (simplearray singlefloat 3) foo))
(declare (type (simplearray singlefloat (3)) foo))
The former means that foo is a threedimensional array of singleprecision floating point numbers with the length of each array dimension left unspecified. The latter means that foo is a onedimensional array with three singleprecision floating point elements.