Speedy Matrix Multiplication in Lisp, Again… June 30th, 2009
Patrick Stein

In response to my earlier post about hidden memory allocations, Jason Cornez pointed out how to rework my loop so that Allegro wouldn’t need to allocate memory. The new structure avoids using the summing keyword in (loop …) and doesn’t use the return-value from (loop …).

(declaim (ftype (function ((simple-array single-float (12))
                           (simple-array single-float (3))
                           (simple-array single-float (3)))
                          (simple-array single-float (3))) mvl*-na))
(defun mvl*-na (matrix vec ret)
  (declare (type (simple-array single-float (12)) matrix)
           (type (simple-array single-float (3)) vec)
           (type (simple-array single-float (3)) ret)
           (optimize (speed 3) (safety 0)))
  (loop for jj fixnum from 0 below 3
     do (let ((offset (* jj 4)))
          (declare (type fixnum offset))
          (let ((current (aref matrix (+ offset 3))))
            (declare (type single-float current))
            (loop for ii fixnum from 0 below 3
               for kk fixnum from offset below (+ offset 3)
               do (incf current
                        (* (aref vec ii)
                           (aref matrix kk))))
            (setf (aref ret jj) current))))
  ret)

Now, the Allegro moved up in the rankings. All of the implementations (except 32-bit Clozure) ran slightly faster with this version. Allegro improved the most here though as its allocation/garbage dropped to nothing.

  wall non-GC alloced
SBCL 1.0.29 0.406s 0.406s 0
CMUCL 19f 0.539s 0.539s 32
Allegro 8.1 (Personal) 0.720s 0.720s 0
Clozure-64bit 1.3-r11936 1.100s 1.100s ?? 0
Clozure-32bit 1.3-r11936 6.884s 5.023s 1,680,000,000
LispWorks 5.1 (Personal) 12.587s ?? 3,120,000,516
ECL 9.6.2 30.875s ?? 17,280,000,256
GNU CLISP 2.47 88.008s 74.455s 2,160,000,000

CMUCL compiled with 30 Notes. I should look into those some time. It still ran quite fast though.

Allegro definitely liked this version better. It kept from allocating anything.

For comparison, I am including the previous runtimes here as well:

  wall non-GC alloced
SBCL 1.0.29 0.444s 0.444s 0
CMUCL 19f 0.567s 0.567s 0
Clozure-64bit 1.3-r11936 1.272s 1.272s ?? 0
Clozure-32bit 1.3-r11936 5.009s 4.418s 1,200,000,000
Allegro 8.1 (Personal) 6.131s 2.120s 1,440,000,000
LispWorks 5.1 (Personal) 14.054s ?? 3,360,000,480
ECL 9.6.2 33.009s ?? 18,240,000,256
GNU CLISP 2.47 93.190s 77.356s 2,520,000,000

2 Responses to “Speedy Matrix Multiplication in Lisp, Again…”

  1. pat
    2009-07-06 @ 10:08 PM

    As malkia noted in another thread, LispWorks has a few additional flags that keep it from wrapping single-precision floats so much. With the addition of (float 0) (safety 0), the LispWorks time drops to 0.562 seconds with only 516 bytes allocated.

  2. leonardo
    2009-09-02 @ 6:40 PM

    For comparison I have created a simple D (V1) version; it uses a matrix and an array, it’s not the faster possible version.
    You may compile it yourself with LDC (codepad runs it, but it doesn’t use any optimization flags):
    http://codepad.org/SxUHubIn
    Email me if you have questions.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <br> <cite> <code> <dd> <del datetime=""> <dl> <dt> <em> <i> <img alt="" height="" longdesc="" src="" width=""> <ins datetime="" cite=""> <li> <ol> <p> <q cite=""> <s> <strike> <strong> <sub> <sup> <u> <ul>

l