<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Trying to Unconfound Lisp Speeds</title>
	<atom:link href="http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/feed/" rel="self" type="application/rss+xml" />
	<link>http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/</link>
	<description>software development and consulting</description>
	<lastBuildDate>Sun, 20 May 2012 18:07:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: pat</title>
		<link>http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/comment-page-1/#comment-136</link>
		<dc:creator>pat</dc:creator>
		<pubDate>Tue, 30 Jun 2009 18:24:06 +0000</pubDate>
		<guid isPermaLink="false">http://nklein.com/?p=616#comment-136</guid>
		<description>Indeed, that is a great deal faster, and without allocating.  Thank you.  I think I will write the numbers up in a separate article.  But, reworking it with &lt;em&gt;(incf&#160;...)&lt;/em&gt; instead (and I used a local variable to avoid doing &lt;em&gt;(aref&#160;ret&#160;jj)&lt;/em&gt; multiple times) resulted in 0.40 seconds for SBCL and 0.72 seconds for Allegro.</description>
		<content:encoded><![CDATA[<p>Indeed, that is a great deal faster, and without allocating.  Thank you.  I think I will write the numbers up in a separate article.  But, reworking it with <em>(incf&nbsp;&#8230;)</em> instead (and I used a local variable to avoid doing <em>(aref&nbsp;ret&nbsp;jj)</em> multiple times) resulted in 0.40 seconds for SBCL and 0.72 seconds for Allegro.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Cornez</title>
		<link>http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/comment-page-1/#comment-135</link>
		<dc:creator>Jason Cornez</dc:creator>
		<pubDate>Tue, 30 Jun 2009 14:11:02 +0000</pubDate>
		<guid isPermaLink="false">http://nklein.com/?p=616#comment-135</guid>
		<description>In Allegro 8.1, try the following formulation instead. It avoids the boxing and results in no extra memory allocation. Hence it is quite a bit faster. Unless I’ve made a silly mistake, it should compute the same result…

-Jason

&lt;code&gt;
(defun mvl*-acl (matrix vec ret)
  (declare (type (simple-array single-float (12)) matrix)
           (type (simple-array single-float (3)) vec)
           (type (simple-array single-float (3)) ret)
           (optimize (speed 3) (safety 0)))
  (loop for jj fixnum from 0 below 3
     do (let ((offset (* jj 4)))
          (declare (type fixnum offset))
          (setf (aref ret jj) (aref matrix (+ offset 3)))
          (loop for ii fixnum from 0 below 3
              for kk fixnum from offset below (+ offset 3)
              do (incf (aref ret jj) (* (aref vec ii)
                                        (aref matrix kk))))))
  ret)
&lt;/code&gt;

</description>
		<content:encoded><![CDATA[<p>In Allegro 8.1, try the following formulation instead. It avoids the boxing and results in no extra memory allocation. Hence it is quite a bit faster. Unless I’ve made a silly mistake, it should compute the same result…</p>
<p>-Jason</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><pre class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(defun mvl*-acl (matrix vec ret)
  (declare (type (simple-array single-float (12)) matrix)
           (type (simple-array single-float (3)) vec)
           (type (simple-array single-float (3)) ret)
           (optimize (speed 3) (safety 0)))
  (loop for jj fixnum from 0 below 3
     do (let ((offset (* jj 4)))
          (declare (type fixnum offset))
          (setf (aref ret jj) (aref matrix (+ offset 3)))
          (loop for ii fixnum from 0 below 3
              for kk fixnum from offset below (+ offset 3)
              do (incf (aref ret jj) (* (aref vec ii)
                                        (aref matrix kk))))))
  ret)</pre></div>
]]></content:encoded>
	</item>
	<item>
		<title>By: pat</title>
		<link>http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/comment-page-1/#comment-130</link>
		<dc:creator>pat</dc:creator>
		<pubDate>Sun, 28 Jun 2009 17:16:12 +0000</pubDate>
		<guid isPermaLink="false">http://nklein.com/?p=616#comment-130</guid>
		<description>It does seem likely that it is boxing and unboxing floats that is causing the allocations.  I would have hoped some of that could be done on the stack instead of in the heap, but....

If I get some time in the near future, I may explore the respective documentation to see how one is &quot;supposed to&quot; do such things with minimal memory thrashing.

Thanks....</description>
		<content:encoded><![CDATA[<p>It does seem likely that it is boxing and unboxing floats that is causing the allocations.  I would have hoped some of that could be done on the stack instead of in the heap, but&#8230;.</p>
<p>If I get some time in the near future, I may explore the respective documentation to see how one is &#8220;supposed to&#8221; do such things with minimal memory thrashing.</p>
<p>Thanks&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan</title>
		<link>http://nklein.com/2009/06/trying-to-unconfound-lisp-speeds/comment-page-1/#comment-129</link>
		<dc:creator>Nathan</dc:creator>
		<pubDate>Sat, 27 Jun 2009 18:24:59 +0000</pubDate>
		<guid isPermaLink="false">http://nklein.com/?p=616#comment-129</guid>
		<description>The consing likely comes from allocating boxed single floats in the generic AREF routines and/or the arithmetic routines.  SBCL and CMUCL know how to inline AREF on single-float arrays and the arithmetic routines so they don&#039;t have to cons.  Allegro and Lispworks should be able to do that too; Lispworks might require (FLOAT 0) or similar.  I should think Clozure CL can do that too, although the 32-bit version obviously doesn&#039;t.  I do know that Clozure has different boxed representations for single-floats in 32-bit vs. 64-bit (the 64-bit boxed representation doesn&#039;t require allocating any extra memory), so maybe it&#039;s not inlining, but just relying on the generic version, which happens to not cons on 64-bit implementations.</description>
		<content:encoded><![CDATA[<p>The consing likely comes from allocating boxed single floats in the generic AREF routines and/or the arithmetic routines.  SBCL and CMUCL know how to inline AREF on single-float arrays and the arithmetic routines so they don&#8217;t have to cons.  Allegro and Lispworks should be able to do that too; Lispworks might require (FLOAT 0) or similar.  I should think Clozure CL can do that too, although the 32-bit version obviously doesn&#8217;t.  I do know that Clozure has different boxed representations for single-floats in 32-bit vs. 64-bit (the 64-bit boxed representation doesn&#8217;t require allocating any extra memory), so maybe it&#8217;s not inlining, but just relying on the generic version, which happens to not cons on 64-bit implementations.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

