Syntactic Corn Syrup June 16th, 2015
Patrick Stein

I’ve been bouncing around between Java and C++ and C and loads of JNI cruft in between. At some point today, I accidentally used a semicolon to separate parameters in my C function declaration:

void JNI_myJNIMethod( int paramA; int paramB; int paramC )
{
  ...
}

It looked wrong to me. But, I had one of those brain-lock moments where I couldn’t tell if it was wrong. I was pretty sure that it was wrong by the time my brain locked on pre-ANSI K&R:

void
JNI_myJNIMethod(paramA, paramB, paramC)
  int paramA;
  int paramB;
  int paramC;
{
  ...
}

Regardless, it got me thinking about the programming maxims: Deleted code has no bugs and Deleted code is debugged code.

I never have this kind of brain-lock in Lisp. Some of that is because my Emacs configuration has been molded to my Lisp habits better than to my C/C++/Java habits. Most of it, though, is that Lisp understands the difference between syntactic sugar and syntactic cruft.

Lisp decided long ago that writing code should be easy even if it makes writing the compiler tougher. C and C++ and Java all decided that LALR(1) was more important than me. As if that weren’t bad enough, C++ and Java have thrown the lexers and parsers under the bus now, too. No one gets a free ride.

Struggling to Keep the Faith April 11th, 2015
Patrick Stein

Six years ago, I wrote about how the choice of programming language dramatically affects the way in which I write code. In that post from six years ago, I said:

In Lisp, I will pull something out into a separate function if it makes the current function more self-contained, more one idea. In C++, I will only pull something out into a separate function if I need the same functionality in multiple places.

At the moment, I’m getting paid to write in C++ for the first time in two years (technically, I suppose I did write some small C++ sample code in my previous job, too). I am struggling at the moment to make my C++ functions as small as I would have made them if I were writing in Lisp.

There are obvious barriers to this imposed by C++. While one of the projects at my work is converting a large swath of code to C++11, my project is still stuck on C++98. This means that I can’t use lambda functions, auto, or for-each style for loops and that I can’t get away with letting the compiler figure out my template types in almost any situation.

For instance, I have one function that is about twenty lines of comments and ten lines of code comprising exactly three statements. Three statements took ten lines of code because when you want to use a pointer to a member function in a std::count_if call, you need to jump through about a hundred characters of declarations and syntax to turn the pointer to a member function of one argument into a regular function of two arguments (using std::mem_fun1_t) and another pile of characters to turn it into a regular function of one parameter again (using std::bind1st). And, I spent nearly an hour trying to glark the type I’d have had to declare in one of the intermediate steps to turn those three statements into four instead. I gave up.

bool MyClass::attemptToFrob( item_t& item )
{
  return item.frob( _frobRetryCount, _frobTimeoutValue );
}

bool MyClass::frobAllOfMyItems()
{
  return performItemOperationOnAllItems( &MyClass::attemptToFrob );
}

bool MyClass::performItemOperationOnAllItems(
                bool (MyClass::*op)(item_t&)
              )
{
  std::mem_fun1_t<bool,MyClass,item_t> binOp(op);
  unsigned int successes = std::count_if( _items.begin(),
                                          _items.end(),
                                          std::bind1st( binOp, this ) );
  return ( _items.size() == successes );
}

In C++11, I’d have written the function as two statements, only one of which had to span more than one line to stay under 80 columns. In C++0X, I’d at least have been able to auto an intermediate statement to hold the result of std::bind1st that I couldn’t manage to satisfy myself.

I really believe that trying to keep the functions down to three or four lines with at most one conditional is a goal that goes a long way toward readability. But, man, it sucks for writability when you’re stuck in a strongly-typed language with a compiler that isn’t at all interested in helping you out with that.

And, a pet peeve here while I’m already ranting about C++…. Am I right in believing that neither C++98, C++0X, C++11, or C++14 have any way to iterate over the keys of a std::map or to wrap a function of one argument that takes a pair and just uses the first half of the pair to invoke the original function? Something like this:

template <class R, class T, class U>
class call1st {
public:
  call1st( R (*fn)(T) ) : _fn(fn) {};
  R operator ( std::pair<T,U>& p ) { return (_fn)(p.first); };
private:
  R (*_fn)(T);
};

Most of why I would want this would be more clear with lambdas instead. But, if there is still going to be crap in the language like std::mem_fun1_ref_t and such, why is there still no functional way (in the #include <functional> sense) to get to the members of a pair? Or, I am just missing something obvious?

Keeping Server and Client Separate September 8th, 2011
Patrick Stein

The problem

Whenever I write client-server applications, I run into the same problem trying to separate the code. To send a message from the server to the client, the server has to serialize that message and the client has to unserialize that message. The server doesn’t need to unserialize that message. The client doesn’t need to serialize that message.

It seems wrong to include both the serialization code and the unserialization code in both client and server when each side will only be using 1/2 of that code. On the other hand, it seems bad to keep the serialization and unserialization code in separate places. You don’t want one side serializing A+B+C and the other side trying to unserialize A+C+D+B.

One approach: data classes

Some projects deal with this situation by making every message have its own data class. You take all of the information that you want to be in the message and plop it into a data class. You then serialize the data class and send the resulting bytes. The other side unserializes the bytes into a data class and plucks the data out of the data class.

The advantage here is that you can have some metaprogram read the data class definition and generate a serializer or unserializer as needed. You’re only out-of-sync if one side hasn’t regenerated since the data class definition changed.

The disadvantage here is that I loathe data classes. If my top-level interface is going to be (send-login username password), then why can’t I just serialize straight from there without having to create a dippy data structure to hold my opcode and two strings?

Another approach: suck it up

Who cares if the client contains both the serialization and unserialization code? Heck, if you’re really all that concerned, then fmakunbound half the universe before you save-lisp-and-die.

Of course, unless you’re using data classes, you’re either going to have code in your client that references a bunch of functions and variables that only exist in your server or your client and server will be identical except for:

(defun main ()
  #+server (server-main)
  #-server (client-main))

Now, of course, your server is going to accidentally depend on OpenGL and OpenAL and SDL and a whole bunch of other -L’s it never actually calls. Meanwhile, your client is going to accidentally depend on Postmodern and Portable-Threads and a whole bunch of other Po-‘s it never actually calls.

Another approach: tangle and weave, baby

Another way that I’ve got around this is to use literate programming tools to let me write the serialiization and unserialization right next to each other in my document. Then, anyone going to change the serialize code would be immediately confronted with the unserialize code that goes with it.

The advantage here is that you can tangle the client code through an entirely separate path than the server code keeping only what you need in each.

The disadvantage here is that now both your client code and your server code have to be in the same document or both include the same sizable chunk of document. And, while there aren’t precisely name-capturing problems, trying to include the “serialize-and-send” chunk in your function in the client code still requires that you use the same variable names that were in that chunk.

How can Lisp make this better?

In Lisp, we can get the benefits of a data-definition language and data classes without needing the data classes. Here’s a snippet of the data definition for a simple client-server protocol.

;;;; protocol.lisp
(userial:make-enum-serializer :opcode (:ping :ping-ack))
(defmessage :ping     :uint32 ping-payload)
(defmessage :ping-ack :uint32 ping-payload)

I’ve declared there are two different types of messages, each with their own opcode. Now, I have macros for define-sender and define-handler that allow me to create functions which have no control over the actual serialization and unserialization. My functions can only manipulate the named message parameters (the value of ping-payload in this case) before serialization or after unserialization but cannot change the serialization or unserialization itself.

With this protocol, the client side has to handle ping messages by sending ping-ack messages. The define-sender macro takes the opcode of the message (used to identify the message fields), the name of the function to create, the argument list for the function (which may include declarations for some or all of the fields in the message), the form to use for the address to send the resulting message to, and any body needed to set fields in the packet based on the function arguments before the serialization. The define-handler macro takes the opcode of the message (again, used to identify the message fields), the name of the function to create, the argument list for the function, the form to use for the buffer to unserialize, and any body needed to act on the unserialized message fields.

;;;; client.lisp
(define-sender  :ping-ack send-ping-ack (ping-payload) *server-address*)
(define-handler :ping     handle-ping   (buffer) buffer
  (send-ping-ack ping-payload))

The server side has a bit more work to do because it’s going to generate the sequence numbers and track the round-trip ping times.

;;;; server.lisp

(defvar *last-ping-payload* 0)
(defvar *last-ping-time*    0)

(define-sender :ping send-ping (who) (get-address-of who)
  (incf *last-ping-payload*)
  (setf *last-ping-time*    (get-internal-real-time)
        ping-payload        *last-ping-payload*))

(define-handler :ping-ack handle-ping-ack (who buffer) buffer
  (when (= ping-payload *last-ping-payload*)
    (update-ping-time who (- (get-internal-real-time) *last-ping-time*))))

Problems with the above

It feels strange to leave compile-time artifacts like the names and types of the message fields in the code after I’ve generated the functions that I’m actually going to use. But, I guess that’s just part of Lisp development. You can’t (easily) unload a package. I can makunbound a bunch of stuff after I’m loaded if I don’t want it to be convenient to modify senders or handlers at run-time.

There is intentional name-capture going on. The names of the message fields become names in the handlers. The biggest problem with this is that the defmessage calls really have to be in the same namespace as the define-sender and define-handler calls.

I still have some work to do on my macros to support &key and &optional and &aux and &rest arguments properly. I will post those macros once I’ve worked out those kinks.

Anyone care to share how they’ve tackled client-server separation before?

XML Parser Generator March 16th, 2010
Patrick Stein

A few years back (for a very generous few), we needed to parse a wide variety of XML strings. It was quite tedious to go from the XML to the native-language representations of the data (even from a DOM version). Furthermore, we needed to parse this XML both in Java and in C++.

I wrote (in Java) an XML parser generator that took an XML description of how you’d like the native-language data structures to look and where in the XML it could find the values for those data structures. The Java code-base for this was ugly, ugly, ugly. I tried several times to clean it up into something publishable. I tried to clean it up several times so that it could actually generate the parser it used to read the XML description file. Alas, the meta-ness, combined with the clunky Java code, kept me from completing the circle.

Fast forward to last week. Suddenly, I have a reason to parse a wide variety of XML strings in Objective C. I certainly didn’t want to pull out the Java parser generator and try to beat it into generating Objective C, too. That’s fortunate, too, because I cannot find any of the copies (in various states of repair) that once lurked in ~/src.

What’s a man to do? Write it in Lisp, of course.

Example

Here’s an example to show how it works. Let’s take some simple XML that lists food items on a menu:

<menu>
        <food name="Belgian Waffles" price="$5.95" calories="650">
                <description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
        </food>
        <!-- ... more food entries, omitted here for brevity ... -->
</menu>

We craft an XML description of how to go from the XML into a native representation:

<parser_generator root="menu" from="/menu">
  <struct name="food item">
    <field type="string" name="name" from="@name" />
    <field type="string" name="price" from="@price" />
    <field type="string" name="description" from="/description/." />
    <field type="integer" name="calories" from="@calories" />
  </struct>

  <struct name="menu">
    <field name="menu items">
      <array>
        <array_element type="food item" from="/food" />
      </array>
    </field>
  </struct>
</parser_generator>

Now, you run the parser generator on the above input file:

% sh parser-generator.sh --language=lisp \
                           --types-package menu \
                           --reader-package menu-reader \
                           --file menu.xml

This generates two files for you: types.lisp and reader.lisp. This is what types.lisp looks like:

(defpackage :menu
  (:use :common-lisp)
  (:export #:food-item
             #:name
             #:price
             #:description
             #:calories
           #:menu
             #:menu-items))

(in-package :menu)

(defclass food-item ()
  ((name :initarg :name :type string)
   (price :initarg :price :type string)
   (description :initarg :description :type string)
   (calories :initarg :calories :type integer)))

(defclass menu ()
  ((menu-items :initarg :menu-items :type list :initform nil)))

I will not bore you with all of reader.lisp as it’s 134 lines of code you never had to write. The only part you need to worry about is the parse function which takes a stream for or pathname to the XML and returns an instance of the menu class. Here is a small snippet though:

;;; =================================================================
;;; food-item struct
;;; =================================================================
(defmethod data progn ((handler sax-handler) (item food-item) path value)
  (with-slots (name price description calories) item
    (case path
      (:|@name| (setf name value))
      (:|@price| (setf price value))
      (:|/description/.| (setf description value))
      (:|@calories| (setf calories (parse-integer value))))))

Where it’s at

I currently have the parser generator generating its own parser (five times fast). I still have a little bit more that I’d like to add to include assertions for things like the minimum number of elements in an array or the minimum value of an integer. I also have a few kinks to work out so that you can return some type other than an instance of a class for cases like this where the menu class just wraps one item.

My next step though is to get it generating Objective C parsers.

Somewhere in there, I’ll post this to a public git repository.

Casting to Integers Considered Harmful August 6th, 2009
Patrick Stein

Background

Many years back, I wrote some ambient music generation code. The basic structure of the code is this: Take one queen and twenty or so drones in a thirty-two dimensional space. Give them each random positions and velocities. Limit the velocity and acceleration of the queen more than you limit the same for the drones. Now, select some point at random for the queen to target. Have the queen accelerate toward that target. Have the drones accelerate toward the queen. Use the average distance from the drones to the queens in the i-th dimension as the volume of the i-th note where the notes are logarithmically spaced across one octave. Clip negative volumes to zero. Every so often, or when the queen gets close to the target, give the queen a new target.

It makes for some interesting ambient noise that sounds a bit like movie space noises where the lumbering enemy battleship is looming in orbit as its center portion spins to create artificial gravity within.

I started working on an iPhone application based on this code. The original code was in C++. The conversion to Objective C was fairly straightforward and fairly painless (as I used the opportunity to try to correct my own faults by breaking things out into separate functions more often).

Visualization troubles

uniform
The original code though chose random positions and velocities from uniform distributions. The iPhone app is going to involve visualization as well as auralization. The picture at the right here is a plot of five thousand points with each coordinate selected from a uniform distribution with range [-20,+20]. Because each axis value is chosen independently, it looks very unnatural.

gauss
What to do? The obvious answer is to use Gaussian random variables instead of uniform ones. The picture at the right here is five thousand points with each coordinate selected from a Gaussian distribution with a standard-deviation of 10. As you can see, this is much more natural looking.

How did I generate the Gaussians?

I have usually used the Box-Muller method of generating two Gaussian-distributed random variables given two uniformly-distributed random variables:

(defun random-gaussian ()
  (let ((u1 (random 1.0))
        (u2 (random 1.0)))
    (let ((mag (sqrt (* -2.0 (log u1))))
          (ang (* 2.0 pi u2)))
      (values (* mag (cos ang))
              (* mag (sin ang))))))

But, I found an article online that shows a more numerically stable version:

(defun random-gaussian ()
  (flet ((pick-in-circle ()
           (loop as u1 = (random 1.0)
                as u2 = (random 1.0)
                as mag-squared = (+ (* u1 u1) (* u2 u2))
                when (< mag-squared 1.0)
                return (values u1 u2 mag-squared))))
    (multiple-value-bind (u1 u2 mag-squared) (pick-in-circle)
      (let ((ww (sqrt (/ (* -2.0 (log mag-squared)) mag-squared))))
        (values (* u1 ww)
                (* u2 ww))))))

For a quick sanity check, I thought, let’s just make sure it looks like a Gaussian. Here, I showed the code in Lisp, but the original code was in Objective-C. I figured, If I just change the function declaration, I can plop this into a short C program, run a few thousand trials into some histogram buckets, and see what I get.

The trouble with zero

So, here comes the problem with zero. I had the following main loop:

#define BUCKET_COUNT 33
#define STDDEV       8.0
#define ITERATIONS   100000

  for ( ii=0; ii < ITERATIONS; ++ii ) {
    int bb = val_to_bucket( STDDEV * gaussian() );
    if ( 0 <= bb && bb < BUCKET_COUNT ) {
      ++buckets[ bb ];
    }
  }

I now present you with three different implementations of the val_to_bucket() function.

int val_to_bucket( double _val ) {
  return (int)_val + ( BUCKET_COUNT / 2 );
}

int val_to_bucket( double _val ) {
  return (int)( _val + (int)( BUCKET_COUNT / 2 ) );
}

int val_to_bucket( double _val ) {
  return (int)( _val + (int)( BUCKET_COUNT / 2 ) + 1 ) - 1;
}

As you can probably guess, after years or reading trick questions, only the last one actually works as far as my main loop is concerned. Why? Every number between -1 and +1 becomes zero when you cast the double to an integer. That’s twice as big a range as any other integer gets. So, for the first implementation, the middle bucket has about twice as many things in it as it should. For the second implementation, the first bucket has more things in it than it should. For the final implementation, the non-existent bucket before the first one is the overloaded bucket. In the end, I used this implementation instead so that I wouldn’t even bias non-existent buckets:

int val_to_bucket( double _val ) {
  return (int)lround(_val) + ( BUCKET_COUNT / 2 );
}
l