Some simple tools for processing debug output and log files August 29th, 2012
Patrick Stein

In my current job, we do a great deal of “ex post facto” debugging from debug data that we record during operations. To help me do off-the-cuff analysis of these, I make heavy use of egrep and sed. I have created three scripts that encapsulate the most common patterns where I had complicated sed lines before.

The three scripts are:

  • clip – used to pluck out a particular portion of a line based on a regex of what comes before the portion and a regex of what comes after it
  • clipc – like clip except counts the number of occurrences of each unique plucked-out portion
  • clips – like clip but assumes the plucked out portion is numeric and outputs the sum of all of the plucked out portions

For example, if I had a snippet of Apache log files:

127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET / HTTP/1.1" 500 606 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET /favicon.ico HTTP/1.1" 200 766 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET / HTTP/1.1" 500 612 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET /favicon.ico HTTP/1.1" 200 766 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:53:10 +0300] "GET / HTTP/1.1" 500 612 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/1.0" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /style.css HTTP/1.1" 200 614 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /img/pti-round.jpg HTTP/1.1" 200 17524 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:21 +0300] "GET /unix_sysadmin.html HTTP/1.1" 200 3880 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET / HTTP/1.1" 200 34 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET /favicon.ico HTTP/1.1" 200 11514 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:53 +0300] "GET /cgi/pti.pl HTTP/1.1" 500 617 "http:/contact.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/0.9" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:27 +0300] "GET / HTTP/1.1" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:34 +0300] "GET /unix_sysadmin.html HTTP/1.1" 200 3880 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:45 +0300] "GET /talks/Fundamentals/read-excel-file.html HTTP/1.1" 404 311 "http://pti.local/unix_sysadmin.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"

Then, I might like to quickly see how often different URLs are fetched. I could do this with:

% clipc GET < apache.log | sort -n
1 /cgi/pti.pl
1 /img/pti-round.jpg
1 /style.css
1 /talks/Fundamentals/read-excel-file.html
2 /unix_sysadmin.html
3 /favicon.ico
7 /

If I want to see how often GET is done from various IP addresses, I could do this:

% clipc '^' '- .* "GET' < apache.log
8 127.0.0.1
2 139.12.0.2
6 217.0.22.3

If I wanted to add up the number of bytes sent sending successful pages, do:

% clips '" 200' < apache.log
50078

The clip program defaults to having the what-comes-before regex matching open paren, open bracket, double quote, single quote, or equal sign. It defaults to having the what-comes-after regex matching close paren, close bracket, double quote, single quote, comma, or whitespace. So, if I wanted just the first few dates from the above, I wouldn’t need any arguments:

% clip < apache.log | head -3
10/Apr/2007:10:39:11
10/Apr/2007:10:39:11
10/Apr/2007:10:40:54

Or, I might be interested in my busiest minute:

% clipc '\[' ':\d\d ' < apache.log | sort -nr | head -1
8 10/Apr/2007:10:54

Implementations of the Above Scripts

The clip could be written simply in a few lines of Perl:

my $pre  = shift || '[\[("\'=]';
my $post = shift || '[,\s"\')\]]';

while ( my $line = <> ) {
  # look for $pre followed by whitespace then the (non-greedy) portion to keep
  # followed by some (non-greedy) amount of whitespace followed by $post.
  print "$1\n" if ( $line =~ m{$pre\s*(.*?)\s*?$post}o );
}

From there, clipc is simply:

#!/bin/sh
clip "$@" | sort | uniq -c | sed -e 's/^ *//'

And, clips is simply:

#!/bin/sh
clip "$@" | awk '// { SUM += $1 } END { printf("%d\n", SUM) }'

Visualizing Galois Fields (Follow-up) May 21st, 2012
Patrick Stein

In my previous article, I should have finished by remapping the multiplication and addition tables so that the multiplication table was in cyclic order. In cyclic order, the zeroth column (or row) represents zero and the (i+1)-th column (or row) (for i \neq 0) represents a^i for some generator a. As such, the multiplication table is simply 0 in the zeroth row and column and 1 + ((i+j) \mod 255) for the spot (i+1,j+1).

Once resorted by the order of the powers of a generator a, the multiplication table look the same regardless of the a. The addition, however, looks different for different generators. Below are the addition tables for two of them: a = (1+x) on the right and a = x^3 + x^6 + x^7 on the left. They make as decent a stereo as any of the other pairs so far.

 

Here’s the code that I used to generate the remapping for a given generator.

(defun make-cyclic-remapping (fn generator &optional (limit 256))
  (let ((to (make-array (list limit) :initial-element 0))
        (from (make-array (list limit) :initial-element 0))
        (used (make-hash-table :test #'equal)))
    ;; fill up the lookup tables
    (setf (gethash 0 used) t)
    (nlet fill-tables ((exp 1) (acc 1))
      (when (< exp limit)
        (setf (aref to exp) acc
              (aref from acc) exp
              (gethash acc used) t)
        (fill-tables (1+ exp) (funcall fn acc generator))))
    ;; return a closure around the lookup tables
    (when (= (hash-table-count used) limit)
      (lambda (direction n)
        (ecase direction
          (:to (aref to n))
          (:from (aref from n)))))))

If you’ve read it, you can probably tell by my code that I’m still under the influence of Let Over Lambda. If you haven’t read it, it is quite worth the read.

Then, I used a modified version of the draw-heightmap function which also takes in the remapping function.

(defun draw-mapped-heightmap (op map filename &optional (limit 256))
  (let ((png (make-instance 'zpng:pixel-streamed-png
                            :color-type :grayscale
                            :width limit
                            :height limit)))
    (with-open-file (stream filename
                            :direction :output
                            :if-does-not-exist :create
                            :element-type '(unsigned-byte 8))
      (zpng:start-png png stream)
      (dotimes (xx limit)
        (dotimes (yy limit)
          (let* ((a (funcall map :to xx))
                 (b (funcall map :to yy))
                 (c (funcall map :from (funcall op a b))))
            (zpng:write-pixel (list c) png))))
      (zpng:finish-png png)))
  (values))

The Anti-Cons February 27th, 2012
Patrick Stein

Motivation

I no longer remember what led me to this page of synthetic division implemented in various languages. The author provides a common lisp implementation for taking a list representing the coefficients of a polynomial in one variable x and a number \alpha and returning the result of dividing the polynomial by (x-\alpha).

The author states: I’m very sure this isn’t considered Lispy and would surely seem like an awkward port from an extremely Algol-like mindset in the eyes of a seasoned Lisper. In the mood for the exercise, I reworked his code snippet into slightly more canonical lisp while leaving the basic structure the same:

(defun synthetic-division (polynomial divisor)                                  
  (let* ((result (first polynomial))                                            
         (quotient (list result)))                                              
    (dolist (coefficient (rest polynomial))                                    
      (setf result (* result divisor))                                          
      (incf result coefficient)                                                
      (push result quotient))                                                  
    (let ((remainder (pop quotient)))                                          
      (list :quotient (nreverse quotient) :remainder remainder))))

From there, I went on to implement it using tail recursion to get rid of the #'setf and #'incf and #'push:

(defun synthetic-division-2 (polynomial divisor)                                
  (labels ((divide (coefficients remainder quotient)                            
             (if coefficients                                                  
                 (divide (rest coefficients)                                    
                         (+ (* divisor remainder) (first coefficients))        
                         (list* remainder quotient))                            
               (list :quotient (reverse quotient) :remainder remainder))))      
    (divide (rest polynomial) (first polynomial) nil)))

What I didn’t like about this was the complexity of calling the tail-recursive portion. If I just called it like I wished to (divide polynomial 0 nil) then I ended up with one extra coefficient in the answer. This wouldn’t do.

The Joke

There’s an old joke about a physicist, a biologist, and a mathematician who were having lunch at an outdoor café. Just as their food was served, they noticed a couple walk into the house across the street. As they were finishing up their meal, they saw three people walk out of that very same house.

The physicist said, We must have miscounted. Three must have entered before.

The biologist said, They must have pro-created.

The mathematician said, If one more person enters that house, it will again be empty.

The Anti-Cons

What I needed was a more-than-empty list. I needed a negative cons-cell. I needed something to put in place of the nil in (divide polynomial 0 nil) that would annihilate the first thing it was cons-ed to.

I haven’t come up with the right notation to make this clear. It is somewhat like a quasigroup except that there is only one inverse element for all other elements. Let’s denote this annihilator: \omega. Let’s denote list concatenation with \oplus.

Only having one inverse-ish element means we have to give up associativity. For lists a and b, evaluating (a \oplus \omega) \oplus b equals b, but a \oplus (\omega \oplus b) equals a.

Associativity is a small price to pay though for a prettier call to my tail-recursive function, right?

The Basic Operations

For basic operations, I’m going to need the anti-cons itself and a lazy list of an arbitrary number of anti-conses.

(defconstant anti-cons 'anti-cons)

(defclass anti-cons-list ()
  ((length :initarg :length :reader anti-cons-list-length))
  (:default-initargs :length 1))

(defmethod print-object ((obj anti-cons-list) stream)
  (print-unreadable-object (obj stream)
    (prin1 (loop :for ii :from 1 :to (anti-cons-list-length obj)
              :collecting 'anti-cons)
           stream)))

Then, I’m going to make some macros to define generic functions named by adding a minus-sign to the end of a Common Lisp function. The default implementation will simply be the common-lisp function.

(defmacro defun- (name (&rest args) &body methods)
  (let ((name- (intern (concatenate 'string (symbol-name name) "-"))))
    `(defgeneric ,name- (,@args)
       (:method (,@args) (,name ,@args))
       ,@methods)))

I’m even going to go one step further for single-argument functions where I want to override the body for my lazy list of anti-conses using a single form for the body:

(defmacro defun1- (name (arg) a-list-form &body body)
  `(defun- ,name (,arg)
     (:method ((,arg anti-cons-list)) ,a-list-form)
     ,@body))

I need #'cons- to set the stage. I need to be able to cons an anti-cons with a normal list. I need to be able to cons an anti-cons with a list of anti-conses. And, I need to be able to cons something other than an anti-cons with a list of anti-conses.

(defun- cons (a b)
  (:method ((a (eql 'anti-cons)) (b list))
    (if (null b)
        (make-instance 'anti-cons-list)
        (cdr b)))
 
  (:method ((a (eql 'anti-cons)) (b anti-cons-list))
    (make-instance 'anti-cons-list :length (1+ (anti-cons-list-length b))))
 
  (:method (a (b anti-cons-list))
    (let ((b-len (anti-cons-list-length b)))
      (when (> b-len 1)
        (make-instance 'anti-cons-list :length (1- b-len))))))

Now, I can go on to define some simple functions that can take either anti-cons lists or regular Common Lisp lists.

(defun1- length (a) (- (anti-cons-list-length a))

(defun1- car (a) :anti-cons)

(defun1- cdr (a)
  (let ((a-len (anti-cons-list-length a)))
    (when (> a-len 1)
      (make-instance 'anti-cons-list :length (1- a-len)))))

(defun1- cadr (a) (when (> (anti-cons-list-length a) 1) :anti-cons))
(defun1- caddr (a) (when (> (anti-cons-list-length a) 2) :anti-cons))

To give a feel for how this all fits together, here’s a little interactive session:

ANTI-CONS> (cons- anti-cons nil)
#<(ANTI-CONS)>

ANTI-CONS> (cons- anti-cons *)
#<(ANTI-CONS ANTI-CONS)>

ANTI-CONS> (cons- :a *)
#<(ANTI-CONS)>

ANTI-CONS> (length- *)
-1

Denouement

Only forty or fifty lines of code to go from:

(divide (rest polynomial) (first polynomial) nil)

To this:

(divide polynomial 0 (cons anti-cons nil))

Definitely worth it.

My Favorite Macro Patterns February 17th, 2012
Patrick Stein

I read Jorge Tavares’s article on Macro Patterns a few days ago.  I was thinking about replying to mention a few of my favorites:

  • The with- pattern which makes sure a special variable is bound for the body and makes sure the tied resources are released at the end of the block.
  • Macros which collect content (usually into a special variable) so they can do something with the content at the end of the close of the macro.

Then, I was working on something tonight when I re-discovered a favorite pattern that I’d forgotten about: Putting multiple wrappers on the same body.

I am working on an HTML+JavaScript+CSS project. In the end, I need static files. But, I thought I would use the opportunity to really experience CL-Who, Parenscript, and CSS-Lite.

I have now made a macro called define-web-file which takes a CL-Who or Parenscript body and wraps it up as both a Hunchentoot handler and a write-to-file wrapper. Now, I can test interactively with Hunchentoot and generate the whole web application when I’m ready.

Keeping Server and Client Separate September 8th, 2011
Patrick Stein

The problem

Whenever I write client-server applications, I run into the same problem trying to separate the code. To send a message from the server to the client, the server has to serialize that message and the client has to unserialize that message. The server doesn’t need to unserialize that message. The client doesn’t need to serialize that message.

It seems wrong to include both the serialization code and the unserialization code in both client and server when each side will only be using 1/2 of that code. On the other hand, it seems bad to keep the serialization and unserialization code in separate places. You don’t want one side serializing A+B+C and the other side trying to unserialize A+C+D+B.

One approach: data classes

Some projects deal with this situation by making every message have its own data class. You take all of the information that you want to be in the message and plop it into a data class. You then serialize the data class and send the resulting bytes. The other side unserializes the bytes into a data class and plucks the data out of the data class.

The advantage here is that you can have some metaprogram read the data class definition and generate a serializer or unserializer as needed. You’re only out-of-sync if one side hasn’t regenerated since the data class definition changed.

The disadvantage here is that I loathe data classes. If my top-level interface is going to be (send-login username password), then why can’t I just serialize straight from there without having to create a dippy data structure to hold my opcode and two strings?

Another approach: suck it up

Who cares if the client contains both the serialization and unserialization code? Heck, if you’re really all that concerned, then fmakunbound half the universe before you save-lisp-and-die.

Of course, unless you’re using data classes, you’re either going to have code in your client that references a bunch of functions and variables that only exist in your server or your client and server will be identical except for:

(defun main ()
  #+server (server-main)
  #-server (client-main))

Now, of course, your server is going to accidentally depend on OpenGL and OpenAL and SDL and a whole bunch of other -L’s it never actually calls. Meanwhile, your client is going to accidentally depend on Postmodern and Portable-Threads and a whole bunch of other Po-‘s it never actually calls.

Another approach: tangle and weave, baby

Another way that I’ve got around this is to use literate programming tools to let me write the serialiization and unserialization right next to each other in my document. Then, anyone going to change the serialize code would be immediately confronted with the unserialize code that goes with it.

The advantage here is that you can tangle the client code through an entirely separate path than the server code keeping only what you need in each.

The disadvantage here is that now both your client code and your server code have to be in the same document or both include the same sizable chunk of document. And, while there aren’t precisely name-capturing problems, trying to include the “serialize-and-send” chunk in your function in the client code still requires that you use the same variable names that were in that chunk.

How can Lisp make this better?

In Lisp, we can get the benefits of a data-definition language and data classes without needing the data classes. Here’s a snippet of the data definition for a simple client-server protocol.

;;;; protocol.lisp
(userial:make-enum-serializer :opcode (:ping :ping-ack))
(defmessage :ping     :uint32 ping-payload)
(defmessage :ping-ack :uint32 ping-payload)

I’ve declared there are two different types of messages, each with their own opcode. Now, I have macros for define-sender and define-handler that allow me to create functions which have no control over the actual serialization and unserialization. My functions can only manipulate the named message parameters (the value of ping-payload in this case) before serialization or after unserialization but cannot change the serialization or unserialization itself.

With this protocol, the client side has to handle ping messages by sending ping-ack messages. The define-sender macro takes the opcode of the message (used to identify the message fields), the name of the function to create, the argument list for the function (which may include declarations for some or all of the fields in the message), the form to use for the address to send the resulting message to, and any body needed to set fields in the packet based on the function arguments before the serialization. The define-handler macro takes the opcode of the message (again, used to identify the message fields), the name of the function to create, the argument list for the function, the form to use for the buffer to unserialize, and any body needed to act on the unserialized message fields.

;;;; client.lisp
(define-sender  :ping-ack send-ping-ack (ping-payload) *server-address*)
(define-handler :ping     handle-ping   (buffer) buffer
  (send-ping-ack ping-payload))

The server side has a bit more work to do because it’s going to generate the sequence numbers and track the round-trip ping times.

;;;; server.lisp

(defvar *last-ping-payload* 0)
(defvar *last-ping-time*    0)

(define-sender :ping send-ping (who) (get-address-of who)
  (incf *last-ping-payload*)
  (setf *last-ping-time*    (get-internal-real-time)
        ping-payload        *last-ping-payload*))

(define-handler :ping-ack handle-ping-ack (who buffer) buffer
  (when (= ping-payload *last-ping-payload*)
    (update-ping-time who (- (get-internal-real-time) *last-ping-time*))))

Problems with the above

It feels strange to leave compile-time artifacts like the names and types of the message fields in the code after I’ve generated the functions that I’m actually going to use. But, I guess that’s just part of Lisp development. You can’t (easily) unload a package. I can makunbound a bunch of stuff after I’m loaded if I don’t want it to be convenient to modify senders or handlers at run-time.

There is intentional name-capture going on. The names of the message fields become names in the handlers. The biggest problem with this is that the defmessage calls really have to be in the same namespace as the define-sender and define-handler calls.

I still have some work to do on my macros to support &key and &optional and &aux and &rest arguments properly. I will post those macros once I’ve worked out those kinks.

Anyone care to share how they’ve tackled client-server separation before?

Updates In Email

Email:

l