Exploring Clean Architecture February 4th, 2016
Patrick Stein

This is the first in what will likely be a series of blog posts about Clean Architecture. Uncle Bob Martin has written numerous blog posts and given lots of talks about it.

The goal of Clean Architecture is to have the directory structure of your application shout out what your application does rather than what framework was used to present your application or what database is nestled in the depths of your application. Your program is divided into Entities, Use Cases, and Interface Adapters.

Entities encapsulate “Enterprise-wide business rules.” Use Cases encapsulate “Application-specific business rules.” Interfaces and Adapters represent how your Use Cases want to interact with the outside world (e.g. databases, users, printers, etc.).

In Clean Architecture, the Entities cannot know that the Use Cases exist and the Use Cases cannot know anything about the Adapters except for the Interface to them which is defined by the Use Case rather than by the Adapter. The Use Case does not know whether the application is being used from the command-line or from the web or from a remote service calling into it. The Use Case does not know whether the data is being stored in the file system or in a relational database (or conjured from the ether as needed). Nothing in the Adapters can know anything about the Entities.

Simple Example

I have a project that I am just starting. I thought I would use this new project to see how Clean Architecture works for me.

There are large number of talks and videos about Clean Architecture. However, there are not many examples of it despite several years of Stack Overflow questions and blog posts asking for examples.

There are a few simple examples around the web. The most notable is Mark Paluch’s Clean Architecture Example. It is just big enough to get a sense of how things hang together. If you’re willing to put up with Java’s insane directory hierarchies, you can get a pretty good idea of what the application does just by poking around the Use Cases directory.

My First Use Case

My first Use Case is to let the User browse a list of Book Summaries. The User should be able to sort by Title, Author, Publication Date, or Date the Book was acquired. The User should be able to filter the list based upon Genre or Keyword. The Use Case should allow the caller to implement pagination, so the Use Case needs to support returning up to a given number of Book Summaries starting with a specific number.

Some might argue that that is multiple Use Cases glommed together. If that were the case, then I would need some way to pipeline together Use Cases if I’m going to make any sort of reasonably navigable app atop my Use Cases.

But, let’s start with baby steps.

The Simplified Version of my First Use Case

Let’s just say the User wishes to see a list of all of the Book Summaries. The User is fine with seeing all of them at once in whatever order my app sees fit.

This simple version of the Use Case is implemented in an accompanying repository under the tag the-dream.

Class Diagram (explained below)

The architecture consists of some simple structures with no “business logic” in them at all: book-summary and book.

(defstruct book-summary

(defstruct book

There is one use case browse-books which defines the use-case interface browse-books-use-case along with its input structure browse-books-request and its output structure browse-books-response. The use case defines the method browse-books which must be called with a browse-books-use-case instance, a browse-books-request instance, and browse-books-response instance.

(defstruct browse-books-request)

(defstruct browse-books-response

(defclass browse-books-use-case ()

(defgeneric browse-books (use-case request response))

In my implementation, the browse-books-response is a simple data structure. One could easily imagine that the browse-books method would return one rather than filling one in that was passed to it. In some variants of Clean Architecture (like the Paluch example cited above), the response model is a class instance upon which a method is called to complete the interaction. But, it would have to be clear from the outset that anyone using this Use Case cannot depend on it being synchronous or asynchronous.

The use case also defines the book-repository interface that it needs.

(defclass book-repository ()

(defgeneric find-book-by-isbn (book-repository isbn))
(defgeneric all-books (book-repository))

In the Paluch example, all of the use cases share the same repository interfaces (though Paluch and others have separate repository interfaces for Users and Invoices and Items). In several of Uncle Bob’s videos, he makes the claim (or claims equivalent to the claim) that each use case should define an interface for just the methods it needs to use on a Book repository. In this use case, it would need only the ability to retrieve the list of Books and so I should not have defined find-book-by-isbn here at all, and I should have named this interface browse-books-book-repository.

I wrote browse-books-impl class which extends the browse-book-use-case. It takes an instance of book-repository on construction.

(defclass browse-books-impl (browse-books-use-case)
  ((book-repository :initarg :book-repository :reader book-repository)))

(defun make-browse-books-use-case (book-repository)
  (check-type book-repository book-repository)
  (make-instance 'browse-books-impl :book-repository book-repository))

It uses that to retrieve the list of Books. Then, it creates a book-summary from each book instance retrieved from the book-repository.

(defun summarize-book (book)
  (check-type book book)
  (make-book-summary :isbn (book-isbn book)
                     :title (book-title book)
                     :author (book-author book)
                     :cover-page-thumbnail (book-cover-page-thumbnail book)))

(defmethod browse-books ((use-case browse-books-impl)
                         (request browse-books-request)
                         (response browse-books-response))
  (let* ((books (all-books (book-repository use-case)))
         (summaries (mapcar #'summarize-book books)))
    (setf (browse-books-response-list-of-book-summaries response) summaries))

To test the design so far, I wrote in-memory-book-repository backend which implements the book-repository interface that was defined in the Use Case.

(defclass in-memory-book-repository (book-repository)
  ((books :initarg :books :reader books)))

(defun make-in-memory-book-repository (books)
  (check-type books list)
  (assert (every #'book-p books))
  (make-instance 'in-memory-book-repository :books books))

(defmethod all-books ((book-repository in-memory-book-repository))
  (mapcar #'copy-book (books book-repository)))

I also wrote a console frontend which invokes the browse-books-use-case.

(defun console-browse-books ()
  (let ((request (make-browse-books-request))
        (response (make-browse-books-response)))
    (browse-books *browse-books-use-case* request response)
    (mapcar #'console-print-summary
            (browse-books-response-list-of-book-summaries response))


(defun console-main-loop ()
  (catch 'console-quit
         :do (mapc #'console-print
                      (console-eval (console-read))))))))

To tie it all together, I wrote app-context which holds the current browse-books instance.

(defvar *browse-books-use-case*)

And, I wrote the app which creates an instance of the in-memory-book-repository and creates the browse-books-impl for the app-context. Then, it runs the main loop of the console frontend.

(defun run-console-app-with-memory-db (&optional (books *book-list*))
  (let* ((book-repo (make-in-memory-book-repository books))
         (*browse-books-use-case* (make-browse-books-use-case book-repo)))

Trouble In Paradise

Already, in this simple interface, I am torn. For this Use Case, I do not need the repository to return me the list of Books. I could, instead, ask the repository to return me the list of Book Summaries. If I do that, my application is just a fig-leaf over the repository.

(defgeneric all-book-summaries (book-repository))

Well, the argument against asking the repository for Book Summaries is that it should not be up to the database to decide how I would like to have my Books summarized. That certainly seems like it should be “business logic” and probably “application specific” business logic at that.

So, fine. I will have the repository return Books and the Use Case will summarize them.

Now, let me extend the Use Case the next little bit forward. What if I want to support pagination? My choices are to push the pagination down to the repository so that I can ask it to give me up to 20 Books starting with the 40th Book. Or, I can let the repository give me all of the books and do the pagination in the Use Case.

(defstruct browse-books-request
   start max-results)

Here, I can find no guidance in any of the Clean Architecture videos that I have watched nor in the examples that I have found online. Everyone seems happy with the repositories being able to return one item given that item’s unique identifier or return all of the items.

If the repository is going to return all of the Books, then why wouldn’t my Use Case just return them all and leave the caller to do any pagination that is needed?

This works fine when there are a few dozen books and they are small. It does not scale, and I don’t know how it is supposed to scale without pushing most of the responsibility onto the repository.

(defgeneric all-books-in-range (book-repository start max-results))

Sure, I can push the responsibility onto the repository. But, one of the reasons that Clean Architecture is so structured is to allow easy testing of all of the application logic. The more that I push into the repository, the less that I actually exercise when I run my unit tests with my mock repository (and the more complex my mock repository should probably be).

One possible approach would be to have the all-books method instead be all-isbns. Then, I can retrieve all of the ISBNs and use find-book-by-isbn to get all of the books.

Now, if I want to sort by Author then by Title, I need to:

  • fetch all of the ISBNs all-isbns,
  • fetch each ISBN’s title,
  • sort my ISBNs by title,
  • fetch each ISBN’s author,
  • stable-sort my ISBNs by author,
  • clip to my range,
  • fetch each book in my range,
  • summarize each fetched book

Or, I have to write an SQL query, that can do all of that in one database call instead of 2N + R + 1 calls (where N is the number of books in the database and R is the number of books in my range), making my Use Case a fig-leaf again.

Population in Politics, Simple Frequency Counts January 13th, 2016
Patrick Stein

The first coding assignment of the Data Management and Visualization class that I am doing on Coursera is just to do some frequency analysis on some of the variables that will be involved in the research you want to do.

I am using the 2012 U.S. Presidential Election data broken down by county.

Frequency Counts

The assignment was to do frequency counts. If I did tables of raw frequency counts, the tables would be huge. There are 4588 counties in the data set. There are 4075 different values for the total number of votes cast in a county. As such, I bucketed the counts based upon the power of ten of the value. Here is the output for the total number of votes cast:

CL-USER> (print-log-buckets "Total" #'vote-distribution-votes-cast)
|    Votes Total | Number of counties |
|            1's | 3                  |
|           10's | 72                 |
|          100's | 646                |
|        1,000's | 2102               |
|       10,000's | 1513               |
|      100,000's | 247                |
|    1,000,000's | 5                  |

Here is the output for the total number of votes for Democratic candidates and for the Republican candidates:

CL-USER> (print-log-buckets "Dem" #'vote-distribution-dem)
|      Votes Dem | Number of counties |
|            1's | 15                 |
|           10's | 166                |
|          100's | 1171               |
|        1,000's | 2380               |
|       10,000's | 730                |
|      100,000's | 124                |
|    1,000,000's | 2                  |
CL-USER> (print-log-buckets "GOP" #'vote-distribution-gop)
|      Votes GOP | Number of counties |
|            1's | 12                 |
|           10's | 171                |
|          100's | 937                |
|        1,000's | 2349               |
|       10,000's | 1009               |
|      100,000's | 110                |

With the above frequency count, we can see that of the five counties with over a million votes cast, the Democrats got more than a million votes in two of them whilst the Republicans did not get a million votes in any county.

The numbers are pretty close the whole way through, but that still doesn’t mean a great deal. It could be that the fifteen counties where Democrats got fewer than ten votes were counties with ten thousand votes cast. So, I put together a small function then to get the worst counties for a given party:

(defun get-worst-counties (key &optional (how-many 10))
  (subseq (stable-sort (copy-seq *by-county*)
                       :key (lambda (dist)
                              (/ (funcall key dist)
                                 (max 1
                                      (vote-distribution-votes-cast dist)))))

The worst counties for Democrats and Republicans?

CL-USER> (get-worst-counties #'vote-distribution-dem)
CL-USER> (get-worst-counties #'vote-distribution-gop)

As you can see from this, there are two counties which show no votes cast. In both of those cases, there are no precincts reporting in the data set. The data set tells the number of precincts in the county along with the number of precincts reporting. These counties with none of the precincts reporting are significant glitches in the data. On the other hand, some counties in the data have hundreds of precincts where all but one reported. I could remove a county from the data if not all of its precincts reported. However, I believe that within a county, single precincts will not differ very much from other precincts which were counted in the data. Further, as I do not have any hope of determining the population density down to the precinct level, I am just going to roll with what I have.


I put together some simple utilities around Fare-CSV to retrieve particular columns of a CSV file formatted in particular ways. Here is the source code for those utilities.

One of the things that immediately became apparent is that there are two separate columns in the database labelled "TOTAL VOTES CAST". I wanted to make sure there were no confusion, so I wrote a quick function then to check that both of those columns agree everywhere.

(defun both-total-votes-columns-agree-everywhere ()
  (let ((columns (find-columns-with-label "TOTAL VOTES CAST")))
    (flet ((votes-cast-agrees (*row*)
             (apply #'= (get-columns-as #'parse-integer-allowing-junk
      (every #'votes-cast-agrees (data-rows)))))

(assert (both-total-votes-columns-agree-everywhere))

Spoiler: They do. Whew!

The data here has one row per county. I might have preferred there be one row per county/candidate pair. Regardless, I wrote a short function that takes a party name and all of the columns identifying parties along with the columns identifying how many votes a given party received.

(defun count-votes (party parties votes)
  (loop :for p :in parties
     :for v :in votes
     :when (string= p party)
     :sum v))

For example, this might get arguments party = "DEM", parties = ("DEM" "GOP" "LIB" "GRN" "" "" "" "" "" "" "" "" "" "" "" ""), and votes = (91696 121234 5539 2127 NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL). This function sums up all of the numbers in the votes list where the corresponding entry in the parties list matches the given party.

I made a little data structure to hold the data that I am interested in for each county.

(defstruct vote-distribution
  (state "" :type string)
  (county "" :type string)
  (dem 0 :type integer)
  (gop 0 :type integer)
  (votes-cast 0 :type integer))

I then created a function which returns a function. The returned function returns an instance of my data structure for the row passed into it. Note: the data set contains rows which roll-up the results for a whole state. For those rows, the FIPS code for the county is zero.

(defun make-votes-by-county-data-collector ()
  (let ((state-column  (find-column-with-label "State Postal"))
        (county-column (find-column-with-label "County Name"))
        (fips-column   (find-column-with-label "FIPS Code"))
        (total-column  (find-column-with-label "TOTAL VOTES CAST"))
        (party-columns (find-columns-with-label "Party"))
        (votes-columns (find-columns-with-label "Votes")))

    (lambda (*row*)
      (when (plusp (get-column-as #'parse-integer-allowing-junk fips-column))
        (let* ((state (get-column-as #'string-upcase state-column))
               (county (get-column-as #'string-upcase county-column))
               (total (get-column-as #'parse-integer-allowing-junk
               (parties (get-columns-as #'string-upcase party-columns))
               (votes (get-columns-as #'parse-integer-allowing-junk
          (make-vote-distribution :state state
                                  :county county
                                  :dem (count-votes "DEM" parties votes)
                                  :gop (count-votes "GOP" parties votes)
                                  :votes-cast total))))))

I did that because I originally had all of that functionality in the function which loops over each of the rows in the data set. Now, the function that collects all of these is simpler, but I’m not sure the overall simplicity is much improved.

(defun get-votes-by-county ()
  (loop :with collector := (make-votes-by-county-data-collector)
     :for row :in (data-rows)
     :for dist := (funcall collector row)
     :when dist
     :collect dist))

(defparameter *by-county*
  (stable-sort (get-votes-by-county)
               :key #'vote-distribution-votes-cast))

I created a function to bucket them based on their base-10 logarithm. Of course, this immediately freaked out on the couple of counties for which there is no data in the data set, so I had to take care not to take the logarithm of zero.

(defun log-buckets (&optional (key #'vote-distribution-votes-cast))
  (let ((buckets (make-hash-table :test #'equal))
        (max-bucket 0))
    (labels ((bucket-number (dist)
               (floor (log (max (funcall key dist) 1)
             (incorporate (dist)
               (let ((n (bucket-number dist)))
                 (setf max-bucket (max n max-bucket)
                       (gethash n buckets) (1+ (gethash n buckets 0))))))
      (mapc #'incorporate *by-county*)
      (loop :for n :to max-bucket
         :collect (gethash n buckets 0)))))

I made a wrapper function for that which pretty-prints the results as a table.

(defun print-log-buckets (label &optional (key #'vote-distribution-votes-cast))
  (let ((buckets (loop :for pow :from 0
                    :for buck :in (log-buckets key)
                    :appending (list (expt 10 pow) buck))))
    (format t "+~16,,,'-<~>+~20,,,'-<~>+~%")
    (format t "|~16< Votes ~A ~>| Number of counties |~%" label)
    (format t "+~16,,,'-<~>+~20,,,'-<~>+~%")
    (format t "~{| ~12:D's | ~D ~38T|~%~}" buckets)
    (format t "+~16,,,'-<~>+~20,,,'-<~>+~%")))

Here is the source code for all of the above snippets.

Building Supplies January 8th, 2016
Patrick Stein

I have a couple of nascent projects on my plate which require custom, server-side software. I’ve been trying to use these projects to explore the Clean Architecture concepts using Test-Driven Development (TDD).

I can’t even begin.

According to Clean Architecture, I should start with my application logic independent of whether it will be a command-line tool, a series of tools, a web-application, or what-have-you. So, let me start on the application code.

This is TDD though. So, I need to start with a test. How do I do that?

(ql:quickload :nst)

(nst:def-test-group querying-data () ...)

Wait a minute? I just committed myself to Lisp. I just made a huge business decision, a huge implementation decision, a decision that will shape my life for the next n years, and neither TDD or Clean Architecture had anything to say.

I need building supplies. I can’t go anywhere with my architecture or my development without a programming language.

I want to write my apps in Lisp. One of them, I already have mostly done in Lisp. I am still trepidatious about deploying Lisp.

Why am I trepidatious about deploying Lisp? Is it because there is inadequate support for web programming in Lisp? Absolutely not. Is it because I have some doubt in Lisp’s staying power? Absolutely not.

It’s email. Email is holding me back.

I use my current web hosting provider to give me a LAMP stack upon which I run WordPress for this blog and some git repositories and a bug-tracking database (that I can’t remember how to log into). All of that, I could move with confidence in a few hours.

What I dread is having to collect the several hundred mail-forwards that I currently have along with the half-dozen IMAP accounts and move them anywhere, let alone to somewhere that I have to manage them myself and deal with SPAM and mail queues and crap.

It seems that for only a few bucks more per month than I’m paying now, I can get Plesk on a VPS that should be big enough for my purposes. Does anyone have any experience with Plesk? Is it going to make email painless for me on Ubuntu? Or, am I going to hate my life? Does someone have a VPS provider they strongly recommend?

Do I really have to write my application in PHP on the chance that I’ll want to deploy it on the web just because email is scary?

Population Density in Politics January 3rd, 2016
Patrick Stein

I am taking a Coursera course by Wesleyan University titled Data Management and Visualization.

During Bush v. Gore, there was a ton of freely available data about that election. I did a quick-and-dirty graph showing that Gore won the most populous counties by wide margins and Bush won most of the rest of the counties by wide margins.

For this course, I am going to revisit that analysis with the 2012 election data and maybe the 2008 election data.

The Hypothesis

In the United States, there is a strong correlation between population density and voting for the Democratic candidate for president.

The Data

The 2012, by-county results are available through The Guardian newspaper website. The 2008 election data is available for purchase through Dave Leip’s Election Data Store. The US Census Data website has information available about the population and land area of each U.S. county.

Related Work

My first web search for related data was: correlation population density political party.

This search turns up several articles about a scatterplot by Conor Sen relating the Cook Partisan Voting Index (PVI) plotted against population density based on 2012 data. There are related heat-maps by others from the same data.

That search also turns up a paper by Jowei Chen of the University of Michigan and Jonathan Rodden of Stanford University about why compact voting districts are bad for Democrats. That paper focuses mostly on the shapes of voting districts in Florida and how they have be gerrymandered to make those in population-dense areas very compact while those in less populated areas are tentacled and sprawling and how this results in a higher number of Republican representatives than is warranted by overall population numbers.

A related aspect that shows up in this search is that on specific issues, like transit infrastructure, the congressional voting record is strongly correlated with the population density of the congressperson’s district. This effect is a second-order effect, however. The vote of a congressperson will likely be entwined with what the party as a whole wants as much as (or even more than) their constituents want.

ArcGIS contains a map correlating political affiliation of congresspersons with the population density of their districts.

The Tools

I will, of course, being me, use Common Lisp for all of this. I suspect that I will use Fare-CSV for ingesting CSV data. If I have to parse TIGRE data, I will likely rely on some blend of esrap and CL-EWKB or custom geometry code. For plotting, I will likely rely on Vecto but may also try out some of the other libraries like adw-charting or finally get around to making my own multi-backend charting library.

Grid-Generators v0.4.20151016 (and List-Types v0.4.20151029) Released October 29th, 2015
Patrick Stein

Two new Common Lisp libraries released: GRID-GENERATORS and LIST-TYPES.


I often find myself wanting to iterate over the points in a rectangular region of a grid in an arbitrary number of dimensions. If my grid is only one-dimensional or two-dimensional then I often just write nested loops to traverse the points of interest.

(loop :for y :from 0 :to 10 :by 1/2
    :do (loop :for x :from 0 :to 10 :by 1/2
             :do (something x y)))

However, when I want the code to be flexible in the number of dimensions, I always end up writing application-specific code to increment arbitrarily long lists of integers with given bounds. I finally got sick of repeating this code every time I needed it and created a library.

For the particular application that I have in mind though, I wanted more than just walking the points inside some rectangular hyper-prism. I wanted to traverse all of the points at a give taxicab distance from a starting point.

The GRID-GENERATORS package facilitates generating points in a rectangular hypergrid, generating points based on taxicab distance, and generating points based upon the number of steps in an arbitrary lattice (given the generators of the fundamental parallelpiped of the lattice). The GRID-ITERATE package provides ITERATE clauses for those generators.

For example, one could iterate over the same points in my nested example above like this:

(loop :with generator := (make-grid-generator '(10 10) :by '(1/2 1/2))
    :for (x y) := (funcall generator)
    :while x
    :do (something x y))

Or, with iterate:

  (iterate:for point on-grid-to '(10 10) by '(1/2 1/2))
  (destructuring-bind (x y) point
     (something x y)))

LIST-TYPES package

The LIST-TYPES package provides a way to generate `SATISFIES` type clauses that ensure that a list contains elements all of the given type. For example, if I wanted to ensure that my list was entirely rational numbers, I could use the declaration:

(check-type my-list (list-types:list-of rational))

Updates In Email