For some time, I’ve been wanting to find some big data source to dig around in and make plots of. Yesterday, I realized that I have access to #lisp logs from IRC going back several years.
The first question that I wanted to look at was: How well does talkativeness on IRC follow a Power Law?
It looks pretty close when you’re looking at the raw data if you limit yourself to the top 100 to 300 people. Once you get up near the top 500 people, the best-fit curve really skyrockets way through the roof. There are just tons of speakers who have said one or two lines in the given time period. And, I made no effort to track lurkers so I have no zeros in my data set.
Here is a plot of the top 250 speakers (ranked by lines spoken). stassats
is the leader, followed by pjb
, then H4ns
, then Xach
. I made a best-effort to collate different handles for the same person (e.g. Xach_
vs. Xach
). The least-squares, best-fit power-law curve here is . So, if we’re going to match the curve exactly, we’ll need stassats
to talk more than twice as much. If you’d like to know how much more (or less) you should talk, drop me a note. 🙂
Click on the image above for the full-size version. I used optima.ppcre
to read the log files and vecto
to draw the graph. Here is the relevant source code: package.lisp, read.lisp, and power.lisp.