IRC Graphs July 24th, 2013
Patrick Stein

For some time, I’ve been wanting to find some big data source to dig around in and make plots of. Yesterday, I realized that I have access to #lisp logs from IRC going back several years.

The first question that I wanted to look at was: How well does talkativeness on IRC follow a Power Law?

It looks pretty close when you’re looking at the raw data if you limit yourself to the top 100 to 300 people. Once you get up near the top 500 people, the best-fit curve really skyrockets way through the roof. There are just tons of speakers who have said one or two lines in the given time period. And, I made no effort to track lurkers so I have no zeros in my data set.

Here is a plot of the top 250 speakers (ranked by lines spoken). stassats is the leader, followed by pjb, then H4ns, then Xach. I made a best-effort to collate different handles for the same person (e.g. Xach_ vs. Xach). The least-squares, best-fit power-law curve here is 68435 k^{-1.0638}. So, if we’re going to match the curve exactly, we’ll need stassats to talk more than twice as much. If you’d like to know how much more (or less) you should talk, drop me a note. 🙂

IRC Top 250 Talkers on #lisp by lines spoken

Click on the image above for the full-size version. I used optima.ppcre to read the log files and vecto to draw the graph. Here is the relevant source code: package.lisp, read.lisp, and power.lisp.

5 Responses to “IRC Graphs”

  1. Anton Vodonosov
    2013-07-24 @ 2:27 AM

    That’s interesting 🙂

    If you want other ideas for big data to dig around and make plots, there are cl-test-grid results. We now have considerable history of test results.

    I am thinking about some quality metric – a single number representing (approximately) quality of CL ecosystem at particular point of time. If we have such a metric, we could build a graph and find our, whether CL is improving or not, and how fast.

    The metric could be something like number of tests/number of failed tests or similar.

    • pat
      2013-07-24 @ 9:44 AM

      I have been thinking of using cl-test-grid results sometime, too. I like the idea of trying to summarize or visualize improvement over time.

      I like the idea of a single number for the whole ecosystem, but I also like the idea of finding ways to cram as much data as possible into a single chart to let one plumb the depths.

  2. Zach Beane
    2013-07-24 @ 8:21 AM

    I made a chart showing activity by hour for IRC in 2008:

    http://xach.com/charts/l.html

    Nicks are colored based on the hour of the highest activity for that individual, with three different groups, 7-14, 15-22, 23-6. Code is at https://github.com/xach/wormtrails/blob/master/irc.lisp

    • Lawrence
      2013-07-26 @ 4:32 AM

      Note that least-squares is demonstrably the worst way of fitting power laws. See http://arxiv.org/abs/0706.1062. tl;dr use a maximum likelihood estimator for the power law exponent, do bootstrap resampling to determine goodness of fit

      • pat
        2013-07-26 @ 9:06 AM

        Thank you… I will read that today…

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <br> <cite> <code> <dd> <del datetime=""> <dl> <dt> <em> <i> <img alt="" height="" longdesc="" src="" width=""> <ins datetime="" cite=""> <li> <ol> <p> <q cite=""> <s> <strike> <strong> <sub> <sup> <u> <ul>

l