IRC Graphs :: nklein software

IRC Graphs July 24th, 2013
Patrick Stein

For some time, I’ve been wanting to find some big data source to dig around in and make plots of. Yesterday, I realized that I have access to #lisp logs from IRC going back several years.

The first question that I wanted to look at was: How well does talkativeness on IRC follow a Power Law?

It looks pretty close when you’re looking at the raw data if you limit yourself to the top 100 to 300 people. Once you get up near the top 500 people, the best-fit curve really skyrockets way through the roof. There are just tons of speakers who have said one or two lines in the given time period. And, I made no effort to track lurkers so I have no zeros in my data set.

Here is a plot of the top 250 speakers (ranked by lines spoken). stassats is the leader, followed by pjb, then H4ns, then Xach. I made a best-effort to collate different handles for the same person (e.g. Xach_ vs. Xach). The least-squares, best-fit power-law curve here is $68435 k^{-1.0638}$ . So, if we’re going to match the curve exactly, we’ll need stassats to talk more than twice as much. If you’d like to know how much more (or less) you should talk, drop me a note. 🙂

IRC Top 250 Talkers on #lisp by lines spoken

Click on the image above for the full-size version. I used optima.ppcre to read the log files and vecto to draw the graph. Here is the relevant source code: package.lisp, read.lisp, and power.lisp.

5 Responses to “IRC Graphs”

Anton Vodonosov
2013-07-24 @ 2:27 AM
- ‡ Link
- ‡ Reply
That’s interesting 🙂

If you want other ideas for big data to dig around and make plots, there are cl-test-grid results. We now have considerable history of test results.

I am thinking about some quality metric – a single number representing (approximately) quality of CL ecosystem at particular point of time. If we have such a metric, we could build a graph and find our, whether CL is improving or not, and how fast.

The metric could be something like number of tests/number of failed tests or similar.
- pat
  2013-07-24 @ 9:44 AM
  
  ‡ Link
  ‡ Reply
  I have been thinking of using cl-test-grid results sometime, too. I like the idea of trying to summarize or visualize improvement over time.
  
  I like the idea of a single number for the whole ecosystem, but I also like the idea of finding ways to cram as much data as possible into a single chart to let one plumb the depths.
Zach Beane
2013-07-24 @ 8:21 AM
- ‡ Link
- ‡ Reply
I made a chart showing activity by hour for IRC in 2008:

http://xach.com/charts/l.html

Nicks are colored based on the hour of the highest activity for that individual, with three different groups, 7-14, 15-22, 23-6. Code is at https://github.com/xach/wormtrails/blob/master/irc.lisp
- Lawrence
  2013-07-26 @ 4:32 AM
  
  ‡ Link
  ‡ Reply
  Note that least-squares is demonstrably the worst way of fitting power laws. See http://arxiv.org/abs/0706.1062. tl;dr use a maximum likelihood estimator for the power law exponent, do bootstrap resampling to determine goodness of fit
  - pat
    2013-07-26 @ 9:06 AM
    
    ‡ Link
    ‡ Reply
    Thank you… I will read that today…

nklein software

software development and consulting

IRC Graphs July 24th, 2013
Patrick Stein

5 Responses to “IRC Graphs”

Leave a Reply

RSS Feeds

Links

Categories

Recent Posts

Archives

nklein software

software development and consulting

IRC Graphs July 24th, 2013 Patrick Stein

5 Responses to “IRC Graphs”

Leave a Reply

RSS Feeds

Links

Categories

Recent Posts

Tags

Archives

IRC Graphs July 24th, 2013
Patrick Stein