Yesterday I did a little work with the TwitterCounter API. Today I’ve gone a little further and (purely as an experiment) ranked a list of Twitter people in Porter Novelli by the number of their followers.
What happens if we chart this? Here’s a kind of Pareto chart showing users ranked in order of followers and the total reach that we get at each stage.
If you’ve seen this kind of thing before, it looks wrong, doesn’t it? That red curve should be steeper at the beginning and have longer flatter asymptote. If you’ve ever heard of the 80/20 rule this is one of the graphs that describes it. Normally the head of the graph (the first 20% of the x-axis) controls around 80% of the value while the tail (the remaining 80% of the x-axis) controls around 20% of the value. If you’ve ever heard about the long tail, it’s this tail that Chris Anderson et al. are talking about.
What’s wrong with the data?
It’s not so much the data as what I’ve not done with it. There must be many, many duplicated connections here. So now I need to write something that will go through the followers of all the Porter Novelli Twitter usernames in ranked order, and only count unique (or unduplicated) followers.
I’m hoping that when I re-do the chart, it will look something more like this:


Pingback: Pareto Novelli — Some Q&As | Mediaczar