A perl script to create Twitter friend/follow matrices

Geek alert: if the title of this post isn’t a dead giveaway I should tell you — unless you’re interested in APIs and badly-put-together bits of code — this probably isn’t for you.

I’ve recently found myself using a service provided by Damon Clinkscale called DoesFollow. All it does is answer the simple question “does twitter user A follow twitter user B?” Apart from a frill which lets you reverse the order of your question (“does twitter user B follow twitter user A?”) that’s all it does. You can even interrogate it from the address bar like this: http://doesfollow.com/barackobama/mediaczar

does barack obama follow mediaczar?

While I was thinking about how useful a service this is, I was suddenly struck by a moment of clarity. A lot of the research I’ve been doing could be simplified by something like this.
Continue reading

The #interestingOPMLexperiment

Interesting OPML experiment

A couple of weeks ago, I asked a bunch of people to send me their OPML files (for those of you who aren’t aware, an OPML file is what tells your RSS reader what feeds you’ve subscribed to — it can act as a way of moving your subscriptions between readers.) Some of the more trusting among them agreed, and that gave me the raw material for the first bit of my experiment.

Some red herrings

Along the way I uncovered a couple of things that were interesting but not (entirely) relevant to the experiment.

  1. Some people are cagey about sharing their list of feeds: whether they consider it intellectual property, or whether they think that it may be too revealing, I don’t know.
  2. Lots of people said things like “oh — my RSS reader? Haven’t looked at that in a while. I get all my news off Twitter these days.”

Continue reading

Thinking differently about word-of-mouth

The current approach to WOM is to try to stimulate positive WOM while addressing or countering negative WOM. A sort of “accentuate the positive, eliminate the negative and don’t mess with Mr In-Between” strategy.

But what if we could do it a different way?

This idea stems from a conversation I had back in February with Martin Kelly and Andy Cocker of Infectious Media. Since that time I’ve chatted it through a couple of times with various interesting people. It’s not properly thought through yet, but following a chat a couple of weeks ago with Ketchum London’s new Head of Digital, the excellent Fernando Rizo, I’ve decided to put the idea out into the public domain to gauge what (if any) interest there is and whether I should continue to work on it.

“Word of Mouth” is hard to do well

I’ve read lots of word of mouth marketing case studies (there’s a great list over at WOMMA) and it strikes me that WOM is hard to do well for a few reasons. I don’t want to go into these in too much detail, but here are a couple of the structural issues:

  1. Unless I’m a journalist, an A-list blogger or media personality or have some kind of platform, I probably have a very low reach.

    Despite everything pointing towards personal contact being the best impetus for positive word of mouth, most word of mouth campaigns compensate for my low reach by trying to get me to self-service my relationship with the brand and the campaign.

  2. “Viral” distribution just doesn’t work the way most people seem to think it does; and this is particularly true when it comes to WOM.

    While I’m quite likely to tell stories about my personal experience of a brand and fairly likely to tell stories that involve a mutual friend, I’m much less likely to tell stories about other friends’ experience, and not likely at all to tell stories about friends-of-friends.

    Furthermore because of the ‘clumpiness’ of most people’s social graphs, geometric progression (the “I tell two people and they each tell two people and so on” effect) just doesn’t happen.

Homophily

One of the many reasons that WOM works is a thing called homophily — which roughly translates to “birds of a feather flock together”, or “you can tell a man by the company he keeps.”

I’ve written about examples of this before: for example, my analyses of twittering US Congresspersons and Westminster MPs which showed that one can predict with some reasonable degree of accuracy the political colouration of any given twitter account based on their mutual friends and follows (if you want to know more about the methodology, it’s worth reading Robert Hanneman’s chapter on cliques and subgroups.)

But there’s another side to the homophily coin; the social pressure to conform to the group’s norms.
Continue reading

Today’s “Integration Triangle” presentation

These are the slides from a presentation I did this morning on the topic of the Integration Triangle. I’ve talked about this here before in the article “5 Straightforward Ways To Integrate Your Communication Activities” — this includes some quick case studies.

I created these slides to support the presentation I was giving: they aren’t the presentation itself. This means that while you’ll be able to have a good guess at what I was saying most of the time, there will be moments when my meaning is opaque.

There are 70 slides in the presentation, including the front and back cover. Nevertheless, I gave the presentation in under 25 minutes. To save you doing the maths, that averages out at around 3 slides every minute (actually, there was a 4 minute delay in the middle of the presentation — so it’s more like 3-and-a-half slides per minute.)

In fact, my slides fall into two categories — those on which I spend fewer than 5 seconds, and those on which I spend more than a minute. This is more an artistic decision than anything else — I think that lots of slides going past very quickly give an appearance of pace and energy (which I dearly need first thing in the morning), but can rapidly become exhausting to watch and hard to follow without the occasional pause for breath.

Even with 70 slides, there’s so much more that I can say about the “Integration Triangle” as a planning tool — but I was trying to keep this to a single simple message. I’m hoping that (whatever they thought about my presentation, and no matter whether they liked it or believed what I was saying) the audience will remember what it was that I was saying, and be able to tell a version of the story themselves.

There’s just so much that we can talk about when it comes to the whole Digital PR thing that it all becomes rather overwhelming. I’ve just got off the phone to a colleague in Vienna (where I’m speaking next week) who wants me to talk to his audience about “Facebook and Twitter and Blogs” (oh my!) And I’ve got 45 minutes to do this. Of course I can do it. But what on earth is the “one thing” I want them to remember?

Can we calculate party affiliation using Twitter networks? (US Congress Edition)

Using nothing more than their public twitter relationships, is it possible to predict whether a US Congressperson is a Republican or a Democrat? The answer seems to be a guarded “yes” — our tools predict correctly 40/46 times (or around 87% of the cases.)

Calculated Party Affinity US Congress

This post follows on from a post earlier today in which I asked, “can we calculate party affiliation?” The data set in the earlier post was gathered from the 16 members of the UK parliament who are on Twitter and the relationships between them.

Tweetcongress maintains a list of US congresspeople on Twitter. Today (February 13, 2009) there are 76 congresspeople on the service, but when I collected my data set of “who follows who” on February 3, 2009 there were only 65. Of these 65, fully 19 (29%) lived a life of noble isolation with regards the network — none of their peers linked to them, and they in turn linked to none of their peers. Removing these Miss Havishams from the data set leaves me with 46 twittering congresspeople who form a network.

Now as both social network analysis and Aesop would have it, “a man is known by the company he keeps.” What I mean by this is that given the partisan nature of politics, we should expect that Democrats will link to other Democrat twitterers more often than they link to Republican twitterers and vice versa. So that’s what NetDraw[1] , the software I’m using for most of this stuff, looks for, or more accurately:

To identify factions, NetDraw software iteratively searches for a distribution of nodes among a selected number of factions to minimise the number of connections between factions and to maximize the number of connections within factions.

Whatever. So I let NetDraw loose on the data, and here’s what it did.

Calculated Party Affinity US Congress

I coloured the nodes red for Republican and blue for Democrats[2], labeled the nodes by party (for the sake of clarity, and for the hard-of-thinking, that’s “R” for Republican and “D” for Democrat) then counted all the nodes where label said one thing but colour another. There were six of these nodes; so NetDraw got the answer right 40⁄46 of the time (just about 87%.) This is less than the astonishing 93.75% accuracy we got with the Westminster twittering members of parliament in the previous post. Nevertheless I think we can safely say that it’s not a particularly integrated (or bipartisan) network if we can predict party affiliation with quite such success.

Here’s exactly the same map with the errant sheep re-labeled with their proper names so it’ll be easier to refer to them (if it helps, you can click on the image to view or download a larger version.)

congress guesswork incorrect labels

You’ll see, I hope, that NetDraw has made a pretty good fist of the job. Where it has gone wrong on the whole is where the data clearly suggests something else. So Rep. Jared Polis for instance follows (and is followed by) no Democrat peers. Rep. Nancy Pelosi (D) and Sen. Richard Durbin (D) follow each other, but since Pelosi is followed by several Republicans and none of her other Democrat peers you can see why the algorithm has made the incorrect guess that the two of them are Republicans. Long-serving member Neil Abercrombie, as discussed in a previous post on US Congress Twitter folk, forms a bit of a bridge between the two parties, so despite his membership of the Congressional Progressive Caucus and liberal voting record, from the Twitter network point of view, his affiliation is somewhat ambiguous.

Sen. McCain follows none of his peers, and appears to inherit his incorrect attribution from Sen. Susan Collins. For the life of me, I can’t work out what makes it think that Sen. Susan Collins is a Democrat. She really isn’t, you know.

Note 1: NetDraw is a free program written by Steve Borgatti from the University of Kentucky. If you’re interested in playing around with this stuff, you’ll need to get yourself a copy.

Note 2: Actually, that’s not true. Despite a friend sharing the simple mnemonic that “‘Republicans’ and ‘red’ begin with the same letter,” I just can’t get it out of my English head that the Republicans should be blue and the Democrats red. As a result I waste precious minutes re-colouring these maps in Illustrator. It is worth pointing out that I also have problems with “left” and “right” on occasion — preferring instead the binary opposition “left” and “No! no! The other left, for God’s sake!”

Can we calculate party affiliation using Twitter networks? (Westminster edition)

This is a follow-up post to Why doesn’t the Tory MP have Twitter friends? — a report on some early research into the interrelationships between the few Westminster MPs who are on Twitter.

According to Tweetminster, the number of UK MPs on Twitter has doubled since this time last month. Where there were eight Twittering MPs, there are now sixteen. Here’s the map that shows who follows whom (the labels may be too small to read — if you want to see a larger image, click on the map.

Actual factions among Westminster MPs on Twitter

I’ve coloured each node to show party affiliation; for those of you who are unfamiliar with British politics, Labour (our left-of-centre party) shows up in red, Conservatives (our right-of-centre party) in blue, and Liberal Democrats (what it says on the tin) in yellow.

The size of each node represents the individual’s “betweenness centrality” — a network analysis term that helps us place a value on individuals within a network. To give you a sense of what it means, the higher the betweenness centrality of an individual, the greater the impact when you take them out of the network. For those of you who work in large companies, it may be worth noting that senior management’s personal assistants generally have very high betweenness — something that is mostly remarked upon when they go on holiday (simultaneous translation: “take a vacation”.)

So far so good. By now, regular readers will probably be kissing their teeth and thinking “so what?” I’ve done a lot of these Twitter maps in the past and the novelty must be wearing off on you by now.

So here’s the thing. There are a few network analysis techniques that let one identify cliques and factions. What we’ve got here is a small set where we already know what people’s affiliations should be. How interesting, I thought, would it be to see how well the calculated result fits the real world data? Here’s what I found:
Continue reading

Republicans vs. Democrats: Pareto charts of unduplicated Twitter reach

A couple of days ago I did a little more analysis on Republican and Democratic Congresspeople on Twitter. Pareto chart showing unduplicated reach for US congressTowards the end of the post, I realized that the unduplicated reach pareto chart that I’d built would only make sense if the US were a one-party state (or to be fair, if both parties had a single issue that they were united in wanting to promote.)

So — wanting to make this a little more representative — I went back and produced two charts; one showing Republican unduplicated reach (which follows a typical 80:20 distribution)…

Pareto chart showing unduplicated reach of Republican Twitterers in the US Congress
Continue reading

Republicans still outperforming Democrats on TweetCongress

Three weeks ago (and at the prompting of my colleague Eddie Garrett who heads up Porter Novelli DC’s digital team) I mapped out the interconnections between US Congress Tweeters. We’d been working on a Twitter crawler and it seemed like a good opportunity to test things out on a new data set.

This is a follow-up post. Once again it was prompted by a third party: Christie Findlay at Politics Magazine asked whether it would be OK to print a copy of one of the maps in their March edition. I’ve heard that three weeks are a long time in politics, so I thought I’d better run the crawl again just in case. Also I’ve got a new crawler that uses the proper Twitter API (I can see some of your eyes glazing over you know. Just skip ahead when that happens.) I’d tried it out on the Porter Novelli data set, but welcomed a chance to try it on something more meaty.

So yesterday morning before work I ran the crawl. I use the excellent Tweet Congress as my source of information about which congress people are on Twitter.
Continue reading

Porter Novelli Twitter folk – the 80/20 rule

Last weekend I posted a chart of Porter Novelli Twitter folk and their followers. If you read it, you’ll recall that I was dissatisfied by what it implied about the collective reach of Porter Novelli twitterers.The pareto chart should look more like this
Well, thanks to a long-ish train journey to Bolton and back, I was able to fudge a little perl script together to look through the data to find and remove everything other than the first instance of a follower. Let’s make that a little clearer. Let’s say that we’re looking at three Twitter people, Alice, Bob, and Carol. The first thing to do is to see who follows them:

alice bob carol
bob
carol
dave
xerxes
yasmine
zeus
alice
carol
edward
william
xerxes
yasmine
zeus
alice
bob
frank
william
xerxes

Now we need to rank them in order of “who has the most followers” (also known as “popularity” as it happens). Here I’ve done that from left to right. Bob has the most followers and Carol the fewest.

bob alice carol
alice
carol
edward
william
xerxes
yasmine
zeus
bob
carol
dave
xerxes
yasmine
zeus
alice
bob
frank
william
xerxes

And finally we go through from left to right removing all followers who have already shown up on someone else’s list.

bob alice carol
alice
carol
edward
william
xerxes
yasmine
zeus
bob
carol
dave
xerxes
yasmine
zeus
alice
bob
frank
william
xerxes

Bob, being at the top of the list gets to keep all his followers which may seem unfair. But it’s not unfair if the question we’re trying to answer is “how do I reach as many people as possible by speaking to as few people as possible?” That is, I’m looking for reach (marketing people often express themselves in terms of “reach” — or the number of people who are exposed to a message — and “frequency” — or the number of times the average person is exposed to that message.)

Looking at the example above, we can see that Alice really delivers an incremental benefit of two new people, and Carol only reaches one new person. That gives us a much better idea of how valuable the most popular person (Bob) really is.

Applying this to the Porter Novelli data set

Clearly it would be extraordinarily boring to perform the process described above for the 205 people in the Porter Novelli data set that I want to analyse. But the analysis script that I wrote (with plenty of help from the perl monks) goes through exactly these steps. It’s a pretty straightforward job, ranking and deduping. Here’s what we get.

Pareto chart showing unduplicated reach among Porter Novelli Twitter Users

This makes much more sense than the last run. According to the Pareto principle, roughly 80% of the effects should come from 20% of the causes. Here we see that 20% of the Porter Novelli Twitter users (marked in black) account for slightly more than 80% of the reach (marked in red.) It’s pretty much a text-book example. Things are as they should be, I suppose.

More to the point, we can now assign appropriate value to coverage at the head of the graph. This is of great value when thinking about our media planning and engagement

By the way — if you’d like a copy of either the Twitter follower API query engine (it’s a well-behaved command-line thing that was developed by the excellent Joachim Larsen) or the slightly shonky perl script that I wrote on the train, you have only to ask: I’ll be pleased to share. Send me a tweet at @mediaczar and I’ll send you the scripts.

Porter Novelli Twitter folk ranked by number of followers

Yesterday I did a little work with the TwitterCounter API. Today I’ve gone a little further and (purely as an experiment) ranked a list of Twitter people in Porter Novelli by the number of their followers.

What happens if we chart this? Here’s a kind of Pareto chart showing users ranked in order of followers and the total reach that we get at each stage.

Porter Novelli Twitter people ranked by #followers

If you’ve seen this kind of thing before, it looks wrong, doesn’t it? That red curve should be steeper at the beginning and have longer flatter asymptote. If you’ve ever heard of the 80/20 rule this is one of the graphs that describes it. Normally the head of the graph (the first 20% of the x-axis) controls around 80% of the value while the tail (the remaining 80% of the x-axis) controls around 20% of the value. If you’ve ever heard about the long tail, it’s this tail that Chris Anderson et al. are talking about.

What’s wrong with the data?

It’s not so much the data as what I’ve not done with it. There must be many, many duplicated connections here. So now I need to write something that will go through the followers of all the Porter Novelli Twitter usernames in ranked order, and only count unique (or unduplicated) followers.

I’m hoping that when I re-do the chart, it will look something more like this:

The pareto chart should look more like this