Orange Project – The problem with retweets


In my last post, I looked at how to count bigrams, and touched in passing on their value to the keyword researcher. It’s notable when looking at Twitter data how many of those bigrams are in the form “rt @{username}”, and how they’re distributed. In the 7 days of tweets that I’m using as my […]

Orange Project Step 4 – Bigrams


So far I’ve managed to do some very simple keyword identification; nothing too dramatic, and it’s taken a while to get here, what with all the collecting and data cleaning scripts and processes I’ve had to write. Last night, Mrs Mediaczar asked me why I was doing this. “Surely”, she pointed out, “you’re reinventing the […]

Orange Project Step 3 – Tokenise and Stopword Removal


Which words are most often associated with the term “orange” in tweets? How might I improve my search construction so that I can focus in on only those mentions that are relevant to my interests, either by inclusion or exclusion? How might I use social signals to improve my keyword planning for SEO or PPC? […]

Orange Project Step 2 – Munging & Cleaning

unicode characters

This is the second in a series of posts tracing the evolution of a project. In the first post I downloaded 35k English-language tweets from Sysomos containing the keyword “orange”. Here’s a quick glance at what the data look like: A lot of the really interesting data is off screen. I could, of course, load […]

Orange Project Step 1 – Data Collection

Create new files with the new names

This is the first in a series of planned posts that track how my workflows evolve and develop around a project. They’re a bit edited and idealised (I’ll only include my errors and dead ends if they might be enlightening, for example.) I’m collecting together a corpus of tweets to run some experiments over the […]

Seasonal Chocolate


I’m doing some jiggery-buggery at the moment around the general theme of “Social Listening 2.0″. I think we’re all more or less agreed that the promise of the early Social Listening platforms (“Funded by Homeland Security grants! Now available to marketers!”) hasn’t really been borne out in practice. But there are interesting and exciting things […]