Playing around with Google Correlate

I’ve just spent a happy lunch hour playing with Google Correlate. It lets you enter real world time series data (weekly sales, temperature, or footfall for example) and then tries to correlate it with search trends. This could be extremely useful for planning content or paid search, or just for winkling out that all-important ‘insight’ that delivers the edge.

So to try it out I downloaded some historic weather data from the Met Office, munged it around a bit (Correlate seems to take well to two column CSV formatted data using US-formatted dates — mm/dd/yyyy), then uploaded it into the tool.

Google Correlate - entering data

Running the tool gives me a set of 10 searches that closely match the seasonality and trends in my data set.

Google Correlate - Matching Data

When the temperature increases in the UK, we’re likely to be go fishing, be bothered by flies and spot grass snakes. Only after we’ve trimmed the hedge and creosoted our garden fences of course.

I slightly resent this image of the average English person as a coarse-fishing gardener obsessed by party boundaries, but in my soul of souls I fear that it’s probably fairly accurate.

The Engagement Thing

I’m cold on engagement. Sure, I used to have a planning chart that I rolled out from time to time that said, more or less, “The secret to social media success is to listen, respond, influence and engage,” but when I became a man I put away childish things. In this particular case, it involved replacing the word “engage” with the word “enlist” (I had a thing about creating zombie armies. I still do, if I’m being honest.)

One of the articles I share most often is Martin Weigel’s ‘Engagement: Fashionable Yet Bankrupt’. The paper has become a bit of a touchstone for me; and I can’t recommend it highly enough. On re-reading it recently, however, I was struck by something I’d not really noticed before; the fashion for Engagement that he was discussing seemed to be much older than I’d thought.

So, armed with an exciting new discovery (ScraperWiki) I set off to datamine the BrandRepublic archives. My aim was to find articles that mentioned the term “engagement”, and chart them day by day.

Here’s the scraper I built. And here’s what I managed to come up with as a first stab at the visualisation.

daily_engagement

It was deeply flawed, and over-plotting hides most of the detail, but I felt I was on the right track. So I ran the plot again, only this time I plotted the monthly totals instead of day by day (I’d never used R’s zoo library or table function before, but together they represent another nail in the coffin for my everyday use of Excel):

Monthly engagement

That’s a nice-and-steady looking upward trend. But looking at the data, I could see one or two problems. For one thing, Brand Republic’s search takes a few liberties: a search for “engagement” turns up results for “engaged” or “engaging.” For another, it seems that many clients engage agencies, not just their audiences. I chickened out a little, and using the keyword analysis toolset that I’ve been building, I tried to narrow and focus the list. This gave me a list of just over 60 bigrams (like “engage audiences”, “customer engagement”, “engaged with”). While this list would significantly reduce the results returned, I’d feel more secure about the findings.)

Armed with this list, I did a little more mining, and finally produced this chart, comparing the increase in posts mentioning “Engagement” and “Social Media” in the Brand Republic archive:

Things to notice

The trend for Engagement begins long before the trend for Social Media, and possibly even a time before that misbegotten ur-text, those Plates of Nephi of Social Media, “The Cluetrain Manifesto“. I believe that this has coloured thinking about Social Media’s strategic and business objectives, and not for the better. I suspect that we have inherited and assimilated the idea of engagement as a goal, as a KPI into our practice in the same way that the early Christians absorbed elements of paganism into their beliefs.

Marketers — who are preternaturally sensitive to trends as it is — are swamped by mentions of Social Media and Engagement. They can’t escape them.

And — don’t both lines look suspiciously as though there’s a feedback loop in place? Journalists write about trend x. Marketers read about trend x, come up with their response. Journalists write about their response. Should we worry about this? Or just accept it as the way of the world?

Decomposing Time Series Data with R

In an earlier post I started looking at how I might use R to forecast Google search volume.

Now I find the useful `decompose` function, which decomposes a time series into seasonal, trend and irregular components using moving averages.

Which produces the following:

Search trend for “cold remedies” – showing the original time series and beneath that, the decomposed trend, seasonal fluctuation and noise.

Forecasting Google search volume using R

This is by way of being a bit of an experiment. I’ve been reading John Foreman’s excellent and fascinating Analytics Made Skeezy blog and came across the Projecting Meth Demand using Exponential Smoothing, in which the protagonist helps a drug lord forecast monthly demand. I was trying to follow along with the spreadsheet, but fell at the first hurdle: it turns out that the Mac version of Excel doesn’t allow array formulae. So I turned to R.

I wanted some nice seasonally-dependent data and a bit of a trend with which to play. Google Trends provides a lot of that sort of thing:

[trend w="590" h="400" q="cold+remedies" geo="US"]
Continue reading

Recover data from PowerPoint charts when the linked file is not available

One of our media partners kindly shared a couple of slide decks today with lots of information about some broad audience segments. There were a couple of graphs that looked interesting, and I wanted to grab the raw data. As so often happens, though, the charts had been copied and pasted from Excel, so those data weren’t available. Instead, PowerPoint responded with the familiar error message, “The linked file is not available.” This has probably happened to you. It can be particularly frustrating when it’s your own presentation that you were trying to update, and you have deleted the original Excel file that you were using as a scratchpad.

But this time, I noticed (possibly for the first time) that — no matter what PowerPoint was telling me — all the data were still in place. Just mousing over the chart data points told me what I needed to know – but (short of manually noting down 96 data points per line) how could I get them out?

It turns out that you can open the PowerPoint document using my text editor of choice, BBEdit (a little checking shows you can also use the free version, TextWrangler). This shows you a nice directory tree, and if you drop into the /ppt/charts/ directory, all the charts will be there in a readily-parseable XML format.


Update


I’ve been looking for another way to get at these data, and thanks to Pat Parslow I can share an even easier method. It transpires that Microsoft Office’s Open XML formats are simple zipped collections of XML files, so here’s how you can get at the data:

  • Unzip the .pptx file, either by typing unzip filename.xml in the terminal, or by changing the extension from .pptx to .zip, then double clicking on it (this has the advantage of working on either OS X or Windows machines.
  • That’s more or less it. Open the folder, go to /ppt/charts/ directory and fill your boots.

The BBEdit route in pictures

PowerPoint Chart

Here are two charts for which I’d like the raw data.


Continue reading

How photos spread virally through Facebook

Three videos on Facebook Stories from Stamen Design visualise how photos posted by George Takei were shared and re-shared over time. Stamen’s work is always wonderful — this is remarkable.

There’s no rubric, but I’m assuming that this is a form of network graph — where each node may itself become the source of new edges.

What’s interesting, if that’s the case, is how only a few shares lead to huge bursty cascades of sharing. I’ve seen this before on much smaller data sets:

There’s a potentially useful paper, The Dynamics of Viral Marketing, (Leskovec et al., 2007) that addresses this phenomenon. I’d be grateful for links to other papers — or your thoughts.

Thanks to somerandomnerd for the link.

Some basic R resources

R

Perl and R

ggplot2 (for pretty charts)

General

Column and bar chart

Column & Bar Chart

I wanted to compare two sets of data on the same chart (in this case, post volume by day, and average engagement by day.) It doesn’t really make sense to have a clustered column chart (because it’s hard to read) and the column-and-line chart (simple to do in Excel, but that doesn’t make it right) misleadingly implies that the line represents trend data — where no such trend exists.

Continue reading

Log Scales

I’m learning at long last how changing scales can expose patterns. Today I’ve been looking at the Facebook Page of Marks & Spencer (a UK retailer.)

Using a linear scale on the y-axis, we see how engagement has increased rapidly over recent months. That’s pretty much text book stuff for a well-run Page.

Marks linear

But by dropping a log2(x) scale onto the y-axis (why log 2? I was experimenting after reading When Should I Use Logarithmic Scales in My Charts and Graphs?) we can see a couple of strange patterns emerging.

Marks log 2

Notice the distinct vertical lines around October and December 2011, and again in February 2012? Also the horizontal lines that extend from around January to April 2012?

Both can — it turns out — be explained by odd posting behaviour related to photo albums; and could raise some interesting issues of what is and what isn’t best practice.

But the short point is; I wouldn’t have seen it but for the log scale.