Forecasting Google search volume using R

This is by way of being a bit of an experiment. I’ve been reading John Foreman’s excellent and fascinating Analytics Made Skeezy blog and came across the Projecting Meth Demand using Exponential Smoothing, in which the protagonist helps a drug lord forecast monthly demand. I was trying to follow along with the spreadsheet, but fell at the first hurdle: it turns out that the Mac version of Excel doesn’t allow array formulae. So I turned to R.

I wanted some nice seasonally-dependent data and a bit of a trend with which to play. Google Trends provides a lot of that sort of thing:

[trend w=”590″ h=”400″ q=”cold+remedies” geo=”US”]

I downloaded the Google Trends data and dropped it into R, converting the data to a time series with a 52-week frequency (Google Trends reports weekly search data). I’d not used the ts() function before. It seems fantastically useful. Here’s the code (including all the Google data):

Which gave me this:

Google search volume on 'cold remedies' as visualised in R

Next, I converted the search_index time series into a Holt Winters object. The Holt Winters function comes as standard with R, so nothing to do here. There are all sorts of tweaks I could make to improve results if I understood the process better. But this is just a proof-of-concept run:

Which produces this nice-looking fit, based on the trend and 52 individual coefficients:

Google search volume on 'cold remedies' with Holt Winters fitting

Next, I load the forecast package, and run the forecast.HoltWinters function for the next year.

Which, in turn, gives this:

The blue line is the forecast, the orange area gives the 80% confidence interval and the yellow the 95% confidence interval.

As I say, I really need to learn how to tweak this if I’m ever going to sell methamphetamine.

Comments

  1. Josh Dulberger says

    Nice post, and definitely good to do this in R instead of excel. That said, array formulae do work in mac excel – use command+shift+enter. For an easy example, try the frequency function: create some data, some bins, then highlight the column/rows next to the bins, go to the forumla bar and enter frequency(array, bins), then command+shift+enter. It works on the skeezy workbook too, and I think John is on a mac.

Please tell me what you think.