Recover data from PowerPoint charts when the linked file is not available

One of our media partners kindly shared a couple of slide decks today with lots of information about some broad audience segments. There were a couple of graphs that looked interesting, and I wanted to grab the raw data. As so often happens, though, the charts had been copied and pasted from Excel, so those data weren’t available. Instead, PowerPoint responded with the familiar error message, “The linked file is not available.” This has probably happened to you. It can be particularly frustrating when it’s your own presentation that you were trying to update, and you have deleted the original Excel file that you were using as a scratchpad.

But this time, I noticed (possibly for the first time) that — no matter what PowerPoint was telling me — all the data were still in place. Just mousing over the chart data points told me what I needed to know – but (short of manually noting down 96 data points per line) how could I get them out?

It turns out that you can open the PowerPoint document using my text editor of choice, BBEdit (a little checking shows you can also use the free version, TextWrangler). This shows you a nice directory tree, and if you drop into the /ppt/charts/ directory, all the charts will be there in a readily-parseable XML format.


Update


I’ve been looking for another way to get at these data, and thanks to Pat Parslow I can share an even easier method. It transpires that Microsoft Office’s Open XML formats are simple zipped collections of XML files, so here’s how you can get at the data:

  • Unzip the .pptx file, either by typing unzip filename.xml in the terminal, or by changing the extension from .pptx to .zip, then double clicking on it (this has the advantage of working on either OS X or Windows machines.
  • That’s more or less it. Open the folder, go to /ppt/charts/ directory and fill your boots.

The BBEdit route in pictures

PowerPoint Chart
Here are two charts for which I’d like the raw data.

edit data menu item
The usual approach is to right-click and select “Edit Data…” from the context menu
error message: the linked file is not available
“Edit Data…” draws a blank if the chart has been copied and pasted from Excel, rather than created in place. Sadly this means that the raw data aren’t available
Mousing around on the chart reveals hidden data. But there are 96 data points per line. I’m not going to do this by hand.
BBEdit can look inside the .pptx package and expose the raw data (click for bigger)

Then you can do what you like to clean the data. Here’s the (ugly, incomplete, but near-enough) sed command I used.

In retrospect though, I should probably have used Google Refine. It took Refine a few seconds to do the work of that sed command, and it took me more than 10 minutes to write the regular expression (sed has its own weird regexes, and it seems that it’s even more clunky on OS X)

Here’s the Google Refine approach (I need to play with Refine more):

Comments

Please tell me what you think.