When London PR blogger Melanie Seasons started her blog two and a half years ago, the subject of her first post was her first post from her MySpace blog. In fact, she took most of her content from there as well. She calls her first post “a cop-out first post of another first post”, but I think that she might have spun it as a “metapost”.
In some ways, the post you’re reading now could be another metapost — a post about first posts. But it’s really about new ways of working.
I know about Melanie’s first post because I’ve been carrying out some quantitative research using first posts. I took a user-generated list of UK PR blogs that I helped curate last October, and attempted to identify the date of the first ever post for each blog.
This is a task that’s almost impossible to automate. Getting the newest post is a cinch for a computer – the oldest post not so much. And yet it’s relatively simple for a human to perform the task – generally it’s just boring and repetitive (although I challenge you to find the first post on Jed Hallam’s blog, Rock Star PR). I’m not one of those people who enjoys repetitive tasks, so I decided to take this opportunity to set up the Magic Bean Lab’s first experiment; to test the efficiency of various alternative labour sources.
Method 1: e-lancers

I used Freelancer.com: a well-established e-lance and outsourcing marketplace I’ve used several times in the past. As we get more used to buying things that we can’t see over the web, the e-lance market has become a no-brainer.
I employed two e-lance researchers. I’ve found that running researchers in parallel on projects like these reduces the need for overmuch error-checking. Quality Assurance (QA) rapidly becomes the biggest overhead in any project like this.
My quick-and-dirty QA process runs as follows:
- Compare results side by side
- If the results agree, accept this as the correct answer
- If the results disagree, do some checking myself.
In the event, the two freelancers agreed in 88% of cases, and I only had to check the remaining 12%.
An obvious problem: if both freelancers agree on an incorrect answer, I won’t check it. This happened in approximately 4% of cases during this test (I’m working with a known set of data here, or — of course — I wouldn’t be able to tell that.)
Another problem which you wouldn’t see otherwise: it’s fairly time intensive. I have to post the project, wait for the bids to roll in, assess the bids and so on. The whole process took about 48 hours from start to finish.
Still, it beats doing the work myself, and it’s fairly scaleable.
Method 2: Amazon Mechanical Turk

I’ve been noodling around with Amazon Mechanical Turk for a few months now. Mechanical Turk, for those of you who don’t know, is named after a late eighteenth century hoax that — while it purported to be a machine that could play chess — was in fact powered by a person sitting inside a box.
Amazon describes its Mechanical Turk service as follows:
Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mechanical Turk service gives businesses access to a diverse, on-demand, scalable workforce and gives workers a selection of thousands of tasks to complete whenever it’s convenient.
So instead of paying a freelancer a fee to perform all the research, I can split the job into its constituent parts that Amazon calls HITs (Human Intelligence Tasks) and pay a fractional piecework fee (say $0.10) for each blog on the list. This has two advantages:
- It’s much faster. Instead of one person working through the list in sequence, the list is processed in parallel;
- It’s cheaper. There seems to be a lower limit to the bids on Freelancer.com of around $30 per job; and
- It’s extremely scaleable. Using Amazon’s API, I can embed humans into automated processes. More on this in later posts, I hope.
The only problem? In the words of one Turker I’ve interviewed, “There are few spammers in mturk who spoil the mturk community.” In other words, I have more QA problems. It seems that people are more inclined to be dishonest when smaller sums and greater anonymity are involved. I’m sure that a behavioural economist like Dan Ariely could explain this in detail — but for the moment, please let’s just accept that most people are more inclined to cheat you out of $0.10 than $100.
So here’s the first thing I tried.
- Run three Turkers in parallel
- Accept the earliest date received
Clearly there are problems with this method. But not as many problems as you might think. Results were approximately 80% accurate. And it took less than 2 hours from start to finish: that’s 46 hours faster than the e-lancer method. And remember it’s costing me less, although I’m paying for answers that the method deems to be inaccurate.
But surely I could do better.
Method 3: Amazon Mechanical Turk and “elementary game theory”
To be honest, I probably know less about game theory than you do. I know about the Prisoner’s Dilemma and the Three White Hats (which probably isn’t even game theoretic; only going to show how little I know.)
But I’m using “game theory” both very loosely and very specifically to mean “I know a little about how people try to game systems, and I’m going to do my damnedest to use that knowledge to hack their behaviour.”
So here’s what I told the Turkers:
Each HIT is being performed three times, and the results will be checked against each other:
- When all three HITs agree, a $0.10 bonus will be paid to all workers.
- When only two HITs agree, those two will be accepted as the correct answer. The third result will be rejected.
- If all three HITs disagree the requester may consider rejecting all three.
- Occasionally, it may be the case that it is very hard for you to find the correct date. We want to make it worth your while to find this. Please note this in the Additional Comments box together with the process you used to discover the date and we will consider paying a discretionary bonus of $0.50 on top of any other bonuses. For particularly hard HITs, therefore, the total potential upside is $0.80
Please, take your time, and get the right answer. It will be worth it for you, and worth it for the other workers performing this task!
My hope was that I would discourage the spammers. Here’s the logic:
Spammers know that they won’t get paid unless at least 1 other person agrees with them. For this to happen, either:
- They’d have to be lucky enough stumble across the right date by mistake (at odds
of around 3650:1 for blogs created in the past decade); or - They’d need to be accidentally matched by another troll (same odds.)
Incidentally, it turns out this “majority rule” is unpopular with the Turkers; mostly because they fear the spammers don’t read the instructions, and will still queer the pitch (which seems like a legitimate and logical strategy for spammers).
Nevertheless the task was completed in much the same time as the previous method; and with startling results. The answers received were 97% accurate. That’s more accurate than the freelancers (just under 96%), at a lower cost, and in a fraction of the time.
Comparison of the three methods
In the table below, Cost per Response is adjusted for accuracy.
| Method | Cost | Accuracy | Cost per Response |
|---|---|---|---|
| 1 (e-lancers) | $61.00 | 66 (95.7%) | $0.92 |
| 2 (mturk simple) | $51.15 | 55 (79.7%) | $0.93 |
| 3 (mturk + “game theory”) | $54.92 | 67 (97.1%) | $0.82 |
For those w/ too much time on yr hands & geekiness in yr souls: the 1st post on my new blog: http://bit.ly/aPGROi
This comment was originally posted on Twitter
RT @mediaczar: For those w/ too much time on yr hands & geekiness in yr souls: the 1st post on my new blog: http://bit.ly/aPGROi
This comment was originally posted on Twitter
RT @mediaczar: For those w/ too much time on yr hands & geekiness in yr souls: the 1st post on my new blog: http://bit.ly/aPGROi
This comment was originally posted on Twitter
Really interesting post from @mediaczar on how to get menial stuff done on the cheap using e-lancers http://tinyurl.com/y8d7qu8
This comment was originally posted on Twitter
a wonderful first post from @mediaczar at the Magic Bean Lab: http://bit.ly/bfqwP6 welcome back to the blogosphere, we’ve missed you.
This comment was originally posted on Twitter
Pingback: eLancing research « Rage on Omnipotent
RT @magicbeanlab Experiment 1: a method to get the dates of first posts using Amazon Mechani.. http://tinyurl.com/ycco39w
This comment was originally posted on Twitter
Just thinking about how you could automate this….You could use the date Google first saw the page. This wouldn’t be perfect, but would be a reasonable estimate in many cases.
So, do searches like this one; http://www.google.co.uk/search?hl=en&safe=off&tbo=p&tbs=qdr:y15&q=inurl:rock-star-pr.com&start=0&sa=N&filter=0
Scrape all the dates and pick the earliest. Works in the above case; the first post was on 8th May 2008.
Does pick up the wrong post though, but only because Google doesn’t seem to have http://rock-star-pr.com/my-first-ever-post/ in its index.
Awesome point, this is a really smart strategy, thankyou for sharing.