Tag Archives: experiman

Alternate Backends for PLASMA Crowdsourcing Tools

Although in practice AutoMan and SurveyMan were both designed to make their backends pluggable, we have yet to implement an alternate backend for either because there simply aren’t any AMT competitors out there. There are plenty of crowdsourcing websites, but none are as programmable as AMT and few are as general. That is to say, all competitors appear to offer specialized labor markets and/or be designed for specialized work.

A known problem with the labor market on Amazon is that, even if you pay your workers minimum wage based on time actually spent on a task, they spend a significant amount of time searching for tasks. There are websites set up to facilitate this process, but it’s still time spent searching for work, instead of actually working. A major subfield of alternate approaches involves extracting work either voluntarily, or in contexts where non-monetary compensations make sense. Quizz uses Google’s advertising system to embed knowledge-mining quizzes in with its usual ads. Other approaches substitute consumer marketing tasks or questions for paywalls. In both cases, users are motivated by something other than payment.

I’ve been wondering for a while whether thefacebook would be a good platform for our software. Although the general understanding is that respondents are anonymized, but we know this is not true. Researchers have assumed that workers are independent. Recent work out of MSR has found that some Indian workers actually collaborate on tasks. For these reasons, I think Facebook would be a perfectly reasonable alternate platform for crowd sourcing. In fact, I believe that Facebook is a better platform for crowdsourcing, since it overcomes ones of the major shortcomings of AMT — people are already hanging out there*. Rather than appeal to a populace that is explicitly looking for work, sometimes as a primary source of income, we would like to instead use a Facebook to tap into people’s excess capacity**.

Since Facebook doesn’t currently have a crowdsourcing interface, could we mock up a substitute using what’s already available? Amazon currently handles listing, pool management, payment, presentation, and offers a sandboxed version of AMT for testing. A minimal implementation would involve hosting our own *Man servers and just using Facebook advertising to recruit workers. However, this diverts the user away from the Facebook ecosystem, which defeats the purpose of using Facebook in the first place (for example, we could just as easily use Google AdWords instead).

To keep users in the ecosystem, we could write a SurveyMan app***. I looked into this briefly, and while it isn’t as integrated into the main Facebook experience as I’d want, it’s closer than routing users to an outside website. We could use Facebook advertising to do the initial recruitment and then use wall updates to bootstrap that process. If Facebook advertising provided a way to target ads to particular demographics, we would have a better time with bias in our sample.

* Admittedly, I am not a regular user of thefacebook. I've heard the "so and so spends their whole day on facebook" complaint, but I really don't know how common this is. Consequently, this post is predicated on the idea that thefacebook is a place where people spend a lot of time, not doing work. I have heard that this is less the case since mobile became ubiquitous.

** TBH, I think the cult of excess capacity is unethical, but for the sake of capitalism and this blog post, let's assume it isn't. I will save a discuss of ethics and excess capacity later.

** Word on the street is that no one actually uses these things anyway...


There’s this website I came across while investigating what other people are doing for online experiments. It’s called SocialSci.com. They have a platform for writing experimental surveys. They also boast of a well-curated participant pool. Here’s the claim on their front page:

We take a three-tiered approach to our participant pool. We first authenticate users to make sure they are human and not creating multiple accounts. We then send them through our vetting process, which ensures that our participants are honest by tracking every demographic question they answer across studies. If a participant claims to be 18-years-old one week and 55-years-old the next, our platform will notify you and deliver another quality participant free-of-charge. Finally, we compensate participants via a secure online transaction where personally identifiable information is never revealed.

Ummmmm, okay. That’s not game-able at all.

If you were thinking about just using their survey design tools and instead posting your instrument on AMT, be warned! They point out that AMT is full of scammers! I’m not sure this is true anymore; my own research has lead me to believe that traditional survey issues such as fatigue and attention issues are a bigger threat to validity. In any case, I looked up quotes for using their participant pool. The costs for 150 respondents (assured to be valid, via some common sense heuristics and their more complete knowledge of the participant pool) and no restrictions on the demographics (e.g. can include non-native English speakers) are:

My estimated time (in minutes)Price (in USD)

Even with bad actors, I’m pretty sure AMT is cheaper. The results were the same when I submitted a request for country of origin==USA. There were also options for UK and Australia, but these are not yet available (buttons disabled). I’ll leave any analysis of the pay to Sara.

Their filters include Country, Language, Age, Sex, Gender, Sexual Orientation, Relationship Status, Ethnicity, Income, Employment Status, Education, Occupation, Lifestyle, and Ailments. What I find potentially useful for survey writers here is the opportunity to target low-frequency groups. There’s our now infamous NPR story on how teens respond mischievously to survey questions. The set of responses from these teens have a high number of low-frequency responses, which amplifies false correlations when researchers analyze low-frequency populations. A service like SocialSci could provide useful, curated pools that rely less on one-time self reporting. However, it doesn’t look like they’re there yet, since there are the options available (shown, but currently disabled options are listed in brackets):

Country: USA, [UK], [Australia] Language: English, [Spanish] Age: 13-17, 18-40, 41-59, [51+], [60+] Sex: Male, Female, [Transgender], [Intersex] Gender: Cis Male, Cis Female, [Trans Male], [Trans Female] Sexual Orientation: Heterosexual, Homosexual, BiSexual (sic), Other
Relationship Status: Married, Single, Co-Habitating, Dating
Ethnicity: Caucasian, [Asian], [Hispanic], [Black], [Native American], [Multiracial] Income: less than 25K, 25K-50K, 50K-75K, 75K-100K, [100K-125K], [125K or more] Employment Status: Full Time, Unemployed, [Part Time], [Temporary Employee Or Independent Contractor] Education: Some College, Associate’s Degree, High School Diploma, Bachelors Degree, Some High School, [Masters Degree], [Doctoral Degree] Occupation: Student, Professional, Technical, Teacher, Sales, Corporate
Lifestyle: Smoke, Used to Smoke, Have Children, Have Cell Phone
Ailments: Chronic Pain, Addiction (Smoking)

** sorry for the lack of formatting!! Also, what do you think of these demographic characteristics? Please leave comments!**

I want to note that the front page advertises “a global pool.” In keeping with the spirit of the times, I’ve included a screenshot (things I’ve learned from shaming bigoted celebrities who express themselves a little too freely on twitter!)

Screen Shot 2014-07-31 at 7.04.57 PM

Now, it’s quite possible that they have at least one person from every demographic listed, but not 150. Until they can compete with AMT in size (where 150 respondents ain’t nothin), I think they should be careful about what they advertise to researchers.

It’s not clear to me how SocialSci recruits their participants. Here’s the page. I had never heard of them until I went looking for resources used in other experiments (via this page Emery recommended) I just did a search of craigslist to see if there’s anything there. No dice. $10 says it’s just their team plus friends:


Snark aside, I’d like to see what a well-curated pool looks like. Not sure these folks are up to the task, but just in case, I did them a solid and posted their signup page on craigslist. I’m not holding out on it being as effective as the Colbert bump, but a girl can dream.

Reproducibility and Privacy

What would it take to have an open database for various scientific experiments? An increasing number of researchers are posting data online, and many are willing to (and required) to share their data if you ask for it. This is fine for a single experiment, but what if you’d like to reuse data from two different studies?

There is a core group of AMT respondents who are very active. Sometimes, AMT respondent contact requesters, at which point they are no longer anonymous. My colleague, Dan Barowy received an email from a respondent, thanking him for the quality of the HIT. I asked him the respondents name and as it turned out, they had contacted me when I was running my experiments as well.

So if we have the general case of trying to pair similar pieces of data into a unit (i.e. person) and the specific case of AMT workers who are definitely the same people (they have unique identifiers), how can we combine this information in a way that’s meaningful? In the case of the AMT workers, we will need to obfuscate some information for the sake of privacy. For other sources of data, could we take specific data, infer something about the population, and build a statistical “profile” of that population to use as input to another test? Clearly we can use standard techniques to learn summary information about a population, but could we take pieces of data and unify them into a single entity and say with high probability these measurements are within some epsilon of a “true” respondent? How would we use the uncertainty inherent in this unification to ensure privacy?

Is it possible to unify data in such a way that an experimenter could execute a query asking for samples of some observation, and get a statistically valid Frankenstein version of that sample? I’m sure there’s literature out there on this. Might be worth checking into…