As discussed in yesterday’s post, the SurveyMan language allows sophisticated survey and experimental design. Since we’re about to launch the full version of Sara Kingsley’s wage survey, I thought I’d step through the process of designing the branching and blocking components.
The survey in question is completely flat (i.e. no blocking, no branching) and contains a variety of ordered and unordered radio (exclusive) and checkbox (not exclusive) questions. We would like to be able to run a version that is completely randomized and a version that’s totally static and based on conventional survey methodology. Ideally we’d like to be able to run both surveys under the same set of conditions.
We could run both versions of the survey at the same time on AMT. However, we’d run into trouble with people who try to take it twice. To solve this problem, we could run one version, issue “reverse qualifications” a la Automan to the people who answered the first one, and then run the second version. This would entail some extra coding on the side, and while support for longitudinal studies and repeated independent experiments would require keeping a database of past participants and flagging them as uniquely qualified or disqualified for a survey, this feature is not currently supported in SurveyMan. What we’d really like to do, though, is run the two surveys concurrently, in the same HIT such that the process looks something like this :
Fortunately for us, we can implement both versions of the survey as a single survey using the underlying language.
Let’s start by looking at what the two versions would look like if we were to run them as separate surveys :
The survey on the left groups all of the questions into one block. The survey on the right places each question into its own block. In order to make the survey on the right truly full static, we would also need to set the RANDOMIZE column to false.*
As we’ll see in my future blog post on the runtime system, the evaluation strategy forces all paths in the survey to join. If there isn’t a semantically meaningful way to join, this can be done trivially by creating a final block that contains an instructional “question” (e.g. “Thank you for taking the time to complete our survey”) and a submit button.
The three orange blocks are top-level: let the fully randomized version be Block 1, the fully static version be Block 2, and the joining “question” Block 3. If we were to simply annotated them according to this scheme, we would end up having every respondent take two versions of the survey. We now need some way of ensuring that the two versions of the survey represent parallel paths through the survey such that the expected number of respondents through each path is 50% of the sample size.
Let’s work backwards from Block 3. Assume that the respondent is already on one of the parallel paths. We can ensure that Block 3 follows immediately from both Block 1 and Block 2 by using the BRANCH column. We add branching to Block 3 for every options in the last question of the fully static survey and do the same for that question’s equivalent in the fully randomized survey (both of which are not contained in different blocks in our current survey). Clearly we will branch to Block 3 after answer the last question in the fully static version. Because we require that the respondent answer every question in a block before evaluating a branch, the respondent will also answer every question in the randomized version.
Finally, we need to be able to randomly assign respondents to each of the paths. At first, I thought this would be a cute example of randomized blocks, where we could change Block 1 to Block _1. However, this would not give us an even split in the sample population. There are three possible orderings of the blocks : [_1 2 3], [2 _1 3], [2 3 _1]. 2/3 of the population would then see Block 2 first.**
We could instead overwrite the
custom.js to randomly assign the first question in the appropriate block.
* Looks like I finally found a use-case for that RANDOMIZE column — the experimental control! Maybe I shouldn’t deprecate after all…
** As an aside, I want to point out that this particular example illustrates a case for Presley‘s suggestion that we interpret the available indices for randomization over the indices of the blocks marked as randomizable. Here we would have marked both Block 1 and Block 2 as randomizable. I’m still not sure if this would be overly constrained and will discuss further in a future blog post.