Ordered | Unordered | Ordered-Unordered | ||
---|---|---|---|---|

Correlation | Spearman's rho | Cramer's V | Cramer's V | |

Question Order Bias | Mann-Whitney U-Test | Chi-Squared Test | N/A | |

Question Wording Bias | Mann-Whitney U-Test | Chi-Squared Test | N/A | |

Breakoff | Nonparametric Bootstrap | Nonparametric Bootstrap | N/A | |

Bad Actors | Nonparametric Bootstrap on Empirical Entropy | Nonparametric Bootstrap on Empirical Entropy | N/A |

### Correlation

Recall that correlation is typically measured as the degree to which two variables covary. We are generally interested in correlation as a measure of predictive power. Correlation coefficients give the magnitude of a monotone relationship between two variables. Completely random responses for two questions will result in a low correlation coefficient.

We use two measures of correlation. For ordered questions, we use Spearman’s $$\rho$$; this coefficient ranks responses and measures to degree to which the ranked results have a monotone relationship. For unordered questions, we use Cramer’s $$V$$. The procedure for Cramer’s $$V$$ is based on the $$\chi^2$$ statistic; we take one question and compute the empirical probability for each answer. We then use this estimator to compute the expected values for each of the values in the other question and find the normalized differences between observed and expected values. The sum of these values is the $$\chi^2$$ statistic, which has a known distribution. Cramer’s $$V$$ scales the value of the $$\chi^2$$ statistic by the sample size and the minimum degrees of freedom.

Both of these tests are sensitive to small counts for some categories. We generally do not collect sufficient information to produce meaningful confidence intervals; as a result, we simply flag correlations that may be of interest.

### Order Bias

We use the $$\chi^2$$ statistic directly to compute order bias for unordered questions. For any question pair $$q_i, q_j, i\neq j$$, we partition the sample into two sets : $$S_{i<j}$$, the set of questions where $$q_i$$ precedes $$q_j$$ and $$S_{j<i}$$, the set of questions where $$q_i$$ follows $$q_j$$. We assume each set is independent.* We show wolog how to test for bias in $$q_i$$ when $$q_j$$ precedes it:

- Compute frequencies $$f_{i<j}$$ for the answer options of $$q_i$$ in the set of responses $$S_{i<j}$$. We will use these values to compute the estimator.
- Compute frequencies $$f_{j<i}$$ for answer options $$q_i$$ in the set of responses $$S_{j<i}$$. These will be our observations.
- Compute the $$\chi^2$$ statistic on the data set. The degrees of freedom will be one less than the number of answer options, squared. If the probability of computing such a number is less than the value at the $$\chi^2$$ distribution with these parameters, there is a significant difference in the ordering

We compute these values for every unique question pair.

### Wording

Wording bias classified in the same was as order bias, except in this case, instead of comparing two sets of responses, we compare $$k$$ sets of responses, corresponding to the number of variants we are interested in.

### Breakoff

We address two kinds of breakoff : ones determined by position, and ones determined by question. For both analyses, we use the nonparametric bootstrap procedure to determine a one-sided 95% confidence intervals and flag indices and questions whose counts exceed the threshhold.

Breakoff by position is often an indicator that the survey is too long. Breakoff by question may indicate that a question is unclear, offensive, or burdensome to the respondent. There are also some cases where breakoff may indicate a kind of order bias.

### Adversaries

We’ve tried a variety of methods for detecting adversaries. The bests empirical results we’ve seen so far have been for a method that uses entropy.

We first compute the empirical probabilities for each question’s answer options. Then for every response $$r$$, we calculate a score styled after entropy : $$score_{r} = \sum_{i=1}^n p(o_{r,q_i}) * \log_2(p(o_{r,q_i}))$$. We then use the bootstrap method to find a one-sided 95% confidence interval and flag any responses that are above the threshold.

*Independence is based on the assumption that each worker is truly unique and that workers do not collude.