Monthly Archives: December 2014

When I first started at UMass, I had effectively no background in statistics or probability. So, when I was taking the first course in the graduate stats sequence, I tried to frame what I was learning in terms of things I already understood. When I saw the conditional probability $$\mathbb{P}(Y\;\vert\; X)$$, I couldn’t help but think:

$$\begin{array}{|l} X \\ \hline \vdots \\ Y \\ \hline \end{array}\\ X \rightarrow Y$$

Assumption seems to be a close analogy of observation, and if we analyze each construct operationally, they both have a strict order (i.e., observe/assume $$X$$, then derive/calcuate the probability of $$Y$$). Both hold $$X$$ fixed in some way for part of the calculation. Suppose we then say that $$X$$ implies $$Y$$ with some probability $$p$$. If we denote this as $$X \overset{p}{\rightarrow} Y$$, then we have some equivalence relation where $$X \overset{p}{\rightarrow} Y \equiv \mathbb{P}(X\rightarrow Y) = p \equiv \mathbb{P}(Y\;\vert\;X) = p$$.

Since $$X \overset{p}{\rightarrow} Y$$ is just normal logical implication, with a probability attached, we should be able to use the usual rewrite rules and identities (after all, what’s the point of modeling this as a logic if we don’t get our usual identities, axioms, and theorems for free?). In classical logic, implication is short for a particular instance of disjunction: $$X \rightarrow Y \hookrightarrow \neg X \vee Y$$. We can then rewrite our probabilistic implication as $$\neg X \overset{p}{\vee} Y$$ and say $$\mathbb{P}(\neg X \vee Y) = p \equiv \mathbb{P}(\neg X \cup Y) = p$$.

Similarly, we want to have the usual rules of probability at our disposal, so by the definition of conditional probabilities, $$\mathbb{P}(Y\;\vert\; X) = \frac{\mathbb{P}(Y\;\cap\;X)}{\mathbb{P}(X)}$$. We can apply the above rewrite rule for implication to say $$\mathbb{P}(\neg X \cup Y) = p \equiv \frac{\mathbb{P}(Y\;\cap\;X)}{\mathbb{P}(X)} = p$$. This statement must be true for all events/propositions $$X$$ and $$Y$$.

Let’s take a closer look at a subset of events: those where $$X$$ is independent of $$Y$$, denoted $$X \bot Y$$. Independence is defined by the property $$\mathbb{P}(Y\;\vert\; X)=\mathbb{P}(Y)$$. From this definition, we can also derive the identities $$\mathbb{P}(X\cap Y) = \mathbb{P}(X)\mathbb{P}(Y)$$ and $$\mathbb{P}(X\cup Y) = \mathbb{P}(X) + \mathbb{P}(Y)$$. Now we can rewrite $$\mathbb{P}(\neg X \cup Y) = p \equiv \frac{\mathbb{P}(Y\;\cap\;X)}{\mathbb{P}(X)} = p$$ as $$\mathbb{P}(\neg X) + \mathbb{P}(Y) = p \equiv \mathbb{P}(Y) = p$$. Since the relations on either side are equivalent, we can then substitute the right into the left and obtain $$\mathbb{P}(\neg X) = 0 \equiv \mathbb{P}(Y) = p$$. Although this looks a little weird, it’s still consistent with our rules: we’re just saying that when the events are independent (a notion that has no correspondence in our logical framework), the probability of the implication (i.e., the conditional probability) is wholly determined by $$Y$$ — if $$X$$ happens (which it will, almost surely) then $$Y$$’s marginal is $$p$$. If $$X$$ never happens (which it won’t), then $$Y$$ is 0, and the probability of the whole implication is 0.

Now let’s consider how this works over events that are not independent. For this example, let’s gin up some numbers:

$$\mathbb{P}(X) = 0.1 \quad\quad \mathbb{P}(Y) = 0.4 \quad\quad \mathbb{P}(X \cap Y) = 0.09$$.

Note that $$X\not\bot\; Y$$ because $$\mathbb{P}(X\cap Y) \not = 0.04$$. Recall that because either $$X$$ or $$Y$$ are supersets of $$X\cap Y$$, their marginals cannot have a lower probability than their intersections.

Now let’s compute values for either side of the equivalence $$\mathbb{P}(\neg X \cup Y) = p \equiv \mathbb{P}(Y\;\vert\; X) = p$$. First, the conditional probability:

$$\mathbb{P}(Y\;|\; X) = \frac{\mathbb{P}(Y\cap X)}{\mathbb{P}(X)} = \frac{0.09}{0.1} = 0.9 = p$$

Now for the left side of the equivalence, recall the definition of union:
$$\mathbb{P}(\neg X \cup Y) = \mathbb{P}(\neg X) + \mathbb{P}(Y) – \mathbb{P}(\neg X \cap Y)$$.

Since we don’t have $$\mathbb{P}(\neg X \cap Y)$$ on hand, we will need to invoke the law of total probability to compute it: $$\mathbb{P}(\neg X \cap Y) = \mathbb{P}(Y) – \mathbb{P}(X\cap Y) = 0.4 – 0.09 = 0.31$$.

We can now substitute values in:
$$\mathbb{P}(\neg X \cup Y) = 0.9 + 0.4 – 0.31 = 0.99 = p$$.

Now our equivalence looks like this:
$$\mathbb{P}(\neg X \cup Y) = 0.99 \equiv \mathbb{P}(Y\;\vert\; X) = 0.9$$,
which isn’t really much of an equivalence at all.

So what went wrong? Clearly things are different when our random variables are independent. Throughout the above reasoning, we assumed there was a correspondence between propositions and sets. This correspondence is flawed. Logical propositions are atomic, but sets are not. The intersection of non-independent sets illustrates this. We could have identified the source of this problem earlier, had we properly defined the support of the random variables. Instead, we proceeded with an ill-defined notion that propositions and sets are equivalent in some way.

Alternate Backends for PLASMA Crowdsourcing Tools

Leave a reply

Although in practice AutoMan and SurveyMan were both designed to make their backends pluggable, we have yet to implement an alternate backend for either because there simply aren’t any AMT competitors out there. There are plenty of crowdsourcing websites, but none are as programmable as AMT and few are as general. That is to say, all competitors appear to offer specialized labor markets and/or be designed for specialized work.

A known problem with the labor market on Amazon is that, even if you pay your workers minimum wage based on time actually spent on a task, they spend a significant amount of time searching for tasks. There are websites set up to facilitate this process, but it’s still time spent searching for work, instead of actually working. A major subfield of alternate approaches involves extracting work either voluntarily, or in contexts where non-monetary compensations make sense. Quizz uses Google’s advertising system to embed knowledge-mining quizzes in with its usual ads. Other approaches substitute consumer marketing tasks or questions for paywalls. In both cases, users are motivated by something other than payment.

I’ve been wondering for a while whether thefacebook would be a good platform for our software. Although the general understanding is that respondents are anonymized, but we know this is not true. Researchers have assumed that workers are independent. Recent work out of MSR has found that some Indian workers actually collaborate on tasks. For these reasons, I think Facebook would be a perfectly reasonable alternate platform for crowd sourcing. In fact, I believe that Facebook is a better platform for crowdsourcing, since it overcomes ones of the major shortcomings of AMT — people are already hanging out there*. Rather than appeal to a populace that is explicitly looking for work, sometimes as a primary source of income, we would like to instead use a Facebook to tap into people’s excess capacity**.

Since Facebook doesn’t currently have a crowdsourcing interface, could we mock up a substitute using what’s already available? Amazon currently handles listing, pool management, payment, presentation, and offers a sandboxed version of AMT for testing. A minimal implementation would involve hosting our own *Man servers and just using Facebook advertising to recruit workers. However, this diverts the user away from the Facebook ecosystem, which defeats the purpose of using Facebook in the first place (for example, we could just as easily use Google AdWords instead).

To keep users in the ecosystem, we could write a SurveyMan app***. I looked into this briefly, and while it isn’t as integrated into the main Facebook experience as I’d want, it’s closer than routing users to an outside website. We could use Facebook advertising to do the initial recruitment and then use wall updates to bootstrap that process. If Facebook advertising provided a way to target ads to particular demographics, we would have a better time with bias in our sample.

* Admittedly, I am not a regular user of thefacebook. I've heard the "so and so spends their whole day on facebook" complaint, but I really don't know how common this is. Consequently, this post is predicated on the idea that thefacebook is a place where people spend a lot of time, not doing work. I have heard that this is less the case since mobile became ubiquitous.


** TBH, I think the cult of excess capacity is unethical, but for the sake of capitalism and this blog post, let's assume it isn't. I will save a discuss of ethics and excess capacity later.

** Word on the street is that no one actually uses these things anyway...

Emma's Research and Television Blog

I <3 Science and T.V.

Monthly Archives: December 2014

Breakfast at Tiffany’s is the Second Worst Film I have ever finished watching.

Logic and Probability

Alternate Backends for PLASMA Crowdsourcing Tools