There are some old arguments against probabilistic models as models of language, but these do not seem to have much force anymore, especially because we now have models that can compute probabilities over the same representations that we use in generative linguistics (Andries Coetzee and I have an overview of probabilistic models of phonology in our Handbook chapter, Mark Johnson has a nice explanation of the development of MaxEnt models and how they differ from PCFG’s as well as other useful material on probabilistic models as models of language learning, Steve Abney has a provocative and useful piece about how the goals of statistical computational linguistics can be seen as the goals of generative linguistics; see more broadly the recent debate between Chomsky and Peter Norvig on probabilistic approaches to AI; see also the Probabilistic Linguistics book and Charles Yang’s review).
That’s not to say that there can’t be issues in formalizing probabilistic models of language. In a paper to appear in Phonology (available here) Robert Daland discusses issues that can arise in defining a probability distribution over the infinite set of possible words, in particular with Hayes and Wilson’s (2008) MaxEnt phonotactic grammar model. In the general case, for this to succeed, the probability of strings of increasing length must decrease sharply enough such that the sum of their probabilities never exceeds 1, and simply continues to approach it. Daland defines the conditions under which this will obtain in the Hayes and Wilson model in terms of the requirements on the weight of a *Struc constraint that assigns a penalty that increases as string length increases.
In the question period after Robert’s presentation of this material at the GLOW computational phonology workshop in Paris in April, Jeff Heinz raised an objection against the general notion of formalizing well-formedness in terms of probabilities, and he repeated this argument at the Manchester fringe workshop last week. Here’s my reconstruction of it (hopefully Jeff will correct me if I get it wrong – I also don’t have the references to the earlier work that made this argument). Take a (relatively) ill-formed short string. Give it some probability. Now take a (relatively) well-formed string. Give it some probability. Now concatenate the well-formed string enough times until the whole thing has probability lower than the ill-formed string, which it eventually will.
This is meant to be a paradox for the view that we can formalize well-formedness in terms of probabilities: the long well-formed string has probability lower than the short ill-formed string. It’s not clear to me, however, that there is a problem (and it wasn’t clear to Robert Daland either – the question period discussion lasted well into lunch, with Ewan Dunbar taking up Jeff’s position at our end of the table). Notice that Jeff’s argument is making an empirical claim that the concatenation of the well-formed strings does not result in a well-formedness decrease. When I talked to him last week, he claimed that this is clearer in syntax than phonology. Robert’s position (which I agree with) is that it likely does – though from his review of the literature on phonotactic well-formedness judgments we don’t seem to have empirical data on this point.
Robert asked us to work with him in designing the experiment, and at the time I wasn’t sure that this was the best use of our lunch time, but I think he has a point. If this is in fact an empirical issue, and we can agree beforehand on how to test it, then this would save a lot of time compared with the usual process of the advocates of one position designing an experiment, which even if it turns out the way they hope, can then be criticized by the advocates of the other position as not having operationalized their claim properly, and so on…
It’s also of course possible that this is not an empirical issue: that there is a concept of perfect well-formedness that probabilistic models cannot capture. This reminds me of a comment on a talk I got once from a prominent syntactician when I discussed probabilistic models that can give probability vanishingly close to zero to ill-formed structures: “but there are sentences that I judge as completely out for English – they should have probability zero”. My response was to simply repeat the phrase vanishingly close to zero, and check to make sure he knew what I meant.