Discussion: Generative linguistics and neural networks at 60

From Joe Pater

The commentaries on my paper “Generative Linguistics and Neural Networks at 60: Foundation, Friction and Fusion” are all now posted on-line at the authors’ websites at the links below. The linked version of my paper and – I presume – of the commentaries are the non-copyedited but otherwise final versions that will appear in the March 2019 volume of Language in the Perspectives section.

I decided not to write a reply to the commentaries, since they nicely illustrate a range of possible responses to the target article, and because most of what I would have written in a reply would have been to repeat or elaborate on points that are already in my paper. But there is of course lots more to talk about, so I thought I’d set up this blog post with open comments to allow further relatively well-archived discussion to continue.

Iris Berent and Gary Marcus. No integration without structured representations: reply to Pater.

Ewan Dunbar. Generative grammar, neural networks, and the implementational mapping problem.

Tal Linzen. What can linguistics and deep learning contribute to each other?

Lisa Pearl. Fusion is great, and interpretable fusion could be exciting for theory generation.

Chris Potts. A case for deep learning in semantics

Jonathan Rawski and Jeff Heinz. No Free Lunch in Linguistics or Machine Learning.

Posted in Uncategorized
7 comments on “Discussion: Generative linguistics and neural networks at 60
  1. Sam Bowman says:

    Re. Berent and Marcus: Is there public write-up of “How we reason about innateness”? It’s cited as evidence for a fairly bold claim (“resistance to innate ideas could well be grounded in core cognition itself”), but all I can find is a talk with no associated paper.

  2. Sam Bowman says:

    (Porting over from Twitter)

    Either I don’t get the spirit of the Rawski & Heinz response, or it misses an easy opportunity to draw parallels between symbolic and representation learning-based approaches to language. They claim that “any serious scientific application of neural architectures within linguistics must always strive to make the learner’s biases transparent,” with the strong implication that this isn’t being done.

    Specifying the model architecture and learning algorithm used for an NN model specifies the model’s bias, no? Nearly every paper in this literature gives that key information, and many, many CL papers discuss the consequences of the specifications for what is learnable easily and what is learnable at all. Very little of this discussion is couched in the language of learning theory and the Chomsky Hierarchy, but the discussion is absolutely happening. I see a clear opportunity for bridge-building here, but I don’t see a clear failing in current work unless you take that body of theory as a particularly privileged approach to the science of language.

    (Of course, the original article could have commented on this too, but I don’t think it was clearly called for.)

  3. Tal Linzen says:

    Reposted from Twitter:

    The Berent & @Marcus response summarizes familiar arguments against 1980s-style connectionism, but it would have been useful to see a discussion of more recent work, e.g. recent neural architectures with structured inductive biases (syntactic, relational, compositional, etc), which I assume would not be taken to follow the “associationist hypothesis” (from Chris Dyer, Jacob Andreas, Richard Socher, Sam Bowman), Kirov and Cotterell experimental work showing that modern seq2seq networks (without explicit algebraic representations) can in fact learn the English past tense (https://arxiv.org/abs/1807.04783), etc, etc. The only recent papers that do get mentioned are ones that support the authors’ argument – @LakeBrenden & Baroni’s (very cool) experimental work that demonstrates lack of systematicity in standard seq2seq networks (https://arxiv.org/abs/1711.00350 and https://arxiv.org/abs/1807.07545).

    In any case I agree with Berent & Marcus that (1) the goal is to create a model that generalizes like humans, and (2) to get there we need to run experiments on both models and humans, and if necessary add different/stronger inductive biases (nothing controversial here).

    • Joe Pater says:

      Thanks Tal! Just in case someone reading your post hasn’t read the paper on which Berent and Marcus were commenting, I should point out that I tried to have a balanced discussion there on the issue of whether explicit linguistic representations, including symbols, are needed in neural net models of language, and included Kirov and Cotterell as an example of an interesting recent result that suggests that current architectures can do more without variables than earlier ones could. It seems like from Berent and Marcus’ perspective, I was leaning too far in the direction of endorsing symbol-free models. The only major thing missing from the commentaries, I think, is someone from the other side, arguing that I was being too optimistic about the need for explicit linguistic representations.

      • Tal Linzen says:

        To summarize my main points: (1) current practice in the neural network world has moved beyond what Gary has termed eliminative connectionism, and many “deep learning” systems have components that could qualify as symbols, variables and compositional representations; (2) discussion of the abilities and limitations of neural networks should make reference to specific experimental results obtained on specific neural network architectures.

Leave a Reply

Your email address will not be published. Required fields are marked *

*