
ODG – Salle du conseil (533)
Tal Linzen (LSCP & IJN)
Can recurrent neural networks acquire hierarchical representations from natural texts?
Recent technological advances have made it possible to train artificial neural networks on a much larger scale than before. These networks are highly effective in natural language processing tasks such as machine translation and speech recognition. These engineering advances are surprising from a linguistic perspective: linguists generally consider sentences to have rich hierarchical structure, but most of these highly successful systems are based on sequence models without pre-engineered hierarchical syntactic representations. Is the performance of neural networks really as impressive when evaluated using rigorous linguistic diagnostics of hierarchy? If so, how do these networks represent or approximate hierarchical structures?
This talk will report on the results of an ongoing analysis of the ability of contemporary recurrent neural networks to master English subject-verb number agreement, one of the numerous phenomena that serve as evidence for hierarchical syntactic structure (joint work with Emmanuel Dupoux and Yoav Goldberg). We probed the syntactic capabilities of the architecture using training objectives with explicit grammatical targets (verb number prediction and acceptability judgments) as well as using a word prediction task with no grammatical supervision ("language modeling").
The verb number prediction model achieved an impressive error rate of less than 1%; inspection of the network suggests that some of its units can detect syntactic structure, at least when there are overt cues to it (function words). Closer examination revealed that error rates increased in sentences where structural and linear factors conflicted; in difficult cases, the model's performance approached chance levels. Models that were trained without explicit attention to verb number performed much worse, approaching a 100% error rate on challenging sentences. These results suggest that contemporary recurrent neural networks can learn to approximate certain aspects of syntactic structure surprisingly well, but only in common sentence types and when given explicit grammatical supervision. Stronger inductive biases may be necessary to eliminate errors altogether and acquire syntax from a natural input.