Maximin Coavoux

Ancien doctorant

Status : Doctorant

Address :

LLF, CNRS – UMR 7110
Université Paris Diderot-Paris 7
Case 7031 – 5, rue Thomas Mann,
75205 Paris cedex 13

E-mail : zpbnibhk@yvathvfg.havi-cnevf-qvqrebg.se

Website : https://github.com/mcoavoux/

Thèse

Title : Discontinuous constituency parsing of morphologically rich languages

Supervision :
  Benoît Crabbé

PhD Defense : 2017-12-11

Inscription : 2014 à Paris Diderot

Jury :

  • Benoît Crabbé (directeur de thèse), Université Paris Diderot;
  • Claire Gardent (rapporteure), CNRS;
  • Alexis Nasr (rapporteur), Aix-Marseille Université;
  • Carlos Gómez Rodríguez (examinateur), Université de La Corogne;
  • Alexandre Allauzen (examinateur), Université Paris Sud.

Abstract :

Syntactic parsing consists in assigning syntactic trees to sentences in natural language. Syntactic parsing of non-configurational languages, or languages with a rich inflectional morphology, raises specific problems. These languages suffer more from lexical data sparsity and exhibit word order variation phenomena more frequently. For these languages, exploiting information about the internal structure of word forms is crucial for accurate parsing.

This dissertation investigates transition-based methods for robust discontinuous constituency parsing. First of all, we propose a multitask learning neural architecture that performs joint parsing and morphological analysis. Then, we introduce a new transition system that is able to predict discontinuous constituency trees, i.e.\ syntactic structures that can be seen as derivations of mildly context-sensitive grammars, such as LCFRS. Finally, we investigate the question of lexicalization in syntactic parsing. Some syntactic parsers are based on the hypothesis that constituents are organized around a lexical head and that modelling bilexical dependencies is essential to solve ambiguities. We introduce an unlexicalized transition system for discontinuous constituency parsing and a scoring model based on constituent boundaries. The resulting parser is simpler than lexicalized parser and achieves better results in both discontinuous and projective constituency parsing.

Bibliography

  • Maximin Coavoux, Benoît Crabbé. 2017. Représentation et analyse automatique des discontinuités syntaxiques dans les corpus arborés en constituants du français. TALN 2017 (long). [pdf] [bib] [code]
  • Maximin Coavoux, Benoît Crabbé. 2017. Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks. EACL 2017 (short). [pdf] [bib] [code]
  • Maximin Coavoux, Benoît Crabbé. 2017. Incremental Discontinuous Phrase Structure Parsing with the GAP Transition. EACL 2017. [pdf] [bib] [code] Outstanding paper award
  • Chloé Braud, Maximin Coavoux, Anders Søgaard. 2017. Cross-lingual RST Discourse Parsing. EACL 2017. [pdf] [bib]
  • Maximin Coavoux and Benoît Crabbé. 2016. Neural greedy constituent parsing with dynamic oracles. Proceedings of ACL. [pdf] [bib] [code]
  • Maximin Coavoux, Benoît Crabbé. 2016. Prédiction structurée pour l’analyse syntaxique en constituants par transitions : modèles denses et modèles creux. Traitement Automatique des Langues, ATALA, 2016, 57 (1). [pdf] [bib]
  • Maximin Coavoux, Benoît Crabbé. 2015. Comparaison d’architectures neuronales pour l’analyse syntaxique en constituants. In Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles, p. 293–304, Caen, France : Association pour le Traitement Automatique des Langues. Prix TALN 2015 (Best paper award). [pdf] [bib]