Antoine Simoulin

Ancien doctorant

Status : PhD Student

Address :

LLF, CNRS – UMR 7110
Université Paris Diderot-Paris 7
Case 7031 – 5, rue Thomas Mann,
75205 Paris cedex 13

Mail :


Title : Sentence embeddings and their relation with sentence structures

Supervision :
  Benoît Crabbé

PhD Defense : 2022-07-07

Inscription : 2019 à Paris Ciuté

Jury :

  • Claire Gardent, CNRS et Université de Lorraine, rapporteuse ;
  • Eric Gaussier, Université Grenoble Alpes, rapporteur ;
  • Rachel Bawden, Inria, examinatrice ;
  • Loïc Barrault, Le Mans Université, examinateur ;
  • Nicolas Brunel, ENSIIE et Laboratoire de Mathématiques et Modélisation d'Évry, membre invité du jury ;
  • Benoit Crabbé, Université Paris Cité, directeur.

Abstract :

Historically, models of human language assume that sentences have a symbolic structure and that this structure allows us to compute their meaning by composition. In recent years, deep learning models have successfully processed tasks automatically without relying on an explicit language structure, thus challenging this fundamental assumption. This thesis seeks to clearly identify the role of structure in language modeling by deep learning methods. The dissertation specifically investigates the construction of sentence embeddings—semantic representations based on vectors—by deep neural networks. Firstly, we study the integration of linguistic biases in neural network architectures to constrain their composition sequence based on a traditional tree structure. Secondly, we relax these constraints to analyze the latent structures induced by the neural networks. In both cases, we analyze the compositional properties of the models as well as the semantic properties of the sentence embeddings.