Vincent Segonne

Ancien doctorant

Status : Doctorant

Address :

LLF, CNRS – UMR 7110
Université Paris Diderot-Paris 7
Case 7031 – 5, rue Thomas Mann,
75205 Paris cedex 13

E-mail :


Title : French Verb Sense Disambiguation

Supervision :
  Benoît Crabbé

PhD Defense : 2021-12-16

Inscription : 2017 à University of Paris

Jury :

  •     Philippe Langlais (rapporteur), Université de Montréal
  •     Emmanuel Morin (rappporteur), Université de Nantes
  •     Marianna Apidianaki (examinatrice), Université de Pennsylvanie
  •     Didier Schwab (examinateur), Université de Grenoble Alpes
  •     Benoît Crabbé (directeur de thèse), Université de Paris
  •     Marie Candito (co-encadrante), Université de Paris

Abstract :

 Word Sense Disambiguation (WSD) is a Natural Language Processing (NLP) task which goal is to automatically predict the meaning of words in context, based on a predefined sense inventory. The success of this task relies on the use of lexical resources and sense annotated data. Moreover, the recent development of contextual representations based on deep learning neural networks has greatly improved the performance of disambiguation systems.

In this thesis, we focus on the disambiguation of French verbs, a language that has little or no viable data for this task. First, we review a state of the art of neural net based contextual representations and disambiguation methods.

Then, we investigate the role of syntax for the disambiguation of verbs. To do so, we first perform a corpus study exploring the potential correlation between verbs' argument structures and senses. We then study whether the argument structure of verbs is encoded in contextual representations obtained from attention-based neural networks. We also propose a model that learns contextual representations from syntactic structures of sentences provided a priori by a parser and test them on the disambiguation task.

Finally, in the last part of this thesis, we address the problem of data availability regarding the WSD task for any other language than English, using French as an example. After studying various automatically produced resources, we propose to use Wiktionary, a free and collaborative dictionary based on the Wikipedia model, and release FrenchSemEval, the first evaluation corpus for the French verb disambiguation task. We evaluate several disambiguation systems on this dataset and obtain the very first results for this task.