Zulipiye Yusupujiang

Docteurs récents

Status : PhD student

Address :

LLF, CNRS – UMR 7110
Université Paris Diderot-Paris 7
Case 7031 – 5, rue Thomas Mann,
75205 Paris cedex 13

E-mail : mhyvcvlr.lhfhchwvnat@yvathvfg.havi-cnevf-qvqrebg.se

Thèse

Title : Characterizing the Response Space of Questions in Dialogue across Languages

Supervision :
  Jonathan Ginzburg

PhD Defense : 2024-05-15

Inscription : 2019 à Paris Cité

Jury :

  • David TRAUM (Rapporteur), Institute for Creative Technologies (ICT), University of Southern California (USC)
  • Claire GARDENT (Rapportrice), Université de Lorraine
  • Abdurishid YAKUP (Examinateur), Berlin-Brandenburgische Akademie der Wissenschaften
  • Heather Susan BURNETT (Examinatrice), Université Paris Cité
  • Jonathan GINZBURG (Directeur de thèse), Université Paris Cité

Abstract :

This thesis aims to develop a systematic classification of responses to questions in dialogues. We introduce a theoretically grounded and empirically tested response space taxonomy with nine unique response classes, providing a formal representation for each within a dialogical formal semantics framework. To evaluate our taxonomy across languages, we conducted a comparative study with Uyghur, a low-resource Turkic language with unique characteristics compared to English. Given the absence of a Uyghur dialogue corpus, we used an open-source customizable communication platform. This resulted in the creation of the first freely available Uyghur chat-based dialogue corpus (UgChDial), annotated with our response space taxonomy. Our comparative study of English and Uyghur responses to questions reveals a generally similar distribution of response classes in these two languages, with some exceptions. The taxonomy covered over 99.0% of the question-response pairs in both languages. Furthermore, we conducted preliminary investigations to automate the classification of the response space of questions in dialogues. We designed 26 features to capture the syntactic, semantic, and lexical characteristics of questions and responses. These features were then utilized to contrast the results of automatic classifications derived from traditional machine learning algorithms with the results obtained from a pre-trained large-scale BERT language model. This thesis also tackles the challenge of interpreting indirect answers to various wh-questions. We constructed a unique corpus of such responses in English, believed to be the first of its kind, and conducted a preliminary study on generating these responses using the pre-trained generative language model, DialoGPT. Our findings suggest that this task poses significant challenges for models like GPT, due to the complex and inference-heavy nature of indirect answers.