Flexique : an inflectional lexicon for spoken French

Responsable : Olivier Bonami

Documentation : PDF

Téléchargement : ZIP (1 Mo)

Participants : Olivier Bonami (U. Paris-Sorbonne, IUF), Gauthier Caron (U. de la Réunion), Clément Plancq (CNRS)

Description : Flexique was designed as a tool for studying the structure of the French inflection system1. In its current formit consists of three tables of French nouns, adjectives and verbs.

POS lexemes words
nouns 31,002 65,111
adjectives 11,252 45,008
verbs 4,987 253,174
total 47,241 363,293

Flexique derives from Lexique2, a lexical database of phonetic, lexical, morphosyntactic, and frequency information on French. Like its predecessor, Flexique is distributed as an open source resource, under a Creative Commons Attribution-NonCommercial-ShareAlike license.

Caractéristiques : Flexique was derived from Lexique version 3.70, a database collecting information of various kinds on 142694 French words, and distributed as an open source resource. Lexique is an extremely useful resource, but can be frustrating for the investigation of inflection, for a number of reasons:

  • Lexique only collects entries for word forms that have occurrences in one of two corpora, post-1950 Frantext and the French Subtitles Corpus. As a result, inflectional paradigms are not complete; in particular there are very few verbs whose full paradigm is documented.
  • Because Lexique is word-centric rather than lexeme-centric, sometimes forms of the same lexeme are not described coherently.
  • The phonetic transcriptions of Lexique are a bit too surfacy for many purposes. In particular, there is no explicit representation of schwa optionality or mid vowel neutralization.
  • Although Lexique is being constantly improved, it has never been thoroughly validated by hand. Thus many scattered errors remain, both in transcriptions and in morphosyntactic annotations.

Flexique was designed with the goal of complementing Lexique in these domains. In particular:

  • Flexique is organized by lexemes rather thanwords; it provides full paradigms for all adjectives, nouns and verbs one of whose forms is documented in Lexique.
  • The phonetic transcriptions strike a balance between surface correctness and generality; the idea is to have for each word a unique phonological representation from which all phonologically predictable phonetic variants of a word can be deduced. This entails having systematic information on the location of possible schwas, even when these are only very seldom realized. This also entails providing notations for neutralized vowels.

1We wish to thank Boris New and Christophe Pallier, whose work on Lexique was an indispensable precondition for the constitution of Flexique, as well as an inspiration for attempting to build it. Boris New has been extremely helpful in sharing his expertise. Gilles Boyé and Delphine Tribout provided crucial help at various points. Flexique was designed at the Laboratoire de Linguistique Formelle and funded by Olivier Bonami’s IUF grant.

2New B., Pallier C., Ferrand L., Matos R. (2001). – Une base de données lexicales du français contemporain sur internet : LEXIQUE. – L’Année Psychologique, 101, 447-462. http://www.lexique.org