Developing a Large-Scale Lexicon for a Less-Resourced Language: General Methodology and Preliminary Experiments on Sorani Kurdish

TitreDeveloping a Large-Scale Lexicon for a Less-Resourced Language: General Methodology and Preliminary Experiments on Sorani Kurdish
Publication TypeArticle dans des actes
Année de la conférence2010
AuthorsWalther, Géraldine, and Benoît Sagot
Nom de la conférenceProceedings of the 7th SaLTMiL Workshop on Creation and use of basic lexical resources for less-resourced languages (LREC 2010 Workshop)
Conference LocationValetta, Malta
Abstract

 

In this paper, we describe a general methodology for developing a large-scale lexicon for a less-resourced language, i.e., a language for which raw internet-based corpora and general-purpose grammars are virtually the only existing resources. We apply this methodology to the development of a morphological lexicon for Sorani Kurdish, an Iranian language mostly spoken in northern Iraq and north-western Iran. Although preliminary, our results demonstrate the relevance of this methodology. 

URLhttp://web.me.com/gwalther/homepage/Publications_(fr)_files/saltmil10soralex.pdf