Séminaire Alpage : Anders Søgaard

Vendredi 22 Mai 2015, 15:00 to 17:00
Organisation: 
Alpage
Lieu: 

ODG – Salle 118

Anders Søgaard (University of Copenhagen)
Language technology for everyone

High-quality NLP tools - from tokenization to semantic parsers - exist for 10-15 of the world’s seven thousand languages, of which we have digital texts for at least a quarter. Even for the major languages, such as English, our tools only fair reasonably well on standard language, and not on informal language or dialect. We even see gender and age biases affect our tools’ performance. In addition our tools often over-fit arbitrary annotation choices, arguably making them even less robust to lingustic diversity. This talk surveys recent efforts in the COASTAL group to bridge these gaps.