Séminaire Alpage : Yuval Marton

Vendredi 19 Juin 2015, 11:00 to 13:00
Organisation: 
Alpage
Lieu: 

ODG – 165

Yuval Marton (Microsoft)
Distributional Paraphrasing with Distributional and Hybrid Semantic Distance Measures

Semantic distance measures estimate how close in meaning two words or phrases (or larger text units) are. These measures are useful in paraphrase generation, which in turn, is useful in NLP tasks such as statistical machine translation (SMT), information retrieval (IR), syntactic parsing, summarization and language generation. I will start with presenting semantic measures: Lexicon-based semantic measures rely a dictionary, thesaurus, or taxonomy (e.g., WordNet), while distributional measures rely instead only on word distributions in a large corpus of non-annotated text (word2vec being a recent example). Lexicon-based measures tend to have higher correlation with human judgments, but lower coverage than distributional measures, especially for multi-word terms, specialized domains, resource-poor ("low density") languages, or non-classical semantic relations.

Therefore, we are motivated to explore hybrid lexicon/corpus-based models that would benefit from both worlds. Previous work used shallow thesaurus-based “concepts” (lists of related words) for defining a coarse-grained aggregated distributional representation. I will show that finer granularity, in hybrid models, can benefit from concept information while retaining high-coverage word-based distributional representation.  Next, I will present a largely language-independent distributional paraphrase generation method, employing some of these semantic measures. Time permitting, I will conclude with describing the integration and evaluation of paraphrasing in state-of-the-art SMT and in the IR task of event discovery and annotation.

About the speaker:

Yuval Marton is a computational linguist, active in lexical semantics, paraphrasing, parsing, statistical machine translation, and information retrieval (search engine result ranking). His interests also span using and adapting machine learning (ML) methods for natural language processing and understanding (NLP/NLU) – and using linguistically informed learning bias and feature design to make such ML-with-NLP methods more effective. He is currently a Senior Applied Scientist at Microsoft, working on semantic ranking and fact extraction from free text. Dr. Marton co-organized the ACL 2012 SP-Sem-MRL, the EMNLP 2013 SPMRL, and the COLING 2014 SPMRL-SANCL
workshops; served (or is serving) as the publication chair of *SEM 2012, *SEM 2013, EMNLP 2014 and EMNLP 2015 conferences; and delivered a tutorial session: “On-Demand Distributional Paraphrasing” at NAACL-HLT 2012. He received his Ph.D. in Linguistics from University of Maryland in 2009, concentrating on computational linguistics, with a Neuroscience and Cognitive Science (NACS) Program Certificate. He received his Masters in Computer Science from NYU/Poly in 2004.

http://www1.ccls.columbia.edu/~ymarton