Anaphoric Pronouns and their Antecedent in the Dundee Eye-Tracking Corpus

APADEC (Anaphorical Pronouns and their Antecedents in the Dundee Eye-Tracking Corpus)

by Olga Seminck and Pascal Amsili

We propose a new annotation layer for the English part of the Dundee Eye-Tracking Corpus, a resource of eye-tracking data from ten participants reading newspaper articles (~50K words) (Kennedy & Pynte 2005). In this corpus, we manually annotated all the anaphoric pronouns and their antecedents. This allows us to study anaphoric relations in natural text reading.

The resource can be downloaded as a csv-file which contains three columns. The first one gives the word number of each pronoun. The second gives the number of the text in which the pronoun appeared. Please note that for every new text, the word numbers start at zero again. The last column gives the word span of the antecedent if the pronoun was anaphoric, "other" otherwise. Please note that our annotation does not feature word-forms for the pronouns and the antecedents, because we do not have the licence to distribute the Dundee Corpus.

Note that there is also an universal dependency annotation layer available for the English part of the Dundee Eye-Tracking corpus that has been build by Maria Barrett and her colleagues (Barett et al, 2015).

The construction of our data-set is described in the following article (in press), please refer to it when using our data.

Download (CSV)

Acknowledgements

Thanks to Amandine Martinez for her help with the annotation.

References

  • Kennedy, Alan, and Joël Pynte (2005). Parafoveal-on-foveal effects in normal reading. In Vision research 45.2: (pp. 153-168).
  • Barrett, M. J., Agic, Z., & Søgaard, A. (2015). The Dundee Treebank. In Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories: TLT14 (pp. 242-248). Warsaw, Poland: Association for Computational Linguistics.