We propose a new annotation layer for the English part of the Dundee Eye-Tracking Corpus, a resource of eye-tracking data from ten participants reading newspaper articles (~50K words) (Kennedy & Pynte 2005). In this corpus, we manually annotated all the anaphoric pronouns and their antecedents. This allows us to study anaphoric relations in natural text reading.
The resource can be downloaded as a csv-file which contains three columns. The first one gives the word number of each pronoun. The second gives the number of the text in which the pronoun appeared. Please note that for every new text, the word numbers start at zero again. The last column gives the word span of the antecedent if the pronoun was anaphoric, "other" otherwise. Please note that our annotation does not feature word-forms for the pronouns and the antecedents, because we do not have the licence to distribute the Dundee Corpus.
Note that there is also an universal dependency annotation layer available for the English part of the Dundee Eye-Tracking corpus that has been build by Maria Barrett and her colleagues (Barett et al, 2015).
The construction of our data-set is described in the following article (in press), please refer to it when using our data.
Download (CSV)
Thanks to Amandine Martinez for her help with the annotation.
Laboratoire de Linguistique Formelle – UMR 7110 CNRS et Université Paris Cité – RNSR : 200112497J
Adresse géographique : Bât. Olympe de Gouges, 5ème étage. 8, Rue Albert Einstein 75013 Paris
Envoyer un courrier : Case Postale 7031 – 5, rue Thomas Mann – F-75205 Paris Cedex 13
Transports : Métro ligne 14 : arrêt "Bibliothèque François Mitterrand" – Tram T3A : arrêt "Avenue de France" – Bus n°89 et 62 : arrêt "Porte de France"
Téléphone : (+33) (0)1 57 27 57 64 – Télécopie : (+33) (0)1 57 27 57 81
Directeur de la publication : Heather Burnett – Dernière mise à jour : 2025-11-08