Adverbial Presupposition Triggering Dataset

The Adverbial Presupposition Triggering Dataset is a dataset comprising of processed versions of the Penn Treebank (PTB) corpus (Marcus et al. 1993} and the English Gigaword corpus (Graff et al. 2007). It was created for the task of predicting adverbial presupposition triggers presented in the following paper:

Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers
Andre Cianflone*, Yulan Feng*, Jad Kabbara* and Jackie C.K. Cheung
ACL 2018, Melbourne, Australia, July 2018.
[ACL anthology link]

The study considers five adverbs: again, also, still, too, yet.


We define a sample in our dataset as a 3-tuple, consisting of a label (representing the target adverb, or 'none' for a negative sample), a list of tokens we extract (before/after the adverb), and a list of corresponding POS tags. In each sample, we also add a special token "@@@@" right before the head word and the corresponding POS tag of the head word, both in positive and negative cases. We add such special tokens to identify the candidate context in the passage to the model. The figure below shows a single positive sample in our dataset: