Adverbial Presupposition Triggering Dataset
When extracting positive data, we scan through all the documents, searching for target adverbs. For each occurrence of a target adverb, we store the location and the governor of the adverb. Taking each occurrence of a governor as a pivot, we extract the 50 unlemmatized tokens preceding it, together with the tokens right after it up to the end of the sentence (where the adverb is)--with the adverb itself being removed. If there are less than 50 tokens before the adverb, we simply extract all of these tokens. In preliminary testing using a logistic regression classifier, we found that limiting the size to 50 tokens had higher accuracy than 25 or 100 tokens. As some head words themselves are stopwords, in the list of tokens, we do not remove any stopwords from the sample; otherwise, we would lose many important samples.
We filter out the governors of "too" that have POS tags "JJ" and "RB" (adjectives and adverbs), because such cases corresponds to a different sense of "too" which indicates excess quantity and does not trigger presupposition (e.g., "rely too heavily on", "it's too far from").
After extracting the positive cases, we then use the governor information of positive cases to extract negative data. In particular, we extract sentences containing the same governors but not any of the target adverbs as negatives. In this way, models cannot rely on the identity of the governor alone to predict the class. This procedure also roughly balances the number of samples in the positive and negative classes.
For each governor in a positive sample, we locate a corresponding context in the corpus where the governor occurs without being modified by any of the target adverbs. We then extract the surrounding tokens in the same fashion as above. Moreover, we try to control position-related confounding factors by two randomization approaches: 1) randomize the order of documents to be scanned, and 2) within each document, start scanning from a random location in the document. Note that the number of negative cases might not be exactly equal to the number of negative cases in all datasets because some governors appearing in positive cases are rare words, and we're unable to find any (or only few) occurrences that match them for the negative cases.