The Wikilinks Rare Entity Prediction Dataset
The dataset is contained in this single compressed file:
rare_entity_dataset.tar.gz (1042563276 bytes, MD5: 39b816112052038dd06e4fc8928df218)
This archive contains 3 files:
rare_entity_dataset.tar.gz (1042563276 bytes, MD5: 39b816112052038dd06e4fc8928df218)
This archive contains 3 files:
corpus.txt: This file contains the content of parsed web pages. Each line represents a single document, with entity mentions being replaced by their corresponding freebase_ids.
entities.txt: This file contains specific information about the entities that appear in the corpus. Each line consists of five entries, separated by tabs, which includes the freebase_id, anchor_text, wiki_url, freebase_name, and description of an entity.
splits/: This directory contains train/valid/test split of corpus.txt used for the experiments.
README.md: A readme.