Geo-referencing Twitter User Profile Locations

Reference: Beatrice Alex, Clare Llewellyn, Claire Grover, Jon Oberlander and Richard Tobin. 2016. Homing in on Twitter users: Evaluating an Enhanced Geoparser for User Profile Locations, full paper to appear at LREC 2016 in May 2016.

Gold standard data sets: Syria 2014, Commonwealth Games 2014, Ukraine 2014, Cities 2014 and the random test set.

This data contains a manually checked ground truth of geo-referencing information of Twitter user profile locations (UPLs). The geo-refercing information is composed of latitude/longitude coordinates, the country code the location appears in (if applicable) and the place name taken from the reference gazetteer entry. By default, we used GeoNames as the reference gazetteer for creating this gold standard and backed off to Wikipedia or Google Maps in case the location is not in GeoNames. We therefore provide GeoNames identifiers or mark the geo-reference as stemming from Google Maps or Wikipedia where necessary. If the UPL did not represent a geo-referenceable location, we left the geo-referencing information blank. We collected this data in 2014 and more information on each dataset can be found in the aforementioned paper. Alongside the geo-refrencing information, we provide Tweet and user ID as well as original UPL. Some users will have updated their UPL since this data was created. This is one of the reason why we provide the TweetID as to anchor the gold standard data in time. This also allows others to reconstruct the tweets using the Twitter API.

Full topic-specific data sets: Syria 2014, Commonwealth Games 2014, Ukraine 2014 and Cities 2014.

These sets are the full topic-specific Twitter user profile location data sets collected in 2014 which were the basis for our geo-referencing experiment. A full description of how and when the data was collected can be found in the LREC paper. While we did not use the tweet information in our study, we provide tweet ids and user ids for others to be able to reconstruct the tweets.

Collecting Topic-spefic Data from Twitter

Reference: Clare Llewellyn, Claire Grover, Beatrice Alex, Jon Oberlander and Richard Tobin. 2015. Extracting a Topic Specific Dataset from a Twitter Archive. In Proceedings of TPDL 2015, September 2015, PoznaƄ, Poland, pp. 364-367. ***Winner of the best poster/demo award.*** [pdf, poster]

Data: available on request to first author