Switchboard in NXT

The Switchboard Corpus in NXT

HOME
OVERVIEW
DATA STRUCTURE
ANNOTATIONS
DATA SUMMARY
GETTING STARTED
DOWNLOAD
PUBLICATIONS
LINKS
CONTACT US

The Switchboard in NXT project aims to bring together major annotations of the Switchboard corpus within a unified framework in XML format. The Switchboard corpus, consisting of telephone conversations between speakers of American English, is one of the longest-standing corpora of fully spontaneous speech. As such, there have been a range of different sorts of linguistic information annotated on it, including syntax, discourse semantics and prosody. In this project, we have converted all of these into XML format within the Nite XML Toolkit (NXT) framework. This allows users to query the corpus to extract data with any combination of features from the whole range of annotated linguistic information.

phone call imageThis site gives an overview of the corpus and the aims of the project, as well as more detailed information about the data structure of the corpus in NXT, and the annotations included in it. There is also a short guide to using the corpus with NXT tools, and a summary of the data representation in NXT, as well as instructions on how to download the corpus. Finally, we provide links to further information about the different annotations, including those done outside the project; and a list of publications from research done using the corpus so far.

With such a diverse range of annotations, we believe the corpus offers one of the richest resources available for the study of discourse in spontaneous speech. We hope by its public release that it will be widely used and developed by researchers in the linguistics and NLP communities.