Switchboard in NXT |
The Switchboard Corpus in NXT |
||||||||||
|
The Switchboard in NXT
project aims to bring together major annotations of the Switchboard
corpus within a unified framework in XML format. The Switchboard corpus,
consisting of telephone conversations between speakers of American
English,
is one of the longest-standing corpora of fully spontaneous
speech. As such, there have been a range of different sorts of
linguistic information annotated on it, including syntax, discourse
semantics and prosody. In this project, we have converted all of these
into XML format within the Nite XML Toolkit (NXT)
framework. This allows users to query the corpus to extract data with
any combination of features from the whole range of annotated
linguistic information.
This site gives an overview of the corpus and the aims of the project, as well as more detailed information about the data structure of the corpus in NXT, and the annotations included in it. There is also a short guide to using the corpus with NXT tools, and a summary of the data representation in NXT, as well as instructions on how to download the corpus. Finally, we provide links to further information about the different annotations, including those done outside the project; and a list of publications from research done using the corpus so far. With such a diverse range of annotations, we believe the corpus offers one of the richest resources available for the study of discourse in spontaneous speech. We hope by its public release that it will be widely used and developed by researchers in the linguistics and NLP communities. |