Switchboard in NXT


The corpus currently has the following layers of annotation, integrated within the XML structure. Annotation layers are grouped according to the version of the Switchboard transcript they used. See here for a summary of the XML elements, their attributes and relationships to the rest of the annotations in the XML. And see here for links to detailed explanations of the annotation guidelines used for each layer and discussions of technical issues to be aware of in the integration of the data into NXT.

Note that not all annotations are available for all conversations in the NXT release, see here for a summary of annotation coverage by conversation.
  • Based on the Penn Treebank transcript:
    • terminals: the original orthographic transcription of the corpus, as included in the Switchboard Penn Treebank release. Includes words, punctuation and silence, as well as traces marking the origin of 'moved' syntactic elements. Part-of-speech information is included. This version did not originally include timing information, so word timings have been derived by automatic alignment with the MS-State version of the transcript.
    • syntax: the syntactic structure of the terminals, annotated as part of the Penn Treebank project. The hierarchical syntax structure is represented by parent-child relationships in the XML. The syntactic phrase category (e.g. VP, NP), optional sub-category (e.g. SBJ, MNR), timing information and a word count of the phrase is included.
    • movement: marks the link between traces and antecedents as co-indexed in the Treebank annotation. For example, in "What book_i did you buy t_i?", what book is the antecedent of the trace (t).
    • turns: encodes the speaker turns within each conversation, i.e. the approximate linear order in which the sentences were said by each speaker (note that overlapping speech can appear before or after the other speaker's turn).
    • disfluency: coding of disfluent speech from the Treebank release. Disfluencies consist of a reparandum, i.e. the words where the speaker hesitated or made a false start; and a repair, where the speaker corrected the error, e.g. "the-_reparandum the government_repair"
    • active: sentences which have been automatically identified as being in the active voice
    • markable: encoding of selected NPs at Edinburgh and Stanford for information status (old, mediated or new) and animacy (e.g. human, animal, non-concrete). Note only a portion of the corpus is annotated for information status.
    • coreference: marks the relationship between each anaphor (i.e. NP marked as old) and its antecedent, i.e. the previous mention of the referent of that NP in the discourse. Done as part of the information status annotation. 
    • kontrast: encoding of selected content words (e.g. nouns, verbs, adjectives) at Edinburgh as to whether they are kontrastive, i.e. made salient to distinguish them from alternatives to that word which could have been used in the context. Coding was done according to certain categories of kontrast, e.g. contrastive, subset or answer. Note only the portion of the corpus annotated for information status was annotated for kontrast.
    • trigger: encodes the relationship between certain kontrasts and the word(s) that motivated their marking. For example, if A says "I live in Garland", and B replies "Well, I prefer San Antonio", then Garland motivates the marking of San Antonio as contrastive (a type of kontrast).
    • dialAct: the dialogue (or speech) act of units of the discourse, e.g. statement, question by Shriberg et al 1998. Note the units used were based on conversation purpose, not syntax. They are roughly equivalent to syntactic sentences, but often do not align with them.

  • Based on the MS-State transcript:
    • phonwords: representation of the corrected, time-aligned MS-State transcript of the corpus. Includes words, laughter and noise. Timing information, and the stress profile of the syllables in the word is included, e.g. "agree" has the profile 'np', i.e. no stress-primary stress.
    • syllables: automatically derived syllable information, done at Stanford. Includes stress information (primary, secondary or no stress).
    • phones: automatically derived phone information based on MS-State transcript, done at Stanford. Includes phone identity, and timing information. Users should be aware of technical issues in the automatic phone boundary detection when using the phone times.
    • accent: pitch accents, associated with words in the MS-State transcript. The time of the peak of the accent as annotated, and the strength of the accent (weak,full) are given. Annotations fall into three sets according to their source:  some of the annotation was done at the University of Washington (UW prosody), some we have converted from the UW set to our standards (Ed converted) and some are our own annotations (Ed original). For the Ed annotations, word association was marked manually; whereas for the UW annotations it was derived automatically from word timing. Accent type is also given for accents in the Edinburgh/Stanford set (nuclear, pre-nuclear, plain).  Note prosody annotation has only been completed on a portion of the corpus.
    • phrase: grouping of words in the MS-State transcript into prosodic phrases. Timing information is given, as well as the phrase type, determined by the ToBI break index following the last word in the phrase (minor, major, disfluent, backchannel). Note in the Edinburgh/Stanford annotations phrases were directly marked by annotators, whereas for the University of Washington annotations they were determined automatically from the break information and silences.
    • breaks: ToBI break index, for the University of Washington annotations only. Breaks have been aligned automatically with the nearest word boundary in the MS-State transcript. The ToBI break index, and phrase and boundary tone (where applicable), are given, along with the time of the break in the original University of Washington annotation. For the Ed original set, these were generated automatically from the phrase files and do not include boundary tones.
    • prosnotes: notes made by annotators during the prosody annotations, for the Edinburgh set only. Includes comments on errors in the transcription, f0 tracking problems, etc.