Switchboard in NXT

Data Summary

HOME
OVERVIEW
DATA STRUCTURE
ANNOTATIONS
DATA SUMMARY
GETTING STARTED
DOWNLOAD
PUBLICATIONS
LINKS
CONTACT US
Below is a summary of the data representation of the corpus within the XML structure. It is intended as a reference guide when constructing queries. Annotation layers are grouped according to the version of the Switchboard transcript they used. Follow the links for full lists of values of certain attributes, as well as full information about annotations provided by the authors of each layer.

Note that not all coding layers are available for all conversations in the NXT release. See here for a summary of annotation coverage by conversation.

The tables list: the name of the codings file for each layer of annotation; the possible elements within each coding layer; the attributes of each element (note not all elements take all attributes, see the metadata file for more details); the values, or examples of values, for each attribute; the relationships described for those elements with other layers of annotation; and the authors of each layer of annotation. Note only parent of and points at relationships are shown, where the pointer relationship is named, this is given in brackets.

Based on the Penn Treebank transcript:

Coding

Elements

Attributes

Values

Relationships

Authors

terminals word, punc, sil, trace nite:start, nite:end orth, pos =
VB, NN, ... (all)
point(phon) at phonword Switchboard/ Penn Treebank
syntax nt nite:start, nite:end
 cat =
subcat =
wc (word count)

NP, VP, S, ... (all)
SBJ, MNR, ... (all)

parent of nt, terminals Penn Treebank
movement movement label *, *T*, *EXP*, *ICH* point(source) at nt,terminal,
point(target) at trace
Penn Treebank
turns turn - - parent of nt (S*) Penn Treebank
disfluency disfluency, reparandum, repair - - disfluency parent of reparandum/repair,
reparandum/repair parent of word
Penn Treebank
active active - - point at nt (S*) Edinburgh
markable markable animacy =
status =
statustype =
human, animal, ... (all)
old, med, new, ... (all)
ident, event, ... (all)
point at nt (NP) Edinburgh and Stanford
coreference link - - point(anaphor) at markable (old),
point(antecedent) at markable
Edinburgh
kontrast kontrast type =
level =
contrastive, subset, (all)
word, np
parent of word Edinburgh
trigger trigger - - point(referent) at kontrast,
point(trigger) at kontrast
Edinburgh
dialAct da niteType =
swbdType =
statement, yn_q, ...
sd, qy^t, ... (all)
parent of word Shriberg et al (1998)

Based on the MS-State transcript:

Coding

Elements

Attributes

Values

Relationships

Authors

phonwords phonword, laughter, noise nite:start, nite:end orth
stressProfile =


stress of syls, e.g. np
parent of syllable MS-State
syllables syllable stress = p (primary), s (secondary), n (none) parent of ph Stanford
phones ph nite:start, nite:end text values - Stanford
accents accent nite:start=nite:end
strength =
(Ed/Stan) type = 

full, weak
nuclear, pre-nuclear, plain
point at phonword Ed/Stan and University of Washington
phrases phrase nite:start, nite:end
type =

major, minor, disfluent, backchannel
parent of phonword Ed/Stan and University of Washington
breaks break nite:start=nite:end
UWtime, index =
phraseTone =
boundaryTone =

ToBI: 0-4, p, X
L, H
L, H
point at phonword University of Washington
prosnotes prosnotes time, comment - - Ed/Stan