|
Below
is a summary of the data representation of the corpus within the XML structure. It is
intended as a reference guide when constructing queries.
Annotation layers are grouped according to the version of
the Switchboard transcript they used. Follow the links for full lists
of values of certain attributes, as well as full information about
annotations provided by the authors of each layer.
Note that not all coding layers are available for all conversations in the NXT release. See here for a summary of annotation coverage by conversation.
The tables list: the name of the codings file for each layer of
annotation; the possible elements within each coding layer; the
attributes of each element (note not all elements take all
attributes, see the metadata file for more details); the values, or
examples of values, for each attribute; the relationships described for
those elements with other layers of annotation; and the authors of each
layer
of annotation. Note only parent
of and points
at relationships are shown, where the pointer
relationship is named, this is given in brackets.
Based on the Penn Treebank
transcript:
Coding
|
Elements
|
Attributes
|
Values
|
Relationships
|
Authors
|
terminals |
word, punc, sil, trace |
nite:start, nite:end orth, pos = |
VB, NN, ... (all) |
point(phon)
at
phonword |
Switchboard/ Penn Treebank |
syntax |
nt |
nite:start, nite:end
cat =
subcat =
wc (word count) |
NP, VP, S, ... (all)
SBJ, MNR, ... (all)
|
parent
of nt, terminals |
Penn Treebank |
movement |
movement |
label |
*, *T*, *EXP*, *ICH* |
point(source)
at
nt,terminal,
point(target)
at
trace |
Penn Treebank |
turns |
turn |
- |
- |
parent
of nt (S*) |
Penn Treebank |
disfluency |
disfluency, reparandum, repair |
- |
- |
disfluency parent of
reparandum/repair,
reparandum/repair parent
of word |
Penn Treebank |
active |
active |
- |
- |
point
at nt (S*) |
Edinburgh |
markable |
markable |
animacy =
status =
statustype = |
human, animal, ... (all)
old, med, new, ... (all)
ident, event, ... (all)
|
point
at nt (NP) |
Edinburgh and Stanford |
coreference |
link |
- |
- |
point(anaphor)
at
markable (old),
point(antecedent)
at
markable |
Edinburgh |
kontrast |
kontrast |
type =
level = |
contrastive, subset, (all)
word, np |
parent
of word |
Edinburgh |
trigger |
trigger |
- |
- |
point(referent)
at
kontrast,
point(trigger)
at
kontrast |
Edinburgh |
dialAct |
da |
niteType =
swbdType = |
statement, yn_q, ...
sd, qy^t, ... (all) |
parent
of word |
Shriberg et al (1998) |
Based on the MS-State
transcript:
Coding
|
Elements
|
Attributes
|
Values
|
Relationships
|
Authors
|
phonwords |
phonword, laughter, noise |
nite:start, nite:end orth
stressProfile = |
stress of syls, e.g. np |
parent
of syllable |
MS-State |
syllables |
syllable |
stress = |
p (primary), s (secondary), n (none) |
parent
of ph |
Stanford |
phones |
ph |
nite:start, nite:end |
text values |
- |
Stanford |
accents |
accent |
nite:start=nite:end
strength =
(Ed/Stan) type = |
full, weak
nuclear, pre-nuclear, plain |
point
at phonword |
Ed/Stan and University of Washington |
phrases |
phrase |
nite:start, nite:end
type =
|
major, minor, disfluent, backchannel |
parent
of phonword |
Ed/Stan and University of Washington |
breaks |
break |
nite:start=nite:end
UWtime, index =
phraseTone =
boundaryTone = |
ToBI: 0-4, p, X
L, H
L, H |
point
at phonword |
University of Washington |
prosnotes |
prosnotes |
time, comment |
- |
- |
Ed/Stan |
|