AMI Corpus - Annotation
Different NXT-formatted annotations are available for select portions of the AMI meeting corpus. (See table for an overview of meeting IDs and the corresponding annotations that are available for the current data release) Annotation schemes were developed by working groups that include people with experience managing such efforts, along with AMI consumers and theorists who specialize in the particular phenomenon being labeled. Once processing is complete, each scheme should have a version controlled coding manual and a document that discusses reliability results. It is expected that all annotations in the corpus will have been brought up to the standard of the final scheme. Reliability documents normally describe early cross-coded data from Hub Set A, the agreement measure used, and where any substantive disagreement lies.
A brief description of each annotation is provided below, along with links to coding manuals, information about reliability testing, and an explanation of how the data are structured. Note that several of the schemes treat the annotation of scenario and non-scenario data differently.
Dialogue Acts
Dialogue act annotations code speaker intentions, and feature the
segmentation of transcripts into separate dialogue acts classified as
follows: acts about the information exchange, acts about possible
actions, comments on previous discussion, social acts, and a special set
of tags for labeling non-intentional acts e.g. backchannels, such as um
and uh-huh
, and attempts to maintain the floor. The latter category plus an additional bucket class
for all other intentional acts allow for complete segmentations, with
no part of the transcript left unmarked. Annotators were provided with a
decision tree to assist in classification, and used an implementation
of the NITE XML Toolkit to code transcripts. This task was carried out in conjunction with extractive summarization (see below).
Click to view dialogue acts annotation guidelines.
Reliability test results are not currently available for this scheme.
Topic Segmentation
For the Topic Segmentation scheme, the annotators divided meetings into
topic segments using an annotation tool written in NXT and which we
first deployed on the ICSI Meeting Corpus. Segmentations feature a
shallow hierarchical decomposition into subtopics, not extending beyond
two child nodes from the root. Annotators were provided with a list of
topic descriptions to choose from. In the event that none was suitable,
the annotator created his or her own brief description. Labels fall
under three categories: top-level topics, subtopics, and functional
descriptions. Examples of functional labels include the Opening
of a meeting, Agenda/Equipment Issues
, and Chitchat
. Annotators were provided with a slightly modified set of guidelines for coding scenario (versus non-scenario) meetings.
Click to view topic segmentation guidelines (for scenario data).
Click to view topic segmentation guidelines (for non-scenario data).
Reliability test results are not currently available for this scheme.
Abstractive and Extractive Summaries
Following topic segmentation of an AMI Corpus meeting, the same
annotator was asked to write an abstractive summary of that meeting,
later extracting a subset of the meeting's dialogue acts and linking
these with sentences in the abstractive summary using a special
implementation of the NITE XML Toolkit. Abstractive summaries consist of around 200 words; do not include abbreviations; feature the organizational headings ABSTRACT
, DECISIONS
, PROBLEMS/ISSUES
,
and 'ACTIONS'; and present content that can be understood by those not
present during the meeting. Extractive summaries constitute roughly
10% of the total transcript, and identify a set of dialogue acts that
jointly convey specific information from the abstractive summary,
thereby functioning as indices into the meeting's video and audio
streams.
Abstractive and extractive summaries were generated for both scenario and non-scenario meetings.
Click to view abstractive hand summaries guidelines (for scenario data).
Click to view abstractive hand summaries guidelines (for non-scenario data).
Click to view extractive hand summaries guidelines.
Reliability test results are not currently available for this scheme.
Named Entities
The named entities scheme was derived from the NIST "1999 Named Entity Recognition Task Definition"
manual, and codes entities (people, locations, organizations,
artefacts), temporal information (dates, times, durations), and
quantities (money, measures, percents, and cardinal numbers). Annotators
used an implementation of the NITE XML Toolkit to tag all relevant phrases.
Click to view named entities annotation guidelines; reliability test results.
Individual Actions
Individual actions and gestures were coded using sets of mutually
exclusive tags for labeling communicative head, hand, and movement
events. Using Event Editor,
annotators labeled the three event classes separately, one meeting
participant at a time, making appropriate use of the different close-up
and room-view videos provided. Annotators were instructed to make use of
the accompanying dialogue and any additional cues (linguistic,
paralinguistic, or otherwise) for selecting the appropriate tag. As with
the other schemes, data are fully annotated so that no part of the
video is without a label. Head event tags code communicative intent --
e.g. concord_signal
, discord_signal
, and deixis_signal
-- and feature a subset of formal attributes for indicating whether the
movement consisted of a head nod or shake. Hand event tags also reflect
communicative intent, and fall under the headings deictic and non-deictic -- e.g. the tags point_a_whiteboard
and comm_interact_a
,
the latter of which signals that an object is in focus. Movement, or
leg event, tags, track the activities of meeting participants and their
location in the room -- e.g. take_notes
, stand_whiteboard
, and the default sit
label.
Click to view individual actions annotation guidelines.
Reliability test results are not currently available for this scheme.
Focus of Attention
Focus of attention coding tracks meeting participants' head orientation and eye gaze. Annotations were performed with Event Editor.
Tags refer to the table, whiteboard, slidescreen, and all relevant
meeting participant IDs. An additional tag was used for indicating an
other
category.
Click to view focus of attention annotation guidelines.
Reliability test results are not currently available for this scheme.
Person Location
NB: person location annotation is not available for the AMI corpus
For this task, sequences from multiple video streams were time-stamped for the following: presence status, i.e. whether a participant was in view; occlusion status,
indicating whether a participant was in some way covered (e.g. by
herself/himself, another participant, or an object in the room); head location, represented by a tightly fitted bounding box which annotators were instructed to draw
using the Head Annotation Interface (HAI); the location of a participant's hands with a Hands Annotation Interface (HSAI); face location, represented by a face triad which annotators marked using the Face Annotation Interface (FAI); mouth location, using a 2-D image point component of a Mouth Annotation Interface (MAI). All annotation tools were developed in Matlab. Separate annotations are available per meeting participant and camera view.
Click to view person location annotation guidelines.
Reliability test results are not currently available for this scheme.
More speculative annotations
NB: these more speculative annotations are not generally available for the AMI corpusIn addition to these released annotations, we have considered annotation of the following types, and have released the coding instructions we have piloted, but not the annotations themselves.
AmiEmotion
The AmiEmotion subgroup is concerned with the recognition of emotional
content in meetings. Two different approaches to the annotation of
emotion in meetings have been followed. In the first approach,
annotators used a modified version of FEELTRACE
to code perceived changes in emotional state for each of the meeting
participants. Informational sources include speech prosody, linguistic
content, and facial expressions. Emotion labels are mapped to a
multi-dimensional emotion space
with positive/negative and active/passive classifiers functioning as the respective x and y
axes. Labels were tested for appropriateness in a preliminary user
study and include: joking, interested, agreeable, contemplative,
relaxed, bored, uninterested, disappointed, confused, nervous, and
angry. Fusion of acoustic and semantic information was achieved using a
Multi-Layer Perceptron feed-forward neural network.
Click to view the feeltrace based AmiEmotion guidelines.
The second approach consists of finding emotionally relevant segments
in the data and labeling those with categories as well as an evaluation and intensity value.
Click to view new AmiEmotion guidelines.
Reliability test results are not currently available for this scheme.
Twente Argumentation Structure
The Twente Argument Schema (TAS) coding scheme is a model that formalizes observations related to argumentation
patterns in meetings. The resulting annotations reveal information
about the trail or path that has been taken along the course of discussions and
creates the possibility of preserving the arguments and their coherence relations
for future explorations.
Click to view TAS coding guidelines (v0.99).