AMI Corpus - Annotation

Different NXT-formatted annotations are available for select portions of the AMI meeting corpus. (See table for an overview of meeting IDs and the corresponding annotations that are available for the current data release) Annotation schemes were developed by working groups that include people with experience managing such efforts, along with AMI consumers and theorists who specialize in the particular phenomenon being labeled. Once processing is complete, each scheme should have a version controlled coding manual and a document that discusses reliability results. It is expected that all annotations in the corpus will have been brought up to the standard of the final scheme. Reliability documents normally describe early cross-coded data from Hub Set A, the agreement measure used, and where any substantive disagreement lies.

A brief description of each annotation is provided below, along with links to coding manuals, information about reliability testing, and an explanation of how the data are structured. Note that several of the schemes treat the annotation of scenario and non-scenario data differently.

Dialogue Acts
Dialogue act annotations code speaker intentions, and feature the segmentation of transcripts into separate dialogue acts classified as follows: acts about the information exchange, acts about possible actions, comments on previous discussion, social acts, and a special set of tags for labeling non-intentional acts e.g. backchannels, such as um and uh-huh, and attempts to maintain the floor. The latter category plus an additional bucket class for all other intentional acts allow for complete segmentations, with no part of the transcript left unmarked. Annotators were provided with a decision tree to assist in classification, and used an implementation of the NITE XML Toolkit to code transcripts. This task was carried out in conjunction with extractive summarization (see below).

Click to view dialogue acts annotation guidelines.
Reliability test results are not currently available for this scheme.

Topic Segmentation
For the Topic Segmentation scheme, the annotators divided meetings into topic segments using an annotation tool written in NXT and which we first deployed on the ICSI Meeting Corpus. Segmentations feature a shallow hierarchical decomposition into subtopics, not extending beyond two child nodes from the root. Annotators were provided with a list of topic descriptions to choose from. In the event that none was suitable, the annotator created his or her own brief description. Labels fall under three categories: top-level topics, subtopics, and functional descriptions. Examples of functional labels include the Opening of a meeting, Agenda/Equipment Issues, and Chitchat. Annotators were provided with a slightly modified set of guidelines for coding scenario (versus non-scenario) meetings.

Click to view topic segmentation guidelines (for scenario data).

Click to view topic segmentation guidelines (for non-scenario data).

Reliability test results are not currently available for this scheme.

Abstractive and Extractive Summaries
Following topic segmentation of an AMI Corpus meeting, the same annotator was asked to write an abstractive summary of that meeting, later extracting a subset of the meeting's dialogue acts and linking these with sentences in the abstractive summary using a special implementation of the NITE XML Toolkit. Abstractive summaries consist of around 200 words; do not include abbreviations; feature the organizational headings ABSTRACT, DECISIONS, PROBLEMS/ISSUES, and 'ACTIONS'; and present content that can be understood by those not present during the meeting. Extractive summaries constitute roughly 10% of the total transcript, and identify a set of dialogue acts that jointly convey specific information from the abstractive summary, thereby functioning as indices into the meeting's video and audio streams.

Abstractive and extractive summaries were generated for both scenario and non-scenario meetings.

Click to view abstractive hand summaries guidelines (for scenario data).

Click to view abstractive hand summaries guidelines (for non-scenario data).

Click to view extractive hand summaries guidelines.

Reliability test results are not currently available for this scheme.

Named Entities
The named entities scheme was derived from the NIST "1999 Named Entity Recognition Task Definition" manual, and codes entities (people, locations, organizations, artefacts), temporal information (dates, times, durations), and quantities (money, measures, percents, and cardinal numbers). Annotators used an implementation of the NITE XML Toolkit to tag all relevant phrases.

Click to view named entities annotation guidelines; reliability test results.

Individual Actions
Individual actions and gestures were coded using sets of mutually exclusive tags for labeling communicative head, hand, and movement events. Using Event Editor, annotators labeled the three event classes separately, one meeting participant at a time, making appropriate use of the different close-up and room-view videos provided. Annotators were instructed to make use of the accompanying dialogue and any additional cues (linguistic, paralinguistic, or otherwise) for selecting the appropriate tag. As with the other schemes, data are fully annotated so that no part of the video is without a label. Head event tags code communicative intent -- e.g. concord_signal, discord_signal, and deixis_signal -- and feature a subset of formal attributes for indicating whether the movement consisted of a head nod or shake. Hand event tags also reflect communicative intent, and fall under the headings deictic and non-deictic -- e.g. the tags point_a_whiteboard and comm_interact_a, the latter of which signals that an object is in focus. Movement, or leg event, tags, track the activities of meeting participants and their location in the room -- e.g. take_notes, stand_whiteboard, and the default sit label.

Click to view individual actions annotation guidelines.
Reliability test results are not currently available for this scheme.

Person Location
For this task, sequences from multiple video streams were time-stamped for the following: presence status, i.e. whether a participant was in view; occlusion status, indicating whether a participant was in some way covered (e.g. by herself/himself, another participant, or an object in the room); head location, represented by a tightly fitted bounding box which annotators were instructed to draw using the Head Annotation Interface (HAI); face location, represented by a face triad which annotators marked using the Face Annotation Interface (FAI); mouth location, using a 2-D image point component of a Mouth Annotation Interface (MAI); and the location of a participant's hands with a Hands Annotation Interface (HSAI). All annotation tools were developed in Matlab. Separate annotations are available per meeting participant and camera view.

Click to view person location annotation guidelines.
Reliability test results are not currently available for this scheme.

Focus of Attention
Focus of attention coding tracks meeting participants' head orientation and eye gaze. Annotations were performed with Event Editor. Tags refer to the table, whiteboard, slidescreen, and all relevant meeting participant IDs. An additional tag was used for indicating an other category.

Click to view focus of attention annotation guidelines.
Reliability test results are not currently available for this scheme.

More speculative annotations

In addition to these released annotations, we have considered annotation of the following types, and have released the coding instructions we have piloted, but not the annotations themselves.

AmiEmotion
The AmiEmotion subgroup is concerned with the recognition of emotional content in meetings. Two different approaches to the annotation of emotion in meetings have been followed. In the first approach, annotators used a modified version of FEELTRACE to code perceived changes in emotional state for each of the meeting participants. Informational sources include speech prosody, linguistic content, and facial expressions. Emotion labels are mapped to a multi-dimensional emotion space with positive/negative and active/passive classifiers functioning as the respective x and y axes. Labels were tested for appropriateness in a preliminary user study and include: joking, interested, agreeable, contemplative, relaxed, bored, uninterested, disappointed, confused, nervous, and angry. Fusion of acoustic and semantic information was achieved using a Multi-Layer Perceptron feed-forward neural network.

Click to view the feeltrace based AmiEmotion guidelines.

The second approach consists of finding emotionally relevant segments in the data and labeling those with categories as well as an evaluation and intensity value.

Click to view new AmiEmotion guidelines.
Reliability test results are not currently available for this scheme.

Twente Argumentation Structure
The Twente Argument Schema (TAS) coding scheme is a model that formalizes observations related to argumentation patterns in meetings. The resulting annotations reveal information about the trail or path that has been taken along the course of discussions and creates the possibility of preserving the arguments and their coherence relations for future explorations.

Click to view TAS coding guidelines (v0.99).