Welcome to the AMI Corpus

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. For a gentle introduction to the corpus, see the corpus overview. To access the data, follow the directions given there. Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains. Detailed information can be found in the documentation section.

Synchronised recording devices:

close-talking and far-field microphones, individual and room-view video cameras, projection, a whiteboard, individual pens.

Annotation:

orthographic transcription, annotations for many different phenomena (dialog acts, head movement etc. ).

Although the AMI Meeting Corpus was created for the uses of a consortium that is developing meeting browsing technology, it is designed to be useful for a wide range of research areas. The downloads on this website include videos that are suitable for most purposes, but higher resolution videos are available for researchers engaged in video processing.

All of the signals and transcription, and some of the annotations, have been released publicly under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).