AMI corpus download

Use this page to download signals and annotations from the AMI corpus. The annotations, which include the orthographic transcription, come all together in two zip files: one for manual annotations and one containing automatically derived data. The signals are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download. Full-size videos are not available from this page; for them, you need to order a fire-wire drive distribution of the data using the contact page. See documentation for more information about the annotations and signals.

Annotations, including transcription

Annotations are in NXT format. To use with signals downloaded below, unzip one or both of these files into the 'amicorpus' directory. Requires NXT version 1.4.4.

Other annotations


1) Select one or more AMI meetings

NOTE: For scenario meetings, 1 day-recording session is divided into four [a, b, c, d] 1-hour meetings. Selecting ES2008 meeting session together with 'a' below allows you to get signals for ES2008a meeting.

Scenario Meetings Non Scenario Meetings

a b  
c d  

a b  
c d  

a b  
c d  

2) Select media streams

For detailed information about formats and exact descriptions of each signal type, please visit the AMI corpus documentation.

Media types average size
per meeting

Video related media streams
Low-size DivX AVI videos 400M prefer wget command to get video files.
RealMedia videos 40M can be used with a SMIL file and real audio mix
SMIL file 10K can be used with real videos and audio mix

Audio related media streams
RealMedia audio mix 2M can be played in SMIL with realmedia videos
Headset mix 30M single wav file
Lapel mix 30M single wav file, not available with TS meetings
Individual lapels 120M four individual WAV lapels, not available with TS meetings
Individual headsets 120M four individual WAV headsets
Microphone array 360M audio channels from 2 table top microphone arrays.

Other streams
Slides 7.0M automatically captured projection and text OCR output.
Shared Docs 2.1M shared presentations, docs and minutes
Pen files 1.6M logitech io pen files
Whiteboard files 20M electronic whiteboard output

3) Press the button once your selection is done

All of the signals and transcription, and some of the annotations, have been released publicly under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).