ICSI corpus download

Use this page to download audio and annotations from the ICSI corpus. The annotations, which include the orthographic transcription, come all together in two zip files: one for core annotations: transcription and dialogue acts, and one containing third party annotations. The audio files are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download.

Annotations, including transcription

Annotations are in NXT format. To use with signals downloaded below, unzip one or both of these files into the 'amicorpus' directory. Requires NXT version 1.4.4.

To use the signals below with NXT, download the Headset mix files, and unzip them into the directory where your ICSI-metadata.xml file is. Some programs assume the wav files are directly in the Signals directory, so you may need to use soft links to use those programs with audio.


1) Select one or more ICSI meetings

2) Select Audio streams

For detailed information about formats and exact descriptions of each signal type, please visit the ICSI corpus documentation.

Media types average size
per meeting
Headset mix 120M single wav file
Individual channel headsets 350M individual SPH files: original ICSI release

3) Press the button once your selection is done

All of the signals and transcription, and some of the annotations, have been released publicly under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).