ICSI corpus download
Use this page to download audio and annotations from the ICSI corpus. The annotations, which include the orthographic transcription, come all together in two zip files: one for core annotations: transcription and dialogue acts, and one containing third party annotations. The audio files are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download.
Annotations, including transcription
Annotations are in NXT format. To use with signals downloaded below, unzip one or both of these files into the 'amicorpus' directory. Requires NXT version 1.4.4.
- ICSI core annotations v1.0 22-July-2016 (19MB): transcripts plus dialogue act coding
- ICSI core plus contributed annotations v1.0 (53MB): all the above plus third-party annotations for topic, hotspot, summarization etc.
- ICSI original MRT format transcripts with documentation (4MB)
To use the signals below with NXT, download the Headset mix files, and unzip them into the directory where your ICSI-metadata.xml file is. Some programs assume the wav files are directly in the Signals directory, so you may need to use soft links to use those programs with audio.
Signals
1) Select one or more ICSI meetings
All of the signals and transcription, and some of the annotations, have been released publicly under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).