AMI Corpus - Data Problems

Issues concerning the collection and pre-processing of audio, video, and auxiliary data, and resolution status

As indicated in the table below, some of the recordings are affected by signal discontinuities. In most cases, this was due to a sudden failure by the audio recording equipment. For the purpose of generating transcriptions and annotated data for these files, the signals were concatenated with zero padding (performed on 48kHz files) in correspondence with the length of the discontinuity. Frame dropping also occurred with some of the video signals collected in Edinburgh and Idiap when encoding them from DV tape to DivX. These have undergone additional processing and are now synchronized with the audio signals. Due to hardware issues during the recording stage, video signals collected at TNO are not perfectly synchronized. However, data for these have undergone an additional processing step (see Meeting Rooms), and now exhibit acceptable levels of synchronization.

Other data collection issues relate to seat swapping by meeting participants. In a few cases, this has resulted in the mislabeling of participant IDs and thumbnails. A very small number of meetings also feature an inappropriate positioning of headset microphones. In one meeting, a participant removes his headset altogether. These and other data collection and pre-processing issues are presented in the following table, along with details of whether and, where relevant, how the issue was resolved.

Other general things to note are that some Edinburgh non-scenario meetings last for longer than 1 hour, even though the video files for those meetings are exactly 1 hour long as that was the length of the tapes used (see table for details). Also, TNO recorded scenario meetings 3 and 4 as one meeting, together with the discussion between the two designers that occured between the two meetings; in the corpus as distributed the recording has been split into its component parts. Finally, for all Edinburgh meetings collected prior to May 2005 note that the Logitech pens were not synchronized before the start of recording. This means that we will need human intervention to establish the relationship between pen time and the standardized timecode.

More Information on Synchronization and drop-outs in TNO data

Due to a faulty hardware configuration, acquisition on video devices was not properly synchronised and the timecode insertion did not work correctly. TNO have worked hard to improve the data as much as possible - by laboriously recovering synchronisation between each video file after detailed analysis of skew and drift. Having done this, the video-video and video-audio synchronisation (there is no problem with audio-audio synch) is now within reasonable bounds for a human observer looking for lip synch. Multiple people have checked a portion of the data and human agreement has been found to be within a bound of ~100ms (~5 frames). In addition, the hardware setup has been rectified and tested ensuring that the problem will not occur in any future data recorded at TNO.

We have therefore approved this data for inclusion in the AMI Hub Corpus, as we feel that this level of synchronisation will in practice be sufficient for AMI research tasks. This and any other problems with data in the AMI corpus (e.g. location of dropped frames) will be noted in metadata in the eventual corpus.

While all research tasks will be encouraged to use AMI data from the 3 sites (TNO, Idiap, UEDIN), if certain researchers really find they require stricter synchronisation, such as maybe emotion recognition or audio-visual speech recognition, then they can define evaluation protocols using only the UEDIN & Idiap portion of the data (~65 hours).

If any researcher foresees a major problem in using the TNO data due to the looser synchronisation, please email the WP2 list with details of your task and your rationale.


* all meetings Logitech I/O digital pen output not synchronized with rest of data N pens' internal clocks do not drift by more than a few seconds during each meeting, providing sufficiently accurate calibration
T* meetings participant data not gathered N Language skills and other data of the kind stored in XML format at corpusResources/participants.xml was not gathered for TNO-recorded meetings.
TS* meetings video signals not perfectly synchronized Y manual processing performed to reach an acceptable level of synchronization quality; see Meeting Rooms
E and I meetings some frame drops in video signals when encoding from DV tape to DivX Y video now synchronized with audio signals
E* meetings RM RealAudio gain level is low and probably needs amplifying N
JPEG captured slides are numerous N slide change detection mechanism probably too sensitive
shared files in other/ directories copied every four meetings

pen stroke files recorded at end of each trial instead of after every meeting (scenario meetings only) N
ES2002a Kick-off ppt presentation not stored and only two different slides captured from presentation N The two captured slides are the last two from the standard kick-off meeting presentation, and it's clear from watching that the rest of the presentation was shown even if it has not been captured.
ES2002a,b,c participant 1 not wearing headset mic properly N/A lapel mic files okay for this participant
ES2002b audio dropout; ME (cam3) and UI (cam2) switch places at 00:11:00; ID (cam 1) takes seat off camera (at projector) for remainder of meeting at 00:22:37 Y single concatenated file generated, zero-padded from 01:04:04 to 01:40:16
ES2002c ME (cam2) and UI (cam3) switch places at 00:10:20; ID (cam1) and UI (cam2) switch places at 00:14:52 N/A closeup videos inappropriately labeled for remaining 25 mins
ES2004d 2 audio dropouts; different audio file sizes Y section 1, channel 24 padded at end by 512 samples; section 2, channel 19 padded at end by 32768 samples, channels 20-24 padded at end by 512 samples; section 3 channel 19 padded at end by 32768 samples, channel 20-24 padded at end by 512 samples --- audio dropout1 zero-padded from 21:52:03 to 22:50:19; dropout2 zero-padded from 31:02:23 to 31:34:02
ES2005a start of meeting not recorded; captured material lasts around 8 minutes N
ES2006b cam2 delayed N
ES2006d different audio file sizes Y channels 22,23 and 24 zero-padded with 512 blank audio samples
ES2006d artefacts (coloured blocks) appear on Closeup1 video 0-450s N
ES2008a participant loses lapel mic N/A headset okay
ES2008b all video files end at about 34min45sec while audio goes till the end (37min11sec) N audio video in sync
ES2008c audio dropout; different audio file sizes Y single concatenated file generated, zero-padded from 24:14:01 to 24:27:01
ES2009a encoding problem with cam4 N
ES2010d audio dropout; different audio file sizes Y dropout1 zero-padded from 10:56:17 to 11:05:14; second section, channel 22 zero-padded at end by 32768 samples, and channels 23 and 24 zero-padded at end by 512 samples; dropout2 zero-padded from 13:50:02 to 14:04:00
ES2012* audio quality is very weak N
ES2012b audio dropout Y single concatenated file generated, zero-padded from 23:25:15 to 23:34:19
ES2012c 2 audio dropouts Y single concatenated file generated, zero-padded from 12:16:14 to 13:48:10, and 16:32:23 to 16:44:15
ES2013b participants move microphone array B at 00:03:53 N it remains mispositioned until the end of the meeting
ES2016a audio dropout; different audio file sizes Y first section, channel 23 padded at end by 32768 samples, channel 24 padded at end by 512 samples; dropout zero-padded from 22:18:18 to 22:28:17
EN2001a,e; EN2005a; EN2009d Video files shorter than audio files N Video tapes were 60 minutes long so for these 4 meetings which are longer than 1 hour, the videos do not contain the full meeting. Audio files are the correct length.
EN2001a audio dropout; 5th person off-camera Y audio zero-padded from 00:03:17:24-00:03:30:07; 5th person sitting next to camera 2
EN2001d,e 5th person off-camera N/A 5th person sitting next to camera 2
EN2005a audio dropout Y audio zero-padded from 01:09:11:24 - 01:09:25:01
EN2006b 2 audio dropouts; participants not properly seated Y/N audio zero-padded from 00:26:21:18 - 00:26:38:10 and from 00:26:52:18 - 00:27:07:17; all participants clustered around whiteboard side of table
IB4002 Channel mapping for audio is wrong (to be fixed in any release post July 2021). N Channel mapping should be A->3; B->2; C->0; D->1
Many thanks to Naoyuki Kanda of Microsoft for describing this error
IN1014 Recording stops before the end of the meeting. N/A
IS1000a PM and UI remove microphones for part of meeting N/A
IS1000b PM and UI swap seats N/A
IS1000c audio dropout N videos un-synched with audio after 23:05; no audio dropout timing information available
IS1001c,d ID and UI swap seats before meeting N/A
IS1002* incomplete trial due to dropout in IS1002a N
IS1003b no mic array audio
IS1003c,d ID and UI swap seats before meetings N/A correct map in metadata xml files
IS1005* incomplete trial due to dropout in IS1005d N
IS1007d no mic array audio N lost
TS3003d No closeup 1, 2, 3 N -
TS3011d missing beginning of meeting due to dropout N audio begins at 00:03:55.66
TS3012c missing end of meeting due to dropout N audio ends at 00:39:36.5