AMI Corpus - Data Problems
Issues concerning the collection and pre-processing of audio, video, and auxiliary data, and resolution status
As indicated in the table below, some of the recordings are affected by signal discontinuities. In most cases, this was due to a sudden failure by the audio recording equipment. For the purpose of generating transcriptions and annotated data for these files, the signals were concatenated with zero padding (performed on 48kHz files) in correspondence with the length of the discontinuity. Frame dropping also occurred with some of the video signals collected in Edinburgh and Idiap when encoding them from DV tape to DivX. These have undergone additional processing and are now synchronized with the audio signals. Due to hardware issues during the recording stage, video signals collected at TNO are not perfectly synchronized. However, data for these have undergone an additional processing step (see Meeting Rooms), and now exhibit acceptable levels of synchronization.
Other data collection issues relate to seat swapping by meeting participants. In a few cases, this has resulted in the mislabeling of participant IDs and thumbnails. A very small number of meetings also feature an inappropriate positioning of headset microphones. In one meeting, a participant removes his headset altogether. These and other data collection and pre-processing issues are presented in the following table, along with details of whether and, where relevant, how the issue was resolved.
Other general things to note are that some Edinburgh non-scenario meetings last for longer than 1 hour, even though the video files for those meetings are exactly 1 hour long as that was the length of the tapes used (see table for details). Also, TNO recorded scenario meetings 3 and 4 as one meeting, together with the discussion between the two designers that occured between the two meetings; in the corpus as distributed the recording has been split into its component parts. Finally, for all Edinburgh meetings collected prior to May 2005 note that the Logitech pens were not synchronized before the start of recording. This means that we will need human intervention to establish the relationship between pen time and the standardized timecode.
More Information on Synchronization and drop-outs in TNO data
Due to a faulty hardware configuration, acquisition on video devices was not properly synchronised and the timecode insertion did not work correctly. TNO have worked hard to improve the data as much as possible - by laboriously recovering synchronisation between each video file after detailed analysis of skew and drift. Having done this, the video-video and video-audio synchronisation (there is no problem with audio-audio synch) is now within reasonable bounds for a human observer looking for lip synch. Multiple people have checked a portion of the data and human agreement has been found to be within a bound of ~100ms (~5 frames). In addition, the hardware setup has been rectified and tested ensuring that the problem will not occur in any future data recorded at TNO.
We have therefore approved this data for inclusion in the AMI Hub Corpus, as we feel that this level of synchronisation will in practice be sufficient for AMI research tasks. This and any other problems with data in the AMI corpus (e.g. location of dropped frames) will be noted in metadata in the eventual corpus.
While all research tasks will be encouraged to use AMI data from the 3 sites (TNO, Idiap, UEDIN), if certain researchers really find they require stricter synchronisation, such as maybe emotion recognition or audio-visual speech recognition, then they can define evaluation protocols using only the UEDIN & Idiap portion of the data (~65 hours).
If any researcher foresees a major problem in using the TNO data due to the looser synchronisation, please email the WP2 list with details of your task and your rationale.
|MEETING ID||PROBLEM||RESOLVED (Y/N)||NOTES|
|* all meetings||Logitech I/O digital pen output not synchronized with rest of data||N||pens' internal clocks do not drift by more than a few seconds during each meeting, providing sufficiently accurate calibration|
|T* meetings||participant data not gathered||N||Language skills and other data of the kind stored in XML format at corpusResources/participants.xml was not gathered for TNO-recorded meetings.|
|TS* meetings||video signals not perfectly synchronized||Y||manual processing performed to reach an acceptable level of synchronization quality; see Meeting Rooms|
|E and I meetings||some frame drops in video signals when encoding from DV tape to DivX||Y||video now synchronized with audio signals|
|E* meetings||RM RealAudio gain level is low and probably needs amplifying||N|
|JPEG captured slides are numerous||N||slide change detection mechanism probably too sensitive|
|shared files in other/ directories copied every four meetings|
|pen stroke files recorded at end of each trial instead of after every meeting (scenario meetings only)||N|
|ES2002a||Kick-off ppt presentation not stored and only two different slides captured from presentation||N||The two captured slides are the last two from the standard kick-off meeting presentation, and it's clear from watching that the rest of the presentation was shown even if it has not been captured.|
|ES2002a,b,c||participant 1 not wearing headset mic properly||N/A||lapel mic files okay for this participant|
|ES2002b||audio dropout; ME (cam3) and UI (cam2) switch places at 00:11:00; ID (cam 1) takes seat off camera (at projector) for remainder of meeting at 00:22:37||Y||single concatenated file generated, zero-padded from 01:04:04 to 01:40:16|
|ES2002c||ME (cam2) and UI (cam3) switch places at 00:10:20; ID (cam1) and UI (cam2) switch places at 00:14:52||N/A||closeup videos inappropriately labeled for remaining 25 mins|
|ES2004d||2 audio dropouts; different audio file sizes||Y||section 1, channel 24 padded at end by 512 samples; section 2, channel 19 padded at end by 32768 samples, channels 20-24 padded at end by 512 samples; section 3 channel 19 padded at end by 32768 samples, channel 20-24 padded at end by 512 samples --- audio dropout1 zero-padded from 21:52:03 to 22:50:19; dropout2 zero-padded from 31:02:23 to 31:34:02|
|ES2005a||start of meeting not recorded; captured material lasts around 8 minutes||N|
|ES2006d||different audio file sizes||Y||channels 22,23 and 24 zero-padded with 512 blank audio samples|
|ES2006d||artefacts (coloured blocks) appear on Closeup1 video 0-450s||N|
|ES2008a||participant loses lapel mic||N/A||headset okay|
|ES2008b||all video files end at about 34min45sec while audio goes till the end (37min11sec)||N||audio video in sync|
|ES2008c||audio dropout; different audio file sizes||Y||single concatenated file generated, zero-padded from 24:14:01 to 24:27:01|
|ES2009a||encoding problem with cam4||N|
|ES2010d||audio dropout; different audio file sizes||Y||dropout1 zero-padded from 10:56:17 to 11:05:14; second section, channel 22 zero-padded at end by 32768 samples, and channels 23 and 24 zero-padded at end by 512 samples; dropout2 zero-padded from 13:50:02 to 14:04:00|
|ES2012*||audio quality is very weak||N|
|ES2012b||audio dropout||Y||single concatenated file generated, zero-padded from 23:25:15 to 23:34:19|
|ES2012c||2 audio dropouts||Y||single concatenated file generated, zero-padded from 12:16:14 to 13:48:10, and 16:32:23 to 16:44:15|
|ES2013b||participants move microphone array B at 00:03:53||N||it remains mispositioned until the end of the meeting|
|ES2016a||audio dropout; different audio file sizes||Y||first section, channel 23 padded at end by 32768 samples, channel 24 padded at end by 512 samples; dropout zero-padded from 22:18:18 to 22:28:17|
|EN2001a,e; EN2005a; EN2009d||Video files shorter than audio files||N||Video tapes were 60 minutes long so for these 4 meetings which are longer than 1 hour, the videos do not contain the full meeting. Audio files are the correct length.|
|EN2001a||audio dropout; 5th person off-camera||Y||audio zero-padded from 00:03:17:24-00:03:30:07; 5th person sitting next to camera 2|
|EN2001d,e||5th person off-camera||N/A||5th person sitting next to camera 2|
|EN2005a||audio dropout||Y||audio zero-padded from 01:09:11:24 - 01:09:25:01|
|EN2006b||2 audio dropouts; participants not properly seated||Y/N||audio zero-padded from 00:26:21:18 - 00:26:38:10 and from 00:26:52:18 - 00:27:07:17; all participants clustered around whiteboard side of table|
|IB4002||Channel mapping for audio is wrong (to be fixed in any release post July 2021).||N||Channel mapping should be A->3; B->2; C->0; D->1
Many thanks to Naoyuki Kanda of Microsoft for describing this error
|IN1014||Recording stops before the end of the meeting.||N/A|
|IS1000a||PM and UI remove microphones for part of meeting||N/A|
|IS1000b||PM and UI swap seats||N/A|
|IS1000c||audio dropout||N||videos un-synched with audio after 23:05; no audio dropout timing information available|
|IS1001c,d||ID and UI swap seats before meeting||N/A|
|IS1002*||incomplete trial due to dropout in IS1002a||N|
|IS1003b||no mic array audio
|IS1003c,d||ID and UI swap seats before meetings||N/A||correct map in metadata xml files|
|IS1005*||incomplete trial due to dropout in IS1005d||N|
|IS1007d||no mic array audio||N||lost|
|TS3003d||No closeup 1, 2, 3||N||-|
|TS3011d||missing beginning of meeting due to dropout||N||audio begins at 00:03:55.66|
|TS3012c||missing end of meeting due to dropout||N||audio ends at 00:39:36.5|