AMI Corpus Meeting Rooms

Description of meeting rooms from AMI Deliverable D2.2

Data for the AMI Meeting Corpus were collected in instrumented meeting rooms constructed at the University of Edinburgh (U.K.), Idiap (Switzerland), and the TNO Human Factors Research Institute (The Netherlands). While the three meeting rooms feature the same types of equipment, there are some minor differences in the ways they were configured. In the following sections, descriptions of the Edinburgh meeting room and the data collected there are presented in detail. Differences between the Edinburgh room and the other two meeting rooms are subsequently outlined.

Edinburgh Room

Audio setup
The room contains 24 microphones from which 24 mono audio channels are recorded directly to hard disk. 16 Sennheiser MK2E-P-C miniature omni-directional electret microphones are arranged in two 10cm radius circular arrays of eight. These are placed in the center of the meeting room table, one between the participants and one at the end of the table closest to the presentation screen and whiteboard. The MK2E-P-C was chosen for its 20Hz-20kHz linear frequency response and its ability to draw phantom power directly from the microphone pre-amplifier. Eight Sennheiser EW300 Series radio microphones are used for recording the four meeting participants. Each person wears an ME 3-N close talking headset condenser mic and an MKE 2-EW omni directional lapel mic -- the wireless equivalent of those used in the microphone arrays. Use of a radio based system allows participants to move freely around the room without diminishing the quality of audio recordings.

Three Focusrite Octopre eight-channel microphone pre-amplifiers with up to 24bit 96kHz analogue-to-digital converters are used to amplify and digitize the microphone outputs. Each channel has a separate class A amplifier with independent gain control. Digitized output is via a single ADAT Lightpipe fiber optic cable carrying all eight channels. The A-to-D converters can sample at a variety of rates either using the Octopre's internal clock or from an external source via a word-clock input. Here the data is captured at a 48kHz, 16bit resolution. The Octopres also provide phantom power for the MK2E-P-C microphones.

The Mark of the Unicorn (MOTU) 2408 MKIII is an audio interface for PC based hard disk audio recording. It consists of a 19-inch rack-mounted I/O unit connected via a Firewire-like interface to a PCI card. The I/O unit supports 24 input/output channels in three banks of eight, with all 24 channels capable of operating simultaneously. Software installed on the PC allows configuration and acquisition of each of the channels via the PCI card. In the meeting room, each of the ADAT Lightpipe outputs from the Octopre A-to-D converters are connected to one bank of a single I/O unit and are subsequently acquired by the PC via the PCI card.

The audio capture computer is a 3GHz P4 with two 40MB SCSI hard drives configured as a RAID 0 array for streaming audio. The operating system used is Windows XP for compatibility with the MOTU driver software. Audio is captured and exported using Cakewalk Sonar recording software.

Video setup
Six cameras are used to record video proceedings. Four Sony XC555 subminiature cameras with 6mm lenses mounted under the central microphone array provide close-up views of each of the meeting participants. Two Sony SSC-DC58AP CCTV cameras, each with a 3.6mm semi-fisheye lense, provide wide-angle views of the room. One is mounted above the center of the table and gives an overhead view of the entire floor area of the room. The other is mounted in the corner of the room and provides a view of the whiteboard and presentation areas.

Six Sony GV-D1000E digital video recorders are used to record the output of the cameras directly to Mini-DV cassettes. Using Mini-DV provides reliable video capture with few errors or dropped frames. It also provides an immediate tape backup of the raw video data.

Synchronization
Special hardware is used to provide synchronization signals. Global time-stamping allows the A-to-D converters in the Octopres to sample each channel at the same time, thereby avoiding a time skew between audio channels. Cameras also acquire frames at the same time, avoiding lags between video channels. The Horita BSG-50 PAL Blackburst Generator generates a composite video timing signal which is used as a reference signal to which all other devices are locked. The signal is fed directly to each of the video cameras to ensure that they sample frames at the exact same time. A further output is connected to a MOTU MIDI Timepiece AV, which generates all other timing signals. The MOTU MIDI timepiece AV (MTP-AV) is capable of locking to and generating a number of different timing signals. In the meeting room, the MTP-AV locks to the Blackburst reference signal and generates a 48kHz word clock for triggering the A-to-D converters in the 3 Octopres. This ensures that each audio channel is sampled at precisely the same instant. The MTP-AV also creates a Longitudinal Time Code (LTC) for each video frame. The LTC is encoded as an 80-bit word (Hours:Minutes:Seconds:Frames) and output as a 2kHz audio signal. In addition, the MTP-AV outputs a MIDI Time Code. This is the LTC output in a format which can be read by MIDI devices. In the meeting room, it is read by the Sonar recording software and used to time-stamp the audio samples.

The Horita AVG-50 time-code inserters translate the 80bit LTC audio signal into a 90bit Vertical Interval Time Code. This 90-bit code is then inserted into the top two lines of each video frame as a series of black and white blocks, which may subsequently be read during video playback. Since this code corresponds directly to the Midi Time Code being used to time stamp the audio recording, precise synchronization of the audio and video signals can be achieved.

Auxiliary data
In addition to audio and video, any auxiliary data generated by the participants during a meeting are recorded. (See Pen, Whiteboard, and Slide Data for more information.)

Idiap Room

The Idiap room has three wide-angle video cameras. One is positioned at the end of the table facing the projector screen. The other two wide-angle cameras are positioned so that they capture pairs of participants seated side-by-side. This is also an 8-microphone 10 cm-radius uniform circular microphone array. The second Idiap circular microphone array has only four elements and is mounted on the ceiling rather than on the table. A binaural manikin is placed at the end of the table furthest from the screen providing two additional audio channels.

TNO Room

The TNO room contains a single eight-element circular microphone array mounted in the center of the table and a second 10-element linear array mounted above the presentation screen. Each participant has a headset mounted radio microphone, but no lapel mic. The TNO room also has two wide-angle cameras, one mounted above and behind the table, and one to the left hand side of the room, angled across the table. The TNO audio recording and synchronization hardware is identical to that installed in the Edinburgh Room. However, a different approach is used for the video processing. Three Windows XP computers, each fitted with two Osprey 210 video capture cards are used to capture and encode the video data and stream it directly to hard disk, rather than recording to digital video tapes.

  • link to TNO room schematic to be set, with microphone array orientation.