Speech-driven 3D facial motions describe dynamic 3D human faces while speaking. The behavior is repeatable and person-specific and thus is promising for many applications, e.g person recognition, lip language analysis, etc. This database focuses on dynamic human faces when the subjects are speaking a short phrase. The database collects 1030 samples consisting of two parts: Speaking with Frontal Pose (S3DFM-FP) and Speaking with Varying Pose (S3DFM-VP). There are 770 samples from 77 participants in the FP sub-dataset and 260 samples from 26 participants in the VP sub-dataset. The participants have different ages, genders, ethnicities, and mother-tongues.
A high-frame-rate (500FPS) 3D video sensor from DI4D Ltd was used for capturing data. The sensor is a binocular stereo vision system mainly consisting of two intensity cameras. Each participant was asked to repeat a short phrase -- ni'hao (a Chinese word, means 'Hello') 10 times when looking naturally straight at the cameras. For each repetition, we captured a video sequence using the sensor and a synchronized audio sequence via a microphone. In the capture of speaking face with varying pose, the participant repeated the same phrase but with the head naturally moving.
The 3D reconstruction of each video sequence was done using DI4D's commercial software with additionally spatial smoothing and temporal filtering. Each sample contains a depth/3D sequence and a pixel-wise registered intensity sequence, plus a short 'passphrase' (the synchronized audio sequence). Each video sequence contains 500 frames and each audio sequence also covers 1 second with a sampling frequency of 44.1 kHZ. The resolutions of the depth/3D and intensity images are 600*600 points each. (The original video sequence was downsampled from their original resolution of 1200*1200 pixels to improve the processing efficiency and to reduce the 3D noise)
Overall, the database contains 2 parts: Frontal Pose (S3DFM-FP), Varying Pose (S3DFM-VP).
In the S3DFM-FP, there are 770 samples with
In the S3DFM-VP, there are 260 samples with
We present the cosine shaded depth data from two participants as examples, their registered intensity frames (frame #: 50, 150, 300, 450) from a video sequence, and the synchronized audio sequence, as shown in Fig.1.
The mouth is the principal dynamic region of a speaking face. We represent a 3D mouth region via the mouth width and opening. The change of the 3D mouth region from a participant and its repeatability from 10 sequences are shown in Fig.2.
The database is freely available for use by other researchers or parties, under CC-BY-NC-ND license terms. Note that the database can only be used for academic research. If you use the data in a publication, please cite:
Each file listed below contains 10 sequences (3D & intensity & audio) from a participant. You could download them by clicking on a file and then unzipping them individually.
seq1_050.mat
) contains 2 arrays:
Img(600,600)
and XYZ(600,600,3)
. These are single precision files.
Img
is the infrared intensity image at the corresponding frame.
XYZ
is the (x,y,z) point computed for the corresponding pixel.demographics(1:77,1).Subject
- subject identifier (1..77)demographics(1:77,2).Age
- age category. 1-Youth, 2-Middle age; 3-Senior (minimum age is 16, maximum age is 73)demographics(1:77,3).Gender
- 1-Female; 2-Male. (27 females, 50 males)demographics(1:77,4).Nationality
- 0-Unknown; 1-North American; 2-South American; 3-African; 4-European; 5-East Asian; 6-South/Southeast Asian.
Participant 1: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 2: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 3: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 4: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 5: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 6: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 7: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 8: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 9: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 10: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 11: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 12: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 13: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 14: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 15: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 16: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 17: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 18: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 19: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 20: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 21: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 22: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 23: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 24: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 25: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 26: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 27: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 28: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 29: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 30: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 31: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 32: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 33: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 34: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 35: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 36: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 37: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 38: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 39: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 40: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 41: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 42: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 43: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 44: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 45: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 46: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 47: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 48: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 49: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 50: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 51: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 52: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 53: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 54: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 55: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 56: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 57: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 58: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 59: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 60: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 61: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 62: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 63: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 64: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 65: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 66: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 67: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 68: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 69: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 70: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 71: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 72: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 73: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 74: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 75: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 76: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 77: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
The video sequences above recorded the participants while they were facing forward and were essentially static, except for their speaking. We recorded an additional 10 videos where the 26 participants are moving their heads while speaking the same passphrase.
Participant 1: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 2: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 3: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 4: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 5: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 6: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 7: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 8: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 9: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 10: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 11: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 12: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 13: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 14: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 15: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 16: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 17: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 18: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 19: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 20: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 21: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 22: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 23: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 24: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 25: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Participant 26: Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10
Note: The participants in the S3DFM-VP are the same as some in the S3DFM-FP. The identity number correspondences are given as follows for possible linking purposes.
S3DFM-VP | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S3DFM-FP | 1 | 60 | 43 | 59 | 11 | 2 | 18 | 7 | 49 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 |
The database was established by Jie Zhang as part of her PhD research while she was a visiting PhD student at the University of Edinburgh (UoE). Jie Zhang was with Beihang University and UoE. Robert B. Fisher and Luis Horna are with UoE.
You might be interested in these related papers:
If you have any questions, please don't hesitate to contact us.
This research was supported by the funding from China Scholarship Council (CSC) under grant 201606020087 and National Council for Science and Technology (CONACyT) of Mexico. We would like to thank all the participants in the data acquisition and the support from DI4D Ltd.