The dataset consists of a set of videos of students who are watching educational videos. It includes annotations of the students' level of intellectual challenge as they watch the educational videos. The original goal of the data collection was to provide data for developing an automatic feedback tool for the people delivering lectures using a video delivery mechanism (either pre-recorded or live streamed).
The problem that motivated this research is: How can an instructor get feedback on the effectiveness of a presentation at various points in the presentation, when all of the students are remote? This contrasts with the traditional approach where the instructor can see the students and assess their facial expressions. This motivates the PUZZLED research project, which is investigating if some feedback signal can be extracted by watching the facial expressions of the students as they watch the videos on their laptop, while being observed using the laptop's webcam.
An example frame from the educational video that the students are watching is on the left below, and an example frame from the video of the student watching the educational video is on the right.
The original 9:35 video (83 Mb) that the students are watching can be downloaded here.
![]() |
A snapshot of the schema of the view that the students see when watching and self-annotating is on the left below, and a snapshot of the facial feature analysis results is on the right
![]() |
![]() |
The dataset consists of 10 videos of student heads, each about 10 minutes long, of the volunteers watching the same educational video. We attempted to balance the genders and ethnicities of the volunteers so as to provide a variety of viewing styles and skin tones; however, we were limited to a small pool of volunteers, so had little flexibility with balancing their characteristics. Ethical permission was given to record and distribute this data. All volunteers agreed to allowing these videos to be distributed.
Associated with each video is a second-by-second annotation of the student's degree of engagement. The annotations were produced manually by: 1) the student, and 2) by three researchers.
The engagement annotations have 4 values:
There were few instances of the Bored label.
Note: for these videos, the students produced the labels at the same time as they watched
the videos. This required the students to shift their gaze from the video to a labeling panel.
The CSV annotation files consist of a sequence of lines of the form: VIDEO_ID,ANNOTATOR_ID,START_LABEL,LABEL. The VIDEO_ID is the same as the CSV filename. The ANNOTATOR_ID is 0, 1, 2, 3, where 0 is the student and 1-3 are by the 3 researchers. START_LABEL indicates the time in seconds when the annotation LABEL starts, and all subsequent times have the same label until the next line in the CSV file. It is assumed that the video starts with an 'OK' level of engagement.
The table below links to the videos and CSV files. The ethnicities were self-declared as C:Caucasian, SA: South Asian, EA: East Asian, and genders self declared as F:female and M:male.
| Thumbnail | Video | CSV | Video Size (Mb) | Est. FPS | Gender | Ethnicity |
|---|---|---|---|---|---|---|
| 163904405748.webm | 163904405748.csv | 220 | 30 | F | SA | |
| 163904657409.webm | 163904657409.csv | 206 | 26 | M | EA | |
| 163949565032.webm | 163949565032.csv | 335 | 11.8 | F | C | |
| 163965258378.webm | 163965258378.csv | 193 | 30 | F | C | |
| 163974758282.webm | 163974758282.csv | 202 | 24 | F | SA | |
| 16400282145.webm | 16400282145.csv | 164 | 12.5 | F | C | |
| 164002054426.webm | 164002054426.csv | 188 | 30 | M | C | |
| 164006412052.webm | 164006412052.csv | 234 | 29 | F | EA | |
| 164007913498.webm | 164007913498.csv | 215 | 30 | M | C | |
| 164008493608.webm | 164008493608.csv | 177 | 30 | F | EA |
The data is freely available for research use. Any publications or public display of images or videos based on the data must cite:
A. Linson, Y. Xu, A. R. English, R. B. Fisher; Identifying student struggle by analyzing facial expressions during asynchronous video lecture viewing: Towards an automated tool to support instructors, Proc. 23rd Int. Conf. on Artificial Intelligence in Education, Durham, 2022.
Funding for the data collection was by the University of Edinburgh Regional Skills program. Ethics approval for the data collection and dissemination was given by the School of Informatics, University of Edinburgh.
© 2022 Robert Fisher