Misc,

DEyeAdicContact

.
Dataset, (2022)Related to: Müller, Philipp, Ekta Sood, and Andreas Bulling. 2020. Anticipating Averted Gaze in Dyadic Interactions. In Proceedings of the ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 1-10. doi: 10.1145/3379155.3391332.
DOI: 10.18419/darus-3289

Abstract

We created our own dataset of natural dyadic interactions with fine-grained eye contact annotations using videos of dyadic interviews published on YouTube. Especially compared to lab-based recordings, these Youtube interviews allow us to analyse behaviour in a natural situation. All interviews were conducted via video conferencing and provide frontal views of interviewer and interviewee side-by-side. Specifically, we downloaded videos from the YouTube channels “Wisdom From North” and “The Spa Dr.” that both provide a large number of interviews, often with a high video quality. Each channel features a single host interviewing different guests in each session. We manually selected videos with high video quality, resulting in 60 videos for “The Spa Dr.” and 61 videos for “Wisdom From North”. All videos are recorded at a frame rate between 24 and 30 fps and vary in length from 17 minutes to 58 minutes (average: 37 minutes). In total the videos contain 74 hours of conversations, amounting to 7,817,821 video frames.We instructed five human annotators to classify the gaze of interviewer and interviewee (in the following referred to as “subjects”). Even though in this study we were only interested in a binary classification of averted gaze versus eye contact, a more fine-grained distinction of averted gaze might prove beneficial for future research. To this end we used in total 11 mutually exclusive classes during annotation. Annotators were asked to select the class “eye contact” if the subject was looking at the location of the other person on her screen or the camera from which she was recorded. We found that annotators were able to reliably determine the placements of camera and screen by skimming through the video prior to starting the annotation. If there was no eye contact, annotators classified whether the subject gazed “up”,“down”, “left”, “right”, or to the “upper left”, “lower left”, ”upper right” or “lower right”. In the following, we refer to the union of these classes as the “no eye contact class”. A separate class was dedicated to blinks, while yet another class indicated instances in which annotators were unsure about how to decide, e.g. as a result of low image quality. As annotators worked on disjoint sets of videos, one of the authors was present throughout the first sessions in order to ensure consistency.To strike a good balance between sufficient coverage and annotation effort, we collected these annotations on a frame-by-frame basis every 30 seconds for the Wisdom From North interviews, and every 15 seconds for The Spa Dr. interviews. We collected annotations for The Spa Dr. on a finer timescale given that the host of that channel almost always keeps eye contact with her interviewees. A coarser time scale would have increased the risk of missing the no eye contact classes in the annotation. In total, we collected 23,131 annotated video frames of which 83% were labelled as "eye contact".

Tags

Users

  • @unibiblio

Comments and Reviews