CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice
Expressions
R. Hedeshy, R. Menges, and S. Staab. Interspeech 2023, August 20--24, 2023. Dublin, Irland, Dublin, Irland, (2023)
Abstract
Non-verbal voice expressions (NVVEs) have been adopted as a means of human-computer interaction in research studies. However, exploring non-verbal voice-based interactions has been constrained by the limited availability of suitable training data and computational methods for classifying such expressions, leading to a focus on simple binary inputs. We address this issue with a new dataset containing 950 audio samples comprising 6 classes of voice expressions. The data were collected from 42 speakers who donated voice recordings. The classifier was trained on the data using features derived from mel-spectrograms. Furthermore, we studied the effectiveness of data augmentation and improved over the baseline model accuracy significantly with a test accuracy of 96.6% in a 5-fold cross-validation. We have made CNVVE publicly accessible in the hope that it will serve as a benchmark for future research.
%0 Conference Paper
%1 hedeshy2023cnvve
%A Hedeshy, Ramin
%A Menges, Raphael
%A Staab, Steffen
%B Interspeech 2023, August 20--24, 2023. Dublin, Irland
%C Dublin, Irland
%D 2023
%K myown
%T CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice
Expressions
%X Non-verbal voice expressions (NVVEs) have been adopted as a means of human-computer interaction in research studies. However, exploring non-verbal voice-based interactions has been constrained by the limited availability of suitable training data and computational methods for classifying such expressions, leading to a focus on simple binary inputs. We address this issue with a new dataset containing 950 audio samples comprising 6 classes of voice expressions. The data were collected from 42 speakers who donated voice recordings. The classifier was trained on the data using features derived from mel-spectrograms. Furthermore, we studied the effectiveness of data augmentation and improved over the baseline model accuracy significantly with a test accuracy of 96.6% in a 5-fold cross-validation. We have made CNVVE publicly accessible in the hope that it will serve as a benchmark for future research.
@inproceedings{hedeshy2023cnvve,
abstract = {Non-verbal voice expressions (NVVEs) have been adopted as a means of human-computer interaction in research studies. However, exploring non-verbal voice-based interactions has been constrained by the limited availability of suitable training data and computational methods for classifying such expressions, leading to a focus on simple binary inputs. We address this issue with a new dataset containing 950 audio samples comprising 6 classes of voice expressions. The data were collected from 42 speakers who donated voice recordings. The classifier was trained on the data using features derived from mel-spectrograms. Furthermore, we studied the effectiveness of data augmentation and improved over the baseline model accuracy significantly with a test accuracy of 96.6% in a 5-fold cross-validation. We have made CNVVE publicly accessible in the hope that it will serve as a benchmark for future research.},
added-at = {2023-06-09T10:48:42.000+0200},
address = {Dublin, Irland},
author = {Hedeshy, Ramin and Menges, Raphael and Staab, Steffen},
biburl = {https://puma.ub.uni-stuttgart.de/bibtex/2eb2b23b12f68fe90a64960396ff203b2/hedeshy},
booktitle = {Interspeech 2023, August 20--24, 2023. Dublin, Irland},
eventdate = {August 20-24},
eventtitle = {Interspeech 2023},
interhash = {6adf2287678455a9fe702bdad6058b80},
intrahash = {eb2b23b12f68fe90a64960396ff203b2},
keywords = {myown},
timestamp = {2023-06-09T10:53:36.000+0200},
title = {CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice
Expressions},
year = 2023
}