Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA

E. Sood, F. Kögel, F. Strohm, P. Dhar, und A. Bulling. Proc. ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL), Seite 27--43. Association for Computational Linguistics, (November 2021)spotlight.
DOI: 10.18653/v1/2021.conll-1.3

Zusammenfassung

We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker. We use our dataset to analyze the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modulated Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorized Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show - for the first time - that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.

Links und Ressourcen

BibTeX-Schlüssel: sood21_conll
Eintragstyp: inproceedings
Buchtitel: Proc. ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL)
Jahr: 2021
Monat: November
Seiten: 27--43
Verlag: Association for Computational Linguistics
code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/vqa-mhug-interpretability
award: Oral presentation
dataset: https://perceptualui.org/research/datasets/VQA-MHUG/
DOI: 10.18653/v1/2021.conll-1.3
Hinweis: spotlight

@hermanns Tags hervorgehoben

Zitieren Sie diese Publikation

@inproceedings{sood21_conll, abstract = {We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker. We use our dataset to analyze the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modulated Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorized Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show - for the first time - that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.}, added-at = {2025-02-17T14:54:07.000+0100}, author = {Sood, Ekta and Kögel, Fabian and Strohm, Florian and Dhar, Prajit and Bulling, Andreas}, award = {Oral presentation}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/22332687c0dcf57f8a4e6bfc4adde675c/hermann}, booktitle = {Proc. ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL)}, code = {https://git.hcics.simtech.uni-stuttgart.de/public-projects/vqa-mhug-interpretability}, dataset = {https://perceptualui.org/research/datasets/VQA-MHUG/}, doi = {10.18653/v1/2021.conll-1.3}, interhash = {1d3abae0619209fa4857a347bb8db61c}, intrahash = {2332687c0dcf57f8a4e6bfc4adde675c}, keywords = {exc2075 pn7 pn7-5 updated}, month = {November}, note = {spotlight}, pages = {27--43}, publisher = {Association for Computational Linguistics}, timestamp = {2025-02-17T14:54:07.000+0100}, title = {VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA}, year = 2021 }

PUMA

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

PUMA

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA

Kommentare und Rezensionen
(0)