PUMA publications for /group/simtech_testFri Feb 27 18:01:12 CET 2026Environmental Modelling & Software04106792Advancing geospatial data infrastructure in Dataverse via metadata automation, interactive tools and LLM case study1992026myown rdm fair In the era of big data and interdisciplinary research, the effective dissemination and reuse of geospatial data have become vital across various fields such as economics, biostatistics, epidemiology, environmental health, and sciences. This study investigates the challenges associated with managing geospatial data and presents the implementation of tools designed to address these challenges. We present an overview of the current state of geospatial data in a general-purpose research data repository Dataverse and outline a series of implemented advancements for improving the management and utilization of geospatial datasets. These advancements include building the capability to extract structured metadata automatically, enabling programmatic engagement with data assets, incorporating checklists, facilitating geospatial-specific searches, and providing previews of geographic dataset coverage. In this paper, we include two case studies. In the first, we evaluate the effectiveness of the automatic metadata extraction feature, part of our proposed advancements, using the large language model GPT-4 and find that the extracted metadata offers unique information, which is not typically provided by the user. In the second case study, we introduce the community of practice around climate-health data at Dataverse, coordinated through the CAFE Research Coordinating Center.Mon Feb 17 14:54:07 CET 202510arxiv:2109.131391--11Multimodal Integration of Human-Like Attention in Visual Question Answering2021pn7 updated pn7-5 exc2075 Human-like attention as a supervisory signal to guide neural attention has shown significant promise but is currently limited to uni-modal integration – even for inherently multi-modal tasks such as visual question answering (VQA). We present the Multimodal Human-like Attention Network (MULAN) – the first method for multimodal integration of human-like attention on image and text during training of VQA models. MULAN integrates attention predictions from two state-of-the-art text and image saliency models into neural self-attention layers of a recent transformer-based VQA model. Through evaluations on the challenging VQAv2 dataset, we show that MULAN achieves a new state-of-the-art performance of 73.98% accuracy on test-std and 73.72% on test-dev and, at the same time, has approximately 80% fewer trainable parameters than prior work. Overall, our work underlines the potential of integrating multimodal human-like and neural attention for VQA.Mon Feb 17 14:54:07 CET 2025Proc. The 1st Gaze Meets ML workshop, PMLR03 Dec165--183Proceedings of Machine Learning ResearchFacial Composite Generation with Iterative Human Feedback2102023pn7 updated pn7-5 exc2075 We propose the first method in which human and AI collaborate to iteratively reconstruct the human’s mental image of another person’s face only from their eye gaze. Current tools for generating digital human faces involve a tedious and time-consuming manual design process. While gaze-based mental image reconstruction represents a promising alternative, previous methods still assumed prior knowledge about the target face, thereby severely limiting their practical usefulness. The key novelty of our method is a collaborative, it- erative query engine: Based on the user’s gaze behaviour in each iteration, our method predicts which images to show to the user in the next iteration. Results from two human studies (N=12 and N=22) show that our method can visually reconstruct digital faces that are more similar to the mental image, and is more usable compared to other methods. As such, our findings point at the significant potential of human-AI collaboration for recon- structing mental images, potentially also beyond faces, and of human gaze as a rich source of information and a powerful mediator in said collaboration.Mon Feb 17 14:54:07 CET 2025OnlineProceedings of the 24th Conference on Computational Natural Language Learning1112--25Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension2020pn7 updated pn7-5 exc2075 While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models {--} despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.Mon Feb 17 14:54:07 CET 2025New York, NY, USAACM Symposium on Eye Tracking Research and Applications061–10ETRA '20 Full PapersAnticipating Averted Gaze in Dyadic Interactions2020pn7 updated pn7-5 exc2075 2We present the first method to anticipate averted gaze in natural dyadic interactions. The task of anticipating averted gaze, i.e. that a person will not make eye contact in the near future, remains unsolved despite its importance for human social encounters as well as a number of applications, including human-robot interaction or conversational agents. Our multimodal method is based on a long short-term memory (LSTM) network that analyses non-verbal facial cues and speaking behaviour. We empirically evaluate our method for different future time horizons on a novel dataset of 121 YouTube videos of dyadic video conferences (74 hours in total). We investigate person-specific and person-independent performance and demonstrate that our method clearly outperforms baselines in both settings. As such, our work sheds light on the tight interplay between eye contact and other non-verbal signals and underlines the potential of computational modelling and anticipation of averted gaze for interactive applications.Mon Feb 17 14:54:07 CET 2025Proc. the 45th Annual Meeting of the Cognitive Science Society (CogSci)07spotlight3639--3646Improving Neural Saliency Prediction with a Cognitive Model of Human Visual Attention2023pn7 updated pn7-5 exc2075 We present a novel method for saliency prediction that leverages a cognitive model of visual attention as an inductive bias. This approach is in stark contrast to recent purely data-driven saliency models that achieve performance improvements mainly by increased capacity, resulting in high computational costs and the need for large-scale training datasets. We demonstrate that by using a cognitive model, our method achieves competitive performance to the state of the art across several natural image datasets while only requiring a fraction of the parameters. Furthermore, we set the new state of the art for saliency prediction on information visualizations, demonstrating the effectiveness of our approach for cross-domain generalization. We further provide augmented versions of the full MSCOCO dataset with synthetic gaze data using the cognitive model, which we used to pre-train our method. Our results are highly promising and underline the significant potential of bridging between cognitive and data-driven models, potentially also beyond attention.Mon Feb 17 14:54:07 CET 2025Piscataway2021 IEEE/CVF International Conference on Computer Vision (ICCV)245-254Neural Photofit : Gaze-based Mental Image Reconstruction2021pn7 updated pn7-5 exc2075 Mon Feb 17 14:54:07 CET 2025Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)spotlight1--10Impact of Privacy Protection Methods of Lifelogs on Remembered Memories2023pn7 updated pn7-5 exc2075 Lifelogging is traditionally used for memory augmentation. However, recent research shows that users’ trust in the completeness and accuracy of lifelogs might skew their memories. Privacy-protection alterations such as body blurring and content deletion are commonly applied to photos to circumvent capturing sensitive information. However, their impact on how users remember memories remain unclear. To this end, we conduct a white-hat memory attack and report on an iterative experiment (N=21) to compare the impact of viewing 1) unaltered lifelogs, 2) blurred lifelogs, and 3) a subset of the lifelogs after deleting private ones, on confidently remembering memories. Findings indicate that all the privacy methods impact memories’ quality similarly and that users tend to change their answers in recognition more than recall scenarios. Results also show that users have high confidence in their remembered content across all privacy methods. Our work raises awareness about the mindful designing of technological interventions.Mon Feb 17 14:54:07 CET 2025Proc. International Symposium on Eye Tracking Research and Applications (ETRA)1--18Gaze-enhanced Crossmodal Embeddings for Emotion Recognition62022pn7 updated pn7-5 exc2075 Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, a representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art for both audio-only and video-only emotion classification on the popular One-Minute Gradual Emotion Recognition dataset. Furthermore, we report extensive ablation experiments and provide insights into the performance of different state-of-the-art gaze representations and integration strategies. Our results not only underline the importance of gaze for emotion recognition but also demonstrate a practical and highly effective approach to leveraging gaze information for this task.Mon Feb 17 14:54:07 CET 2025Proc. ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL)11spotlight27--43VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA2021pn7 updated pn7-5 exc2075 We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker. We use our dataset to analyze the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modulated Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorized Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show - for the first time - that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.Mon Feb 17 14:54:07 CET 2025Proc. 31st Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)9154--9169InteRead: An Eye Tracking Dataset of Interrupted Reading2024pn7 updated pn7-5 exc2075 Eye movements during reading offer a window into cognitive processes and language comprehension, but the scarcity of reading data with interruptions {--} which learners frequently encounter in their everyday learning environments {--} hampers advances in the development of intelligent learning technologies. We introduce InteRead {--} a novel 50-participant dataset of gaze data recorded during self-paced reading of real-world text. InteRead further offers fine-grained annotations of interruptions interspersed throughout the text as well as resumption lags incurred by these interruptions. Interruptions were triggered automatically once readers reached predefined target words. We validate our dataset by reporting interdisciplinary analyses on different measures of gaze behavior. In line with prior research, our analyses show that the interruptions as well as word length and word frequency effects significantly impact eye movements during reading. We also explore individual differences within our dataset, shedding light on the potential for tailored educational solutions. InteRead is accessible from our datasets web-page: https://www.ife.uni-stuttgart.de/en/llis/research/datasets/.Mon Feb 17 14:54:07 CET 2025Advances in Neural Information Processing Systems6327--6341Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention332020pn7 PN7-5 updated exc2075 Mon Feb 17 14:54:07 CET 2025StroudsburgProceedings of the 7th Workshop on Representation Learning for NLP143-155Video Language Co-Attention with Multimodal Fast-Learning Feature Fusion for VideoQA2022pn7 updated pn7-5 exc2075 Fri Nov 08 09:56:54 CET 202401Test Post Deletion Copy Version 312024pn1 curated test Fri Nov 08 09:56:26 CET 2024my test journalTest Post Deletion Copy Version 212024pn1 test my test abstractFri Nov 08 09:20:25 CET 202401Test Post Deletion Copy Version 312024pn1 curated test Fri Nov 08 09:12:50 CET 2024my updated test journalTest Post Deletion Copy Version 52024pn1 Tue Oct 29 10:10:04 CET 2024my updated test journal01Test Post Deletion Copy Version 312024pn1 curated test my test abstractMon Oct 28 15:38:37 CET 2024unknown my test journal0123-4Test Post Deletion Copy Version 512024pn1 curated test my updated test abstractWed Oct 23 15:05:18 CEST 2024unknown my test journal0123-4Test Post Deletion Copy Version 312024pn1 curated test my updated test abstract