Beliebiger Eintrag,

Code for Improving Video Caption Accuracy with LLMs : Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models

N. Fathallah.
Dataset, (2025)Related to: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024. arXiv: 2412.00342.
DOI: 10.18419/darus-4776

Zusammenfassung

As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community.

BibTeX-Schlüssel: fathallah2025improving
Eintragstyp: misc
Jahr: 2025
Art der Veröffentlichung: Dataset
affiliation: Fathallah, Nadeen/University of Stuttgart
orcid-numbers: Fathallah, Nadeen/https://orcid.org/0000-0001-7921-034X
DOI: 10.18419/darus-4776
Hinweis: Related to: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024. arXiv: 2412.00342

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

@misc{fathallah2025improving, abstract = {As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community. }, added-at = {2025-03-03T09:44:34.000+0100}, affiliation = {Fathallah, Nadeen/University of Stuttgart}, author = {Fathallah, Nadeen}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/2c586852f452ddd4e788882e21c20cb17/unibiblio}, doi = {10.18419/darus-4776}, howpublished = {Dataset}, interhash = {0016e9bfe6d9d803d99891b5f5d82312}, intrahash = {c586852f452ddd4e788882e21c20cb17}, keywords = {}, note = {Related to: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024. arXiv: 2412.00342}, orcid-numbers = {Fathallah, Nadeen/https://orcid.org/0000-0001-7921-034X}, timestamp = {2025-03-03T08:44:34.000+0100}, title = {Code for Improving Video Caption Accuracy with LLMs : Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models}, year = 2025 }

PUMA

Code for Improving Video Caption Accuracy with LLMs : Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf