Representation Problems in Linguistic Annotations: Ambiguity, Variation, Uncertainty, Error and Bias
C. Beck, H. Booth, M. El-Assady, and M. Butt. Proceedings of the 14th Linguistic Annotation Workshop, page 60--73. Barcelona, Spain, Association for Computational Linguistics, (December 2020)
Abstract
The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.
%0 Conference Paper
%1 beck-etal-2020-representation
%A Beck, Christin
%A Booth, Hannah
%A El-Assady, Mennatallah
%A Butt, Miriam
%B Proceedings of the 14th Linguistic Annotation Workshop
%C Barcelona, Spain
%D 2020
%I Association for Computational Linguistics
%K 2020 d02 from:christinawarren sfbtrr161
%P 60--73
%T Representation Problems in Linguistic Annotations: Ambiguity, Variation, Uncertainty, Error and Bias
%U https://www.aclweb.org/anthology/2020.law-1.6
%X The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.
@inproceedings{beck-etal-2020-representation,
abstract = {The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.},
added-at = {2021-06-28T10:34:03.000+0200},
address = {Barcelona, Spain},
author = {Beck, Christin and Booth, Hannah and El-Assady, Mennatallah and Butt, Miriam},
biburl = {https://puma.ub.uni-stuttgart.de/bibtex/273385cdbf95100ccad0d5bd3ba8a28cc/sfbtrr161},
booktitle = {Proceedings of the 14th Linguistic Annotation Workshop},
interhash = {c0ca7b443d89c942df667b6f0baa53cd},
intrahash = {73385cdbf95100ccad0d5bd3ba8a28cc},
keywords = {2020 d02 from:christinawarren sfbtrr161},
month = {12},
pages = {60--73},
publisher = {Association for Computational Linguistics},
timestamp = {2022-09-28T15:35:12.000+0200},
title = {Representation Problems in Linguistic Annotations: Ambiguity, Variation, Uncertainty, Error and Bias},
url = {https://www.aclweb.org/anthology/2020.law-1.6},
year = 2020
}