Inproceedings,

A Textual Entailment Dataset from German Web Forum Text

B. Zeller, and S. Padó.
Proceedings of IWCS, page 288--299. Potsdam, Germany, (2013)

Abstract

We present the first freely available large German dataset for Textual Entailment (TE). Our dataset builds on posts from German online forums concerned with computer problems and models the task of identifying relevant posts for user queries (i.e., descriptions of their computer problems) through TE. We use a sequence of crowdsourcing tasks to create realistic problem descriptions through summarisation and paraphrasing of forum posts. The dataset is represented in RTE-5 Search task style and consists of 172 positive and over 2800 negative pairs. We analyse the properties of the created dataset and evaluate its difficulty by applying two TE algorithms and comparing the results with results on the English RTE-5 Search task. The results show that our dataset is roughly comparable to the RTE-5 data in terms of both difficulty and balancing of positive and negative entailment pairs. Our approach to create task-specific TE datasets can be transferred to other domains and languages.

BibTeX key: zeller13:_textual_entail_datas_german_web_forum_text
entry type: inproceedings
address: Potsdam, Germany
booktitle: Proceedings of IWCS
year: 2013
pages: 288--299
Document: http://www.aclweb.org/anthology/W13-0125.pdf

PUMA

A Textual Entailment Dataset from German Web Forum Text

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on