Artikel in einem Konferenzbericht,

All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German

, , , , und .
Proceedings of the 12th Language Resources and Evaluation Conference, Seite 4368--4378. Marseille, France, European Language Resources Association, (Mai 2020)

Zusammenfassung

In this paper we present the GerCo dataset of adjective-noun collocations for German, such as alter Freund `old friend' and tiefe Liebe `deep love'. The annotation has been performed by experts based on the annotation scheme introduced in this paper. The resulting dataset contains 4,732 positive and negative instances of collocations and covers all the 16 semantic classes of adjectives as defined in the German wordnet GermaNet. The dataset can serve as a reliable empirical basis for comparing different theoretical frameworks concerned with collocations or as material for data-driven approaches to the studies of collocations including different machine learning experiments. This paper addresses the latter issue by using the GerCo dataset for evaluating different models on the task of automatic collocation identification. We compare lexical association measures with static and contextualized word embeddings. The experiments show that word embeddings outperform methods based on statistical association measures by a wide margin.

Tags

Nutzer

  • @neelefalk

Kommentare und Rezensionen