Zusammenfassung
It has been argued that BERT ``rediscovers the traditional NLP
pipeline'', with lower layers extracting morphosyntactic features and
higher layers creating holistic sentence-level representations.
In this paper, we critically examine this assumption through a
principle-component-guided analysis, extracing sets of inputs that
correspond to specific activation patterns in BERT sentence representations.
We find that even in higher layers, the model mostly picks up on a
variegated bunch of low-level features, many related to sentence
complexity, that presumably arise from its specific pre-training
objectives.
Nutzer