Abstract

It has been argued that BERT ``rediscovers the traditional NLP pipeline'', with lower layers extracting morphosyntactic features and higher layers creating holistic sentence-level representations. In this paper, we critically examine this assumption through a principle-component-guided analysis, extracing sets of inputs that correspond to specific activation patterns in BERT sentence representations. We find that even in higher layers, the model mostly picks up on a variegated bunch of low-level features, many related to sentence complexity, that presumably arise from its specific pre-training objectives.

Links and resources

Tags

community

  • @tcl-ims
  • @sp
@sp's tags highlighted