Incollection,

Text Mining and Topic Modelling

, and .
Handbook of Computational Social Science, Routledge, London, (2021)

Abstract

Working with text poses important conceptual and methodological challenges. Topic models are a popular tool to reduce texts’ complexity and find meaningful themes in large corpora. After an overview of existing work, we explain how to employ structural topic models, one of the variations of topic modeling of most relevance to social researchers. In particular, however, this chapter emphasizes the selection of an appropriate number of topics K and its relation to preprocessing. We investigate the influence of preprocessing decisions on (i) the choice of K and (ii) the quality of a topic model (i.e., its predictive power and consistency). For that purpose, we examine a multitude of model setups by employing both established metrics and innovative measures. From our empirical results, we derive several practical recommendations for researchers and provide easy-to-use code to approximate an appropriate number of topics and test the robustness of one’s choice. We develop these arguments with comprehensive data on over 137,000 education-related dissertations completed at U.S. universities.

Tags

Users

  • @raphei

Comments and Reviews