Article,

Big Data, Big Noise

A. Waldherr, D. Maier, P. Miltner, and E. Günther.
Social Science Computer Review, 35 (4): 427-443 (2017)
DOI: 10.1177/0894439316643050

Abstract

In this article, we focus on noise in the sense of irrelevant information in a data set as a specific methodological challenge of web research in the era of big data. We empirically evaluate several methods for filtering hyperlink networks in order to reconstruct networks that contain only webpages that deal with a particular issue. The test corpus of webpages was collected from hyperlink networks on the issue of food safety in the United States and Germany. We applied three filtering strategies and evaluated their performance to exclude irrelevant content from the networks: keyword filtering, automated document classification with a machine-learning algorithm, and extraction of core networks with network-analytical measures. Keyword filtering and automated classification of webpages were the most effective methods for reducing noise, whereas extracting a core network did not yield satisfying results for this case.

BibTeX key: doi:10.1177/0894439316643050
entry type: article
year: 2017
journal: Social Science Computer Review
number: 4
pages: 427-443
volume: 35
file: :C\:\\Users\\Heckelen\\bwSyncAndShare\\Diss\Łiteratur\\Waldher et al (2016) Big Data Big Noise - The Challenge of Finding Issue Networks on the Web.pdf:PDF
DOI: 10.1177/0894439316643050
eprint: http://dx.doi.org/10.1177/0894439316643050
url: http://dx.doi.org/10.1177/0894439316643050

PUMA

Big Data, Big Noise

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on