Efficient Sparse Spherical K-Means for Document Clustering
J. Knittel, S. Koch, und T. Ertl. Proceedings of the 21st ACM Symposium on Document Engineering, New York, NY, USA, Association for Computing Machinery, (2021)
DOI: 10.1145/3469096.3474937
Zusammenfassung
Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.
%0 Conference Paper
%1 Knittel21Clustering
%A Knittel, Johannes
%A Koch, Steffen
%A Ertl, Thomas
%B Proceedings of the 21st ACM Symposium on Document Engineering
%C New York, NY, USA
%D 2021
%I Association for Computing Machinery
%K analysis clustering document k-means large-scale vis(us) vis-gis visus:ertl visus:knittejs visus:kochsn
%R 10.1145/3469096.3474937
%T Efficient Sparse Spherical K-Means for Document Clustering
%U https://doi.org/10.1145/3469096.3474937
%X Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.
%@ 9781450385961
@inproceedings{Knittel21Clustering,
abstract = {Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.},
added-at = {2022-02-14T15:36:27.000+0100},
address = {New York, NY, USA},
articleno = {6},
author = {Knittel, Johannes and Koch, Steffen and Ertl, Thomas},
biburl = {https://puma.ub.uni-stuttgart.de/bibtex/2409c8ed2e30a5580de1bd6e7a05fd04f/knittejs},
booktitle = {Proceedings of the 21st ACM Symposium on Document Engineering},
doi = {10.1145/3469096.3474937},
interhash = {8220850f0e7cf5d5440189cffe13b01a},
intrahash = {409c8ed2e30a5580de1bd6e7a05fd04f},
isbn = {9781450385961},
keywords = {analysis clustering document k-means large-scale vis(us) vis-gis visus:ertl visus:knittejs visus:kochsn},
location = {Limerick, Ireland},
numpages = {4},
publisher = {Association for Computing Machinery},
series = {DocEng '21},
timestamp = {2022-02-14T14:36:27.000+0100},
title = {Efficient Sparse Spherical K-Means for Document Clustering},
url = {https://doi.org/10.1145/3469096.3474937},
year = 2021
}