Inproceedings,

Energy Minimizing-based Token Merging for Accelerating Transformers

H. Tran, D. Nguyen, M. Nguyen, N. Le, and B. T. Nguyen.
Proceedings of Practical ML for Low Resource Settings in Science workshop at ICLR 2024, May 7-11, 2024, Austria, ICLR, (May 2024)

Abstract

Model compression has been an active research field that has been used to reduce the size and complexity of the model. In a recent noteworthy study, ToMe and its variants utilize the Bipartite Soft Matching (BSM) algorithm in which tokens representing patches in an image are split into two sets, and top-k similar tokens from one set are merged. This approach utilizes pre-trained weights, enhances speed, and reduces memory usage. However, these algorithms have some drawbacks. First, the choice of a token-splitting strategy significantly influences algorithm performance since tokens in one set can only perceive tokens in the other set, leading to mis-merging issues. Furthermore, although ToMe is effective in the initial layers, it becomes increasingly problematic in deeper layers as the number of tokens diminishes because of damaged informative tokens. To address these limitations, rather than relying on specific splitting strategies like BSM, we propose a new algorithm called PiToMe. Specifically, we prioritize the protection of informative tokens using an additional factor called energy score. In experiments, PiToMe achieved up to a 50% memory reduction while exhibiting superior off-the-shelf performance on image classification ( keeping a 1.71% average performance drop compared to 2.6% for ToMe) and image-text retrieval (1.35% average performance drop compared to 6.89% for ToMe) compared to previous BSM-based approaches dependent solely on token similarity.

BibTeX key: tran2024energy
entry type: inproceedings
booktitle: Proceedings of Practical ML for Low Resource Settings in Science workshop at ICLR 2024, May 7-11, 2024, Austria
year: 2024
month: may
publisher: ICLR
language: en

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{tran2024energy, abstract = {Model compression has been an active research field that has been used to reduce the size and complexity of the model. In a recent noteworthy study, ToMe and its variants utilize the Bipartite Soft Matching (BSM) algorithm in which tokens representing patches in an image are split into two sets, and top-k similar tokens from one set are merged. This approach utilizes pre-trained weights, enhances speed, and reduces memory usage. However, these algorithms have some drawbacks. First, the choice of a token-splitting strategy significantly influences algorithm performance since tokens in one set can only perceive tokens in the other set, leading to mis-merging issues. Furthermore, although ToMe is effective in the initial layers, it becomes increasingly problematic in deeper layers as the number of tokens diminishes because of damaged informative tokens. To address these limitations, rather than relying on specific splitting strategies like BSM, we propose a new algorithm called PiToMe. Specifically, we prioritize the protection of informative tokens using an additional factor called energy score. In experiments, PiToMe achieved up to a 50% memory reduction while exhibiting superior off-the-shelf performance on image classification ( keeping a 1.71% average performance drop compared to 2.6% for ToMe) and image-text retrieval (1.35% average performance drop compared to 6.89% for ToMe) compared to previous BSM-based approaches dependent solely on token similarity.}, added-at = {2024-03-22T23:41:43.000+0100}, author = {Tran, Hoai-Chau and Nguyen, Duy Minh Ho and Nguyen, Manh-Duy and Le, Ngan Hoang and T. Nguyen, Binh}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/28dbc84ccda33c9a51f00efeec9997a1a/joy}, booktitle = {Proceedings of Practical ML for Low Resource Settings in Science workshop at ICLR 2024, May 7-11, 2024, Austria}, interhash = {7696d12abc5a96b37c5602f982e6d99b}, intrahash = {8dbc84ccda33c9a51f00efeec9997a1a}, keywords = {mls workshop}, language = {en}, month = may, publisher = {ICLR}, timestamp = {2024-03-22T23:42:57.000+0100}, title = {Energy Minimizing-based Token Merging for Accelerating Transformers}, year = 2024 }

PUMA

Energy Minimizing-based Token Merging for Accelerating Transformers

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on