Abstract
Model compression has been an active research field that has been used to reduce the size and complexity of the model. In a recent noteworthy study, ToMe and its variants utilize the Bipartite Soft Matching (BSM) algorithm in which tokens representing patches in an image are split into two sets, and top-k similar tokens from one set are merged. This approach utilizes pre-trained weights, enhances speed, and reduces memory usage. However, these algorithms have some drawbacks. First, the choice of a token-splitting strategy significantly influences algorithm performance since tokens in one set can only perceive tokens in the other set, leading to mis-merging issues. Furthermore, although ToMe is effective in the initial layers, it becomes increasingly problematic in deeper layers as the number of tokens diminishes because of damaged informative tokens. To address these limitations, rather than relying on specific splitting strategies like BSM, we propose a new algorithm called PiToMe. Specifically, we prioritize the protection of informative tokens using an additional factor called energy score. In experiments, PiToMe achieved up to a 50% memory reduction while exhibiting superior off-the-shelf performance on image classification ( keeping a 1.71% average performance drop compared to 2.6% for ToMe) and image-text retrieval (1.35% average performance drop compared to 6.89% for ToMe) compared to previous BSM-based approaches dependent solely on token similarity.
Users
Please
log in to take part in the discussion (add own reviews or comments).