EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance
Published in arXiv 2024, 2024
Recommended citation: Yingxin Li, Ye Li, Yuan Meng, Xinzhu Ma, Zihan Geng, Shu-Tao Xia, Zhi Wang. "EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance." arXiv:2412.08521, 2024. https://arxiv.org/abs/2412.08521
EMS compresses the KV cache of large language models via an adaptive evict-then-merge strategy that allocates budget head-wise based on global-local importance.