TurboQuant model weight compression support added to Llamacp / TurboQuant模型权重压缩支持已添加到Llamacpp

📰 2026-04-04 18:30 更新

🔸 TurboQuant model weight compression support added to Llamacpp / TurboQuant模型权重压缩支持已添加到Llamacpp

🔗 TurboQuant model weight compression support added to Llamacpp
🔥 10 points

原文:
Adds CUDA dequantization for TQ4_1S (5.0 bpv) and TQ3_1S (4.0 bpv) WHT-rotated weight compression types. These achieve 27-37% model size reduction at +1.0-1.9% PPL on Qwen/Phi families. Base types + Metal + CPU quantize/dequant from TheTom’s PR TheTom#45. CUDA additions:

译文:
为TQ4_1S ( 5.0 bpv )和TQ3_1S ( 4.0 bpv ) WHT旋转重量压缩类型添加CUDA去量化。在Qwen/PHI家族中,在+1.0-1.9% PPL下,这些实现了27-37%的模型尺寸缩小。底座类型+金属+ CPU量化/定量来自TheTom的PR TheTom # 45。CUDA添加:


自动更新 · 正文抓取 · 双语翻译

Leave a Comment