Show HN: Duplicate 3 layers in a 24B LLM, logical deduction / 显示HN :在24B LLM中复制3层,逻辑推理.22→ .76。无需培训

📰 2026-03-19 09:00 更新

🔸 Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training / 显示HN :在24B LLM中复制3层,逻辑推理.22→ .76。无需培训

🔗 Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training
🔥 26 points

原文:
I replicated Ng’s RYS method and found that duplicating 3 specific layers in Qwen2.5-32B boosts reasoning by 17% and duplicating layers 12-14 in Devstral-24B improves logical deduction from 0.22→0.76 on BBH — no training, no weight changes, just routing hidden states through the same circuit twice. Tools included. Two AMD GPUs, one evening. Duplicate 3 layers. No training. Logical deduction goes from 0.22 → 0.76. This toolkit finds and exploits “reasoning circuits” hidden inside transformer m…

译文:
我复制了Ng的RYS方法,发现在Qwen2.5-32B中复制3个特定层可以将推理提高17% ,而在Devstral-24B中复制12-14层可以提高BBH上0.22→ 0.76的逻辑演绎—没有训练,没有权重变化,只是通过同一电路两次路由隐藏状态。包括工具。一个晚上两个AMD显卡。复制3层。无需训练。逻辑演绎从0.22 → 0.76开始。此工具包可查找并利用” 推理电路“隐藏在变压器内部……


自动更新 · 正文抓取 · 双语翻译

Leave a Comment