📰 2026-03-20 04:00 更新
🔸 NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute / NanoGPT运行缓慢:使用无限计算,数据效率提高10倍
🔗 NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute
🔥 27 points
原文:
We’ve achieved 10x data efficiency with NanoGPT Slowrun within a few weeks. An ensemble of 1.8B parameter models (18B total params) trained on 100M tokens matches what would normally require 1B tokens with a standard LM baseline. Data efficiency matters because compute grows much faster than data . Since our current scaling laws require proportional increases in both , intelligence will eventually be bottlenecked by data, not compute. This data efficiency result allows us to improve model pe…
译文:
我们在几周内使用NanoGPT Slowrun实现了10倍的数据效率。在100M代币上训练的1.8B参数模型(总参数为18B )的集合通常需要10B代币与标准LM基线相匹配。数据效率很重要,因为计算的增长速度比数据快得多。由于我们目前的缩放定律要求两者都按比例增加,因此智能最终将受到数据而不是计算的瓶颈。这些数据 效率结果使我们能够改进模型PE…
自动更新 · 正文抓取 · 双语翻译