📰 2026-05-07 17:00 更新
🔸 How Unsloth and Nvidia made LLM training 25% faster on consumer GPUs / Unsloth和Nvidia如何使消费级GPU上的LLM培训速度提高25 %
🔗 How Unsloth and Nvidia made LLM training 25% faster on consumer GPUs
🔥 9 points
原文:
The model still needs to know where each original sequence starts and ends. So, alongside the packed tokens, we carry sequence metadata such as: sequence lengths cumulative sequence offsets (cu_seqlens) the maximum sequence length attention structure derived from the three items above
译文:
模型仍然需要知道每个原始序列的开始和结束位置。因此,除了打包的代币外,我们还携带序列元数据,例如:序列长度累积序列偏移( cu_seqlens )从上述三个项目得出的最大序列长度注意结构
自动更新 · 正文抓取 · 双语翻译