Hugging Face AI Updates: May 23, 2026

1. NVIDIA ships Nemotron-Labs Diffusion LLMs with up to 6.4x token-per-forward speedups over autoregressive baselines

Hugging Face. NVIDIA released the Nemotron-Labs Diffusion family on the Hub, with base and instruction-tuned text models at 3B, 8B, and 14B parameters plus an 8B vision-language model, converting pretrained autoregressive checkpoints into hybrid diffusion models through continued pretraining on 1.3T tokens followed by 45B tokens of post-training. The 8B model posts a 1.2% average accuracy gain over Qwen3 8B while delivering 2.6x tokens-per-forward in pure diffusion mode, 6x with linear self-speculation, and 6.4x with quadratic self-speculation, hitting roughly 865 tok/s on a B200 in the speedbench dataset. The architecture uses a block-wise attention mechanism that preserves KV-cache compatibility, supports three switchable inference modes (autoregressive, FastDiffuser 32-token block generation, and LinearSpec bidirectional drafting with causal verification that is lossless versus AR at temperature 0), and integrates with SGLang through a single-line config change. Text models ship under the commercially-friendly NVIDIA Nemotron Open Model License while the VLM uses the research-focused NVIDIA Source Code License, with training code available through the Megatron Bridge framework. Source

2. Dharma AI argues specialized 3B OCR model beats Claude Opus 4.6 at 1/52nd the cost on Brazilian Portuguese

Hugging Face. Dharma AI published a procurement-focused study contending that distributional alignment between a model’s training history and the deployment task is more predictive of performance than raw parameter count, using Brazilian Portuguese OCR as the empirical anchor. On their benchmark Nanonets-OCR2-3B scored 0.911 with a 0.20% text degeneration rate while costing roughly 52x less than Claude Opus 4.6 (0.833), outperforming Gemini 3.1 Pro (0.820), GPT-5.4 (0.750), and GPT-4o (0.635) on the same task. The team also reports that specialization compounds in stages: identical OCR training applied to the pre-specialized Nanonets-OCR2-3B versus the generalist Qwen2.5-VL-3B yielded a 16% quality improvement (0.921 vs 0.793) and a 7x reduction in failure rate, supporting the case that starting from a domain-aligned checkpoint pays dividends under fixed training budgets. The authors flag the evidence is bounded to a single domain and explicitly do not argue frontier models are inferior, but rather that training history is underweighted in current enterprise procurement frameworks that lean on public benchmark leaderboards. Source