Meta AI Updates: June 24, 2026

1. PyTorch blog details 5x throughput gain serving DeepSeek-V4 on GB300 with SGLang

Meta. A post on the PyTorch blog reports that SGLang reached roughly 11,200 tokens per second per GPU when serving DeepSeek-V4 on NVIDIA GB300 hardware, a 5x throughput improvement over the day-zero launch at the same user interactivity of about 50 tokens per second per user. The gains came from kernel optimizations such as MHC fusion, KV Compression V2, and W4A4 MegaMoE, alongside runtime changes including improved SWA budgeting, better disaggregated decode admission, and breakable CUDA graph support. The work also included bug fixes across SGLang and Dynamo to stabilize the serving stack. Source