NVIDIA AI Updates: April 25, 2026

1. NVIDIA Posts Day-0 DeepSeek V4 Recipes Targeting 1M-Token Inference on Blackwell

NVIDIA. Coinciding with DeepSeek’s launch of V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B / 13B active), NVIDIA published serving recipes for both models on Blackwell GB200 NVL72 and B300 systems via SGLang and vLLM, including multinode prefill/decode disaggregation that scales past 100 GPUs. NVIDIA reports out-of-the-box throughput of more than 150 tokens/sec/user on V4-Pro at the model’s full 1M-token context window, citing DeepSeek’s 73% FLOP reduction and 90% KV cache reduction versus V3.2 as the reasons long-context agentic workloads are now economical. The models are also live as hosted endpoints on build.nvidia.com and as day-0 NIM downloads. Source

2. NVIDIA FLARE Adds Two-Step API for Drop-In Federated Learning

NVIDIA. The latest NVIDIA FLARE release reframes federated learning around a two-step workflow: a minimal client API (flare.init, flare.receive, flare.send) that drops into existing PyTorch or PyTorch Lightning training scripts without forcing Executor or Learner inheritance, and Python-based job recipes that replace the older JSON job configurations. NVIDIA frames it as a fix for the gap where pilots stall after working locally because production federation has historically required deep refactors. The post positions FLARE as the runtime layer for regulated and data-sovereignty-bound deployments where centralizing data is off the table. Source