Meta AI Updates: May 20, 2026
1. PyTorch ships aarch64 CUDA wheels on the default PyPI index
Meta. The PyTorch core team (Alban Desmaison, Nikita Shulga, Andrey Talman) coordinated with NVIDIA and vLLM through the PyTorch Foundation’s Technical Advisory Council to publish CUDA-enabled aarch64 Linux wheels on the default PyPI index starting with PyTorch 2.11. Previously pip install torch on Grace Hopper class systems (GH200, GB200, GB300) would silently resolve to CPU-only wheels, forcing users to pin --index-url https://download.pytorch.org/whl/cu128; the new build dynamically links NCCL and cuBLAS to keep wheel size manageable. vLLM’s use_existing_torch.py and no-build-isolation-package workarounds are now obsolete for standard installs. Source
2. ExecuTorch lands an experimental MLX delegate for Apple Silicon GPUs
Meta. Meta’s ExecuTorch team shipped a new MLX delegate that lowers torch.export graphs through MLXPartitioner and runs the resulting .pte file on Apple Silicon GPUs via Apple’s MLX framework, delivering 3-6x higher throughput on generative AI workloads versus existing macOS backends. The delegate covers roughly 90 ATen ops including quantized matmul, multi-head attention, rotary embeddings, MoE routing, and recurrent state-space operations, with BF16/FP16/FP32, 2/4/8-bit TorchAO affine quantization, and NVFP4 support. Validated on Llama 3.2, Qwen 3, Phi-4, Gemma 3, Qwen 3.5 35B (256-expert MoE), Whisper, Mistral Voxtral, and NVIDIA Parakeet, with the same C++/Python runtime API used across XNNPACK, CoreML, Vulkan, and CUDA backends. Source