Hugging Face AI Updates: May 18, 2026

1. NVIDIA publishes a single-GPU LoRA recipe for Cosmos Predict 2.5 robot video generation

Hugging Face. A new walkthrough from NVIDIA shows how to parameter-efficiently fine-tune the 2B-parameter Cosmos Predict 2.5 world model for domain-specific robot video generation using LoRA and DoRA adapters. The recipe injects roughly 50M trainable parameters into the DiT’s attention and feedforward projections at rank 32, leaving the VAE, text encoder, and base DiT frozen, and reports a 17-hour single-H100 run or 2.5 hours on 8x H100s on the open nvidia/GR1-100 dataset (92 robot manipulation videos). Quantitative gains over the base model show up across Sampson error (geometric consistency), LLM-as-judge physical plausibility scores, and instruction-following on hand selection and object interactions, with the team finding that rank 8 and rank 32 converge similarly on geometry but rank 32 helps instruction following, and that DoRA’s magnitude-direction split mostly matters at very low ranks. Source

2. PaddleOCR 3.5 swaps in a Transformers backend for OCR and document parsing

Hugging Face. PaddlePaddle published PaddleOCR 3.5 with a new Transformers backend that lets users run the toolkit’s OCR and document parsing pipelines directly through the Hugging Face Transformers library instead of through the Paddle runtime. The change makes the models easier to integrate into existing Transformers-based pipelines, simplifies inference on standard GPU stacks, and brings PaddleOCR closer to the rest of the document-AI ecosystem on the Hub. The release is positioned as a practical option for teams that want PaddleOCR’s accuracy on Asian scripts and structured documents without taking on the Paddle dependency. Source

3. IBM Research launches the Open Agent Leaderboard to benchmark full agent systems

Hugging Face. IBM Research, in collaboration with Hugging Face, launched the Open Agent Leaderboard, an evaluation framework that scores complete agent systems rather than just underlying models. The leaderboard runs a unified protocol across six tasks - SWE-Bench Verified for bug fixing, BrowseComp+ for deep web research, AppWorld for multi-app personal tasks, and tau2-Bench Airline, Retail, and Telecom for policy-following customer service - and reports success rate alongside cost per task, including the markup that failed runs add (20-54% more than successful ones). The headline conclusion from the launch data is that agent architecture matters as much as model choice: the same underlying model produces very different outcomes once planning, memory, tool use, and error recovery differ. The leaderboard, Exgentic evaluation framework, and methodology paper are all open. Source