Hugging Face AI Updates: June 12, 2026

1. Allen AI Releases olmo-eval, an Open-Source Evaluation Workbench for LLM Development

Allen AI published olmo-eval on the Hugging Face blog, describing it as an evaluation framework that builds on OLMES and extends across the rest of the LLM development loop. The framework separates benchmark logic from runtime execution through a task, suite, and harness abstraction, and it adds a sandbox layer that supports tool-use evaluation including code execution and web browsing. It records configurations and results in a normalized experiment schema, provides a results viewer for checkpoint-to-checkpoint comparisons, and treats multi-turn and agentic evaluation as primary use cases. The code is available at the allenai/olmo-eval GitHub repository. Source

2. Hugging Face Publishes a PyTorch Profiling Guide on Fusing MLP Operations

Hugging Face published the second part of a PyTorch profiling series on its blog, authored by Aritra Roy Gosthipaty, Rémi Ouazan Reboul, Sergio Paniego, and Sayak Paul, focused on profiling and fusing multilayer perceptron operations on the GPU. The post explains that nn.Linear already fuses bias addition into the matrix multiplication epilogue, that torch.compile merges GeLU and multiplication into a single Triton kernel when three linears form a GeGLU MLP, and that the kernels library offers a pre-built LigerGEGLUMLP with fusion baked into hardware-optimized code. It reports that compile-generated kernels reached roughly 89.4 microseconds but require recompilation for shape changes, while hand-tuned kernels ran at 92.8 microseconds and stayed robust across varying input dimensions. Source