Hugging Face AI Updates: June 26, 2026

1. Hugging Face Jobs Can Now Launch a vLLM Server in One Command

Hugging Face published a guide showing how developers can spin up a private, OpenAI-compatible vLLM server on HF Jobs infrastructure with a single command, removing the need to manually provision servers. The service runs on pay-per-second billing and targets short-lived workloads such as experiments, one-off evaluations, batch generation, and quickly testing a model. The post positions Jobs as complementary to Hugging Face Inference Endpoints, which remains suited to production scenarios that need scale-to-zero and finer access controls. Source

2. Allen Institute Study Maps Which Tokens Hybrid Models Predict Better

Hugging Face hosted a post from the Allen Institute for AI comparing Olmo 3, a transformer, against Olmo Hybrid, which mixes attention and recurrent layers, to see where each architecture wins token by token. The hybrid model performed better on meaning-bearing tokens such as nouns, verbs, and adjectives, along with pronouns that require tracking context, while the transformer was stronger at predicting repeated sequences and closing brackets where exact recall matters. The researchers argue that a single average loss hides these differences and recommend evaluating models by token category to guide future hybrid design. Source