Hugging Face AI Updates: May 20, 2026

1. AllenAI ships OlmoEarth v1.1 with a 3x cheaper tokenizer for satellite imagery

Hugging Face. AllenAI released OlmoEarth v1.1 on the Hub, an updated family of remote-sensing foundation models (Nano, Tiny, Base) that cut compute cost by up to 3x versus the v1 generation released last November while holding performance parity on internal research benchmarks and partner tasks. The savings come from redesigning how Sentinel-2 imagery is tokenized: v1 emitted one token per timestep per resolution (10m, 20m, 60m), giving an (H/p x W/p x T x 3) sequence, while v1.1 collapses all three resolutions into a single token per timestep for an (H/p x W/p x T) sequence and a 3x shorter context, which matters because transformer compute scales quadratically with sequence length. A naive switch costs about 10 points on the m-eurosat kNN benchmark, but the team recovered the gap with a modified pre-training regimen (detailed in the technical report) rather than reintroducing per-band tokens, on the theory that cross-band relationships can be learned through training strategy instead of token layout. Weights ship in the allenai/olmoearth collection alongside the olmoearth_pretrain GitHub repo. Source

2. Johns Hopkins releases the Ettin reranker family, six ModernBERT cross-encoders from 17M to 1B

Hugging Face. Hugging Face highlighted a new open cross-encoder reranker family built on the Ettin ModernBERT backbones from Johns Hopkins, with six Apache 2.0 checkpoints at 17M, 32M, 68M, 150M, 400M, and 1B parameters and 8K-token context across the lineup. On MTEB(eng, v2) the 1B variant posts 0.6114 mean NDCG@10 while running 2.4x faster than its 1.54B teacher (mixedbread-ai/mxbai-rerank-large-v2) and staying within 0.0001 of its score, and the 17M model beats ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 with roughly half the parameters - useful for production retrieve-then-rerank pipelines where latency budgets are tight. Throughput on a single H100 ranges from 7,517 pairs per second at 17M down to 928 at 1B, the architecture uses RoPE, GeGLU, and unpadded Flash Attention 2 with a CLS-pooled classification head, and training was a single-stage pointwise MSE distillation on ~143M (query, document, teacher_score) triples drawn from LightOn pre-training and fine-tuning data. Sentence Transformers CrossEncoder loads the models directly via cross-encoder/ettin-reranker-{size}-v1 identifiers. Source