AI News: May 4, 2026

1. Xiaomi Releases MIT-Licensed MiMo-V2.5 and V2.5-Pro for Hours-Long Autonomous Coding

Xiaomi. MiMo-V2.5-Pro is a 1.02-trillion-parameter mixture-of-experts model with a 1M-token context, released as open weights under MIT and aimed at long-horizon agentic work. In Xiaomi’s published runs, it built a SysY-to-RISC-V compiler in Rust over 4.3 hours and 672 tool calls (scoring 233/233 on the course’s hidden tests) and produced an 8,192-line video editor across 1,868 tool calls and 11.5 hours of autonomous work. On Xiaomi’s MiMo Coding Bench it scores 73.7 versus Claude Opus 4.6 at 77.1, while reportedly using 40-60% fewer tokens than Claude Opus 4.6 or Gemini 3.1 Pro for equivalent agentic tasks. Source

2. Harvard Study Finds Frontier LLM Beats Two Human Doctors on ER Diagnostic Accuracy

Harvard / Beth Israel. Researchers at Harvard Medical School and Beth Israel Deaconess Medical Center evaluated frontier language models against two physicians on a battery of emergency-room diagnostic scenarios and found at least one model produced more accurate diagnoses than both clinicians. The paper joins a growing line of LLM-vs-doctor evaluations but is notable for its ED setting, where time pressure and incomplete information typically favor experienced humans. The authors caution that diagnostic accuracy in vignettes does not translate directly to clinical deployment, where workflow integration, liability, and the cost of false positives remain unresolved. Source

3. MIT Paper Pins Reliable LLM Scaling on Feature Superposition

MIT. A new MIT study attributes the empirical regularity of language-model scaling laws to superposition — the well-documented phenomenon where networks pack more linear features than they have neurons. The authors argue that the steady, predictable loss reductions observed as parameters grow are mechanically explained by how additional capacity lets models disentangle features that previously had to share dimensions, providing a principled story for why scaling has held up across architectures and orders of magnitude. The paper offers a candidate answer to a question that has dogged the field since Kaplan et al.: are the curves coincidence or consequence? Source

4. New Benchmark Shows Frontier Models Diverge Sharply on Real-World Ethical Dilemmas

Researchers. A new evaluation tests leading language models on 100 real-world ethical scenarios and finds substantial disagreement across providers on how each model resolves the same prompt. The results reframe alignment as less a question of whether a model has values and more a question of whose values get embedded — a meaningful problem when the same product surface (a chatbot) is asked to advise users in jurisdictions and contexts with very different moral defaults. The benchmark is positioned as a way to surface and audit those divergences rather than to declare a winner. Source

5. US Government Benchmark Claims China Trails by Eight Months on AI

US government. A US-government-produced benchmark reports that Chinese frontier AI capabilities lag the United States by roughly eight months, though independent metrics — including Chatbot Arena rankings, agentic-task scores from Xiaomi, Moonshot, and DeepSeek, and per-token cost comparisons — paint a less decisive picture. The Decoder’s coverage notes that the benchmark’s framing serves an export-control narrative while glossing over China’s clear lead on cost-efficient open-weights models. The piece is a useful reminder that “who is ahead” depends heavily on which axis is measured. Source

6. ‘This Is Fine’ Creator Accuses Artisan of Training on His Artwork Without Permission

KC Green / Artisan. KC Green, creator of the “this is fine” dog meme, has publicly accused Artisan — the AI company behind the controversial “stop hiring humans” billboard campaign — of using his art without permission in marketing material. The dispute joins a lengthening list of artist-vs-AI-startup confrontations and is unusual mainly for the visibility of both parties: Green’s image is among the most-circulated in internet culture, and Artisan has spent the past year courting backlash as a marketing strategy. Whether Green pursues legal action remains unclear. Source