AI News: May 3, 2026

1. ARC Prize Foundation Names Three Reasoning Failure Modes Common to Frontier Models

ARC Prize Foundation. A new analysis of 160 game runs from GPT-5.5 and Claude Opus 4.7 on ARC-AGI-3 finds both models stuck below 1% accuracy (0.43% and 0.18% respectively) and isolates three recurring failure modes. Models fail to stitch local observations into a coherent world model, misidentify novel environments as familiar games from their training data such as Breakout or Tetris, and do not validate hypotheses when a guess happens to work. The throughline is that frontier reasoning is still pattern matching against training-data analogues rather than constructing fresh causal models, which has direct implications for any agent system being evaluated on tasks outside its pretraining distribution. Source

2. The Academy Bars AI-Generated Actors and Scripts From Oscar Eligibility

AMPAS. The Academy of Motion Picture Arts and Sciences updated its eligibility rules to disqualify AI-generated performances and AI-written scripts from Oscar consideration, formalizing a policy line that several guilds had already drawn in their own contracts. The rule does not bar the use of AI in production workflows broadly, but it does require that performances and screenplays be substantively human-authored to qualify. The decision is the highest-profile institutional rejection of AI-generated creative output to date, and is likely to be referenced in upcoming labor negotiations and award-show eligibility frameworks across the industry. Source

3. Replit Opens a 24-Hour Window of Free Agent Access for Its 10th Anniversary

Replit. Replit marked its 10-year anniversary with a 24-hour promotion on May 2 that removed usage caps on its Agent product for all accounts, free or paid. The Agent autonomously writes code, provisions databases, fixes bugs, and deploys applications, and the company paired the giveaway with a $100K Buildathon and renewed enterprise positioning. The promotion is also a competitive jab at the AI coding market — Cursor, Claude Code, Codex — at a moment when several incumbents are testing pricing and packaging changes of their own. Source