Apple AI Updates: June 24, 2026

1. Apple research finds LLM evaluation panels deliver far less independent signal than expected

Apple. Apple researchers published work showing that a panel of nine large language model judges effectively contributes only about two independent votes’ worth of information, because correlated errors across the models undermine the panel. The paper reports that the resulting panel accuracy falls short of what truly independent voting would achieve, and that a single strong judge often outperforms the full panel. The findings suggest that adding more models cannot overcome shared decision-making patterns in LLM-based evaluation. Source

2. Apple studies annotation saturation when learning from label distributions

Apple. Apple Machine Learning Research released a paper on metric-dependent annotation saturation for learning from label distributions. The work examines how the number of annotations required for reliable training depends on the evaluation metric being optimized. It offers guidance on when additional labels stop improving model quality for label-distribution learning tasks. Source