Apple Services Engineering (ASE) powers the AI and LLM features behind experiences that hundreds of millions of users love every day. As these systems increasingly rely on human-in-the-loop evaluation, the quality of our products is directly constrained by the quality of our evaluation systems. We b