About the Role We're evaluating and comparing the performance of frontier AI coding agents on real-world software engineering tasks. Your job is to craft challenging, realistic prompts based on actual open-source pull requests, then judge which model produces better results. Your evaluations directl