You’ll be stress-testing the world’s most advanced models to see where they break. Your work will directly impact how frontier LLMs handle complex, multilingual tasks. Your work will be supervised by our in-house research staff Evaluate & Benchmark: Run rigorous evaluations on frontier LLMs and