Seeking *experienced researchers and technical experts** to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in *data science, machine learning, finance, and coding** to help identify reasoning and problem-solving gaps