Machine Learning Evaluator
hackajobJob description
- hackajob
is collaborating with
- *Moody's Corporation**
At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s opportunities. We do this by striving to create an inclusive environment where everyone feels welcome to be who they are—with the freedom to exchange ideas, think innovatively, and listen to each other and customers in meaningful ways. Moody’s is transforming how the world sees risk.
As a global leader in ratings and integrated risk assessment, we’re advancing AI to move from insight to action—enabling intelligence that not only understands complexity but responds to it. We decode risk to unlock opportunity, helping our clients navigate uncertainty with clarity, speed, and confidence.
If you are excited about this opportunity but do not meet every single requirement, please apply! You still may be a great fit for this role or other open roles. We are seeking candidates who model our values: invest in every relationship, lead with curiosity, champion diverse perspectives, turn inputs into actions, and uphold trust through integrity.
- *Skills And Competencies
- Ph.D. in Computer Science, Machine Learning, Natural Language Processing, Statistics, or a related quantitative field; or Master’s degree with 2-3 years of experience in machine learning evaluation or a related area
- Strong foundations in statistical methods, experimental design, and hypothesis testing
- Experience evaluating machine learning or NLP models, including designing experiments and interpreting results
- Familiarity with LLM evaluation benchmarks and methodologies
- Strong programming skills in Python or R
- Excellent communication skills in English (both written and verbal)
- *Preferred
- Experience evaluating LLMs or generative AI systems
- Experience with production machine learning systems
- Exposure to cloud platforms such as AWS, GCP, or Azure
- Publications or demonstrated work in model evaluation, benchmarking, or related areas
- *Education
- Ph.D. in Computer Science, Machine Learning, Natural Language Processing, Statistics, or a related quantitative field; or Master’s degree with 2-3 years of experience in machine learning evaluation or a related area
- *Responsibilities
- Evaluate and validate large language models for production-grade analytical and decision-support systems
- Design and implement evaluation frameworks for assessing LLM performance in credit analytics and decision-support contexts
- Develop metrics and benchmarks to measure model robustness, reliability, consistency, and output quality
- Analyze model behavior across diverse inputs, identifying failure modes, edge cases, and areas for improvement
- Collaborate with model development and deployment teams to integrate validation processes into the model lifecycle
- Conduct systematic assessments of model stability over time and across updates
- Evaluate model outputs for bias, fairness, and economic relevance to credit risk applications
- Develop and maintain documentation for evaluation methodologies, findings, and recommendations
- Contribute to the advancement of best practices for LLM evaluation within the Credit COE
- *About The Team**
¿Te interesa este puesto?