Skip to main content

i18n Evaluation Lead — Program & Engineering

Technology
Raindrops Technology
Mountain View, United States1 months agoUntil 4/4/2026
On-site

Job description

Quantryx.ai ( Raindrops Technology) is hiring an i18n Evaluation Lead to own both program delivery and operations engineering for a multilingual AI evaluation engagement supporting a leading technology company's conversational AI product. This is a lead + hands-on engineering role — you'll manage a team of Linguistic QA Analysts while building and maintaining the automated evaluation pipeline that powers the entire operation.all. What You'll Do

Program Leadership

Own end-to-end program delivery for multilingual evaluation across 7 locales

Serve as primary client interface with the technology partner's program team

Manage and mentor a team of Linguistic QA Analysts across Spanish (US & MX), French (Canada), Italian, Portuguese (Brazil), and Japanese

Design and maintain multi-dimension rating questionnaires and calibration protocols

Ensure inter-rater reliability standards are met across all dimensions and locales

Deliver headroom reports with actionable recommendations to client stakeholders

Conduct quarterly business reviews and rubric alignment sessions

Engineering & Tooling

Design, build, and maintain the end-to-end evaluation pipeline: query generation, model invocation, response capture, rating UI, and statistical analysis

Develop and operate the rating UI used by Linguistic QA Analysts

Implement automated headroom calculation, delta tracking, and report generation

Build real-time dashboards for quality scores, reliability metrics, and trend analysis

Manage data infrastructure: version-controlled query sets, encrypted eval data, audit logs

Implement blind double-rating protocols and adjudication workflows in tooling

Ensure pipeline scalability for large-volume multilingual evaluation runs

What We're Looking For

7+ years in NLP evaluation, i18n quality, or language technology — with at least 3 years in a leadership or program management capacity

Strong software engineering skills: Python and/or Node.js for pipeline automation

Experience managing multilingual evaluation or localization teams

Statistical literacy: hypothesis testing, confidence intervals, inter-rater reliability metrics (Cohen's kappa)

Experience building rating/annotation UIs or evaluation tooling for human assessment

Database design (SQL + NoSQL) and cloud infrastructure experience (GCP preferred)

Familiarity with CI/CD, data visualization frameworks, and dashboard tooling

Excellent client-facing communication and presentation skills

Bachelor's degree required; Master's in Linguistics, Computational Linguistics, CS, or related field preferred

Nice to Have

Experience with Google's evaluation methodologies or vendor program structures

Background in both program management and engineering (rare but ideal for this role)

Familiarity with multimodal AI evaluation

Keywords
NLP Evaluationi18n QualityLanguage TechnologySoftware EngineeringPythonNode.jsStatistical LiteracyDatabase DesignCloud InfrastructureCI/CDData VisualizationClient-Facing CommunicationPresentation SkillsLinguistic QAAutomated EvaluationMultilingual EvaluationNLPi18nStatistical AnalysisQuality AssuranceProgram ManagementConversational AI

¿Te interesa este puesto?