Research Engineer, Evaluations
Technologie
Fully, SchweizVor 1 MonatenBis 25.5.2026
Praktikum
Stellenbeschreibung
- Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics
- Build and maintain competitive benchmarking pipelines
- Design and run systematic experiments to measure the impact of model changes
- Onboard, curate, and maintain evaluation datasets
- Create evaluation subsets to stress-test specific capabilities and edge cases
- Define evaluation metrics for real-world performance
- Translate qualitative customer feedback into quantifiable evaluation criteria
- Work with customer-facing teams to understand pain points and convert them into research priorities
- Maintain clean evaluation pipelines and clear documentation
- Identify evaluation gaps proactively and propose solutions
- ML fundamentals: Interpret results and debug issues without training from scratch
- Strong Python skills: Write clean evaluation scripts, work with data pipelines, comfortable with SQL and cloud infrastructure
- Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor
- Voice agent stack familiarity: Understands VAD, ASR, turn detection, LLM, TTS systems interaction
- Tinkerer mentality: Preference for shipping and iterating quickly
- Communication skills: Explain technical results, summarize findings, and translate customer feedback
- Ownership mindset: Proactively fill evaluation gaps
- Work at least 3-4 hours overlapping with Eastern US Time Zone
Pay range: $210K - $260K
About AssemblyAI
Industry-leading Speech AI models to automatically recognize and understand speech.
Keywords
OCamlCloud computingPythonSqlStress TestingDebuggerDebugging
¿Te interesa este puesto?