ML Engineer - Speech recognition
職務内容
About the Job
About the Company
Established in 2020 as a spin‑off from a major Japanese corporate group, we aim to democratize mobility by enabling anyone to "move" to remote locations instantly and sustainably. We develop core technologies that transmit human presence and skills in real time through robots and diverse mobility solutions.
Our flagship product is a mobile, communication‑focused AI avatar robot. Its rollout is the first step toward an instantaneous, sustainable, and inclusive mobility network that connects people with places and experiences. Since 2021, we've expanded from aquariums and museums into public spaces such as airports, hotels, hospitals, government offices, train stations, and commercial facilities in Japan and worldwide.
Position Summary (AI Engineer, Speech AI - AI & Robotics Team)
As an AI Engineer on the AI & Robotics team, you will add new AI capabilities to our flagship avatar robot and help design, build, and optimize real‑time voice chatbots for customer service. We build solutions that raise partners' service efficiency, leveraging traditional and deep‑learning computer vision, automatic speech recognition (ASR), text‑to‑speech (TTS), and retrieval‑augmented generation (RAG).
As a Speech AI specialist, you will work on:
- ASR modeling
- Voice activity detection (VAD)
- Language identification
- Emotion recognition
- Speaker diarization
- Speech noise reduction and audio cleaning
Responsibilities
- Implement speech‑processing pipelines for customer projects
- Design, build, and maintain real‑time voice chatbots for customer service (streaming ASR/TTS + dialog orchestration), ensuring robust performance in noisy environments
- Optimize for accented/non‑native speech, code‑switching, barge‑in/over‑talk, and overlapping speech scenarios
- Establish evaluation protocols and continuous improvement loops for accuracy, latency, and robustness
- Research and apply the latest advances in machine learning
- Build high‑performance, maintainable code that can be deployed to thousands of robot units
Must‑Have Skills
- Production experience deploying Speech AI systems, including:
- ASR
- Speaker diarization
- Speech emotion recognition
- ASR‑specific expertise, including:
- Model distillation
- Model evaluation
- Fine‑tuning methods (with a focus on accented English, e.g., Chinese‑accented English, code‑switching, and noisy real‑world audio)
- Demonstrated ability to handle barge‑in/over‑talk and overlapping speech via endpointing, VAD tuning, and diarization‑aware pipelines
- Experience with streaming/real‑time architectures and latency‑sensitive speech systems
- Strong programming ability in Python and solid knowledge of C/C++
- Speech AI fundamentals: speech pre‑processing, VAD, speaker diarization
- Hands‑on with Speech AI libraries, such as the Hugging Face ecosystem, OpenAI Whisper, and NVIDIA NeMo
- Professional fluency in English (team communication is conducted in English)
Nice‑to‑Have Skills
- Master's degree in Computer Science or a deep‑learning‑related field
- Knowledge of distributed systems, cloud, or HPC
- Solid software‑engineering foundations (system design, testing, debugging)
- Understanding of NVIDIA technologies (CUDA / TensorRT / Triton)
Team Culture
- Highly collaborative communicators with a growth mindset
- Able to contribute in autonomous, cross‑functional teams
- Enjoy a fast pace of learning, experimentation, and continuous improvement
- Engineering mindset with focus on performance and robustness
- Proactive, positive ownership
- Day‑to‑day development communication is conducted in English
Working Conditions
- Flexible hours: 8 hours/day, 5 days/week (availability window 7:00-22:00)
- Hybrid: 2 remote days/week (up to 4 based on performance)
- Consecutive leave program (up to 1 month)
- Monthly/quarterly company‑sponsored team lunches & dinners
- Company‑wide recreation events (lunch gatherings, BBQs, offsites, etc.)
- Technical team operates fully in English
Benefits
- Annual paid leave: 15 days (carryover up to 2 years)
- Commuting allowance (nearest home station to office)
- Housing allowance: JPY 30,000/month (within 5 km of the office)
- Child allowance: JPY 10,000/month per child (up to 2 children, through age 14)
- Learning & self‑development support: up to JPY 30,000/year (career‑relevant uses)
- Late‑night work allowance (after 22:00)
- Social insurance provided (health, pension, unemployment; 50% employee / 50% company cost share)
- Parental leave (prenatal/postnatal & childcare; up to 1 year after 1 year of employment)
¿Te interesa este puesto?