Why Are We Hiring for this Role Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders. Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning. Integrate LLM-based reasoning with acti