Skip to main content

Deep Learning Solution Architect

Technology
英伟达
北京市, 中国1个月前截至 2026/6/6
全职

职位描述

NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers!

We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.

What You Will Be Doing

  • Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
  • Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
  • Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
  • Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
  • Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).

What We Need To See

  • Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
  • 4 years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
  • Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
  • Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
  • Competency in agentic inference design and using AI agents to solve business challenges.
  • Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.

Ways To Stand Out From The Crowd

  • Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
  • Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
  • Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
  • In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.

#deeplearning, , JR2015520

Keywords
monthsOfExperience: 48OrchestrationPyTorchScigressGNU parallelDEMOSCluster analysisDeep learningQuantizationData clusterInterSystems CachéNemoParallelDockerKubernetes

¿Te interesa este puesto?