Rajat Saini

MLOPs and AI Infrastructure Engineering

Talent

Rewāri, RewariMember since 12 August 2025

Hire this person

Send a job offer directly to this candidate

About

Expertise in architecting, developing and operating scalable, GPU-accelerated AI systems (NLP + CV), proficient in Python and Java. Experienced in cloud/on-premises infrastructure, container orchestration (Kubernetes), MLOps (Kubeflow), distributed inference and optimization (vLLM, Ray, TensorRT-LLM, Triton, ONNX, DeepSpeed, NVIDIA Model Optimizer), efficient fine-tuning (PEFT LoRA/QLoRA) and system observability.

Experience

AI Platform & Infrastructure Engineer | Senior Software Engineer — Designing and managing large-scale AI infrastructure with expertise in GPU scheduling and optimization, distributed model serving (vLLM + Ray), GPU-accelerated analytics, and production-grade deployment pipelines. I deliver full-lifecycle AI solutions—from model training and convergence to drift monitoring and high-availability production deployments—using Kubeflow, SageMaker, and custom MLOps pipelines. Skilled in fine-tuning large language models with LoRA and optimizing training using frameworks like DeepSpeed across both resource-constrained and distributed multi-node clusters.

Specialized in containerization and orchestration with Docker and Kubernetes, architecting scalable, fault-tolerant systems. Experienced in building CI/CD pipelines with Jenkins, and automating infrastructure provisioning via Terraform and Ansible.

Strong software engineering foundation across FastAPI (Python), Spring Boot (Java), and Laravel, with expertise in microservices (gRPC/REST) and high-throughput asynchronous processing (Kafka, RabbitMQ, Redis Streams). Versatile in deploying solutions across AWS, GCP, OpenStack, and on-premises environments with Proxmox virtualization.