Sanchit Jain

AI Engineer at Bi42 (2024-05 – 2026-01)

Developed and optimized advanced AI solutions including RAG pipelines, LLM fine-tuning, and production-grade multi-agent systems

Optimized multi-source RAG pipelines utilizing Python & LangChain to facilitate contextual retrieval across diverse multimodal datasets, enhancing analytics capabilities & production Q&A processes
Designed structured prompt + parsing frameworks to improve reliable field extraction in Tally accounting workflows (keyword/report/config/period style tasks)
Led LLM fine-tuning optimization using QLoRA + DDP multi-GPU training, tuning LoRA rank/alpha and training strategy to reach ~90% accuracy on domain evaluation
Implemented vLLM-based inference + batching on NVIDIA H100, reducing validation runtime from ~1 hour → ~7 seconds through throughput-optimized serving
Improved fine-tuned model quality via DPO, delivering ~20% gains on internal evaluation metrics and better cross-split generalization
Deployed open-source LLMs via Hugging Face + llama.cpp, tuning decoding (greedy/temperature/top-k/top-p) to balance determinism vs creativity across production scenarios
Worked hands-on with NVIDIA H100/H200 for large-scale fine-tuning, quantized inference, and multi-model orchestration under latency constraints
Built production-grade multi-agent systems using LangChain + CrewAI, enabling autonomous multi-step reasoning, tool execution, and task routing
Delivered an agentic workflow for a Domino's client to automate decisioning and execution in video analytics + dense-caption RAG pipelines
Implemented low-latency vector retrieval using Qdrant (text) and FAISS (image), focusing on high-throughput indexing and retrieval
Engineered a hybrid search stack combining dense retrieval with keyword + metadata filtering to improve relevance and reduce false positives
Built conversational interfaces for interactive data exploration, translating natural-language intent into structured retrieval and analysis actions
Conducted Video-LLM R&D by fine-tuning InternVL / Qwen-VL for security event understanding (e.g., theft and unsafe behavior) in surveillance footage
Co-invented a patent for ultra-compact transformer deployment on embedded devices for real-time anomaly detection and alerting in remote oil & gas operations
Led R&D on domain tuning + compression (distillation, pruning, 8/4/2-bit quantization) and resource-aware inference (dynamic quantization switching)

AI Engineer - Intern at Bi42 (2024-02 – 2024-04)

Developed knowledge graphs and RAG pipelines for intelligent data retrieval and multi-format data integration

Developed Neo4j-based knowledge graphs for intelligent data retrieval and relationship mapping
Built Retrieval-Augmented Generation (RAG) pipelines integrating multiple data formats (Excel, PDF, Word, SQL databases)
Optimized text-to-SQL models (Mistral 7B, LLaMA2 8B) on the CCED dataset; demonstrating a proof-of-concept impact with QLoRA methodology & synthetic dataset creation

Hire this person

About

Experience

Education

Skills

Reviews

Similar people near Bengaluru

Abhitesh Bhardwaj

Hemangi Chhaya

Adnan Ahmed Khan

Kausthub Murthy

Anushka Umate

Dharmik Barot

Other similar people

Donthi Karthik Goud

Trishanjit Dhar

Zeeshan Ghaniwala

Osho Upadhyay

Sanjana Patel

Aditya Apate

Related