AI Engineer at Bi42 (2024-05 – 2026-01)
Developed and optimized advanced AI solutions including RAG pipelines, LLM fine-tuning, and production-grade multi-agent systems
- Optimized multi-source RAG pipelines utilizing Python & LangChain to facilitate contextual retrieval across diverse multimodal datasets, enhancing analytics capabilities & production Q&A processes
- Designed structured prompt + parsing frameworks to improve reliable field extraction in Tally accounting workflows (keyword/report/config/period style tasks)
- Led LLM fine-tuning optimization using QLoRA + DDP multi-GPU training, tuning LoRA rank/alpha and training strategy to reach ~90% accuracy on domain evaluation
- Implemented vLLM-based inference + batching on NVIDIA H100, reducing validation runtime from ~1 hour → ~7 seconds through throughput-optimized serving
- Improved fine-tuned model quality via DPO, delivering ~20% gains on internal evaluation metrics and better cross-split generalization
- Deployed open-source LLMs via Hugging Face + llama.cpp, tuning decoding (greedy/temperature/top-k/top-p) to balance determinism vs creativity across production scenarios
- Worked hands-on with NVIDIA H100/H200 for large-scale fine-tuning, quantized inference, and multi-model orchestration under latency constraints
- Built production-grade multi-agent systems using LangChain + CrewAI, enabling autonomous multi-step reasoning, tool execution, and task routing
- Delivered an agentic workflow for a Domino's client to automate decisioning and execution in video analytics + dense-caption RAG pipelines
- Implemented low-latency vector retrieval using Qdrant (text) and FAISS (image), focusing on high-throughput indexing and retrieval
- Engineered a hybrid search stack combining dense retrieval with keyword + metadata filtering to improve relevance and reduce false positives
- Built conversational interfaces for interactive data exploration, translating natural-language intent into structured retrieval and analysis actions
- Conducted Video-LLM R&D by fine-tuning InternVL / Qwen-VL for security event understanding (e.g., theft and unsafe behavior) in surveillance footage
- Co-invented a patent for ultra-compact transformer deployment on embedded devices for real-time anomaly detection and alerting in remote oil & gas operations
- Led R&D on domain tuning + compression (distillation, pruning, 8/4/2-bit quantization) and resource-aware inference (dynamic quantization switching)
AI Engineer - Intern at Bi42 (2024-02 – 2024-04)
Developed knowledge graphs and RAG pipelines for intelligent data retrieval and multi-format data integration
- Developed Neo4j-based knowledge graphs for intelligent data retrieval and relationship mapping
- Built Retrieval-Augmented Generation (RAG) pipelines integrating multiple data formats (Excel, PDF, Word, SQL databases)
- Optimized text-to-SQL models (Mistral 7B, LLaMA2 8B) on the CCED dataset; demonstrating a proof-of-concept impact with QLoRA methodology & synthetic dataset creation