AI / RAG System Intern at Zhironghe, Xiamen, Fujian (2025-06 – 2025-08)
Built retrieval-side components for an IP-domain RAG prototype supporting repeatable search across patent and case materials for internal research workflows.
- Built retrieval-side components for a 4-person IP-domain RAG prototype, supporting repeatable search across patent and case materials for internal research workflows.
- Developed a natural-language-to-query translation module that converted user intent into constrained Boolean-style expressions over a 10M+-document patent/case corpus, typically narrowing first-pass retrieval to ~1K candidates per query for downstream grounding and improving early-result relevance in internal evaluation.
- Structured a two-stage retrieval pipeline that transformed noisy large-corpus retrieval into manageable candidate sets for downstream grounding, reducing irrelevant recall and improving answer traceability.
- Integrated workflows with Dify, RAGFlow, and Docker-based local environments to support reproducible testing, debugging, and rapid iteration across retrieval and generation components.
Researcher (Remote) at Singapore University of Technology and Design (2025-04 – 2025-06)
Built a split-safe regression pipeline for supercapacitor capacitance prediction from composition and test variables on a 620-sample dataset.
- Built a split-safe regression pipeline for supercapacitor capacitance prediction from composition and test variables on a 620-sample dataset, covering preprocessing, outlier treatment, normalization, and holdout-based evaluation.
- Benchmarked baseline, boosting, and stacked models under a fixed train/validation/test protocol to identify approaches that generalized reliably in a small-data setting.
- Engineered 10 domain-informed features spanning doping summaries, interaction terms, and nonlinear surface/defect descriptors to strengthen predictive signal and interpretability.
- Improved holdout performance from RMSE 42.99 / R² 0.7680 to RMSE 26.85 / R² 0.9095 in the best run; used SHAP and residual diagnostics to explain model behavior and validate consistency.