A 21-year-old building production-grade data systems by day and expanding into every adjacent layer of the data stack by night.
Send a job offer directly to this candidate
Data Engineer with production experience building large-scale ETL/ELT pipelines and analytics platforms on Databricks, Apache Spark, and Delta Lake. Delivered end-to-end data systems across financial services, healthcare, and gaming verticals, processing 60+ TB across 1,500+ tables. Alongside core engineering work, actively developing skills across the analytics stack: Python (Pandas, NumPy, Matplotlib, Seaborn), advanced SQL (SQL Server), Power BI (DAX, dashboards), Tableau, and applied statistics — building capability to operate across both the engineering and insight layers of data.
Pursuing dual degrees in Computer Science (G.B. Pant DSEU, 2026) and Data Science (IIT Madras, 2027) alongside full-time employment.
Data Engineer June 2025 – Present
Eucloid Data Solutions · Gurugram, India
Enterprise Financial Analytics Platform — BFSI Client (primary data engineer, 7-month engagement)
– Primary data engineer on a shared Databricks platform: ~1,500 tables, 60+ TB of structured financial data, individual tables up to 500 GB / 300B rows.
– Architected Bronze → Silver → Gold medallion pipeline framework adopted across the engineering team.
· Ingestion patterns: full-load, incremental, overwrite, and merge.
· SCD Type 1 & 2 at production scale with lineage tracking via Delta Live Tables.
– Engineered a metadata-driven framework generating Databricks SQL and PySpark logic automatically.
· Eliminated manual query writing for new pipeline onboarding.
· Reduced table onboarding time; framework adopted across team workflows.
– Optimised Spark workloads on tables in the hundreds-of-billions-of-rows range.
· Reduced shuffle read volumes by re-architecting join strategies, applying partition pruning and predicate pushdown.
· Diagnosed data skew and tuned execution plans via Spark UI — estimated 30–50% improvement in pipeline runtimes on critical workflows.
– Delivered multi-layer validation framework (row count, aggregation, schema reconciliation) across all medallion layers.
– Managed pipeline infrastructure as code via Terraform; primary contributor to 6 of 14 enterprise data marts.
B.Tech — Computer Science & Engineering Expected June 2026
G.B. Pant DSEU, New Delhi · CGPA: 8.3 / 10
B.Sc — Programming & Data Science Expected June 2027
IIT Madras · Pursuing alongside full-time employment