Rio Deb

A 21-year-old building production-grade data systems by day and expanding into every adjacent layer of the data stack by night.

Talent

NoidaMember since 18 May 2026

databricksApache Sparkpysparketletl developerTerraformpythonpython developerbash scriptinglangchainollamacomputer sciencellmgen-aigenaigenai engineerpandasnumpysqlsql developerpower bipower bi developerdata engineerdata analystdata analysisdata scientistexcel

Hire this person

Send a job offer directly to this candidate

databricksApache Sparkpysparketletl developer

About

Data Engineer with production experience building large-scale ETL/ELT pipelines and analytics platforms on Databricks, Apache Spark, and Delta Lake. Delivered end-to-end data systems across financial services, healthcare, and gaming verticals, processing 60+ TB across 1,500+ tables. Alongside core engineering work, actively developing skills across the analytics stack: Python (Pandas, NumPy, Matplotlib, Seaborn), advanced SQL (SQL Server), Power BI (DAX, dashboards), Tableau, and applied statistics — building capability to operate across both the engineering and insight layers of data.

Pursuing dual degrees in Computer Science (G.B. Pant DSEU, 2026) and Data Science (IIT Madras, 2027) alongside full-time employment.

Experience

Data Engineer June 2025 – Present

Eucloid Data Solutions · Gurugram, India

Enterprise Financial Analytics Platform — BFSI Client (primary data engineer, 7-month engagement)

– Primary data engineer on a shared Databricks platform: ~1,500 tables, 60+ TB of structured financial data, individual tables up to 500 GB / 300B rows.

– Architected Bronze → Silver → Gold medallion pipeline framework adopted across the engineering team.

· Ingestion patterns: full-load, incremental, overwrite, and merge.

· SCD Type 1 & 2 at production scale with lineage tracking via Delta Live Tables.

– Engineered a metadata-driven framework generating Databricks SQL and PySpark logic automatically.

· Eliminated manual query writing for new pipeline onboarding.

· Reduced table onboarding time; framework adopted across team workflows.

– Optimised Spark workloads on tables in the hundreds-of-billions-of-rows range.

· Reduced shuffle read volumes by re-architecting join strategies, applying partition pruning and predicate pushdown.

· Diagnosed data skew and tuned execution plans via Spark UI — estimated 30–50% improvement in pipeline runtimes on critical workflows.

– Delivered multi-layer validation framework (row count, aggregation, schema reconciliation) across all medallion layers.

– Managed pipeline infrastructure as code via Terraform; primary contributor to 6 of 14 enterprise data marts.