Senior Data Engineer at Mint.ai (2026-01 – Present)
Contributing to the modernization of Mint 2.0's core advertising data platform developing scalable ingestion frameworks, data models, and analytics infrastructure to reliably process over 40,000 monthly data load operations across 12 distributed ad networks.
- Revamped the Mint 2.0's unified analytics infrastructure owning the Snowflake ingestion framework across key networks improving schema consistency, ingestion reliability, and operational isolation across reporting workloads.
- Solved critical production stability issues by developing a memory-aware distributed ingestion engine on Kubernetes completely eliminating recurring OOM failures on pods while increasing data throughput 2x to 3x across large-scale reporting workloads
- Built the data observability framework for pipelines. Instrumented Prometheus to track real-time schema drifts, low-watermark lags, and data staleness, exposing them through centralized Grafana dashboards and routing critical automated alerts to Slack apps for sub-hour incident response.
- Collaborated with stakeholders to design and deliver a centralized Snowflake + dbt analytical data layer. Enforced robust Slowly Changing Dimensions (SCD2) modeling for budget tracking, currency normalization, and media plan propagation, successfully providing stable, ML-ready features for downstream modeling teams
Senior Data Engineer at Drest (2022-11 – 2026-01)
Architect and platform owner for the end-to-end data ecosystem during a period of 10x growth, scaling the infrastructure from 3M to 30M daily events with a core architecture designed to sustain 300M+ events/day. Guided the platform's evolution from an early-stage batch system to a decoupled, near-real-time production Lakehouse
- Designed Drest's initial data stack (S3 + Airflow + Redshift + dbt) from scratch to meet immediate business reporting needs. Successfully managed the long-term architectural evolution of the platform as data volume scaled 10x, systematically onboarding new high-scale data sources and third-party APIs without service interruption
- Transitioned the core transformation layer from Redshift-native dbt to dbt + Trino on Apache Iceberg, effectively decoupling compute from the warehouse storage layer. Migrated historical and streaming event data into S3 Iceberg tables using AWS Glue for catalog management and incremental compaction retaining Redshift solely for aggregated reporting
- Replaced fragile, batch-scheduled file drops with an event-driven architecture powered by Apache Kafka, enabling deterministic replay across integrations. Built Spark Structured Streaming applications on AWS EMR to handle burst traffic for sessionization and real-time feature extraction, accelerating data availability from overnight batch jobs to near-real-time metrics.
- Deployed a targeted Apache Kafka to ClickHouse real-time streaming pipeline to support high-concurrency "Brand Day" live events, handling sudden 10x to 15x traffic bursts over a 24-hour window. Designed a denormalized schema utilizing ClickHouse Materialized Views to shift heavy aggregations to write-time, enabling sub-minute event-to-dashboard latency and allowing business teams to make live, data-driven catalog decisions during peak operational spikes. Responsible for the decommissioning of the cluster post-event to optimize cloud spend.
- Established data quality guardrails by introducing schema versioning and backward-compatible event contracts at the Kafka layer, eliminating downstream model breakage caused by upstream producer changes. Implemented end-to-end SLA monitoring across the entire pipeline, maintaining a 99.9% success rate and leveraging Iceberg time-travel queries to recover from data corruption incidents in under 30 minutes
Senior Data Engineer at Superside (2021-10 – 2022-11)
Lead the end-to-end modernization of Superside's legacy analytical stack. Owned the architectural roadmap, data validation strategy, and platform maintenance for over 12 months post-migration, successfully moving the company toward a centralized, cloud-native data architecture.
- Managed the complete migration from a fragmented legacy stack (AWS Glue + Druid + ad-hoc SQL) to a centralized Snowflake, dbt, and Airflow platform. Consolidated siloed business and reporting logic into version-controlled, fully tested dbt models, successfully eliminating 70% of legacy AWS Glue batch job dependencies and cutting operational overhead.
- Optimized data warehouse performance and compute utilization within Snowflake. Implemented strategic warehouse clustering and resizing frameworks that reduced analytical query runtimes by 40% to 60% across the organization's core data workloads.
- Built the foundational data governance and security frameworks for the data team. Introduced multi-environment isolation (Dev/Staging/Prod) and RBAC to enable safe concurrent development across a globally distributed team, while executing historical data migrations with multi-layer validation checks (row reconciliation, aggregate matching, and anomaly detection) to achieve zero post-cutover data discrepancies.
Data Engineer at ComplyAdvantage (2019-07 – 2021-10)
Core data and platform engineer responsible for scaling high-throughput ingestion networks and production ML compute environments. Architected streaming pipelines and real-time indexing models to process and serve an active corpus of 10M+ risk and financial sanctions entities.
- High-Throughput Streaming Systems: Engineered high-throughput, backpressure-aware Kafka ingestion pipelines for real-time global news feeds, eliminating system ingestion bottlenecks while processing millions of daily entity updates.
- Low-Latency Search & Reliability Infrastructure: Tuned a massive Elasticsearch index cluster to support low-latency search capability across 10M+ processed entities; implemented rigorous schema governance for event topics, reducing downstream pipeline breakage by over 80%.
- Distributed ML Compute Infrastructure: Optimized distributed Spark applications to handle complex entity resolution and automated text extraction in the production-grade data streams required to feed downstream machine learning classification and risk-scoring models.
Data Engineer at BlueJeans Network (2018-12 – 2019-07)
- Built Spark-based batch processing pipelines for usage metrics and system telemetry supporting product analytics and infrastructure monitoring.
- Optimized Redshift queries and introduced workload isolation, improving analytics reliability and reducing query contention for downstream reporting teams.
Lead Analytics Engineer at Belong (2015-11 – 2018-12)
Foundational data engineer establishing, scaling, and maintaining the company's entire analytics ecosystem from the ground up. Built the core data platform architecture and managed a growing team of engineers.
- Designed core dimensional schemas for user acquisition, retention, and revenue operations, establishing the business's definitive single source of truth. Maintained complete operational ownership of the platform for 24+ months, executing schema evolution and historical backfills with zero downstream disruption.
- Managed and mentored a team of 3 data engineers, defining company-wide SQL/dbt data modeling standards, Git workflows, and technical code review processes.
Data & Analytics Roles at EY / HSBC / Infosys (2012-07 – 2015-11)
Enterprise Consulting & Banking
- Built ETL workflows and regulatory/financial reporting pipelines using SQL, Python, and enterprise data warehouses.
- Contributed to risk analytics and compliance data modeling, collaborating closely with business stakeholders to translate ambiguous KPIs into validated datasets.