Data Engineer - Epsilon - Boston, MA
(2024-07)
- Led design and deployment of scalable ETL/ELT pipelines on Azure Data Lake and Azure databases for Property and Casualty insurance claims data, integrating GenAI models for enhanced data enrichment and reducing processing time by 30%.
- Architected Delta Lakehouse on Snowflake with partitioning, incremental updates, schema enforcement, and version control, which improved data-integration reliability and cut query latency, enabling faster data access for analysts
- Developed interactive analytics platforms leveraging Agentic AI for self-service reporting with Power BI, Tableau, and Looker, delivering real-time KPIs to business units.
- Implemented data governance, lineage, and compliance using Unity Catalog and SAS tokens, ensuring enterprise standards.
- Automated Azure Data Factory and Databricks workflows with CI/CD for efficient continuous deployment.
- Led code reviews and mentored junior engineers, improving pipeline quality and reducing data errors by 20%.
- Collaborated with architects to define KPIs and design data models, enabling actionable insights across teams and supporting business development consulting initiatives.
- Implemented cost-optimized Snowflake workloads by tuning virtual warehouse sizes, clustering strategies, and caching, reducing compute spend while maintaining strict SLA adherence for P&C insurance analytics.
Jr. Data Engineer - Publicis Groupe - NYC, NY
(2021-06 - 2023-08)
- Engineered PySpark and SQL ETL/ELT pipelines on Azure Databricks for large-scale batch and near real-time processing, enabling downstream analytics teams to access fresh data faster
- Orchestrated ingestion and transformation workflows via Azure Data Factory, improving reliability and monitoring with Azure Monitor.
- Optimized ADLS Gen2 storage using partitioning, lifecycle management, and curated dataset modeling to support analytics teams.
- Enforced data quality through validation rules, reconciliation processes, and automated checks across pipelines, ensuring accurate data and lowering error rates for downstream consumers
- Improved pipeline performance with modular, parameterized code and parallel execution strategies.
- Collaborated with business teams in utilities energy public services to onboard new data sources using Azure Data Factory and documented end-to-end data workflows in Confluence, enabling faster integration and improving data availability for analytics
- Designed reusable data models and conformed dimensions to standardize customer and policy views across business units, simplifying downstream reporting and machine learning use cases.
Data Engineer (Internship) - Verizon - Irving, TX
(2020-12 - 2021-05)
- Assisted in developing ETL pipelines using Python and AWS Lambda for small- to medium-scale datasets, applying data pipeline design principles to support analytics initiatives.
- Supported S3 data management and AWS Glue workflows for reliable ingestion and transformation.
- Participated in cleaning, validating, and preparing datasets to enable accurate reporting and dashboard generation.
- Assisted in optimizing data models and warehouse structures in Amazon Redshift, improving query performance.
- Monitored CloudWatch logs and assisted in troubleshooting job failures to maintain pipeline reliability and SLA compliance.
- Created documentation for data flows, pipeline processes, and operational procedures using Confluence and Markdown, enabling the team to understand and maintain ETL jobs more easily
- Collaborated with data scientists and analysts to provision curated datasets in Redshift, accelerating experimentation and deployment of predictive analytics solutions.