Chinmay Naik

Data Engineer - Exusia India Pvt Ltd - Pune

(2022-12)

Designed and implemented a Medallion Architecture (Bronze, Silver, Gold) using Databricks and Delta Lake.
Built scalable ingestion pipelines using Databricks Auto Loader for incremental loading of JSON and CSV files.
Developed transformation pipelines using Delta Live Tables (DLT) to automate ETL workflows and enforce data quality rules.
Processed multi-format datasets including JSON (Customers, Orders), CSV (Addresses, Payments), and image files (Membership cards ingested and stored as part of the Lakehouse pipeline).
Integrated Azure SQL Database to ingest external Refunds data into the Lakehouse architecture.
Implemented data validation rules within DLT pipelines for schema enforcement, null handling, and duplicate removal.
Designed Silver-layer datasets by performing cleansing, standardization, and business rule-based transformations.
Created Gold-layer aggregated datasets for customer analytics, order insights, payment tracking, and refund analysis.
Applied Delta Lake features including ACID transactions, schema evolution, and time travel for reliable data processing.
Established relational data models using primary and composite key relationships across entities.
Optimized PySpark transformations using partitioning and DataFrame optimization techniques for performance improvement.
Built end-to-end data pipelines using Databricks notebooks integrated with Auto Loader and DLT workflows.
Performed data validation using Databricks SQL to ensure accuracy and consistency across pipeline layers.
Used Git-based deployment workflows with Databricks Asset Bundles (DABs) for managing and deploying data pipelines.

Data Engineer SSR/ADV - Datavant

DATAVANT - Redshift to Snowflake Migration. Migrated a legacy Amazon Redshift-based data warehouse to Snowflake for an insurance domain client to improve scalability, performance, and reporting efficiency. Data from multiple source systems including NetSuite, ERP, Oracle, and PostgreSQL was ingested using Fivetran, processed through a modern medallion architecture (bronze, silver, gold layers), and served to business users via Sigma dashboards.

Designed and implemented Snowflake ELT architecture using Procedures, Functions, Pipes, Tasks, Masking Policies and event-based scheduling.
Built and maintained dbt models following a Medallion architecture (bronze, silver, gold layers).
Developed incremental models and snapshots to support SCD Type 2 historical tracking.
Created reusable dbt macros for column masking and dynamic model generation.
Optimized SQL transformations by eliminating redundant joins and improving dbt model performance.
Performed data validation and reconciliation across Redshift and Snowflake environments.
Led code reviews, mentored junior developers, and ensured production ready deployments across release cycles.
Managed production issues and supported release and deployment activities.

Senior Analyst, Associate - Datavant

DATAVANT - NetSuite to Redshift migration. Developed and maintained a data integration and reporting platform for an insurance-domain client. The platform extracts live patient and financial data from NetSuite and ERP systems, applies business-specific transformation rules, and generates billing and revenue-related metrics such as records billed, pages processed, requests received, revenue generated, accounts receivable, daily revenue, and general ledger reporting.

The extracted data is loaded into Amazon Redshift staging tables and transformed into dimensional and fact tables based on business requirements. The processed data is consumed through Power BI dashboards and IBM Cognos reports by sales and business teams for billing validation, revenue analysis, and operational reporting. As part of a source system migration, the existing ELT framework was enhanced to replace Oracle-based data extraction with NetSuite and ERP integrations.

The extraction logic, orchestration workflows, and downstream reporting processes were modified to ensure accurate data processing and seamless report generation.

Designed and developed ELT pipelines integrating NetSuite and ERP data sources with Amazon Redshift to support business reporting and billing processes.
Modified and enhanced existing extraction and transformation logic to support migration from Oracle-based source systems to NetSuite and ERP platforms.
Developed and maintained Apache Airflow DAGs for workflow orchestration, scheduling, monitoring, and error handling of ELT processes.
Migrated Airflow workloads to AWS MWAA (Managed Workflows for Apache Airflow) and optimized workflows for scalability and maintainability.
Implemented data chunking mechanisms to process large datasets efficiently, improve performance, and prevent authentication token expiration issues.
Performed extensive unit testing, integration testing, and data validation to ensure data accuracy and alignment with business requirements.
Designed optimized Redshift data loading and transformation processes, improving reporting performance and data availability.
Supported Power BI and IBM Cognos reporting systems by ensuring timely and accurate data availability for business users.
Resolved production issues, performed root cause analysis, and implemented fixes during deployment and post-production support activities.
Led technical discussions, reviewed code, mentored team members, and provided guidance on development best practices.
Addressed database deadlock issues by implementing scheduling controls and dependency management to prevent concurrent access to shared tables.
Utilized Python, AWS S3, Amazon Redshift, AWS MWAA, Git, PostgreSQL, Oracle, NetSuite, and ERP systems throughout the project lifecycle.

Sr. Analyst - CIOXHealth

CIOXHEALTH - Oracle to Redshift migration. Worked on a large-scale insurance domain data platform responsible for processing patient and financial data from multiple Oracle source systems. The platform generated key business metrics such as records billed, pages processed, requests received, revenue generated, accounts receivable, daily revenue, and general ledger reports.

Multiple ELT processes were developed to support business and accounting operations. The processed data was consumed by sales and business teams through Power BI dashboards and IBM Cognos reports for billing validation, financial analysis, and client reporting. Earlier the data was stored in Oracle known as Lawson schema, then the data was transformed in oracle itself after applying the required transformations and stored again in oracle in fact tables causing the process to run at very slow rate resulting in delay in the report generations as well as debugging and fixing the data if required.

Initially, the source data was stored in Oracle (Lawson schema), where all transformations were performed within the Oracle database and loaded into Oracle fact tables. This architecture resulted in performance bottlenecks,

Designed and developed enterprise ELT pipelines integrating multiple Oracle source systems with Amazon Redshift for high-performance data processing and reporting.
Built the end-to-end data migration framework from scratch, enabling the transition from Oracle-to-Oracle processing to Oracle-to-Redshift architecture.
Improved overall system performance by offloading complex transformation workloads from Oracle databases to Amazon Redshift.
Developed Python-based automation frameworks to eliminate manual execution of database scripts and streamline data processing workflows.
Created and maintained Apache Airflow DAGs for orchestration, scheduling, monitoring, dependency management, and error handling of ELT processes.
Optimized Airflow workflows to improve maintainability, operational efficiency, and ease of support for client teams.
Performed extensive data validation, unit testing, integration testing, and business reconciliation to ensure data accuracy and consistency.
Collaborated closely with business and accounting stakeholders to gather requirements, analyze reporting needs, and implement data model enhancements.
Provided technical leadership by reviewing code, mentoring team members, recommending implementation approaches, and ensuring adherence to development standards.
Investigated and resolved production issues, performed root cause analysis, and supported deployment activities in production environments.
Leveraged AWS services including Amazon S3 and Amazon Redshift to build scalable and efficient cloud-based data solutions.
Maintained source code repositories and CI/CD processes using GitLab.
Configured AWS Database Migration Service (DMS) to replicate data from source systems into Amazon Redshift for analytics and reporting.

Sr. Analyst - Exusia

Exusia - EN-COM (Enabling Ecommerce). Designed and implemented a scalable big data pipeline simulating an e-commerce/healthcare product ecosystem. The system processes multiple data feeds such as product catalogs, inventory, pricing, and sales orders using AWS-based ETL workflows.

Built an event-driven architecture where data uploaded to Amazon S3 triggers AWS Glue jobs via AWS Lambda. Implemented data validation, transformation, and parent-child dependency checks before storing processed data in relational databases (AWS RDS (Oracle)). Developed reporting datasets to analyze sales trends, product performance, and business insights.

The records, which passed the validation, were stored in master schema. Some feeds like price, inventory, etc. which are dependent on parent record, we verified the record is present in parent table or not, if not then the records were moved to unprocessed schema. For storing purpose, AWS RDS (ORACLE) was used.

We had multiple jobs for each feed, and we used to extract the data from RDS as per the extract logic and dump it to s3 target location from where the sap processes use to fetch the data.

Designed and developed ETL pipelines using PySpark in AWS Glue, processing approximately 8GB of simulated data to mimic real-world workloads.
Implemented data validation, cleansing, and transformation logic for multiple product categories.
Implemented data processing logic using PySpark and SQL for data transformation, aggregation, and validation.
Built parent-child data validation framework to handle dependencies (e.g., product vs inventory/price).
Stored processed and unprocessed datasets in Amazon S3 and AWS RDS (PostgreSQL).
Debugging Data Related Issues.
Developed SQL based reporting for business use cases such as top-selling products and sales trends.
Simulated and processed approximately 8GB of data, including product (approximately 6GB) and sales order (approximately 2GB) datasets, to support transformation and sales analytics use cases.
Generated analytical outputs (CSV datasets) for downstream visualization and reporting.
Simulated real-world production scenarios including job monitoring and debugging data issues.
The project involves technologies like Python, AWS GLUE, AWS S3, AWS RDS, AWS Lambda and Postgres.
Faced performance issues due to uneven data distribution (data skew) in PySpark jobs, resolved by optimizing transformations and repartitioning data to improve execution time.
Optimized job performance by tuning configurations (e.g., worker nodes in AWS Glue).
Handled data quality issues such as missing/null values and inconsistent formats by implementing validation and cleansing logic in PySpark.

Hire this person

About

Experience

Education

Skills

Reviews

Similar people near Hyderabad, Kolkata, Mumbai

Venkat Singaram

Shubham Kale

Shiva Reddy

Kushal Swarup

Chetana

Ramesh Lamba

Other similar people

Shubham Popade

Amit Kumar

Snehal Rasal

Sethupathi S

Gurupadayya Pujar

Sathiya Keerthika

Related