Data Engineer
Send a job offer directly to this candidate
Data Engineer with 3.6+ years of experience in designing and developing ETL/ELT pipelines and data warehousing solutions using Python, SQL, PySpark, Apache Airflow, dbt, AWS, Oracle, PostgreSQL, Snowflake, and Amazon Redshift. Experienced in Spark performance optimization, workflow orchestration, and cloud-based data engineering solutions. Worked with US-based clients in Insurance domain projects as Sr.
Analyst, Associate, and Data Engineer. Experienced in designing and developing ETL/ELT pipelines, data ingestion workflows, and data transformation solutions using AWS and Azure cloud data platforms. Hands-on experience developing PySpark based ETL pipelines for data ingestion, transformation, data quality enforcement, and performance optimization within modern data engineering workflows.
Designed and implemented end-to-end ETL and data engineering solutions using PySpark, AWS Glue, and AWS Lambda for data ingestion, transformation, validation, and performance optimization. Hands-on exposure to Databricks, Azure Data Factory, ADLS and AWS Step Functions through self-built projects focused on ETL development and Spark optimization techniques. Strong experience in SQL development, data warehousing, workflow orchestration, performance tuning, and building high-quality data processing solutions.
Experienced in Agile and Scrum methodologies, including sprint planning, production support, client communication, troubleshooting, peer code reviews, and mentoring team members. Developed ETL/ELT workflows including NetSuite to Amazon Redshift using Apache Airflow and handled production issue resolution and SQL-based data fixes. Acting technical lead experience in proposing solutions, reviewing designs, and coordinating with teams for delivery and deployments.
Quick learner with strong analytical, problem-solving, and technical collaboration skills.
Data Engineer - Exusia India Pvt Ltd - Pune
(2022-12)
Data Engineer SSR/ADV - Datavant
DATAVANT - Redshift to Snowflake Migration. Migrated a legacy Amazon Redshift-based data warehouse to Snowflake for an insurance domain client to improve scalability, performance, and reporting efficiency. Data from multiple source systems including NetSuite, ERP, Oracle, and PostgreSQL was ingested using Fivetran, processed through a modern medallion architecture (bronze, silver, gold layers), and served to business users via Sigma dashboards.
Senior Analyst, Associate - Datavant
DATAVANT - NetSuite to Redshift migration. Developed and maintained a data integration and reporting platform for an insurance-domain client. The platform extracts live patient and financial data from NetSuite and ERP systems, applies business-specific transformation rules, and generates billing and revenue-related metrics such as records billed, pages processed, requests received, revenue generated, accounts receivable, daily revenue, and general ledger reporting.
The extracted data is loaded into Amazon Redshift staging tables and transformed into dimensional and fact tables based on business requirements. The processed data is consumed through Power BI dashboards and IBM Cognos reports by sales and business teams for billing validation, revenue analysis, and operational reporting. As part of a source system migration, the existing ELT framework was enhanced to replace Oracle-based data extraction with NetSuite and ERP integrations.
The extraction logic, orchestration workflows, and downstream reporting processes were modified to ensure accurate data processing and seamless report generation.
Sr. Analyst - CIOXHealth
CIOXHEALTH - Oracle to Redshift migration. Worked on a large-scale insurance domain data platform responsible for processing patient and financial data from multiple Oracle source systems. The platform generated key business metrics such as records billed, pages processed, requests received, revenue generated, accounts receivable, daily revenue, and general ledger reports.
Multiple ELT processes were developed to support business and accounting operations. The processed data was consumed by sales and business teams through Power BI dashboards and IBM Cognos reports for billing validation, financial analysis, and client reporting. Earlier the data was stored in Oracle known as Lawson schema, then the data was transformed in oracle itself after applying the required transformations and stored again in oracle in fact tables causing the process to run at very slow rate resulting in delay in the report generations as well as debugging and fixing the data if required.
Initially, the source data was stored in Oracle (Lawson schema), where all transformations were performed within the Oracle database and loaded into Oracle fact tables. This architecture resulted in performance bottlenecks,
Sr. Analyst - Exusia
Exusia - EN-COM (Enabling Ecommerce). Designed and implemented a scalable big data pipeline simulating an e-commerce/healthcare product ecosystem. The system processes multiple data feeds such as product catalogs, inventory, pricing, and sales orders using AWS-based ETL workflows.
Built an event-driven architecture where data uploaded to Amazon S3 triggers AWS Glue jobs via AWS Lambda. Implemented data validation, transformation, and parent-child dependency checks before storing processed data in relational databases (AWS RDS (Oracle)). Developed reporting datasets to analyze sales trends, product performance, and business insights.
The records, which passed the validation, were stored in master schema. Some feeds like price, inventory, etc. which are dependent on parent record, we verified the record is present in parent table or not, if not then the records were moved to unprocessed schema. For storing purpose, AWS RDS (ORACLE) was used.
We had multiple jobs for each feed, and we used to extract the data from RDS as per the extract logic and dump it to s3 target location from where the sap processes use to fetch the data.
Bachelor of Technology - Deogiri Institute of Engineering and Management Studies (2021)