Data Engineer | PySpark | AWS Glue Apache Airflow | SQL
Send a job offer directly to this candidate
Data Engineer with 2.6 years of experience in designing, developing, and migrating scalable ETL pipelines using PySpark and AWS Glue in the insurance domain. Experienced in migrating legacy BDM workflows to AWS Glue and converting TWS schedulers into Apache Airflow DAGs for workflow orchestration and monitoring. Skilled in building configuration-driven ETL frameworks, batch-based incremental processing, data validation, and source-to-target reconciliation for large-scale claims data processing.
Hands-on experience with PySpark transformations, Apache Hive, Amazon S3, PostgreSQL, and AWS services to support scalable cloud-based data engineering solutions. Proficient in CI/CD deployments using Azure DevOps and GitHub Actions, with expertise in workflow monitoring through Airflow, Control-M, and CloudWatch.
Project: Upstream Rebuild – Data Migration & ETL Modernization (Aviva UK Insurance Domain) Source Systems: Ireland Claims, Pelican Claims, Guidewire Claims Tech Stack: PySpark, AWS Glue, Apache Airflow, MWAA, SQL, Apache Hive, Amazon S3, PostgreSQL, CloudWatch, Azure DevOps, GitHub Actions, Control-M
Migrated legacy ETL workflows from BDM to AWS Glue using PySpark, improving scalability and reducing manual intervention by ~30%. Converted legacy TWS schedulers into Apache Airflow DAGs for metadata refresh, batch extraction, validation, ETL execution, and workflow orchestration. Developed configuration-driven ETL mapping frameworks with source-to-target mappings, transformation logic, and reconciliation rules.Built scalable PySpark ETL pipelines for large-scale insurance claims processing using Apache Hive and Amazon S3.
PySpark DataFrame transformations including joins, filters, aggregations, and column-level mappings based on business requirements.
Developed batch ID-based incremental processing for delta data loads, improving execution efficiency and reducing unnecessary full-load processing. Performed source-to-target reconciliation, ANR validations, null checks, and duplicate validations to ensure migration accuracy and data quality.Loaded transformed datasets into PostgreSQL using JDBC connections for downstream reporting and analytics. Monitored and troubleshot AWS Glue jobs using Amazon CloudWatch, improving production stability and reducing job failures.
RMK Engineering College Bachelor of Engineering (B.E.) – Electronics & Telecommunication Engineering