Associate Software Engineer - Data Engineer at Mphasis (2023-08 – Present)
Worked on designing and managing data pipelines in Azure Databricks for large-scale data processing and advanced analytics. Implemented Medallion Architecture and managed ETL pipelines with Delta Lake.
- Worked on designing and managing data pipelines in Azure Databricks for large-scale data processing and advanced analytics.
- Ingested structured and semi-structured data (CSV, Excel, Text, JSON, Parquet) from Azure Data lake and other heterogeneous sources.
- Implemented the Medallion Architecture (Bronze, Silver, and Gold layers) within the Data Lakehouse Architecture to manage raw, cleansed, and curated datasets, and implemented a Star Schema data model to enable optimized analytics and reporting.
- Performed data transformations, cleansing, and aggregations using Pyspark and SQL and Delta Lake to meet client-specific requirements.
- Developed and scheduled Databricks Jobs and Delta Live Tables (DLT) to enable automated, reliable, and scalable data pipelines.
- Leveraged Delta Live Tables for declarative pipeline design, schema enforcement, data quality expectations, and incremental data processing.
- Implemented incremental load strategies and Delta Lake ACID transactions to ensure efficiency, reliability, and consistency in data pipelines.
- Designed and implemented robust data pipelines utilizing both batch and real-time stream processing techniques to handle large-scale data ingestion and transformation.
- Ensured data quality, governance, and monitoring through validation checks, auditing, error handling, and alerting frameworks.
- Leveraged Python to design and implement data transformation logic for complex scenarios within ETL pipelines, improving data quality and processing efficiency.
Associate Software Engineer - ETL Developer at Mphasis (2023-03 – 2023-08)
Developed and implemented robust ETL pipelines using SQL Server Integration Services (SSIS) for data extraction, transformation, and loading from heterogeneous sources. Optimized SQL queries and improved ETL performance.
- Developed and implemented a robust ETL pipeline using SQL Server Integration Services (SSIS) to perform data extraction, transformation, and loading from heterogeneous sources (Excel, CSV, flat files) into a centralized data warehouse. Optimized SQL queries and ensured data integrity, improving ETL performance by 30%.
- Designed and deployed dynamic SSIS packages capable of efficiently processing millions of records with performance tuning and minimal manual intervention.
- Integrated custom C# scripts in script task & script component within SSIS to handle dynamic and frequently changing file layouts, transformation logic for fixed-width files.
- Implemented event handlers and custom logging mechanisms to track package execution, error handling, significantly improving troubleshooting, and operational visibility.
- Optimized data load performance using batching, parallel execution, and staging techniques, reducing ETL runtime by over 30%.
- Collaborated with business analysts and stakeholders to gather requirements, map source-to-target transformations, and deliver data solutions aligned with business needs.
- Replaced SSIS-based orchestration with Databricks Jobs to manage and schedule end-to-end ETL pipelines, enhancing scalability and operational efficiency.
Associate Software Engineer - ETL Developer at Mphasis (2022-02 – 2023-03)
Designed and implemented a scalable ETL framework using SQL Server Integration Services (SSIS) to extract, transform, and load data from diverse sources into SQL Server.
- Designed and implemented a scalable ETL framework using SQL Server Integration Services (SSIS) to extract, transform, and load data from diverse sources including Excel, fixed-width text files, flat files into SQL Server, loading data from multiple sources.
- Used a range of SSIS transformations, such as Lookup, Sort, Merge, Derived Column, and Conditional Split, to standardize and enrich incoming datasets. Incorporated Script Tasks for advanced file handling.
- Developed custom logs and event handlers to monitor package execution and facilitate effective error handling, enhancing troubleshooting and operational visibility.
- Designed and Developed the stored Procedures to extract the data and send Validation Reports to clients