Data Engineer at Broadridge Fund Communication Solutions (2024-12 – Present)
Client: Aviva
- Initiated requirement gathering session with cross functional users across the organization.
- Developed end-to-end ETL workflows integrating multiple sources using Event Hub, Rest API's with Databricks ensuring seamless data ingestion and improved consistency across analytics pipelines.
- Designed robust and reusable Databricks notebooks for multi-source integration, reducing development effort for new projects and maintaining consistent architecture across environments.
- Optimized Databricks cluster performance by fine-tuning memory, caching, and partition strategies for large datasets, reducing runtimes, compute usage, and improving overall platform efficiency.
- Built high-performance PySpark transformations in Databricks for data cleansing, aggregation, and enrichment, data latency, and supporting downstream, and upstream systems.
- Configured Delta Lake optimization techniques, including Z-order, Partitions, Shuffling, Sharding, data compaction, compressions, and caching strategies, improving query performance, reducing compute costs, and enhancing large dataset processing efficiency.
- Leveraged Delta Lake features like Schema Evolution, Schema Enforcement, Schema drift, time travel version data for data auditing from source to target, and to handle Source schema changes.
- Designed batch and real-time data processing solutions using Databricks and Azure Stream Analytics, enabling continuous data flow for near real-time business reporting and analytics.
- Developed Databricks Asset Bundles, CI/CD pipelines using Azure DevOps to automate deployment of ADF pipelines, Databricks notebooks, and Synapse scripts, improving release quality, reducing manual intervention, and accelerating delivery cycles.
- Monitored pipeline and cluster performance using Azure Monitor and Databricks metrics, setting up proactive alerts to minimize downtime, maintain SLAs, and ensure overall platform reliability.
Data Engineer at Broadridge Fund Communication Solutions (2023-01 – 2024-12)
Client: Invesco
- Worked with internal and external stakeholders and functional users to gather the requirements related to end-to-end data platform, BI reports and translated them into technical solutions.
- Managed Git version control for ADF pipelines, Databricks notebooks, ensuring code management, collaboration, auditability, streamlined development, consistent deployment across environments.
- Implemented automated data validation scripts in Python using PyTest, Unit Test and Great Expectations ensure production accuracy, reliability, and reduced data discrepancies.
- Managed Databricks clusters, notebooks, jobs, configuring autoscaling, necessary libraries, and runtime environments for batch and streaming processing of large datasets across diverse domains.
- Optimized Databricks cluster performance and streaming jobs by tuning runtime configurations, autoscaling parameters, job parallelism to reduce computing costs and improve processing speed.
- Loaded large datasets into Azure Synapse Analytics using PolyBase, improving query performance, supporting large-scale analytics, and enabling efficient data warehouse for reporting needs.
- Implemented near real-time data processing solutions using Azure Stream Analytics, Event Hub, and Service Bus Queue, enabling rapid ingestion and transformation of streaming datasets for analytics.
- Applied RBAC and Active Directory security policies, securing sensitive business data, maintaining compliance with regulatory requirements, and ensuring enterprise-wide data governance.
- Scheduled and orchestrated ETL jobs using ADF triggers and Databricks job scheduling, improving reliability, ensuring timely data availability, and fully automating enterprise data workflows.
- Integrated web applications and external data sources using REST APIs, implemented pagination and API rate-limit handling, and processed the data using Databricks with storage in ADLS.
Data Engineer at Capgemini (2021-12 – 2022-08)
Client: GE-EIC
- Collaborated with cross-functional teams to migrate legacy systems to modern Cloud platforms, improving usability, visualization, accessibility, and adoption by end-users across enterprise.
- Developed parameterized ADF pipelines to automate ingestion from legacy systems supporting multiple environments, maintaining consistent processing logic, and reducing manual intervention.
- Created Metadata driven robust ADF pipelines using activities like Copy, Get Metadata, Lookup, Filter, ForEach, and Databricks notebooks to extract, transform, and load data from disparate sources efficiently, supporting enterprise-wide analytics initiatives.
- Performed ongoing monitoring, automation, refinement of data pipelines, integrating Azure Logic Apps for workflow automation, ensuring operational stability, and optimizing overall performance.
- Built dimensional and star schema models in Azure SQL DB, optimizing query performance, enabling efficient reporting for both SSAS cubes and Power BI, and supporting enterprise decision-making.
- Developed metadata-driven pipelines using ADF ADLS Gen2, allowing dynamic schema handling, lineage tracking, reusable pipelines, and efficient data processing for multiple business units.
- Automated ADF pipelines and dataflows using Triggers, Azure DevOps and ARM templates.
- Designed Power BI reports with interactive features, drill-through, filters, incremental refresh, providing business users actionable insights, enhanced reporting, and improved decision-making.
Data Analyst at Capgemini (2020-07 – 2021-12)
Client: MetLife
- Gathered reporting and analytics requirements from business stakeholders and translated KPIs into scalable ETL and data warehouse solutions using SSIS, SSAS, and SQL Server.
- Developed and optimized SSIS ETL pipelines and SQL queries to load large volumes of data into enterprise data warehouses, improving data quality, performance, and reporting efficiency.
- Designed SSAS multidimensional cubes, measures, hierarchies, and calculated fields to support analytical reporting, dashboards, and business intelligence initiatives.
- Built Alteryx workflows for data profiling, cleansing, transformation, and enrichment, improving data preparation efficiency and consistency across reporting processes.
- Developed automated SSRS reports with scheduling and subscriptions, enabling timely delivery of critical business insights and reducing manual reporting effort.
- Implemented data quality, reconciliation, security, and governance controls to ensure compliance, data accuracy, and reliable enterprise reporting.
QA Engineer to Senior Consultant at Capgemini (2013-10 – 2020-07)
Clients: MetLife, NBC Universal. Progressed from QA Engineer to Senior Consultant, leading cross-functional testing teams and delivering enterprise-scale projects across insurance, media, and financial services domains.
- Gathered and analysed business and technical requirements, developed test strategies, managed project schedules, risks, and stakeholder communications to ensure successful project delivery.
- Designed and implemented automated testing frameworks using Selenium, UFT, Jenkins, and CI/CD pipelines, significantly improving release quality and reducing manual testing effort.
- Mentored junior engineers, coordinated with business stakeholders and project managers, and drove continuous process improvements through automation, governance, and quality assurance best practices.