I have 6 years of experience and a proven knowledge of Programming, Data Engineering Concepts, ETL, Data Warehouse and Data virtualization Tools, Machine Learning, DevOps.
Experience
Continuously seek out and identify areas to improve our data engineering practices, our data quality and our processes.
Create and manage data documentation to provide transparency to users ( ERD, Data Dictionary, Lineage)
Perform Development and solution design from project inception through to implementation managing resources, timelines and budgets.
Maintain , Troubleshoot and answer hoc questions regarding current data pipelines.
Create, enhance and monitor data processes to ensure data data quality and consistency.
Deploying the data service infrastructure on kubernetes using terraform scripts
Creating Azure Data Bricks Notebooks based on Business requirements.
Creating DAX queries for developing the reports and measures in Power BI and deploying it using the Azure Devops, PowerShell Scripts.
Used Informatica to Collaborate with cross-functional teams to understand data requirements and design and implement data integration solutions.
Used Informatica for extracting, transforming, and loading (ETL) data from various sources into target systems, ensuring data accuracy and consistency.
Creating MLOps application leveraging the Databricks MLFlow.
Model deployment and serving is created using Databricks MLflow API’s.
Used PySpark in data bricks, ETL/ELT, big data
API's used under PySpark or RDD , SQL, structure streaming, ML.
The sources worked with PySpark diff type of sources like kafka, eventhub, relational database like sql server, oracle.
NoSQL database like cosmos db, mongo db blob storages like azure blob storage, azure data lake gen2, aws s3, file sources csv, excel, avro, parquet, xml, json.
Packaging the existing Python code and deploying as libraries into data bricks Filestore folder and automating it using the Git