Data Engineer
Send a job offer directly to this candidate
I led the end-to-end design and optimization of 100+ data pipelines, reducing processing time by 40% and enhancing query performance in BigQuery and Snowflake. I developed complex SQL scripts utilizing partitioning, materialized views, stored procedures, and functions, which improved data retrieval efficiency by 30%.
To streamline data movement across hybrid cloud platforms, I implemented Apache Airflow workflows, orchestrating Dataflow pipelines for efficient processing. I also designed and executed automated data quality checks, reducing ETL errors by 25%. Leveraging Informatica, I built robust ETL processes that ensured seamless data integration across environments.
I designed and implemented 300+ data quality rules in Snowflake to ensure data integrity, accuracy, and consistency during the data ingestion and transformation process. These rules were integrated into automated data quality frameworks to validate data at various stages of the ETL pipeline, ensuring that anomalies, missing values, and inconsistencies were detected and addressed before loading into production tables.
Additionally, I enhanced CI/CD pipelines, automating deployment and reducing manual intervention by 50% using Bamboo and Bitbucket. I also developed 100's of automated tests to validate pipeline integrity between GCP and Snowflake. Security was a key focus—using Control-M, I implemented PII data hashing before securely transferring files to GCP cloud storage.
By collaborating with cross-functional teams, I optimized data models for business reporting, ensuring the organization had accurate, timely, and actionable insights. Throughout, I monitored Airflow DAGs to guarantee uninterrupted data ingestion, directly impacting business intelligence capabilities and did analysis of data by comparing with Snowflake, Denodo and Tableau data where it significantly improved data availability, performance, and reliability, directly driving better decision-making across the organization where I can use my skills and knowledge.
Results-driven Data Engineer with 6+ years of experience, a proven track record by optimizing 100+ data pipelines and enhancing query performance in BigQuery and Snowflake. Skilled in Apache Airflow, GCP, SQL and CI/CD pipelines, I excel in collaboration and automated testing, achieving a 40% reduction in processing time while ensuring data quality and governance. Adept at leveraging Apache Spark, Hadoop and Machine learning and optimizing analytics solutions to improve data quality and performance.
Masters in Applied Computer Science