Maryville, Township of Polk, NodawayMember since June 20, 2025
·
Hire this person
Send a job offer directly to this candidate
About
Cloud Data Engineer with 3+ years of experience building scalable ETL pipelines across AWS and GCP using Python, PySpark, Airflow and AWS Glue.
Skilled in data warehousing with Snowflake, BigQuery, Redshift, and pipeline optimization through query tuning and cost reduction.
Hands-on with Docker & Kubernetes for container orchestration; implemented CI/CD pipelines using Git, Jenkins, and Terraform.
Developed real-time streaming pipelines using Kafka, Kinesis, Lambda, and Spark Streaming; migrated legacy SQL/Oracle systems to modern cloud warehouses.
Designed fault-tolerant, monitored workflows using CloudWatch and CloudTrail; built automated reporting dashboards using QuickSight and BigQuery.
Collaborated cross-functionally with analysts, architects, and DevOps to ensure scalable, business-aligned data solutions.
Experience
Cloud Data Engineer with 3+ years of experience building scalable ETL pipelines across AWS and GCP using Python, PySpark, Airflow and AWS Glue.
Skilled in data warehousing with Snowflake, BigQuery, Redshift, and pipeline optimization through query tuning and cost reduction.
Hands-on with Docker & Kubernetes for container orchestration; implemented CI/CD pipelines using Git, Jenkins, and Terraform.
Developed real-time streaming pipelines using Kafka, Kinesis, Lambda, and Spark Streaming; migrated legacy SQL/Oracle systems to modern cloud warehouses.
Designed fault-tolerant, monitored workflows using CloudWatch and CloudTrail; built automated reporting dashboards using QuickSight and BigQuery.
Collaborated cross-functionally with analysts, architects, and DevOps to ensure scalable, business-aligned data solutions.
Education
Built and optimized data pipelines using Apache Spark, Hive, Kafka, MySQL, and HBase; improved processing efficiency by 30% through Spark-Hive and Spark-HBase integration and 5x faster performance via Spark job tuning.
Automated data transfers using Sqoop and Airflow, reducing MySQL-to-Hive sync time by 2x; developed scalable ingestion workflows using AWS Glue, Lambda, and S3, maintaining 99.8% uptime across batch and streaming pipelines.
Designed and deployed real-time streaming applications using Kinesis and Lambda, reducing data lag by 40%; implemented cost-effective storage strategies using S3 lifecycle policies.
Deployed containerized ETL workflows on AWS ECS and Fargate for high availability; configured real-time monitoring and alerts for 15+ pipelines using CloudWatch and CloudTrail.
Integrated Snowflake as a centralized cloud data warehouse; built ingestion pipelines from S3 and Kafka using Airflow and Python, and optimized queries and warehouse sizing to reduce compute costs by 25%.