Total 6 years of experience in the IT industry and 3 years of experience as a data engineer.
Proficient in designing and implementing Big Data projects within the Hadoop ecosystem, utilizing technologies such as MapReduce, HDFS, Hive, and Python, and performing data pipeline development and ETL operations.
Proficient in Spark Architecture, including Data Frame, Dataset, and RDD, as well as the Hadoop HDFS architecture and MapReduce framework.
Experienced in utilizing AWSservices such as S3, EMR, Glue, and Airflow, demonstrating a strong understanding of cloud-based data storage, data processing, and workflow management.
Strong practical experience with Relational Databases, showcasing hands-on expertise.
Proficient in databases, Docker, Git, CI/CD, and Agile methodologies, highlighting a well-rounded skill set in essential tools and practices.
Experience
Led a Healthcare project involving data ingestion into Postgres and Redshift tables, implementing various events based on business requirements to enhance data evolution.
Actively contributed to the development of 5 number of Spark applications using Python, leveraging advanced data processing capabilities.
Optimized existing code in Hadoop by utilizing Spark-SQL, Data Frames, and RDD, resulting in a 50% improvement in performance and efficiency.
Implemented ETL processes by loading CSV data into intermediate tables, performing transformations, and persisting the results in Redshift tables.
Designed and implemented 6 number of Airflow pipelines to handle diverse data sources and streamline data workflows in the healthcare domain.