Data Enthusiast & Data Engineer
Send a job offer directly to this candidate
I'm an enthusiastic and diligent software technology professional. Having worked on a variety of projects around various domains equips me with the ability to adapt, enhance and optimize towards quality solution delivery. I'm an equally exceptional performer under pressure with core focus at adhering timelines.
02/2023 - Present, Achievements/Tasks
AWS (EC2, EMR, S3) | Pyspark | Airflow | Python | Git | Docker | Jenkins | Bash | Click House | Kafka | Data Pipelines | Data Lake | Data Warehousing | ETL/ELT
Back fill automation | Data QA automation | Processing with large Scale(PetaBytes) data on daily bases | Ensuring of Data delivery, consistency | Mantaining, writting and optimizing data pipelines on daily bases.
11/2022 - 01/2023, Achievements/Tasks
(Python | MySQL | Mongodb | Spark | Airflow | | Bash | Rest Api | Multiprocessing and MultiThreading)
Creates Data Pipelines for preparing US Demographics Data
Big Data Engineer The Entertainer
05/2022 - 10/2022, Achievements/Tasks
(Data Factory | Spark | Kafka | Hadoop | Hive | Azure Synapse Analytics | Python | MySQL)
Build ETL Pipelines | Perform Data Modeling for DWH | Build Data Lake in AWS S3 | Build Analytics Pipelines using PySpark
Apache (Nifi | Kaka | Druid | Airflow | Spark | Hadoop | Hive)
Python | Java | Scala | Java Script | C++ MySQL | PostgreSQL | SQL Server
Avro | Parquet MongoDB | Cassandra
Microsoft Azure (Data Factory | Data Bricks) Docker | GIT AWS (Redshift | S3 | EC2) SQL (UDF | Stored Procedures) Data Manipulation (Numpy | Pandas | Matplotlib | Seaborn ) ML | DL (Mlxtend | Scikit learn | Keras) Data Pipelines
04/2021 - 04/2022, Achievements/Tasks
(Python | Airflow | MySQL | MongoDB | Bash | Web Scraping | Multiprocess and Multithreaded scripts | REST API | Spark)
News Classification & Recommendation Application(python | Scrapy | Aprori | Random Forest | JavaScript | MERN)
Data Modeling (PostgreSQL | Apache Cassandra | Redshift)
Data Warehouse(ETL pipeline| S3 | Redshift)
ETL pipeline & Data Modeling (Nifi | Airflow) Data Mining Tool (Python/ML)
Pakistan E-Market website (MERN)
Product Affinity Analysis using Apriori algorithm Zameen.com Scraping Engine
Stock (Time Series) forecasting using deep learning (LSTM) Heart Disease Classification using ML ( Random Forest ) Retail Churn Prediction using ML (Random Forest)
Extract data from Pdf (retail invoices) using python
C++ program optimization based on memory management
Data Engineering (Nanodegree - udacity)
Linkedin Assessments: C | C++ | OOP | Java | Python | JavaScript | Machine Learning | MySQL | MongoDB | Node.js| REST APIs | PHP Hackerrank: SQL(Intermediate) | Problem Solving (Basic | 3 Stars ) Introduction to Big Data with Spark and Hadoop (Coursera | IBM)
ETL and Data Pipelines with Shell, Airflow and Kafka ( Coursera | IBM ) Advanced Python (Linkedin)
Advanced SQL for Query Tuning and Performance Optimization (Linkedin)
Apache Spark Essential Training: Big Data Engineering (Linkedin)
Build and optimize processes based on Multiprocess and Multithread in order to improve the efficiency of a process in controlled environment so we can manage QPS and use resources efficiently.
Develop Data Pipeline for multiple clients e.g. AT&T, T-Mobile, virgin media in US and UK.(AirFlow | MySQL | MongoDB)
02/2017 - 02/2021,
GPA - 3.14