- Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views, and tables for new enhancements of the project and to resolve the existing defects.
- Worked on ingesting the real-time data using Kafka.
- Used Sqoop to ingest the data from Oracle database and store them on S3.
- Worked on ingesting data from JSON, CSV files using spark and EMR and store the output data in Parquet file format on S3.
- Built ETL pipelines on Snowflake and the data products are used by stakeholders for querying and serve as backend objects for visualizations.
- Excellent knowledge on AWS services (S3, EMR, Athena, EC2), Snowflake and Big Data technologies.
- Proficient in Teradata SQL, BTEQ, FastLoad, MultiLoad, TPump, and FastExport for efficient data extraction, transformation, and loading (ETL).
- Experienced in writing complex SQL queries, stored procedures, macros, and functions to optimize data retrieval and reporting.
- Skilled in Teradata performance tuning, including indexing strategies, query optimization, and analyzing execution plans using EXPLAIN.
- Created internal and external stages and transformed data during load.
- Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap.
- Designing the business requirement collection approach based on the project scope and SDLC methodology.
- Using SSIS to create data mapping that loads data from source to the destination.
- Create automated scripts to run the batch jobs and reports using SSIS.
- Built SSIS packages, to fetch file from remote location like FTP and SFTP, decrypt it, transform it, mart it to data warehouse and provide proper error handling and alerting
- Developed and maintained data pipelines for streaming data using Apache Kafka.
- Implemented ETL processes to automate data ingestion and transformation using Python and Airflow.
- Automated the process of extracting the various source files from SFTP/FTP.
- Meetings with business/user groups to understand the business process, gather requirements, analyse, design, development and implementation according to client requirement.
- Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
- Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
- Ingested huge volume and variety of data from disparate source systems into Azure data Lake Gen2 using Azure Data Factory V2.
- Develop Azure Databricks notebooks to apply the business transformations and perform data cleansing operations.
- Develop Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure Data Lake storage.
- Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
- Used Azure Maps to Calculate the Distance, Latitude & Longitude.
- Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
- Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
- Developed streaming pipelines using Apache Spark with Python.
- Developed Power BI reports using Power Query from SQL Server & different Data sources.
- Created notifications and alerts, subscriptions to reports in the Power BI service.
- Provided support once the Power BI reports were published if there were any data changes, system changes, or requirement changes.
- Experienced in developing audit, balance and control framework using SQL DB audit tables to control the ingestion, transformation, and load process in Azure.
- Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
- Used Azure DevOps to build and release different versions of code in different environments.
- Created External tables in Azure SQL Database for data visualization and reporting purpose.
- Well-versed with Azure authentication mechanisms such as Service principal, Managed Identity, Key vaults.
- Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time.
- Perform ongoing monitoring, automation, and refinement of data engineering solutions.
- Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
- Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
- Generating alerts on the daily metrics of the events to the product people.
- Suggest fixes to complex issues by doing a thorough analysis of root cause and impact of the defect.
- Designed and optimized BTEQ scripts for large-scale data extraction, transformation, and loading (ETL) in Teradata, improving query efficiency by 30%+.
- Developed high-performance TPT (Teradata Parallel Transporter) scripts (Load, Update, Stream, DDL) for fast data movement, handling multi-terabyte datasets with minimal downtime.
- Automated BTEQ/TPT workflows using UNIX shell scripting, reducing manual effort by 60% and ensuring timely data delivery.
- Enhanced ETL job performance by 40% through query tuning, partitioning, and optimizing Teradata utility parameters (sessions, buffers, checkpoints).
- Executed large-scale data migrations using FastLoad, MultiLoad, FastExport, and TPump, ensuring zero data loss for billions of records.
- Developed and maintained end-to-end data pipelines using Apache Spark and Python, processing large volumes of data from various sources, and transforming it into meaningful insights for stakeholders.
- Worked on SnowSQL and Snowpipe.
- Created Snowpipe for continuous data load and Used Copy to bulk load the data.
Collaborated with cross-functional teams including data scientists, analysts, and software engineers to define data requirements, design data models, and optimize data storage and retrieval.