Harish R

Data Engineer

Talent

Charlotte, Mecklenburg60 USD/hourMember since May 12, 2025

Hire their services

Request a quote with no obligation

azure data factory power bi data warehousing data modeling databricks etl data engineer sql

About

5 Years of total IT experience and technical proficiency in the Data Warehousing involving Business Requirements Analysis, Application Design, Data Pipelines, Data Modelling, Development, Testing and Documentation.
Excellent understanding of best practices ofEnterprise Data Warehouseand involved in Full life cycle development ofData Warehousing.
Hands on experience in creating pipelines in Azure Data Factory V2 using activities like Move &Transform, Copy, filter, for each, Get Metadata, Lookup, Data bricks etc.
Excellent knowledge on integrating Azure Data Factory V2/V1with variety of data sources and processing the data using the pipelines, pipeline parameters, activities, activity parameters, manually/window based/event-based job scheduling.
Hands on experience working with different file formats like Json, csv, Avro, parquet etc. using Databricks and Data Factory.
Extensively working in reading Continuous Json data from different source system using EventHub into various downstream systems using stream analytics and Apache spark structured streaming (Databricks).
Extensive knowledge and Hands on experience implementing cloud data lakes like Azure Data Lake Gen1 and Azure Data Lake Gen2.
Expert in Coding SQL, Stored Procedures, Macros, Functions and Triggers.
Expertise in creating visualizations and reports using Power BI.
Build interactive Power BI dashboards and publish Power BI reports utilizing parameters, calculated fields and table calculations, user filters, action filters and sets to handle views more efficiently.
Providing Azure technical expertise including strategic design and architectural mentorship, assessments, POCs, etc., in support of the overall sales lifecycle or consulting engagement process.
Worked on Data Warehouse design, implementation, and support (SQL Server, Azure SQL DB, Azure SQL Data warehouse, Teradata).
Experience in implementing in ETL and ELT solutions using large data sets.
Experience in Teradata Database design, implementation, and maintenance mainly in large scale Data
Expertise in querying and testing RDBMS such as Teradata, Oracle and SQL Server using SQL for data integrity.
Managed Teradata workload scheduling using TASM (Teradata Active System Management) to prioritize critical queries and improve system performance.
Performed data migration from legacy systems to Teradata, ensuring data integrity and minimal downtime.
Created and maintained documentation for database schemas, ETL processes, and best practices for team reference.
Proficient in Data Modelling Techniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, Physical and Logical data modelling for Data Warehouse and Data Mart.
Excellent communication and inter personnel skills, Proactive, Dedicated and Enjoy learning new Technologies and Tools.
Strong experience in design and development of Business Intelligence solutions using Tableau, R shiny, Python flask, data modelling, Dimension Modelling, ETL Processes, Data Integration, OLAP and client /server application.
Strong commitment towards quality, experience in ensuring compliance to coding standards and review process.

Experience

Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views, and tables for new enhancements of the project and to resolve the existing defects.
Worked on ingesting the real-time data using Kafka.
Used Sqoop to ingest the data from Oracle database and store them on S3.
Worked on ingesting data from JSON, CSV files using spark and EMR and store the output data in Parquet file format on S3.
Built ETL pipelines on Snowflake and the data products are used by stakeholders for querying and serve as backend objects for visualizations.
Excellent knowledge on AWS services (S3, EMR, Athena, EC2), Snowflake and Big Data technologies.
Proficient in Teradata SQL, BTEQ, FastLoad, MultiLoad, TPump, and FastExport for efficient data extraction, transformation, and loading (ETL).
Experienced in writing complex SQL queries, stored procedures, macros, and functions to optimize data retrieval and reporting.
Skilled in Teradata performance tuning, including indexing strategies, query optimization, and analyzing execution plans using EXPLAIN.
Created internal and external stages and transformed data during load.
Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap.
Designing the business requirement collection approach based on the project scope and SDLC methodology.
Using SSIS to create data mapping that loads data from source to the destination.
Create automated scripts to run the batch jobs and reports using SSIS.
Built SSIS packages, to fetch file from remote location like FTP and SFTP, decrypt it, transform it, mart it to data warehouse and provide proper error handling and alerting
Developed and maintained data pipelines for streaming data using Apache Kafka.
Implemented ETL processes to automate data ingestion and transformation using Python and Airflow.
Automated the process of extracting the various source files from SFTP/FTP.
Meetings with business/user groups to understand the business process, gather requirements, analyse, design, development and implementation according to client requirement.
Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
Ingested huge volume and variety of data from disparate source systems into Azure data Lake Gen2 using Azure Data Factory V2.
Develop Azure Databricks notebooks to apply the business transformations and perform data cleansing operations.
Develop Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure Data Lake storage.
Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
Used Azure Maps to Calculate the Distance, Latitude & Longitude.
Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
Developed streaming pipelines using Apache Spark with Python.
Developed Power BI reports using Power Query from SQL Server & different Data sources.
Created notifications and alerts, subscriptions to reports in the Power BI service.
Provided support once the Power BI reports were published if there were any data changes, system changes, or requirement changes.
Experienced in developing audit, balance and control framework using SQL DB audit tables to control the ingestion, transformation, and load process in Azure.
Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
Used Azure DevOps to build and release different versions of code in different environments.
Created External tables in Azure SQL Database for data visualization and reporting purpose.
Well-versed with Azure authentication mechanisms such as Service principal, Managed Identity, Key vaults.
Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time.
Perform ongoing monitoring, automation, and refinement of data engineering solutions.
Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
Generating alerts on the daily metrics of the events to the product people.
Suggest fixes to complex issues by doing a thorough analysis of root cause and impact of the defect.
Designed and optimized BTEQ scripts for large-scale data extraction, transformation, and loading (ETL) in Teradata, improving query efficiency by 30%+.
Developed high-performance TPT (Teradata Parallel Transporter) scripts (Load, Update, Stream, DDL) for fast data movement, handling multi-terabyte datasets with minimal downtime.
Automated BTEQ/TPT workflows using UNIX shell scripting, reducing manual effort by 60% and ensuring timely data delivery.
Enhanced ETL job performance by 40% through query tuning, partitioning, and optimizing Teradata utility parameters (sessions, buffers, checkpoints).
Executed large-scale data migrations using FastLoad, MultiLoad, FastExport, and TPump, ensuring zero data loss for billions of records.
Developed and maintained end-to-end data pipelines using Apache Spark and Python, processing large volumes of data from various sources, and transforming it into meaningful insights for stakeholders.
Worked on SnowSQL and Snowpipe.
Created Snowpipe for continuous data load and Used Copy to bulk load the data.

Collaborated with cross-functional teams including data scientists, analysts, and software engineers to define data requirements, design data models, and optimize data storage and retrieval.