Senior Data Engineer - Morgan Stanley - Houston, TX
(2024-12)
Environment: AWS (Redshift, S3, Glue, Lambda, Kinesis, API Gateway, Step Functions, RDS, EMR, Athena, IAM, VPC), Databricks (PySpark, Delta Lake, Unity Catalog), Hadoop, Informatica, SSIS, SQL Server, PL/SQL, Power BI
- Designed and built an AWS-centric data platform, combining S3 data lake, Redshift data warehouse, and Glue ETL to support batch and real-time analytics.
- Developed PySpark/Databricks data pipelines using Delta Lake and medallion architecture (Bronze/Silver/Gold) for scalable ingestion, transformation, and ACID-compliant upserts.
- Implemented Kinesis + Lambda + Step Functions for high-throughput streaming ingestion into S3/Redshift and automated workflows across Glue and Databricks.
- Managed Snowflake environments including warehouse sizing/auto-suspend policies, role/privilege structures, and environment hygiene (schemas, stages, retention) to balance performance, cost, and governance
- Modeled dimensional schemas (Star/Snowflake) and tuned Redshift (distribution/sort keys, WLM, materialized views), cutting query runtimes by 30–50%.
- Extensive hands-on experience with Snowflake for analytics workloads, including schema design, virtual warehouse configuration, query optimization, and cost control (micro-partitioning, clustering, time travel, zero-copy cloning)
- Used Snowflake's account usage views and query history, plus cloud monitoring, to track usage patterns, detect anomalies, and continuously tune warehouses for cost and performance
- Implemented complex Snowflake stored procedures and tasks to orchestrate multi-step transformations and data quality check
- Developed advanced Power BI dashboards utilizing Redshift and Delta Lake with complex DAX measures, time intelligence, dynamic KPIs, drill-down/drill-through, and row-level security for executive-level P&C policy and claims performance, trend analysis, and operational monitoring.
- Used Jira in Agile teams to manage stories, tasks, and defects for data engineering and BI projects.
- Created Python REST APIs using Lambda + API Gateway to expose data and metrics for internal/external consumers.
- Implemented CI/CD (GitLab/Azure DevOps) for AWS ETL and Databricks jobs, and configured governance/lineage via Unity Catalog and enterprise data governance tools.
Data Engineer - Walgreens - Houston, TX
(2023-11 - 2024-11)
Environment: Azure (Synapse, Data Factory, Data Lake, HDInsight, Azure Functions, Azure DevOps), Databricks (PySpark, Delta Lake), Kafka, PostgreSQL, SQL Server, Power BI, Terraform, Python, Shell
- Designed and maintained Azure-based data platforms using Azure Data Lake and Synapse for analytics and downstream reporting.
- Built Databricks (PySpark) data pipelines reading from Azure Data Lake and streaming sources (Kafka), writing curated Delta Lake tables and Synapse models.
- Orchestrated pipelines with Azure Data Factory, integrating Databricks, Synapse, and external data sources in both batch and near real-time patterns.
- Created advanced Power BI solutions on Azure data models, including semantic models with tabular modeling, DAX, calculation groups, shared datasets, and retail/pharmacy dashboards (customer, basket, store performance) with drill-through, bookmarks, KPI scorecards, and row-level security per region/store.
- Used Azure DevOps for CI/CD of data pipelines and Power BI artifacts, and Terraform for IaC across Azure resources.
Junior Data Engineer - EigerTech Knowledge Services - India
(2018-02 - 2021-09)
Environment: GCP (BigQuery, Dataflow/Apache Beam, Pub/Sub, Dataproc, Cloud Composer, GCS, Cloud Functions, Data Catalog), Databricks, Spark, Hive, SQL, Power BI
- Built GCP-based data platforms with BigQuery as the analytical warehouse and GCS as the data lake.
- Developed batch and streaming data pipelines using Cloud Composer (Airflow), Dataflow (Apache Beam), Pub/Sub, and Dataproc, loading multi-layer BigQuery models (staging/refined/curated).
- Optimized BigQuery with partitioning, clustering, and cost-aware query design; applied GCP IAM and Data Catalog for security, metadata, and lineage.
- Created Power BI dashboards on BigQuery datasets for operational and management reporting with advanced DAX and row-level security.
Data Analyst - EigerTech Knowledge Services - India
(2017-04 - 2018-02)
Environment: Python, Jupyter Notebook, MySQL, PostgreSQL, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Plotly, Power BI Desktop, Power BI Service Pro, Teradata, Salesforce, Einstein Analytics Studio, Business Objects, SharePoint, SAP HANA, SAP Logon, QlikView, IBM Lotus Notes, SQL, SQL Server, MS Access, Microsoft Office Suite (Word, PowerPoint, Excel)
- Developed Python scripts utilizing libraries such as Pandas and NumPy for data processing, automation, and reporting, enhancing efficiency and accuracy.
- Conducted hypothesis testing using Python libraries like SciPy and statsmodels to validate marketing campaign performance, resulting in a 15% increase in conversion rate.
- Built interactive Power BI dashboards and paginated reports using multiple data sources (SQL, Excel, CSV).
- Developed DAX measures, hierarchies, drill-down reports, and KPI scorecards for business stakeholders, and worked closely with business teams to translate requirements into effective data models and visualizations.