Lakshay Goel

Data Engineer at Vrentin Tech (2025-04 – Present)

The project involved building a robust, real-time financial data pipeline to ingest, transform, and deliver financial data to end-users and reporting dashboards. The system processes transactional and market data from multiple sources in near real-time, ensuring data freshness, accuracy, and reliability for financial reporting and compliance purposes.

Designed and developed 4–5 end-to-end data pipelines covering batch and real-time streaming use cases
Optimized Spark job run times from ~1 hour to ~20 minutes (67% reduction) through code redesign, query optimization, and efficient partitioning strategies
Implemented Databricks Instance Pools, reducing cluster spin-up time from ~10 minutes to under 2 minutes, significantly lowering infrastructure costs and improving SLA adherence
Built structured streaming jobs to consume and process data from Kafka topics in real time
Developed an automated job monitoring and alerting system using Databricks system tables to proactively notify teams of failing or stale jobs/pipelines
Loaded processed data into Snowflake tables using the Snowflake-Databricks connector for downstream reporting
Implemented role-based access control (RBAC) to enforce data security and regulatory compliance
Collaborated with a cross-functional team of 10 members, participating in design reviews, sprint planning, and code reviews

Sr. Technical Solutions Engineer at Databricks Pvt. Ltd. (2022-07 – 2025-04)

As a Senior Technical Solutions Engineer at Databricks, worked in an individual contributor capacity with 10+ enterprise customers across Finance, Retail, and E-commerce domains. Engagements were project/need-based, focusing on Data Lakehouse migration, real-time streaming pipeline development, performance tuning, and enabling customers to adopt new Databricks platform features. Handled data volumes up to 12 TB and delivered up to 50% job run time improvements across client engagements.

Designed and developed 8–10 Notebook-based Spark jobs and pipelines across multiple customer engagements
Worked individually with 10+ enterprise customers (JPMC, Swiggy, Meesho, and others) handling data volumes up to 12 TB
Delivered up to 50% improvement in Spark job run times through deep-dive performance profiling, query plan analysis, and resource tuning
Led Data Lakehouse migration engagements, guiding customers from legacy data warehouses to Databricks Delta Lake architecture
Designed and implemented real-time streaming pipelines using Delta Live Tables (DLT) and Structured Streaming for event-driven use cases
Diagnosed and resolved complex Spark job failures, OOM errors, data skew issues, and pipeline bottlenecks for customer workloads
Collaborated with the Databricks Engineering team to test upcoming platform features and provide actionable customer feedback
Served as the technical bridge between customers and Databricks engineering, translating business requirements into scalable data solutions

Senior Software Engineer — Data Engineer at Optum Global Solutions (2020-06 – 2022-07)

Commercial Cradle is a healthcare claims intelligence platform at Optum, designed to analyze, prioritize, and flag insurance claims using a rule-based recommendation engine. The system ingests large volumes of claims data from Oracle, applies rule-based scoring and prioritization logic, and stores processed data in Cassandra and Azure Cosmos DB for downstream consumption. The platform improved fraud detection accuracy and enabled analysts to focus on high-risk claims first.

Designed and developed end-to-end Spark modules to ingest claims data from Oracle DB into Cassandra tables, enabling fast, scalable data access
Built a Python-based rule prioritization module that scored and ranked insurance claims by risk weight, directly improving fraud detection outcomes
Developed automated Airflow DAGs to orchestrate daily Spark jobs, ensuring reliable pipeline execution and scheduling
Implemented Azure Cosmos DB as the primary store for claims metadata, enabling globally distributed, low-latency read access
Designed and built a daily reporting Spark job generating operational analytics reports for business stakeholders
Containerized data workloads using Docker and deployed on AKS, ensuring scalability and environment consistency
Integrated Azure Data Factory (ADF) and Azure Blob Storage for seamless data movement across the Azure ecosystem

Technology Analyst / Big Data Developer at Infosys Limited (2014-06 – 2020-04)

Prime Therapeutics is a leading pharmacy benefits management (PBM) company in the US healthcare space. The project involved designing and building a reusable, scalable ETL Pipeline Framework using Apache Spark and the Hadoop ecosystem to process large volumes of pharmaceutical and claims data. The framework performed data cleansing, change data capture (CDC), historical data purging, and analytical data loading — significantly reducing data processing time and improving data quality for downstream healthcare analytics.

Designed and developed multiple reusable Spark modules in Scala for data cleansing, change data capture (CDC), and data transformation

Hire this person

About

Experience

Education

Skills

Reviews

Similar people near Chandigarh

Jagdeep Singh

Parveen Kumar

Sarabjit Jeet

Ashwani Kumar

Neha Khullar

Ruchika Pandita

Other similar people

Harish Kumar

Ankur Jain

Ankit Sarawagi photo

Surajit Ghosh

Mustaq Ahmed

Raghvendra Upadhyay

Related