Software Development Engineer - 1 at COMMERCEIQ (2022-01 – Present)
Joined as an ASDE - 1 and promoted to SDE - 1 in Mar 2023.
- Designed and built a scalable asynchronous microservices platform for crawling and scraping millions of pages/day across multiple retailers using event-driven architecture.
- Designed and led the migration of MDS from a monolithic ingestion to a distributed event-driven architecture using SQS-based event chaining and client-specific Snowflake warehouses, eliminating bottlenecks and achieving 15x scalability, 3x throughput, and 70% reduction in end-to-end latency.
- Optimized crawler logic to eliminate redundant requests, reducing BD crawl costs by 50% and AWS infra spend from $3.2K/day to $1.6K/day(50% reduction), while improving data reliability and supporting ARR/GDR growth.
- Designed and built E2E pipeline for Zip-code based and category analytics crawling improving data coverage and enabling advanced marketplace insights.
- Designed and led the re-extraction process. Involved incorporating Gather-SDK support, UI creation, and data flow job implementation, providing a seamless and efficient re-extraction experience.
- 62% cost reduction in Databricks warehouse expenses, saving $6K/month, supporting ARR & GDR growth. 96% reduction in runtime (from 11 hours to 30 minutes). Optimised compute resources and query performance, enhancing stability and efficiency.
- Owned the 3P platform end-to-end as primary POC, leading architecture, integrations, and execution; drove cross-service development and sprint delivery, improving scalability, reliability, and operational efficiency.
- Designed and built scalable PySpark pipelines (Databricks) for in-house seed generation, processing high-volume data from Elasticsearch, BigQuery, and Databricks, enabling multi-retailer crawling at scale.
- Reduced Databricks pipeline runtime from 24 hours to 1 hour (96%) and cut infrastructure costs from $11K/month to $480/month (95% savings) using BigQuery materialized views and query optimizations.
- Designed and implemented centralized Airflow scheduler (GitOps-based), unifying workflow orchestration across in-house and 3P pipelines and improving observability and maintainability.
- Designed and built Seed Gatekeeper, a centralized cloud-native Spring Boot service on Kubernetes to validate crawling seeds against provider rules, improving data quality and preventing invalid pipeline executions.
- Improving production stability by addressing on-call and CS issues. Through diligent efforts, resolving these issues resulted in a remarkable decrease in the daily workload from 2-3 hours per day to 2-3 hours per week.
- Mentored and onboarded engineers, accelerating team productivity and contributing to design reviews and engineering best practices.
Software Engineer Intern at RAKUTEN INDIA (2021-07 – 2021-12)
Worked with Rakuten Catalogue Platform team.
- Automated the process of pre- and post-checks of the crawled store data, reducing manual data checks and increasing work efficiency by 70%.
Software Engineer Intern at CDK GLOBAL (2021-05 – 2021-07)
Worked with Dev Platform team.
- Developed and trained ChatBot for the Dev-Platforms team to help resolve JIRA tickets.