Site Reliability Engineer at DeepIntent (2025-01 – Present)
Architected AWS infrastructure and implemented Zero Trust security across multi-cloud environments
- Architected AWS infrastructure using Terraform for various deployments like EC2, Lambda, S3, IAM, VPC, ALB, CloudWatch etc.
- Implemented Zero Trust security using IRSA, RBAC, and Pod Security Standards across 30+ microservices, eliminating long-lived credentials and improving SOC2 compliance from 65% to 98%.
- Designed VPC architecture with public/private subnets and NAT gateways; secured traffic for multi-tier web app demo.
- Architected and implemented a secure multi-cloud bridge between AWS and GCP using site-to-site vpn, creating secure connection between the two clouds
- Automated Multi-Cloud Provisioning: Authored reusable Terraform modules to standardize the setup of OIDC providers, IAM roles, and VPC peering.
- Built automated CI/CD pipelines with Github Self Hosted Runners for infrastructure like (EKS + on-premise Kubernetes) with integrated vulnerability scanning tools like Trivy and SonarQube.
- Led SOC2 compliance initiative, driving vulnerability tracking and remediation across media services, SRE-owned services, and internal tools.
- Enforced platform-wide security scanning, ensuring all services were covered by automated vulnerability detection and compliance checks.
- Automated serverless workflows with Lambda, S3 triggers, EventBridge; integrated SNS/SES for real-time alerting.
- Upgraded and optimized monitoring exporters for critical data sources, improving reliability and observability of production systems.
- Drove AWS cost optimization efforts, achieving $12K/month savings through rightsizing, lifecycle management, and resource governance.
Site Reliability Engineer at One2n Consulting (2024-03 – 2024-12)
Led architecture design and deployment of scalable AWS infrastructure; managed microservices and CI/CD optimization
- Led the architecture design and deployment of scalable, secure, and cost-efficient cloud infrastructure on AWS using Terraform, enhancing system reliability and performance.
- Deployed and managed microservices using Jenkins and Bitbucket Pipelines, reducing deployment errors and improving release reliability.
- Migrated CI/CD from Jenkins to Bitbucket Pipelines.
- Automated scheduled start & stop of instances and ECS cluster/services for non-prod environments to save cost.
- Designed and stress-tested scalable AWS infrastructure using JMeter, scaling ECS clusters from 5 to 25 nodes to handle 5x traffic spikes while maintaining SLA, saving $8K/month in over-provisioning
- Implemented AWS IDP with Google workspace to manage users and their boundaries.
- Served as primary DevOps support, resolving 95%+ of infrastructure issues within 2 hours, reducing mean-time-to-resolution (MTTR) by 40%
Site Reliability Engineer at One2n Consulting (2024-03 – 2024-12)
Set up observability stack and automated CI/CD pipelines for Smartwinnr
- To monitor services and servers, set up an Observability stack using Grafana, Prometheus, Promtail, Loki, and Ansible.
- Developed NGINX log parser (Python/Bash) extracting 2xx/4xx/5xx status codes, resulting in real-time SLI reporting (10-minute intervals) to Slack for 10+ services
- Built and optimized GitLab pipeline automation, reducing deployment time by 60% and enabling one-click deployments for development team.
DevOps Engineer at Poonawalla Fincorp (2022-06 – 2024-02)
Managed Kubernetes clusters, led cloud migration, and implemented security and monitoring solutions
- Created Ansible playbooks to configure RHEL and Oracle Linux servers for Kubernetes and OpenShift deployments.
- Led the migration from On-Prem infrastructure to OCI and AWS.
- Managed Kubernetes clusters on Oracle Cloud Infrastructure (OKE) and Amazon Web Services (EKS) Fargate, ensuring high availability and scalability for microservices architecture.
- Led efforts to improve security by integrating security scanning tools such as Trivy and SonarQube into the CI/CD pipelines and applying best practices for image scanning and vulnerability management.
- Implemented monitoring and logging solutions using Prometheus and Grafana in OCI and CloudWatch and New Relic in AWS, resulting in improved visibility into system performance.
- Migration from cloud provided CI/CD tools in OCI and AWS to Github Actions.
- Provided Tier-1 DevOps support, resolving 95%+ infrastructure issues within 2 hours (40% MTTR reduction)