ASSISTANT SYSTEM ENGINEER - IOPEX TECHNOLOGIES - BANGALORE
(2024-05)
Incident Management & Operations - RRPS Project
- Spearheaded P0–P3 incident management across global environments, maintaining 95%+ SLA compliance and minimizing service disruption.
- Commanded critical outages as Incident Commander, driving cross-functional coordination and reducing MTTR by 20%.
- Evaluated and prioritized large-scale disruptions impacting 20%+ of customers, enabling faster and informed recovery decisions.
- Communicated real-time updates to stakeholders and leadership, ensuring 100% adherence to communication SLAs and issuing Cloud Availability Notifications within 30–60 minutes.
- Managed 30+ incidents/month, ensuring 10-minute acknowledgment SLO compliance and timely SME engagement.
- Automated incident workflows using Slack bots, improving response time by 25% and enhancing visibility across teams.
- Led post-incident reviews (PIRs), ensuring 100% RCA completion and driving continuous improvement actions.
- Reduced repeat incident volume by ~15% by identifying recurring issues and implementing preventive measures through RCA insights.
- Transitioned 10+ critical incidents/month to full CINC, ensuring seamless handover and reducing resolution delays by ~15%.
- Governed incident lifecycle compliance, achieving 100% closure within 72-hour SLO and reducing backlog by ~20%.
ASSISTANT SYSTEM ENGINEER - IOPEX TECHNOLOGIES - BANGALORE
(2024-05)
Splunk Applications & Support - Kaleidoscope Project
- Owned end-to-end support for Splunk applications and add-ons, resolving 5+ tickets/week across data ingestion, parsing, and search performance.
- Oversaw full application lifecycle (installation, configuration, upgrades, patching, decommissioning) across multiple Splunk environments.
- Troubleshot and resolved complex technical issues, ensuring 95%+ SLA adherence with minimal service downtime.
- Optimized SPL for log analysis and query optimization, improving search performance by ~15%.
- Designed and maintained 10+ dashboards and alerts, enhancing monitoring, visibility, and operational insights.
- Configured data onboarding pipelines, ensuring accurate, consistent, and reliable data flow.
- Reduced alert noise by ~20% through threshold tuning, alert optimization, and misfire resolution.
- Drove with Tier 2/3 teams root cause analysis (RCA) and implement preventive fixes for recurring issues.
- Improved knowledge objects (fields, lookups, dashboards), improving data usability and reporting accuracy by ~20%.