Triage & Incident Ownership o Perform rapid intake, triage, and prioritization of alerts, tickets, and incidents. o Act as Incident Owner during high-severity events, ensuring clear communication, timely updates, and swift restoration of service. o Maintain accurate, real-time incident timelines and post-incident documentation.
Troubleshooting & Restoration o Execute root-cause isolation across application, middleware, APIs, data, and infrastructure layers. o Use observability/monitoring tools (e.g., Kibana, Dynatrace, CloudWatch, Grafana) to correlate logs, metrics, and traces; identify anomalies, performance bottlenecks, and failure patterns. o Perform targeted mitigations, rollbacks, config fixes, and coordinate hotfixes to restore service quickly.
Cross-Team Collaboration o Engage with App Dev, DevOps, Database, Network, Security, QA, and vendor partners to drive efficient problem resolution. o Provide clear technical context, hypothesis-driven analysis, and evidence from monitoring tools to accelerate fixes. o Facilitate postmortems and continuous improvement actions.
Platform & Application Stack Awareness o Identify and recognize the application stack (UI-frontend, backend services, APIs, queues, databases, caches, containers, orchestration, networking) for each impacted service to quickly isolate the source of issues. o Maintain runbooks, service maps, and dependency diagrams to speed up diagnosis.
Service Quality & Process Excellence o Contribute to automation and self-healing routines (alert tuning, auto-remediation, playbooks). o Recommend monitoring gaps to improve observability Mandatory Skills Description:
4 years of experience in application support of web applications.
4 years of experience supporting JavaScript with React v18 with modern patterns, Redux, Redux Sagas, Vite (current build tool) and ES6 or equivalent experience • Advanced knowledge of software engineering standard processes such as versioning and versioni