Key Responsibilities: - Own the health, stability and performance of mission-critical services: respond to incidents, perform root-cause analysis, implement corrective actions and prevent recurrence. - Diagnose latency, throughput or availability issues in production and design scalable solutions to