What You’ll Do Reliability & Operations - Own availability, latency, and scalability across SaaS and AI systems - Define and enforce SLOs, SLIs, and error budgets - Participate in a global on-call rotation (~1 week every 4 weeks) - Lead incident response and drive blameless postmortems with systemic