Site Reliability Engineer Professional
Tecnología
IBM
Guerrero, MéxicoHace 1 mesesHasta 10/4/2026
Descripción del puesto
Introduction
At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future‑ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.Your Role And Responsibilities
We’re seeking a Site Reliability Engineer Professional to support the availability, performance, and day‑to‑day operations of our services and platforms. The engineer in this role will apply SRE best practices like automation, observability, Kubernetes, CI/CD. Responsibilities include system maintenance, tooling improvements, participation in on‑call, and contributing to the reliability and scalability of services.Key Responsibilities
Operations & Reliability
- Participate in an on‑call rotation with mentorship and established runbooks
- Perform operational tasks: log reviews, rollouts, restarts, configuration updates, certificate renewals
- Maintain and update runbooks, dashboards, diagrams, and documentation
Monitoring & Observability
- Build or update dashboards and alerts using Prometheus, Grafana, and Loki
- Tune alerts to reduce noise and improve signal quality
- Apply golden signal and RED/USE patterns under guidance
Automation & Tooling
- Develop automation scripts with Python, Bash, or Go to eliminate repetitive tasks
- Contribute to CI/CD pipelines (linting, gates, templates)
Cloud & Platform
- Support deployment and operation of workloads on Docker, Kubernetes, and OpenShift
- Contribute to infrastructure changes using Terraform and Ansible with review
- Assist with basic cloud provisioning tasks
Networking & Security
- Apply foundational networking concepts (TCP/IP, DNS, routing, HTTP, TLS) in troubleshooting
- Follow least‑privilege and proper secrets‑management practices
Collaboration & Process
- Participate and/or lead Agile ceremonies (standups, planning, retros)
- Contribute and/or lead blameless post‑incident reviews
- Collaborate with cross‑functional teams and use standard Git workflows
Required Technical And Professional Expertise
- Between 1 and 3 years of experience in SRE/DevOps/Platform Engineering or related fields
- Advanced English proficiency is a must
- Strong Linux fundamentals: CLI, processes, permissions, logs, troubleshooting
- Proficiency in at least one scripting language (Python, Bash, or Go)
- Experience with Git and GitHub workflows
- Familiarity with Docker and Kubernetes basics
- Experience with CI/CD implementations
- Basic networking knowledge
Preferred Technical And Professional Experience
- OpenShift experience
- Hands‑on exposure to Terraform and Ansible
- Experience with Prometheus, Grafana, Loki, Thanos, or OpenTelemetry
- Cloud platform fundamentals (IBM Cloud, AWS, Azure, or GCP)
- Optional experience with JavaScript or TypeScript
¿Te interesa este puesto?