Skip to main content

DevOps Engineer – SRE

Technology
Triune Infomatics Inc
California, United States1 weeks agoUntil 6/8/2026
Fully remote

Job description

Senior DevOps Engineer – SRE

Location:

Remote

Duration:

12+ months

Overview

We are seeking a highly skilled

Senior DevOps Engineer – Site Reliability Engineering (SRE)

to lead the design, implementation, and reliability of scalable cloud infrastructure. This role focuses on ensuring high availability, performance optimization, and automation across AWS environments.

The ideal candidate will bring deep expertise in AWS, monitoring, and automation, with a strong SRE mindset to support mission-critical applications in a 24/7 production environment. You will work closely with engineering and operations teams to build resilient systems, improve observability, and drive operational excellence.

Required Skills

  • Strong hands-on experience with
AWS cloud services and infrastructure management
  • Experience implementing alerts, alarms, and notifications using CloudWatch and/or Dynatrace
  • Experience working with
AWS services such as Kafka, ECS, and EKS
  • Expertise in

Infrastructure as Code (IaC)

using Terraform or AWS CDK
  • Strong background in automation and configuration management
  • Experience with

CI/CD pipelines

(Jenkins, Azure DevOps, or similar tools)
  • Proven

Site Reliability Engineering (SRE)

experience in production environments
  • Strong
Linux system administration and OS-level troubleshooting skills
  • Experience supporting
24/7 production environments

, including incident response and RCA

  • Solid understanding of monitoring, observability, and performance tuning
  • Experience with networking fundamentals
(TCP/IP, DNS, load balancing)

Preferred Skills

  • AWS certifications (DevOps Engineer or Solutions Architect)
  • Experience with
Ansible, Python scripting

, or other automation tools

  • Familiarity with high availability (HA) and disaster recovery (DR)
architectures
  • Experience with container orchestration and microservices architecture
  • Knowledge of security best practices and vulnerability management tools
  • Experience working in enterprise-scale environments
  • Exposure to
Java/.NET application deployments
  • Understanding of databases (SQL Server, Oracle)
  • Strong troubleshooting and problem-solving skills across infrastructure and applications
  • Experience with multi-region / multi-AZ AWS deployments
Keywords
development-operations-devopssite-reliability-engineering-sreplanning-and-designvisual-art-designproduct-development-and-designcloud-infrastructurehigh-availabilityperformance-optimizationamazon-web-servicesmission-criticalenvironment-health-and-safety-hsseecology-environmentmaintenance-repair-and-operations-mroibm-security-soarobservabilityoperational-excellencecloud-servicesinfrastructure-managementamazon-cloudwatchdynatracekafkaamazon-ecsamazon-eksamazon-elastic-kubernetes-service-eksinfrastructure-as-code-iacterraformconfiguration-managementcustomer-intelligence-cicontinuous-integrationcd-certificate-of-depositci-cdjenkinsmicrosoft-azureazure-devopslinuxsystem-administrationtroubleshootingincident-responseroot-cause-analysis-rcavehicle-modification-tuningnetworking-telecommunicationsintellectual-propertytcp-ip-protocoldomain-name-system-dnsload-balancingtraining-certificationansiblepythonscriptingdisaster-recoveryrepair-and-recoverydirect-response-drservice-management-and-orchestration-smocontainer-orchestrationmicroservicesmicro-services-architecturepolicies-and-practicesmalware-and-vulnerabilitiesvulnerability-managementjavasqloracleavailability-zones

¿Te interesa este puesto?