Skip to main content

DevOps Engineer

Technology
Qlay
2 weeks agoUntil 10/06/2026
Fully remote

Job description

As a Principal Engineer

, you will:
  • Own and evolve the architecture of cloud and on-prem infrastructure
  • Lead design and implementation of highly available, scalable, and secure systems
  • Set infrastructure standards for reliability, observability, performance, and cost efficiency
  • Drive company-wide initiatives around availability, disaster recovery, and capacity planning
  • Act as a technical authority for infrastructure, SRE, and platform-related decisions
  • Review and guide complex infrastructure designs, migrations, and incident responses
  • Identify systemic risks and proactively reduce operational and security debt
  • Mentor senior engineers and raise the bar for infrastructure engineering practices
  • Partner with product, security, and leadership to align infra strategy with business goals
  • Preferably graduated from a top Vietnamese university (HCMUT, HCMUS, HUST, UIT, etc.) and/ or top university around the world.
  • 6+ years of experience in infrastructure, systems, or platform engineering
  • Deep expertise in cloud platforms (AWS, GCP, or Azure) and cloud-native architecture
  • Strong background in Linux, networking (TCP/IP, DNS, load balancing), and distributed systems
  • Extensive experience with Infrastructure as Code (Terraform, CloudFormation, Pulumi, etc.)
  • Proven experience designing and operating high-availability, production systems
  • Strong incident management and root cause analysis experience
  • Excellent communication and technical leadership skills
Technical Skills
  • Cloud services: compute, networking, storage, IAM
  • Containers and orchestration (Docker, Kubernetes)
  • CI/CD systems and deployment automation
  • Observability: monitoring, logging, tracing, alerting
  • Security best practices (identity, secrets, encryption, least privilege)
  • Cost optimization and capacity planning
Nice to Have
  • Experience with SRE practices and SLIs/SLOs
  • Experience running infrastructure at scale (high traffic, global systems)
  • Multi-region and disaster recovery architecture
  • Experience with compliance-heavy environments (SOC2, ISO, HIPAA, PCI)
  • Background in platform or developer experience teams
  • This will be structured as contractor work.
  • Devices: You will be expected to use your own computer to perform the work.
_______________________________ If you are interested in this position, please:
Keywords
development-operations-devopsplanning-and-designvisual-art-designproduct-development-and-designobservabilitycost-efficiencydisaster-recoveryrepair-and-recoveryplanning-and-forecastingelectrical-engineering-and-planningcapacity-planningsite-reliability-engineering-srementoringpolicies-and-practicesunit-investment-trust-uitamazon-web-servicesgoogle-cloud-platformgood-clinical-practice-gcpmicrosoft-azurelinuxnetworking-telecommunicationsintellectual-propertytcp-ip-protocoldomain-name-system-dnsload-balancingdistributed-computinginfrastructure-as-code-iacterraformpulumiincident-and-problem-managementincident-breach-managementroot-cause-analysis-rcatesting-and-analysiscloud-servicesdistribution-and-storageaws-identity-and-access-managementidentity-access-management-iamservice-management-and-orchestration-smodockerkubernetescustomer-intelligence-cicontinuous-integrationcd-certificate-of-depositci-cddeployment-automationdata-encryptionleast-privilegeexpense-optimizationcompliancesoc-2soc-ii-compliancei-o-memory-peripheral-connectorsinternational-organization-for-standardization-isoincentive-stock-options-isohealth-information-privacy-hipaahipaa-compliancepercutaneous-coronary-intervention-pcideveloper-experienceexternal-workforce

¿Te interesa este puesto?