Senior DevOps Engineer
Описание на позицията
Our client is a HealthTech product company building an innovative AI-powered digital health SaaS platform and is currently looking for a Senior AWS DevOps Engineer to lead the development of their AWS, ML and K8s infrastructure.
Responsibilities:
> Lead operations for multi-tenant SaaS workloads on AWS, ensuring scalability, high availability, and cost efficiency.
> Design, implement, and maintain reliable infrastructure for production, data, and AI/ML workloads.
> Own incident response, postmortems, and operational runbooks to improve system reliability and reduce MTTR.
> Manage and enhance CI/CD pipelines supporting both application and ML deployment workflows.
> Build and maintain infrastructure automation using Infrastructure-as-Code (AWS CDK or Terraform).
> Enable self-service capabilities for engineering and data science teams.
> Monitor and optimize cloud usage across compute, GPU, and storage resources, implementing cost controls and forecasting.
> Support and automate ML pipelines, including training, testing, and deployment using AWS SageMaker, Kubeflow, or MLflow.
> Manage GPU and compute clusters (EKS, ECS, EC2) for model training and inference workloads.
> Develop and maintain monitoring, alerting, observability, and security best practices.
> Collaborate closely with Engineering, Data, AI/ML, and PlatformOps temas to ensure smooth cross-team delivery.
Requirements:
> 7 years of experience in DevOps/ CloudOps/ SRE.
> Solid hands-on experience with AWS (Fargate, EKS, EC2, S3, RDS, Lambda, IAM, CloudWatch, CloudTrail), K8s and containerized workloads.
> Proficiency with CI/CD tools, Infrastructure-as-Code, infrastructure automation, and scripting.
> Proven experience with AI/ML platforms (AWS SageMaker, Kubeflow, MLflow, or equivalent), and cost‑efficient GPU/compute optimization.
> Working knowledge of MongoDB operations, monitoring, and performance tuning.
> Solid understanding of FinOps principles, cloud cost monitoring, and right-sizing strategies.
> Experience with production monitoring & incident management (Splunk, Grafana, OpenTelemetry).
> Exposure to multi-tenant SaaS architectures and security or compliance frameworks is a plus.
> Strong collaboration, mentoring, and communication skills, with the ability to thrive in a fast-paced, evolving environment.
Working with them has its perks:
> Additional Health Insurance ( optional family member).
> Sports Card ( optional family member).
> Certifications and trainings.
> Personal Career Growth plan.
> Transportation Allowance.
... and much more if you're up for it.
Let's talk whenever at niki@cadabra.bg
(No. 2709 from 17.01.2019)
¿Te interesa este puesto?