We are seeking a DevOps Engineer with strong expertise in cloud infrastructure, deployment, and system reliability, along with recent hands-on experience integrating Large Language Model (LLM)-based solutions. This role focuses on building and managing scalable, secure, and production-ready platforms that support modern Generative AI applications.
Key Responsibilities
- Design, build, and manage CI/CD pipelines for AI/LLM-enabled applications
- Deploy and manage containerized applications using Docker and orchestration tools (e.g., Kubernetes)
- Integrate LLM-based services (APIs, inference endpoints) into production environments
- Ensure system reliability, scalability, and performance for AI-driven workloads
- Conduct vulnerability assessments, security scans, and compliance checks across code and infrastructure
- Implement automated testing, including regression testing and validation for AI systems
- Monitor applications and infrastructure using logging and observability tools
- Collaborate with data scientists, ML engineers, and software developers to productionize AI solutions
Required Skills
DevOps (Primary Focus)
- Strong experience with
- CI/CD tools (Jenkins, GitHub Actions, GitLab CI)
- Containerization and orchestration (Docker, Kubernetes)
- Cloud platforms (AWS, Azure, or GCP)
- Infrastructure as Code (Terraform, CloudFormation)
- Experience with system monitoring (Prometheus, Grafana, ELK stack)
Security & Quality
- Knowledge of
- Code vulnerability scanning (SAST/DAST tools)
- Secure coding and infrastructure practices
- Experience with
- Automated regression testing
- Code quality and compliance checks