DevOps Engineer / Sr. DevOps Engineer
Technology
TechBlocksGurugram, India2 months agoUntil 19/4/2026
Full timeOn-site
Job description
- *Roles & Responsibilities:
- *Duties & accountabilities
- Provide second line client-facing technical support for issues escalated by first line support teams.
- Apply strong technical skills and good business knowledge together with investigative techniques and problem-solving skills toidentifyand resolve issues efficiently andin a timely manner.
- Work collaboratively with development teamrequiredfor third line escalation.
- Coordinate with product and delivery teams to ensure the Service Management team is ready for new releases and engaged in early design of new enhancements.
- Work on initiatives and continuous improvement process around proactive application health monitoring, reporting, and technical support.
- Apply AI/ML techniques to detect anomaly, predict alerting, and to enhance support operations.
- *Key Areas of The Teams Responsibilities Are
- Proactive monitoring and management of business critical 24x7 real-time. Where required to rectify issues ina timelyfashion to restore application functionality.
- Ensure incidents are correctly processed, assessing business and technical impact and severity.
- Taking ownership of application incidents and ensuring that they are resolved, this includesretainingownership of incidents that require 3rd Line or IT Change activity to resolve.
- Ensuring the communication to the business communityremainsactive.
- Application responsibilities will cover Application Infrastructure, Data Fixes, User Queries, UserEducationand Incident Investigation.
- Monitoring of application events alerts, job schedules, capacitymonitorsand performance KPI''s. Creation and ownership of change requests raised to address any of the above issues.
- Proactively share knowledge with the team and update the knowledge base with support documentation (Confluence).
- Work to provide services to agreed Service Level Targets and Operating Level Agreements.
- Leverage AI Ops techniques to analyse logs, metrics, traces, and event data, enabling proactive trend identification and continuous optimization of system performance
- *Education and Hand on experiencerequired.
- Preferably 4+ years of direct experience in Site Reliability Engineering or DevOps roles, high availability, and incident response in AWS or Azure or GCP.
- Proficiencywith cloud computing environments (AWS / GCP/ Azure).
- Good understanding of Application Support processes
- Ideally familiar with monitoring tools such as Splunk,Cloudwatch, Dotcom and Monolith.
- Expertisein Oracle SQL/PostgreSQL:Proficiencyin advanced SQL techniques, query optimization, and experience with complex database systems.
- Experience with advanced observability tools (e.g., Prometheus, Grafana, Splunk) for monitoring, logging, and tracing.
- Experience in leading post-mortem analyses and implementing preventative measures to avoid recurrence of incidents.
- Excellent problem-solving skills and the capacity to lead effectively under pressure during incident response and outage management.
- Must understand operating systems most especially Windows and Linux. Good scripting experience (preferably including python) an advantage.
Keywords
confluenceamazon-web-servicesazure-devopsmicrosoft-azuregoogle-cloud-platformsplunkamazon-cloudwatchoraclepostgresqlprometheusgrafanawindowspython
¿Te interesa este puesto?