Software Support Engineer
LuxoftDescripción del puesto
Project Description
The ideal candidate will have a strong understanding of Starlink Production Core Support.
He will be responsible for Eyes on glass Monitoring, Triage & Incident Ownership, Troubleshooting & Restoration, Cross-Team Collaboration, Platform & Application Stack Awareness and Service Quality & Process Excellence. Responsibilities:
- Triage & Incident Ownership o Perform rapid intake, triage, and prioritization of alerts, tickets, and incidents.
o Act as Incident Owner during high-severity events, ensuring clear communication, timely updates, and swift restoration of service.
o Maintain accurate, real-time incident timelines and post-incident documentation.
- Troubleshooting & Restoration o Execute root-cause isolation across application, middleware, APIs, data, and infrastructure layers.
o Use observability/monitoring tools (e.g., Kibana, Dynatrace, CloudWatch, Grafana) to correlate logs, metrics, and traces; identify anomalies, performance bottlenecks, and failure patterns.
o Perform targeted mitigations, rollbacks, config fixes, and coordinate hotfixes to restore service quickly.
- Cross-Team Collaboration o Engage with App Dev, DevOps, Database, Network, Security, QA, and vendor partners to drive efficient problem resolution.
o Provide clear technical context, hypothesis-driven analysis, and evidence from monitoring tools to accelerate fixes.
o Facilitate postmortems and continuous improvement actions.
- Platform & Application Stack Awareness o Identify and recognize the application stack (UI-frontend, backend services, APIs, queues, databases, caches, containers, orchestration, networking) for each impacted service to quickly isolate the source of issues.
o Maintain runbooks, service maps, and dependency diagrams to speed up diagnosis.
- Service Quality & Process Excellence o Contribute to automation and self-healing routines (alert tuning, auto-remediation, playbooks).
o Recommend monitoring gaps to improve observability Mandatory Skills Description:
- 3-5+ years in Support, or Production Operations roles.
- Strong triage and troubleshooting mindset with proven incident ownership and sense of urgency.
- Hands-on with observability/monitoring tools like Kibana (log analysis), Dynatrace (APM), plus familiarity with metrics and tracing.
- Solid understanding of application architectures (microservices, APIs, message queues, databases, AWS services).
- Ability to correlate logs/metrics/traces and form data-driven hypotheses to isolate root causes.
- Proficient in SQL basics and comfortable reading application logs (JSON, stack traces).
- Excellent communication skills for incident coordination and stakeholder updates (concise, structured, timely).
- Expertise in C# and .NET Core, MySQL/MSSQL/No-SQL.
- Solid understanding of SOAP, REST, and Web API based web service protocols
- Solid understanding of AWS Cloud service and Integration
- Cloud Technology : AWS - Lambda, Neptune, Dynamo DB, document DB, RDS SQL, Mongo DB (medium), Redis cache, EC2 hosting, Integration with .NET Core (middleware) , DynaTrace/Datadog.
- Experience with Microsoft Entity Framework and/or LINQ to SQL
- Logging: Kibana, Elastic search, Log stash
- Dev Tool: Fiddler, Postman and Swagger
- Knowledge on GIT Nice-to-Have Skills Description:
- Testing Framework: XUnit/NUnit/MSTest
- Experience with Microsoft Enterprise Library
- Experience with Event Base Architecture
- Exceptional at working problems of moderate scope where analysis of situations or data requires review of a variety of factors
- Exceptional at triage or analysis of situations for production support
- Effective Communication (verbal + written)
- Airline Domain knowledge is a PLUS
¿Te interesa este puesto?