Software Support Engineer

Project Description

The ideal candidate will have a strong understanding of Starlink Production Core Support.

He will be responsible for Eyes on glass Monitoring, Triage & Incident Ownership, Troubleshooting & Restoration, Cross-Team Collaboration, Platform & Application Stack Awareness and Service Quality & Process Excellence. Responsibilities:

Triage & Incident Ownership o Perform rapid intake, triage, and prioritization of alerts, tickets, and incidents.

o Act as Incident Owner during high-severity events, ensuring clear communication, timely updates, and swift restoration of service.

o Maintain accurate, real-time incident timelines and post-incident documentation.

Troubleshooting & Restoration o Execute root-cause isolation across application, middleware, APIs, data, and infrastructure layers.

o Use observability/monitoring tools (e.g., Kibana, Dynatrace, CloudWatch, Grafana) to correlate logs, metrics, and traces; identify anomalies, performance bottlenecks, and failure patterns.

o Perform targeted mitigations, rollbacks, config fixes, and coordinate hotfixes to restore service quickly.

Cross-Team Collaboration o Engage with App Dev, DevOps, Database, Network, Security, QA, and vendor partners to drive efficient problem resolution.

o Provide clear technical context, hypothesis-driven analysis, and evidence from monitoring tools to accelerate fixes.

o Facilitate postmortems and continuous improvement actions.

Platform & Application Stack Awareness o Identify and recognize the application stack (UI-frontend, backend services, APIs, queues, databases, caches, containers, orchestration, networking) for each impacted service to quickly isolate the source of issues.

o Maintain runbooks, service maps, and dependency diagrams to speed up diagnosis.

Service Quality & Process Excellence o Contribute to automation and self-healing routines (alert tuning, auto-remediation, playbooks).

o Recommend monitoring gaps to improve observability Mandatory Skills Description:

3-5+ years in Support, or Production Operations roles.
Strong triage and troubleshooting mindset with proven incident ownership and sense of urgency.
Hands-on with observability/monitoring tools like Kibana (log analysis), Dynatrace (APM), plus familiarity with metrics and tracing.
Solid understanding of application architectures (microservices, APIs, message queues, databases, AWS services).
Ability to correlate logs/metrics/traces and form data-driven hypotheses to isolate root causes.
Proficient in SQL basics and comfortable reading application logs (JSON, stack traces).
Excellent communication skills for incident coordination and stakeholder updates (concise, structured, timely).
Expertise in C# and .NET Core, MySQL/MSSQL/No-SQL.
Solid understanding of SOAP, REST, and Web API based web service protocols
Solid understanding of AWS Cloud service and Integration
Cloud Technology : AWS - Lambda, Neptune, Dynamo DB, document DB, RDS SQL, Mongo DB (medium), Redis cache, EC2 hosting, Integration with .NET Core (middleware) , DynaTrace/Datadog.
Experience with Microsoft Entity Framework and/or LINQ to SQL
Logging: Kibana, Elastic search, Log stash
Dev Tool: Fiddler, Postman and Swagger
Knowledge on GIT Nice-to-Have Skills Description:
Testing Framework: XUnit/NUnit/MSTest
Experience with Microsoft Enterprise Library
Experience with Event Base Architecture
Exceptional at working problems of moderate scope where analysis of situations or data requires review of a variety of factors
Exceptional at triage or analysis of situations for production support
Effective Communication (verbal + written)
Airline Domain knowledge is a PLUS

Descripción del puesto

Project Description

Relacionado

Relacionado