Selby JenningsWe're looking for a Databricks Platform Engineer to design, operate, and evolve a mission‑critical data and analytics platform used across the entire organization.
In this role, you'll own platform reliability, environment architecture, automation patterns, security controls, and operational governance. You'll work across engineering, data, product, and trading/analytics teams to ensure that Databricks environments are scalable, cost‑efficient, and safe for high‑impact use cases.
This role is ideal for someone who blends SRE/DevOps fundamentals with hands‑on Databricks administration and Python automation. You'll help shape a rapidly growing platform, influence engineering standards, and contribute to new capabilities as the team expands. As the first dedicated platform engineer in the US, you'll have meaningful ownership, autonomy, and visibility.
Platform Operations & Reliability:
Administer and support Databricks workspaces, clusters, permissions, policies, and configuration baselines with a focus on uptime, security, and predictable performance.
Use Terraform to define infrastructure and platform components across multiple environments, ensuring consistency, repeatability, and policy‑driven guardrails.
Scripting & Tooling Development:
Build Python‑based automation, operational tooling, health checks, and runbooks to reduce manual work and streamline platform management.
CI/CD & Governance:
Implement and maintain CI/CD patterns for infrastructure and configuration changes, including automated validations, testing, and environment promotion.
Collaborate with senior engineering leadership to shape standards, evaluate new Databricks features, and influence platform roadmap decisions.
Incident Response & Diagnostics:
Troubleshoot platform‑level issues including configuration drift, cluster failures, job orchestration, networking concerns, and user‑level operational challenges.
Cost & Resource Management:
Improve visibility into spend, enforce cost‑related policies, and optimize cluster usage patterns.
Work with engineering, quant/data teams, and analysts to understand workflows, triage issues, and guide best practices for secure and efficient platform usage.
Hands‑on experience configuring workspaces, access controls, cluster policies, and workspace‑level governance.
Terraform & IaC Expertise:
Ability to design reusable modules, multi‑environment infrastructure structures, and policy‑as‑code patterns.
Strong automation experience, including API integrations, lifecycle management tasks, and platform tooling.
CI/CD & DevOps Workflow Knowledge:
Familiarity with Git‑based workflows, automated testing pipelines, and environment promotion strategies.
Understanding of monitoring, alerting, observability, log analysis, and safe change management practices.
Ability to partner with engineers, data practitioners, and business users to distill ambiguous problems into practical, well‑designed solutions.
Risk & Governance Mindset:
Comfort implementing controls that prevent unsafe changes in shared, high‑impact environments.
Terraform (multi‑env IaC, modules, policies)
Python for platform automation and API integration
Databricks administration (clusters, workspace config, permissions, jobs)
Linux CLI and shell scripting
CI/CD tooling and Git workflows
Cloud fundamentals (IAM, networking basics, storage, compute)
Experience operating shared developer or data platforms
Familiarity with Databricks REST APIs, Unity Catalog, Delta Lake, or cost governance
Exposure to AWS infrastructure defined via Terraform
Background working in regulated or risk‑sensitive environments
Experience in SRE, DevOps, or infrastructure engineering roles
Bachelor's degree in Computer Science, Engineering, or a related technical discipline
Chicago, IL
Full-time
¿Te interesa este puesto?