Anson McCadeWe are seeking a highly skilled Senior Data Engineer to join our systems engineering team. In this role, you will be at the forefront of innovation, designing and maintaining the robust data architectures that power mission-critical AI and NLP initiatives.
Since we operate in a highly regulated and secure environment, you will focus heavily on on-premise infrastructure, ensuring our Generative AI capabilities are powerful, private, and resilient.
Architect & Build:
Design, develop, and optimize scalable data pipelines (ETL/ELT) within air-gapped or on-premise data centers.
AI Integration:Engineer data structures specifically for Natural Language Processing (NLP) and Large Language Models (LLMs), including local vector databases and private model hosting.
Manage and scale on-premise big data clusters, ensuring high availability without reliance on public cloud providers.
Maintain rigorous data quality and security standards—crucial for sensitive engineering—while managing complex datasets from disparate sources.
Work alongside Data Scientists to transition GenAI prototypes into production-ready, locally-hosted solutions.
Expert-level Python and advanced SQL.
Experience with ETL/ELT and orchestration tools like Apache Airflow or NiFi.
On-Prem Tech
Proficiency with Hadoop/HDFS, Spark, and containerization via Docker/Kubernetes (K3s/OpenShift).
Practical experience with NLP (HuggingFace) and GenAI frameworks (LangChain) tailored for local execution.
Experience with PostgreSQL and on-prem Vector DBs (e.g., Milvus, Qdrant, or pgvector).
Experience working within Linux-based secure environments and air-gapped networking.
A degree in Computer Science, Data Engineering, Mathematics, or a related technical field.
Must be eligible for high-level security clearance (SC or DV level).
A "security-first" mindset with the ability to troubleshoot complex hardware/software interactions on-site.
¿Te interesa este puesto?