Role Overview
We are looking for an experienced Data Engineer to join the Data & AI team delivering core data platform capabilities. This is a hands-on engineering role focused on designing and building robust Python-based data pipelines, ETL processes, and data models across a modern data platform. The ideal candidate is a strong Python developer with solid SQL skills and a good understanding of data lake architecture.
Exposure to data quality tooling / techniques, whether Informatica, Microsoft Purview, or custom-built frameworks, is a valuable plus.
Key Responsibilities
- Design, build, and maintain end-to-end data pipelines using Python, ensuring reliable data ingestion, transformation, and delivery across the platform.
- Develop and optimise ETL/ELT processes to move and transform data across bronze, silver, and gold layers following the Medallion architecture pattern.
- Write clean, modular, production-grade Python code for data processing, orchestration, and automation tasks.
- Supports in design and implementation of data models, schemas, and storage strategies to support downstream analytics and reporting requirements.
- Build and maintain SQL-based transformations, stored procedures, and views for data validation and reconciliation.
- Develop and manage data ingestion frameworks, handling a variety of source formats (flat files, APIs, databases, streaming).
- Implement data quality checks and validations within pipelines, applying rules across completeness, validity, consistency, uniqueness, accuracy, and timeliness dimensions.
- Monitor pipeline health, build alerting mechanisms, and troubleshoot data issues in production environments.
- Contribute to CI/CD pipelines for data workloads, including automated testing, deployment, and version control practices.
- Produce clear technical documentation for pipelines, data models, and operational runbooks.
Required Skills & Experience
- Strong Python development skills with hands-on experience building production data pipelines (pandas, PySpark, or equivalent).
- Solid SQL skills for complex queries, data transformations, and performance tuning.
- Experience designing and implementing ETL/ELT processes at scale.
- Good understanding of the Medallion architecture (Bronze / Silver / Gold) and modern data lake/lakehouse design patterns.
- Experience with data orchestration tools.
- Working knowledge of cloud data platforms (Azure, AWS, or GCP).
- Familiarity with relational and non-relational databases.
- Strong problem-solving skills with the ability to debug complex data pipeline issues.