Data Engineer (Data Preprocessing)
Descripción del puesto
Hello,
Please review the JD below and let me know if you're interested in an updated resume.
Data Engineer (Data Preprocessing)
Remote – 100% - Mexico & Brazil & Antarctica - can be anywhere in Latin America
12+ Months Contract
Language: Must speak fluent English
About the Opportunity Our client, a leader in the Analytics and AI space, is seeking two Data Engineers to join a specialised team focused on product-agnostic data preprocessing. This team is responsible for transforming data upstream before it feeds into various core products. The project primarily involves a strategic migration from a legacy platform (Informatica sunset) to a modern Spark-based environment.
This is a high-impact role for a self-motivated professional who can deliver results in a fast-paced, evolving technical landscape.
Key Responsibilities & Deliverables: This role focuses on building the data layer and migrating historical client data.
Your responsibilities will include:
- Legacy Code Migration: Reviewing and interpreting legacy code (including Scala) to refactor and rewrite into Python/PySpark for the new platform.
- Pipeline Development: Building and maintaining a new data migration module and data pipelines for both historical data resets and ongoing batch ingestion.
- Data Transformation: Developing product-agnostic data layers to ensure clean data flow across multiple internal products.
- Cloud Data Management: Utilizing AWS Glue for Spark jobs, Lambda for serverless functions, and managing data files within S3 buckets.
- Quality Assurance: Ensuring data integrity through rigorous validation and end-to-end testing during the baseline reset for migrated clients.
- Required Skills & Experience We are looking for technical experts who can hit the ground running with the following skills:
- PySpark Mastery: Deep expertise in Python and PySpark (Dataframes) for data transformation and ETL.
- Code Refactoring: The ability to read and understand Scala for the purpose of rewriting it into Python (actual coding will be in Python/PySpark).
- Core AWS Stack: Strong hands-on experience with AWS Glue, S3, and Lambda. Familiarity with other AWS data tools is a significant plus.
- ETL Infrastructure: Proven experience building multi-layer pipelines (Raw to Cleansed) and handling complex schema mapping
- Analytical Rigour: Strong SQL skills for data validation and reconciliation.
- Internal Notes (For DWT Team Use Only - Do Not Post Publicly)
¿Te interesa este puesto?