Skip to main content

Data Engineer (Data Preprocessing)

Tecnología
Tek Leaders Inc
México, MéxicoHace 1 mesesHasta 19/5/2026

Descripción del puesto

Hello,

Please review the JD below and let me know if you're interested in an updated resume.

Data Engineer (Data Preprocessing)

Remote – 100% - Mexico & Brazil & Antarctica - can be anywhere in Latin America

12+ Months Contract

Language: Must speak fluent English

About the Opportunity Our client, a leader in the Analytics and AI space, is seeking two Data Engineers to join a specialised team focused on product-agnostic data preprocessing. This team is responsible for transforming data upstream before it feeds into various core products. The project primarily involves a strategic migration from a legacy platform (Informatica sunset) to a modern Spark-based environment.

This is a high-impact role for a self-motivated professional who can deliver results in a fast-paced, evolving technical landscape.

Key Responsibilities & Deliverables: This role focuses on building the data layer and migrating historical client data.

Your responsibilities will include:

  • Legacy Code Migration: Reviewing and interpreting legacy code (including Scala) to refactor and rewrite into Python/PySpark for the new platform.
  • Pipeline Development: Building and maintaining a new data migration module and data pipelines for both historical data resets and ongoing batch ingestion.
  • Data Transformation: Developing product-agnostic data layers to ensure clean data flow across multiple internal products.
  • Cloud Data Management: Utilizing AWS Glue for Spark jobs, Lambda for serverless functions, and managing data files within S3 buckets.
  • Quality Assurance: Ensuring data integrity through rigorous validation and end-to-end testing during the baseline reset for migrated clients.
  • Required Skills & Experience We are looking for technical experts who can hit the ground running with the following skills:
  • PySpark Mastery: Deep expertise in Python and PySpark (Dataframes) for data transformation and ETL.
  • Code Refactoring: The ability to read and understand Scala for the purpose of rewriting it into Python (actual coding will be in Python/PySpark).
  • Core AWS Stack: Strong hands-on experience with AWS Glue, S3, and Lambda. Familiarity with other AWS data tools is a significant plus.
  • ETL Infrastructure: Proven experience building multi-layer pipelines (Raw to Cleansed) and handling complex schema mapping
  • Analytical Rigour: Strong SQL skills for data validation and reconciliation.
  • Internal Notes (For DWT Team Use Only - Do Not Post Publicly)
Client Strategy: Look for candidates who have specifically handled "Informatica to Spark" migrations or have strong experience refactoring Scala into Python. Extensions are highly likely based on the client's onboarding pipeline.
Keywords
CodingScalaSCHEMAApache SparkBaselinePythonSqlData managementPicasa

¿Te interesa este puesto?