Skip to main content

Performance Test Data Engineer

Technology
Rezolve Ai
Roseville, United States1 months agoUntil 5/23/2026
Contract

Job description

Job Description: Data Platform Engineer (QA Storage Focus)

Role Overview

We are looking for a Data Platform Engineer with strong QA and Data Validation experience to support large-scale data platforms. The ideal candidate will have hands-on experience in testing data pipelines, validating data lakes/storage systems, and ensuring data quality, accuracy, and performance across distributed environments.

Key Responsibilities

  • Design, develop, and execute data validation and QA test strategies for ETL/ELT pipelines
  • Perform end-to-end data validation between source systems and target data platforms (Data Lake / Data Warehouse)
  • Validate large-scale datasets (millions/billions of records) using SQL, Python, and PySpark
  • Perform file-level and storage validation across data lakes (S3 / ADLS / HDFS)
  • File count validation
  • Schema validation
  • Partition validation
  • Data completeness checks
  • Test and validate data ingestion pipelines (batch & streaming)
  • Validate data across Bronze / Silver / Gold layers (Medallion architecture)
  • Perform data reconciliation and consistency checks across multiple systems
  • Develop and maintain automated data validation frameworks using Python (PyTest or similar)
  • Implement and monitor data quality checks (nulls, duplicates, referential integrity)
  • Validate data formats such as Parquet, ORC, Delta Lake
  • Conduct performance testing of data pipelines and queries (Spark / SQL)
  • Analyze and validate data processing performance, latency, and throughput
  • Collaborate with Data Engineers to identify and fix data issues and optimize pipelines

Required Skills

Data QA / Testing

  • Strong experience in ETL/ELT testing and data validation
  • Expertise in SQL for data validation and reconciliation
  • Experience with test case design, execution, and defect tracking
  • Knowledge of data quality frameworks and validation techniques

Data Engineering Knowledge

  • Understanding of data pipelines (ADF / Airflow / Glue / Databricks)
  • Experience with PySpark / Apache Spark (basic to intermediate)
  • Familiarity with data modeling and transformations

Storage / Data Lake Validation (MANDATORY)

  • Hands-on experience with Data Lakes (AWS S3 / Azure ADLS / HDFS)
  • Strong knowledge of:
  • File-based validation
  • Partitioning strategies
  • Schema evolution
  • Experience validating Parquet / ORC / Delta Lake datasets

Programming & Tools

  • Python (for automation/testing)
  • SQL (strong)
  • Experience with PyTest / automation frameworks
  • Git / CI-CD basics

Cloud Platforms (Any One)

  • AWS (S3, Glue, Athena) OR
  • Azure (ADLS, ADF, Databricks)

Nice to Have

  • Experience with Great Expectations / Deequ (data quality tools)
  • Knowledge of Kafka / streaming validation
  • Experience with Delta Lake features (time travel, versioning)
  • Exposure to data governance tools (Glue Catalog, Unity Catalog)

Ideal Candidate Profile

  • Strong Data Engineer with QA/testing experience
  • Hands-on with data validation storage systems
  • Comfortable working with large-scale distributed data platforms
  • Detail-oriented with a focus on data accuracy, quality, and performance
Keywords
UnixApache KafkaApache HadoopSCHEMAApache SparkReferential integrityPartitionAirflowPythonSqlApache ParquetHadoopApache LicenseApache Http ServerOrcUnityDisk partitioningAWSGit

¿Te interesa este puesto?