Skip to main content

Data Engineer – GCP & Python

Technology
HashRoot
Houston, United States1 months agoUntil 5/13/2026
Hybrid

Job description

Position: Data Engineer

Experience: 5+ years

Locations: Huston, TX Or NYC, NY

Notice Period: Immediate Joiners

Work mode will be Hybrid - Candidate should be able to work from office, 5 days in a month.
  • *Job Overview**
We are seeking Data Engineers to join the Data Onboarding Engineering team. This role is focused on building and operating robust, scalable data pipelines that ingest and process 30+ TB of data daily, primarily using Python on Google Cloud Platform (GCP).

The engineer will collaborate closely with business partners, researchers, and trading teams to onboard high-value datasets that directly power systematic trading and research workflows.

The ideal candidate is highly hands-on, production-focused, and comfortable operating in a high-performance, data-intensive environment.

  • *Key Responsibilities
  • Work closely with business stakeholders to understand data requirements and usage patterns
  • Collaborate with engineers, researchers, and portfolio managers to onboard new and complex datasets
  • Design, build, and support production-grade ETL and data ingestion pipelines using Python
  • Operate and scale data pipelines running on Google Cloud infrastructure
  • Ensure strong standards around data quality, reliability, monitoring, and operational support
  • Handle large-scale batch data ingestion volumes (30TB+ per day)
  • Extend and enhance the existing data onboarding framework to support new data sources and formats
  • Troubleshoot and resolve pipeline failures and data quality issues in production
  • Contribute to documentation, operational runbooks, and engineering best practices
  • *Desired Skills and Experience
  • *Essential Skills
  • 3+ years of professional experience as a Data Engineer or in a similar role
  • 3+ years of hands-on experience building ETL pipelines in production environments
  • Strong Python programming skills for data processing and pipeline development
  • Practical experience with cloud-based data platforms, preferably Google Cloud Platform (GCP)
  • Solid understanding of data operations, including ingestion, processing, storage, quality, and lifecycle management
  • Strong SQL skills and familiarity with data modeling concepts
  • *Nice-to-Have Skills
  • Experience with Snowflake as a cloud data warehouse
  • Exposure to Spark or other distributed data processing frameworks
  • Familiarity with Lakehouse concepts (Delta Lake or similar formats)
  • Experience with event-driven or streaming data pipelines
  • Background working with financial, market, or alternative datasets
  • Knowledge of data observability, lineage, and governance tooling
  • *Behavioral Competencies
  • Strong problem-solving and analytical mindset
  • Excellent collaboration and communication skills
  • Ability to work effectively with cross-functional technical and non-technical teams
  • High ownership and accountability in a production environment
  • Comfortable working in a fast-paced, data-driven organization
  • *Educational Requirement:
Bachelor’s or master’s in computer science.
Keywords
google-cloud-platformpythonsnowflakesparkdelta-lake

¿Te interesa este puesto?