TheorisTheoris Services is assisting our client in their search for a Data Engineer to add to their growing team. Our client is seeking an individual to build and maintain high-performance data pipelines and lakehouse architectures on AWS to integrate, harmonize, and enable fast querying of massive, multi-modal scientific datasets (e.g., compounds, assays, experiments) from 30+ diverse sources, supporting researchers with reliable, scalable data for drug discovery and experimental analysis.
Design, build, and optimize data pipelines and ETL processes for scientific data integration across 30+ heterogeneous data sources
Implement and maintain lakehouse architectures on AWS (S3, Glue, Athena) supporting multibillion-record scientific datasets
Develop federated query capabilities using Trino and other distributed query engines to enable unified data access
Build data harmonization solutions to standardize compound, assay, and experimental data across modalities
Performance & Scalability
Optimize database performance for PostgreSQL, Iceberg, and other data platforms handling complex analytical workloads
Implement caching strategies and query optimization techniques to improve response times and user experience
Monitor and troubleshoot data pipeline performance, addressing bottlenecks proactively
Design scalable architectures that support growing data volumes and user bases
Data Quality & Governance
Implement data validation, quality checks, and monitoring frameworks
Create and maintain comprehensive data documentation and metadata management
Ensure compliance with data governance policies and regulatory requirements
Education & Experience
Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related technical field
3+ years of experience in data engineering, data warehousing, or related roles
Proven track record of building production-grade data pipelines and platforms
Strong proficiency in Python and SQL; experience with data manipulation libraries (pandas, PySpark)
Deep expertise in relational databases (PostgreSQL, Oracle) and modern data warehouses (Snowflake, Redshift)
Hands-on experience with AWS services (S3, Glue, Athena, Lambda, RDS)
Experience with distributed processing frameworks (Spark, Trino, Presto, or similar)
Proficiency with data integration tools and building scalable data pipelines.
Experience with visualizaiton tools like spotifire/Power BI
Experience with Git and collaborative development workflows
Strong problem-solving skills with ability to debug complex data issues
Excellent communication skills to translate technical concepts for non-technical stakeholders
Ability to work independently and collaboratively in cross-functional teams
Attention to detail and commitment to data quality and accuracy
Best-In-Class-Benefits:
We are in the people business; treating people right is our ONLY priority.
Theoris Services consultants are full-time employees with full benefits, including:
401(k) plan
Our goal is to Fuel Your Career As a Theoris team member, you join a culture based on people-centered values and an environment that fosters both personal and professional growth.
We build long-term relationships with our clients and our consultants. With over 30 years of building strong relationships in the industry, we're uniquely positioned to make the right connections. This knowledge is used to find the right job placement. Our recruiting teams are experts dedicated to the information technology and engineering staffing space and are highly respected by our client base.
¿Te interesa este puesto?