Backend/Data Engineer(Python Entry-Level)
Tecnología
De Canaria
México, MéxicoHace 1 semanasHasta 15/7/2026
Descripción del puesto
We process data from 200,000+ employer career portals and major job boards, running large-scale scraping infrastructure, complex ETL pipelines, and ML enrichment models across billions of records. This is not a typical web app role. You will work directly with one of the largest job market datasets in the world: 900M+ unique postings, each enriched with 82 fields.
What You Will Do
- Build and maintain web scraping systems that collect job postings from thousands of sources using Scrapy, Playwright, and custom crawlers.
- Design and optimize data processing pipelines that clean, deduplicate, and transform raw job postings into structured, enriched records.
- Work with our database layer across PostgreSQL, MongoDB, Redis, Aerospike, and ClickHouse, each serving a specific role in our data architecture.
- Write Python scripts and services for data ingestion, validation, and quality assurance across the pipeline.
- Deploy and monitor your work using Docker on cloud infrastructure (AWS, Google Cloud).
- Collaborate with ML engineers who build the NLP models that enrich the data you process.
Who You Are
- You have a Master's degree (preferred) or Bachelor's degree in Computer Science, Engineering, or a related field.
- You are proficient in Python, with experience writing production scripts, data processing code, or backend services.
- You are comfortable working in Linux/Unix environments and writing shell scripts.
- You have experience with at least two of: PostgreSQL, MongoDB, Redis, or another database system.
- You have some experience with web scraping (Scrapy, Playwright, Selenium, BeautifulSoup, or similar tools).
- You understand data formats (JSON, CSV, Parquet) and have worked with messy, real-world data.
Nice to Have
- Experience with large-scale data pipelines or ETL systems.
- Familiarity with Docker and containerized deployments.
- Experience with cloud platforms (AWS or Google Cloud).
- Knowledge of asynchronous programming in Python (asyncio, aiohttp).
- Familiarity with message queues (RabbitMQ, Kafka, or Redis queues).
- Experience with ClickHouse, Aerospike, or other specialized databases.
- Exposure to NLP, machine learning, or data enrichment workflows.
- Familiarity with Kubernetes.
Keywords
UnixClickHouseOCamlApache KafkaAerospikeRedisJSONMongodbCloud computingRabbitMQLinuxPostgresqlPythonApache Parquet
¿Te interesa este puesto?