Skip to main content

Backend/Data Engineer(Python Entry-Level)

Tecnología
De Canaria
México, MéxicoHace 1 semanasHasta 15/7/2026

Descripción del puesto

We process data from 200,000+ employer career portals and major job boards, running large-scale scraping infrastructure, complex ETL pipelines, and ML enrichment models across billions of records. This is not a typical web app role. You will work directly with one of the largest job market datasets in the world: 900M+ unique postings, each enriched with 82 fields.

What You Will Do

  • Build and maintain web scraping systems that collect job postings from thousands of sources using Scrapy, Playwright, and custom crawlers.
  • Design and optimize data processing pipelines that clean, deduplicate, and transform raw job postings into structured, enriched records.
  • Work with our database layer across PostgreSQL, MongoDB, Redis, Aerospike, and ClickHouse, each serving a specific role in our data architecture.
  • Write Python scripts and services for data ingestion, validation, and quality assurance across the pipeline.
  • Deploy and monitor your work using Docker on cloud infrastructure (AWS, Google Cloud).
  • Collaborate with ML engineers who build the NLP models that enrich the data you process.

Who You Are

  • You have a Master's degree (preferred) or Bachelor's degree in Computer Science, Engineering, or a related field.
  • You are proficient in Python, with experience writing production scripts, data processing code, or backend services.
  • You are comfortable working in Linux/Unix environments and writing shell scripts.
  • You have experience with at least two of: PostgreSQL, MongoDB, Redis, or another database system.
  • You have some experience with web scraping (Scrapy, Playwright, Selenium, BeautifulSoup, or similar tools).
  • You understand data formats (JSON, CSV, Parquet) and have worked with messy, real-world data.

Nice to Have

  • Experience with large-scale data pipelines or ETL systems.
  • Familiarity with Docker and containerized deployments.
  • Experience with cloud platforms (AWS or Google Cloud).
  • Knowledge of asynchronous programming in Python (asyncio, aiohttp).
  • Familiarity with message queues (RabbitMQ, Kafka, or Redis queues).
  • Experience with ClickHouse, Aerospike, or other specialized databases.
  • Exposure to NLP, machine learning, or data enrichment workflows.
  • Familiarity with Kubernetes.
Keywords
UnixClickHouseOCamlApache KafkaAerospikeRedisJSONMongodbCloud computingRabbitMQLinuxPostgresqlPythonApache Parquet

¿Te interesa este puesto?