We process data from 200,000+ employer career portals and major job boards, running large-scale scraping infrastructure, complex ETL pipelines, and ML enrichment models across billions of records. This is not a typical web app role. You will work directly with one of the largest job market datasets in the world: 900M+ unique postings, each enriched with 82 fields.

What You Will Do

Build and maintain web scraping systems that collect job postings from thousands of sources using Scrapy, Playwright, and custom crawlers.
Design and optimize data processing pipelines that clean, deduplicate, and transform raw job postings into structured, enriched records.
Work with our database layer across PostgreSQL, MongoDB, Redis, Aerospike, and ClickHouse, each serving a specific role in our data architecture.
Write Python scripts and services for data ingestion, validation, and quality assurance across the pipeline.
Deploy and monitor your work using Docker on cloud infrastructure (AWS, Google Cloud).
Collaborate with ML engineers who build the NLP models that enrich the data you process.

Who You Are

You have a Master's degree (preferred) or Bachelor's degree in Computer Science, Engineering, or a related field.
You are proficient in Python, with experience writing production scripts, data processing code, or backend services.
You are comfortable working in Linux/Unix environments and writing shell scripts.
You have experience with at least two of: PostgreSQL, MongoDB, Redis, or another database system.
You have some experience with web scraping (Scrapy, Playwright, Selenium, BeautifulSoup, or similar tools).
You understand data formats (JSON, CSV, Parquet) and have worked with messy, real-world data.

Nice to Have

Experience with large-scale data pipelines or ETL systems.
Familiarity with Docker and containerized deployments.
Experience with cloud platforms (AWS or Google Cloud).
Knowledge of asynchronous programming in Python (asyncio, aiohttp).
Familiarity with message queues (RabbitMQ, Kafka, or Redis queues).
Experience with ClickHouse, Aerospike, or other specialized databases.
Exposure to NLP, machine learning, or data enrichment workflows.
Familiarity with Kubernetes.

Backend/Data Engineer(Python Entry-Level)

Descripción del puesto

What You Will Do

Who You Are

Nice to Have

Relacionado

Relacionado