Skip to main content

Senior Software Engineer - Python (Contract)

Technology
Tekmetric
IeriPână la 20.07.2026

Descrierea postului

What You’ll Do

We are seeking a Senior Software Engineer with expertise in web scraping, data processing, and search technologies to help build a large-scale data ingestion and classification system. You will be responsible for extracting data from diverse sources (web pages, APIs, PDFs), cleaning and normalizing it, and building search capabilities using ElasticSearch/OpenSearch. You will work with Python, Scrapy, Airflow, Kubernetes, AWS, and Spark to create scalable, high-performance data pipelines.

  • Build and design large scale, distributed crawling bots (perhaps AI agents) and infrastructure that operate in an adversarial environment aiming at low operational overhead
  • Develop and maintain data pipelines to extract data from large volumes of web pages, documents, PDFs (OCR), and APIs.
  • Help unify heterogeneous documents into a coherent data schema across varied source formats
  • Preprocess and normalize raw data for downstream classification, ML/NLP, and search indexing.
  • Build APIs to expose structured, classified data via ElasticSearch/OpenSearch.
  • Collaborate with ML/NLP teams to integrate classification models into the pipeline.
  • Automate workflows using Apache Airflow and deploy solutions in Kubernetes on AWS.
  • Optimize and scale data pipelines using Spark (EMR) for processing large datasets.
What You’ll Bring
  • 4 years of experience in Python with building crawling/scraping solutions at scale.
  • Experience working with APIs (REST), PDF processing (OCR, Tesseract, PyMuPDF etc.).
  • Proficiency in data processing & search technologies (ElasticSearch/OpenSearch, NoSQL/SQL databases).
  • Experience with React
  • Strong problem-solving skills in handling anti-scraping mechanisms and data scaling challenges.
  • Hands-on experience with AWS or GCP.
Nice to Have
  • Familiarity with NLP and Machine Learning (a plus but not required).
  • Experience with LLMs, NLP models, or ML frameworks (e.g., Hugging Face, spaCy, TensorFlow, PyTorch).
  • Prior experience in automated document classification.
  • Experience working in high-scale, production environments with petabytes of data.
  • Hands-on experience with Kubernetes.
Keywords
ReactOSTensorFlowOCamlPyTorchSpaCySCHEMAApache SparkOpenSearchElasticsearchApache AirflowAirflowPythonSqlApache LicenseApache Http Server

¿Te interesa este puesto?