Senior Software Engineer - Python (Contract)
Technology
Tekmetric
IeriPână la 20.07.2026
Descrierea postului
What You’ll Do
We are seeking a Senior Software Engineer with expertise in web scraping, data processing, and search technologies to help build a large-scale data ingestion and classification system. You will be responsible for extracting data from diverse sources (web pages, APIs, PDFs), cleaning and normalizing it, and building search capabilities using ElasticSearch/OpenSearch. You will work with Python, Scrapy, Airflow, Kubernetes, AWS, and Spark to create scalable, high-performance data pipelines.
- Build and design large scale, distributed crawling bots (perhaps AI agents) and infrastructure that operate in an adversarial environment aiming at low operational overhead
- Develop and maintain data pipelines to extract data from large volumes of web pages, documents, PDFs (OCR), and APIs.
- Help unify heterogeneous documents into a coherent data schema across varied source formats
- Preprocess and normalize raw data for downstream classification, ML/NLP, and search indexing.
- Build APIs to expose structured, classified data via ElasticSearch/OpenSearch.
- Collaborate with ML/NLP teams to integrate classification models into the pipeline.
- Automate workflows using Apache Airflow and deploy solutions in Kubernetes on AWS.
- Optimize and scale data pipelines using Spark (EMR) for processing large datasets.
- 4 years of experience in Python with building crawling/scraping solutions at scale.
- Experience working with APIs (REST), PDF processing (OCR, Tesseract, PyMuPDF etc.).
- Proficiency in data processing & search technologies (ElasticSearch/OpenSearch, NoSQL/SQL databases).
- Experience with React
- Strong problem-solving skills in handling anti-scraping mechanisms and data scaling challenges.
- Hands-on experience with AWS or GCP.
- Familiarity with NLP and Machine Learning (a plus but not required).
- Experience with LLMs, NLP models, or ML frameworks (e.g., Hugging Face, spaCy, TensorFlow, PyTorch).
- Prior experience in automated document classification.
- Experience working in high-scale, production environments with petabytes of data.
- Hands-on experience with Kubernetes.
Keywords
ReactOSTensorFlowOCamlPyTorchSpaCySCHEMAApache SparkOpenSearchElasticsearchApache AirflowAirflowPythonSqlApache LicenseApache Http Server
¿Te interesa este puesto?