Description : Snapmint is looking for a skilled Data Engineer with 3-5 years of experience to design, build, and manage real-time data pipelines using technologies like Kafka, Flink, and Spark Streaming. The role involves optimizing scalable, fault-tolerant pipelines, performing real-time transformations, and collaborating with data scientists for feature development. The ideal candidate will have strong programming skills in Python, Java, or Scala, along with solid SQL expertise and a good understanding of data modeling, data warehousing, and OLTP vs. OLAP systems.

Experience with CDC tools, data lakes/lakehouse architectures (Databricks), open table formats (Delta Lake, Iceberg, Hudi), and orchestration tools like Airflow is essential.

Roles and Responsibilities :

Key Responsibilities :

Design, build, and manage real-time data pipelines using tools like Apache Kafka, Apache Flink, Apache Spark Streaming.
Optimize data pipelines for performance, scalability, and fault-tolerance.
Perform real-time transformations, aggregations, and joins on streaming data.
Collaborate with data scientists to onboard new features and ensure they're discoverable, documented, and versioned.
Optimize feature retrieval latency for real-time inference use cases.
Ensure strong data governance : lineage, auditing, schema evolution, and quality checks using tools such as dbt, and Open Lineage.

Requirements :

Bachelor's degree in Engineering from a premier institute (IIT/NIT/ BIT)
3-5 years of experience in an Indian startup/ tech company
Strong programming skills in Python, Java, or Scala and proficient in SQL.
Solid understanding of data modeling, data warehousing concepts, and the differences between OLTP and OLAP workloads.
Experience ingesting and processing various data formats, including semi-structured (JSON, Avro), unstructured, and document-based data from sources like NoSQL databases (e.g., MongoDB), APIs, and event tracking platforms (e.g., PostHog).
Hands-on experience with Change Data Capture (CDC) tools such as Debezium or AWS DMS for replicating data from transactional databases.
Proven experience designing and building scalable data lakes or lakehouse architectures on platforms like Databricks.
Hands-on experience with modern open table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.
Hands-on experience with real-time streaming technologies like Kafka, Flink, and Spark Streaming.
Proficiency with data pipeline orchestration tools like Apache Airflow.
Exposure to event-driven microservices architecture.
Strong written and verbal communication skills.

Good to have :

Familiarity with cloud data warehouse systems like BigQuery or Snowflake.
Experience with real-time analytical databases like ClickHouse.
Familiarity with designing, building, and maintaining feature store infrastructure to support machine learning use cases.

Snapmint - Data Engineer - Python/Java/Scala

Job description

Roles and Responsibilities :

Key Responsibilities :

Requirements :

Good to have :

Related

Related