Anamika Thakur

Data Engineer at Cognizant Technology Solutions (2022-08 – Present)

Data Engineer with expertise in building scalable data pipelines using Databricks, Delta Lake, and AWS services

Architected end-to-end ingestion pipelines in a scalable medallion (Bronze–Silver–Gold) lakehouse using Databricks and Delta Lake, integrating 6+ enterprise sources (SAP, Oracle, Veeva CRM, APIs, partner feeds) to enable batch and near real-time data processing for a unified Snowflake analytics warehouse
Developed PySpark pipelines on Databricks using broadcast joins, SCD Type 2 window functions, and dynamic partition overwrite for late-arriving data — processing millions of records within SLA
Built event-driven ingestion pipelines using Auto Loader and Kafka Structured Streaming, enabling efficient, low-latency data ingestion with schema evolution and fault tolerance
Developed CDC-based pipelines with Delta MERGE for idempotent upserts and deduplication, ensuring accurate current-state data and handling late-arriving records
Implemented data governance and security using Unity Catalog, including row-level access control, column-level masking for PII, and centralized lineage tracking
Orchestrated end-to-end data workflows using Apache Airflow, managing 10+ interdependent DAGs with sensor-based dependencies, SLA tracking, and automated failure alerting for reliable pipeline execution
Built a config-driven data quality framework enforcing schema validation, null checks, referential integrity, and composite-key deduplication before downstream data promotion
Architected a 3-layer Snowflake warehouse (Staging → Enriched → DataMart) using MERGE-based loads, clustering keys for performance optimization, and scalable transformation patterns
Designed a multi-pattern ingestion architecture (CDC, streaming, API, batch) on AWS, handling 100M+ records/day, ensuring zero data loss via checkpoint-based recovery, DLQs, and idempotent processing
Built end-to-end CDC pipelines using AWS DMS (full-load + CDC) with Multi-AZ failover, optimized LOB handling, and Op-based downstream merge logic enabling reliable replay and consistency
Engineered a 3-zone S3 data lake (Raw, Quarantine, Curated) using AWS Glue with strict data contracts, enabling schema validation, auditability, and automated reprocessing workflows for failed records
Engineered schema drift detection and quarantine framework — runtime fingerprint comparison against Glue Data Catalog, routing non-conforming records to quarantine S3 with enriched error metadata
Developed scalable Glue Spark pipelines with Delta Lake SCD Type 2 MERGE, incorporating Deequ-based data quality checks, dynamic partitioning, and Z-order optimization for large-scale transformations
Orchestrated pipelines using Step Functions with parallel execution and idempotent design, and built Redshift integration using manifest-based COPY for transactional loads and high-performance BI consumption

Hire this person

Experience

Education

Skills

Reviews

Similar people near Pune

Samiksha Tamboli

Priyanka Priyanka

Asawir Jinabade

Yash Chilwar

Apeksha Parsewar

Pratik Bambal

Other similar people

Amit Kumar

Snehal Rasal

Gurupadayya Pujar

Sathiya Keerthika

Himanshu Chauhan

Shravan Kumar P

Related