Cloud Platform Engineer

Job DescriptionAt the core of our architecture in PayPay, we use Kafka for high-performance data streaming, and every payment that goes through the app is handled by multiple topics in Kafka.

To keep up with the challenges of our growing product and future expansion, we are expanding our Streaming Platform team and are looking for an engineer with deep experience in Kafka to help lead the next stage.

The role involves strengthening our existing platform to bring a truly event-driven architecture to our development process, while adding performance and resilience to the existing application.

A key part of our current roadmap includes migrating our self-hosted Kafka clusters to KRaft mode and operating Kafka both in self-managed form and via AWS MSK.

We are looking for someone who is not only hands-on with Kafka internals, but also confident operating it at scale in production environments.

This is a senior role but still very hands-on , an excellent opportunity to build and modernize the streaming infrastructure that powers payments for over 69 million users.

Main Responsibilities

Tech leadership, architecture, development, and operations for Apache Kafka, including both self-hosted clusters and AWS MSK

Ensure deployment, management, and smooth operation of highly available Kafka infrastructure

Monitoring and alerting of Kafka for reliability, throughput, and latency

Performance tuning and instrumentation of Kafka clusters and pipelines

Partner with application teams to guide Kafka topic design, schema management, and best practices

Contribute to the development of scalable, secure, and observable data pipelines

Automate infrastructure and operations workflows using infrastructure-as-code tools

Tech Stack

| Java, Python, Go

| Kafka, Elasticsearch/Opensearch, AWS MSK, Athena, Glue

| Docker, Kubernetes, ArgoCD, Argo rollouts, Argo workflows, Artifactory, AWS, GCP

| GitHub, GitHub Actions, Jenkins, Terraform, Ansible, Microservices, GitOps

| Logstash, Fluent-bit, Vector, Victoria Metrics, Grafana, Prometheus

| Slack, Zoom, Confluence, JIRA

Qualifications

Minimum of 3 years of engineering experience with Apache Kafka in production environments

Minimum of 3 years of experience in AWS, Terraform, Ansible and Linux Administration

Strong hands-on experience with Kafka cluster operations, including setup, tuning, and maintenance

Experience with Kafka authentication and authorization operations

Familiarity with AWS cloud platform, especially Amazon MSK (Managed Streaming for Kafka) * We heavily rely on AWS, so those without previous AWS experiences must expect to catch up with it after joining us

Proficiency in one or more general-purpose programming languages (e.g., Python, Java, Go)

Experience with infrastructure automation and configuration management tools, such as Terraform or Ansible

Understanding of modern system design using microservice architecture

Working knowledge of Git and CI/CD tools

Preferred Qualifications

Deep expertise in running and scaling Apache Kafka both self-hosted and in managed cloud environments (e.g., AWS MSK)

Experience migrating or operating Kafka in KRaft mode (no ZooKeeper)

Experience in data replicate between Kafka clusters, enabling seamless data migration, disaster recovery, and cross-region data synchronization

Exposure to Kafka security, including ACLs, TLS, SASL, and IAM-based auth on MSK

Deep expertise in Kafka internals, including broker tuning, partitioning, replication, and fault tolerance

Contributions to Kafka-related open source projects or community involvement

Experience with Kafka Connect, Kafka Streams, or other stream processing frameworks

Experience guiding cross-team adoption of Kafka in microservice architectures

Experience in operating distributed systems

Working experience in a full remote environment

Bachelor's or Master's Degree in Computer Science or a related field