🚀 Hiring: SwarmBench Task Engineer — SWE / Code
📍 Location: Remote
💼 Job Type: Freelance / Contract
💰 Payment: Hourly Basis
🕒 Shift Timing: 7:30 PM – 12:30 AM IST + Flexible 4 Hours (PST Overlap Required)
⏳ Availability: Full-time (8 Hours Daily) with 4 Hours PST Overlap
We are looking for experienced SwarmBench Task Engineers (Code / SWE) to design and build high-quality multi-agent benchmark tasks based on real-world software engineering workflows.
🔹 Experience Required: 5+ Years
🔧 Key Skills:
- Strong experience in Python & JavaScript development
- Hands-on experience with AI coding benchmarks like SWE-bench, Terminal-Bench, etc.
- Ability to navigate large open-source codebases (Django, Flask, FastAPI, Node.js, etc.)
- Strong understanding of Git workflows, PRs, diffs, cherry-picking & commits
- Comfortable with Docker (Dockerfiles, image building, debugging containers)
- Experience writing test scripts using pytest, unittest, or custom assertions
- Excellent technical documentation and specification writing skills
📌 Role Responsibilities:
- Build multi-agent benchmark tasks using real-world open-source code changes
- Work with Harbor evaluation framework inside Docker environments
- Write detailed task instructions with expected behavior and constraints
- Create Python-based verification scripts for validating AI-generated code changes
- Design decomposition strategies for multi-agent workflows
- Debug and refine tasks for reproducibility and deterministic execution
- Improve benchmark quality, clarity, and evaluation signals
🎯 Ideal Candidate:
Someone who enjoys deep codebase analysis, software engineering workflows, debugging complex systems, and working at the intersection of AI + Software Engineering.