SwarmBench Task Engineer SWE / Code Role Responsibilities Build multi-agent benchmark tasks based on real-world open-source code changes such as bug fixes, migrations, and refactors Work with the Harbor evaluation framework to run and validate tasks inside Docker environments Write clear and precise