Key Responsibilities Systematically analyze, solve, and document benchmark tasks involving Docker, shell scripting, and Linux system administration Evaluate agent outputs for correctness, reproducibility, and reliability across complex multi-step CLI workflows Provide detailed, evidence-based reason
1 weeks ago