The intern will design, evaluate and bring to the state of the art the internal Windmill agentic loop for generating scripts, flows and full-stack apps - and build the benchmarking system that measures its progress. The work tackles several open questions: how to objectively evaluate a generated wor