Skip to main content

Applied AI Engineer: Production-Grade LLM Systems

Tecnología
Zoolatech
Gasteiz / Vitoria, EspañaHace 2 semanasHasta 4/5/2026

Descripción del puesto

Government-backed Abu Dhabi organization focused on advanced technology R&D (est. 2020), defining strategy, funding, and policies across AI, robotics, and emerging technologies. Oversees the full innovation lifecycle - from research and programs to commercialization - through dedicated applied research, innovation, and venture entities.

¿Es este el puesto que está buscando? Si es así, siga leyendo para obtener más detalles y no olvide enviar su solicitud hoy mismo.

The first production system is an AI-enabled operational platform that gives a senior leadership team a shared situational picture, an AI-classified signal feed, a daily AI-generated briefing, and an action accountability tracker. MVP target: operational within two weeks of team formation. The platform is also the technical foundation for all subsequent Data & AI systems across the organization.

Build, own, and continuously improve the AI capabilities in the DAIO's(Data & AI Office) production systems: real-time signal classification against a defined scenario framework, and daily AI-generated briefing generation. This is not a research role and not a fine-tuning role. It is applied AI engineering — structured prompts, observable outputs, deterministic fallbacks, and measurable quality.

The AI capabilities must work reliably under production conditions including API outages, malformed signal data, and edge-case classification scenarios. This role also designs the migration path from the initial LLM runtime to the sovereign model runtime in Phase 2.

WHAT THIS ROLE BUILDS & OWNS

AI Classification & Briefing Service — FastAPI wrapper around the LLM API with two versioned prompt templates

Signal classification prompt — structured prompt against a defined scenario taxonomy, returning JSON with scenario tag, confidence level, and rationale

Daily briefing generation prompt — structured 400–600 word output covering signal summary, scenario assessment, delta from prior day, and recommended decision agenda

Prompt versioning system — templates stored in configuration, editable by authorized users without code changes

Observability layer — every API call logged with input hash, model version, output, latency, and token count

Fallback logic — graceful degradation when the LLM API is unavailable: items stored as unclassified and surfaced for manual review

Classification quality evaluation framework — weekly precision measurement against human reviewer sample

Phase 2: sovereign model runtime migration plan — prompt adaptation, integration testing, performance benchmarking

KEY DECISIONS THIS ROLE OWNS

Prompt design for each capability — structure, temperature, output format, system vs. user message split

Confidence threshold definition — what triggers a low-confidence flag requiring human review

Context window management for briefing generation — what signal subset to include within the token budget

When to trigger prompt iteration vs. accept current classification quality

Which classification errors are acceptable vs. unacceptable given operational stakes

Sovereign model prompt adaptation scope for Phase 2 — what needs rewriting, what transfers

WHAT THIS ROLE DOES NOT DO

Build the backend API or ingestion pipeline — this role calls the API, it does not build it

Fine-tune or train models — this is prompt engineering and integration, not ML research

Define the operational scenario taxonomy — that is business domain knowledge owned by designated owners

Own the data schema for signals — that is the Head of Data Architecture

PROFILE OF THE IDEAL CANDIDATE

Has shipped an LLM-based feature that non-AI users depend on daily — and has been responsible when it breaks. Knows that the hardest part of applied AI is not the prompt — it is the fallback, the observability, and the human review loop. Can write a classification prompt in the morning, evaluate its precision against a ground truth set in the afternoon, and ship an improved version the next day. Not attached to a particular model — the job is reliable output, not elegant architecture

Anthropic Claude API — structured output prompting, JSON mode, system prompt design

Prompt engineering for classification tasks — zero-shot and few-shot with examples

Python — async API calls, error handling, retry logic with exponential backoff

LLM evaluation — precision/recall for classification, human-AI agreement measurement

Structured output design — JSON schema enforcement, output validation with Pydantic

APIs (Falcon, Llama, or equivalent)

Token budgeting and context window management

Observability for AI systems — output quality monitoring, anomaly detection

FastAPI — building the AI service wrapper

Docker deployment of AI service components

Engagement Model: Direct Independent Contractor (Please read carefully)

This is an independent contractor opportunity based on a direct contractual relationship between Zoolatech and the individual service provider.

To facilitate this direct partnership, we engage with professionals who are registered and operate as a sole proprietorship, private entrepreneur, or an equivalent self-employment status in your country.

Please note, our model does not accommodate contracts through third-party intermediaries such as agencies, incubators, or umbrella companies. The essential requirement is your ability to enter into a service agreement and invoice Zoolatech directly. xugodme This is not an offer of direct employment

Please note that only candidates whose profiles closely match our requirements will be contacted.

¿Te interesa este puesto?