From Copilot to Coworker

The first version of PAM's AI assistant was a chat. You asked a question, it answered. It was useful — agents could surface a player's withdrawal history in a sentence instead of clicking through four screens — but it was bounded by what fit in a single turn. "Show me this player's recent activity" worked. "Find every player at risk of churning, segment them by behavior, and draft a retention campaign for each segment" did not.

That second request isn't a question. It's a piece of work. It has steps, dependencies, and decision points. The right answer to it isn't a paragraph — it's a series of tool calls that an operator can audit, intervene in, and approve.

Building that turned out to require a different kind of architecture. Not a smarter model. A planner.

Why a planner exists

The natural instinct, given a capable language model, is to let it do everything in one prompt. Stuff the conversation history, the tool list, and the user's request into a single call and let the model decide what to do. For a question with a one-shot answer, this works. For multi-step work, it breaks for three reasons.

The audit trail collapses. The model produced an answer; you can't see the steps it took to get there. In a regulated environment, "the AI did it" is not an acceptable log entry.
Approval gates have nowhere to live. Some actions require a human in the loop — drafting a bonus, sending a message, modifying a player limit. If the work is one indivisible call, there's no point at which an operator can say "stop, I want to review."
Errors compound silently. If step two of an implicit five-step process fails, the model often invents a plausible answer instead of surfacing the failure. By the time anyone notices, the wrong action has been taken on the basis of fabricated data.

A planner solves all three by making the steps first-class. The model still does the thinking — but what it produces is a structured plan, not a free-form response.

The plan as a data structure

An AI Studio session, when given a non-trivial request, produces a plan: an ordered list of typed steps. Each step is one of three types:

Query — read data from PAM. Tools in this category fetch player records, transactions, bonuses, KYC status, segmentation results. They never modify state.
Analyze — operate on the results of previous steps to produce a derived answer. Aggregations, segmentations, comparisons, summaries.
Write — propose a change to PAM. Drafting a bonus, sending a message, scheduling an action, updating a player limit. Every write step carries an explicit approval flag.

Each step has a status that progresses through a defined lifecycle: Pending → Running → AwaitingApproval (for write steps) → Approved → Completed. Steps can also be Skipped by the operator or Failed with a captured error message. The plan itself is an auditable object — persisted, queryable, replayable.

Why this matters

When something goes wrong — and in any non-trivial system, eventually something will — the question is never "what did the AI do." The question is always "which step did it fail at, with what input, and what did the operator approve before it ran." A typed plan with persisted step status answers all three. A free-form chat does not.

How the plan gets built

The model is given the user's request, the agent's permission context, and the tool catalog — every IAITool implementation that this agent's role is allowed to invoke. Its job is not to produce the answer. Its job is to produce a plan that, if executed, would produce the answer.

This shifts the model's role in a way that turns out to matter. Instead of "answer this question," the prompt becomes "decompose this request into the smallest sequence of tool calls that would satisfy it, and explain why each step is necessary." The model is good at this — better, in our experience, than at giving a complete answer in one shot. Decomposition is a more constrained task than synthesis, and the constraints help.

Once the plan is generated, it isn't executed automatically. It's shown to the operator first. The operator sees the steps, the tool each one will call, the arguments, and any approval gates. They can edit the plan, skip steps, or run it as-is. Only then does execution begin.

Replanning when reality disagrees

A plan is a hypothesis about how to satisfy a request. Sometimes the hypothesis is wrong. The query that was supposed to return at-risk players returns zero rows. The segmentation that was supposed to produce three groups produces one. A write step's preconditions no longer hold by the time it runs.

When this happens, the planner has a choice: fail the plan and surface the error, or replan. Replanning means feeding the partial results back into the model and asking it to produce a revised plan that takes the new information into account. The revised steps are flagged with IsReplanned = true so the operator can see what changed and why.

This is a deliberate design choice. Replanning is not the same as retrying. Retrying assumes the original plan was correct and the failure was transient. Replanning assumes the original plan was based on incorrect assumptions about the data, and that a different plan is now appropriate. In a system with as many moving parts as PAM, the second case is far more common than the first.

What replanning does not mean

Replanning never means "the AI decides on its own to do something different than what was approved." If the operator approved a three-step plan, the agent executes that plan. Replanning happens before approval, or after a step has materially changed the available context. A replanned step that involves a write operation goes back to the operator for fresh approval — never piggybacking on an earlier authorization.

The boring part that makes it work

Most of the engineering effort in AI Studio is not in the model interaction. It's in the tools. PAM exposes around forty tool implementations across Banking, Bonus, Casino, KYC, Sportsbook, Players, Reporting, Responsible Gaming, Messaging, and a few others. Each one is a strongly-typed C# class with a JSON schema, a permission check, a sanitization layer for personal data, and a deterministic execution path.

The model never writes SQL, never sees the database schema, never gets a free-form data interface. It picks tools from a catalog, fills in typed arguments, and waits for typed results. The tools enforce skin scoping, role permissions, and PII handling — exactly the same way the back-office UI does, because they go through the same service layer.

This is the unglamorous piece, and it's the piece that determines whether the whole approach is deployable. A planner that orchestrates well-defined tools is auditable, scopeable, and safe. A planner that orchestrates ad-hoc database access is none of those things, no matter how good the model is.

What it feels like in use

An operator opens AI Studio and types: "Pull the last 30 days of failed withdrawals over €1,000, group by reason code, and draft a brief for the payments team."

The planner produces three steps:

Query: SearchPaymentsTool with type=withdrawal, status=failed, amount≥1000, days=30
Analyze: group results by failure reason, count and sum each bucket
Write: DraftMessageTool targeting the payments team channel, with the analyzed summary as the body

The operator sees the plan before anything runs. They notice the analyze step doesn't break out by payment provider, and ask for that to be added. The model regenerates the plan with the additional grouping. The operator approves. The first two steps execute and complete in seconds. The third step pauses, shows the draft message, and waits. The operator reads it, makes a small edit to the tone, and approves. The message is sent.

Total elapsed time: under a minute. Total decisions delegated to the AI: zero. The agent did the work; the operator made every choice that mattered.

Today

AI Studio is in production for early-access PAM operators. The planner handles requests that span up to a dozen tool calls across multiple modules. Every plan is persisted with its steps, approvals, results, and any replans — searchable for audit and replayable for debugging. The most common feedback from operators is not about the model. It's about the plan view: knowing exactly what's about to happen before it happens is the part that turns "AI assistant" into "AI coworker."

The principle

The interesting question in agentic AI for regulated operations is not "how capable is the model." Capable models are widely available. The interesting question is "what shape does the work take when the model is involved." If the answer is "an opaque blob of model output," the system is not deployable. If the answer is "a structured plan of typed steps, each with a status, an approval gate where appropriate, and a persisted record," the system is deployable.

The planner is the difference. It's the layer that converts "the model decided to do something" into "the operator approved a step that the model proposed." Those two sentences describe very different products — and only one of them belongs in a back office that has to answer to a regulator.

Share this insight

Share on LinkedIn

Preview post text