Write Tools with Human-in-the-Loop

Letting an AI assistant tell an operator something is one product. Letting an AI assistant do something on behalf of an operator is a different product. The model is the same. The risk profile is not.

A read-only assistant that hallucinates is annoying. A write-capable assistant that hallucinates moves money, grants bonuses, sends messages to real customers, or mutates limits on accounts that shouldn't have been touched. The model's confidence is exactly the same in both cases. The blast radius is not.

So the design question for the write side of PAM's AI was not "how do we make the model more accurate." It was "how do we make sure no side effect happens until a human has seen, in concrete terms, exactly what is about to happen and authorized it."

Two interfaces, not one

PAM's tool catalog is split. There is IAITool, the base interface every tool implements — a name, a description, a JSON schema, a permission check, and an execute method. Read-only tools (querying players, fetching transactions, summarizing bonuses, segmenting audiences) implement only this.

Write tools implement an additional interface, IAIWriteTool. The contract adds one method: BuildPreview(args, context). Before a write tool can be executed, this method must be called. It returns a WriteToolPreview — a small record with four fields:

ActionLabel — a one-line human-readable description of what's about to happen. "Grant 50 EUR welcome bonus." "Set deposit limit to 1,000 SEK / week." "Send retention email to 31 players."
ImpactSummary — a multi-line description of the specific change: which fields, which records, which values, before and after where applicable.
AffectedIds — the list of entities (player IDs, bonus IDs, message recipient IDs) that this action will touch. Numeric, not paraphrased.
IsReversible — whether the change can be undone after the fact. Granting a bonus that hasn't been wagered: yes. Sending an email that's already left the SMTP server: no.

The preview is generated deterministically from the same arguments the execute method would consume. There is no separate "describe what you would do" code path that the model could use to lie about its intent. The preview is a function of the args, computed by the same tool that will execute them.

Why a record, and why those four fields

The shape of WriteToolPreview looks small. It is — deliberately. Anyone designing this for the first time wants to put more in: a structured diff, a validation report, a probability score, a recommendation. We tried versions of all of those. Each one made the operator's decision harder, not easier.

The operator does not need a full diff to decide whether to approve. They need to know what kind of action this is, what it will touch, and whether they can take it back. That is exactly what the four fields encode.

The reversibility flag is not cosmetic

Operators read previews differently when IsReversible is false. The UI surfaces this with a distinct colour and a confirmation step. We've watched operators approve a hundred reversible actions and pause for thirty seconds on the first irreversible one. That pause is the entire point. A preview without a reversibility flag would let those two cases blur into each other — which is the failure mode we built the preview to prevent.

What this prevents

Three failure modes that are common in LLM-driven write systems are eliminated by this pattern.

The off-by-one player. The model picks playerId: 4821 when it meant 4812. With direct execution, the wrong player gets credited and someone notices a week later in a reconciliation report. With preview-and-approve, the operator sees "Grant 50 EUR bonus to Player 4821 (J. Doe)" and either confirms or catches the error before the bonus is granted. Same model, same mistake — different outcome.

The plausible but wrong rate. The model picks weeklyDepositLimit: 5,000 SEK when the operator said "five thousand" and meant their default unit, EUR. With direct execution, the limit is wrong by an exchange rate factor. With preview-and-approve, the impact summary shows "5,000 SEK" in unambiguous terms. The operator either confirms or fixes it.

The implicit bulk action. The operator says "send a follow-up to everyone who churned" expecting a draft for review. The model interprets this as authorization to send. With direct execution, three hundred customers get an email no human read. With preview-and-approve, the operator sees "Send retention email to 312 players" and decides whether that's the right scope.

None of these scenarios require the model to be malicious or unusually wrong. They require the model to behave normally. That's the case the preview is designed for — not the worst case, but the average case.

Approval is bound to a specific preview

An approval in PAM is not "yes, the operator agrees this is fine." It is "the operator approved this preview, with this label, this impact summary, this set of affected IDs, and this reversibility flag, at this timestamp." The approval record references the preview. The execution record references the approval. Every link is preserved.

This matters when something has to be reconstructed later. If a regulator or an internal audit asks why a particular bonus was granted to a particular player at a particular time, the answer is not "the AI did it." The answer is: the operator approved this preview, generated by this tool, from these arguments, proposed by the model in response to this request, in this session. Every step is replayable.

What we don't allow

There is no "approve all future actions like this one" toggle. There is no implicit batching of approvals across steps. There is no "approve plan" that authorizes write tools without their previews being shown individually. Each write produces its own preview and requires its own approval. An operator who wants to grant ten bonuses approves ten previews. That's slower. It's also the only design where the audit trail means anything.

The exception: trusted actions in scheduled sessions

There is one place where the human-in-the-loop pattern is relaxed, and it is deliberately narrow. Scheduled AI sessions — recurring jobs that run without an operator present — can be configured with a list of trusted actions: specific write-tool names that the operator has pre-authorized for that schedule.

A scheduled session can execute a trusted action without a per-step approval, but it cannot execute any write tool that isn't on its trust list. A schedule that's allowed to send a "weekly retention email" cannot, on its own initiative, decide to grant a bonus. That action is structurally not available to it, regardless of what the model proposes. The trust list is not advice to the model; it's an enforceable filter on the tool catalog the model sees.

Even within trusted actions, the preview is still generated and recorded. The audit trail of a scheduled session contains every preview, every execution, and the schedule's trust list at the time of execution — so a regulator can see not only what happened, but what was permitted to happen.

The cost

The preview pattern is not free. Every write tool has to implement a deterministic preview generator that mirrors the execute path. This is real engineering work, and it has to be kept in sync — when the execute method changes, the preview has to change with it. We've found this cost to be smaller than it looks, because the preview generator is usually a thin formatting layer over the same validation that execute already performs.

The cost the pattern does impose is on operator time. An operator who wants to take ten actions has to approve ten previews. There's no shortcut. We've considered building one and decided against it: every shortcut we sketched ended up looking like "approve a category of action without seeing the specific instance," which is exactly the failure mode the preview was built to prevent. The friction is the feature.

Today

Every write tool in PAM's AI catalog implements IAIWriteTool with a deterministic preview generator. Operators see a draft of every change before it happens, with affected entity IDs and a reversibility flag. Approval records are linked to specific previews and persisted with the session. No write tool can execute without going through the preview path — including, for scheduled sessions, where the relaxation is bounded by an explicit trust list. No exceptions, no overrides, no "this one is fine."

The principle

An AI that writes is different from an AI that reads, and the difference is not on the model side. It's on the side effect side. The right question is never "do you trust the model" — the model is what it is — but "what runs between the model's intent and the real change."

If nothing runs between, the model is the system. If a deterministic preview runs between, and the preview is shown to an operator, and the operator's approval is recorded against that exact preview, then the operator is the system and the model is a proposal generator. The first design ships AI features. The second one ships them into a regulated back office.

Share this insight

Share on LinkedIn

Preview post text