40 Tools, One Brain

The shortest path from "we want AI in the back office" to a working prototype is to give the model a database connection and a system prompt. It works on day one. It impresses on day two. It hits its first compliance review on day three, and it never recovers.

The architecture that survived contact with regulated production is structurally different. The model never sees the database. It never reads schema. It never composes SQL. It picks tools from a typed catalog — about forty of them — and fills in arguments that match a published JSON schema. The tools are the entire surface area between the model and the platform.

This article is a tour of that catalog: what's in it, how it's organized, and — equally important — what was deliberately left out.

The shape of a tool

Every tool implements IAITool: a name, a description, a permission requirement, a function declaration (the JSON schema the model fills in), an availability check, and an execute method. Read-only tools stop there. Write tools also implement IAIWriteTool, adding a deterministic preview generator.

The arguments are typed. SearchPlayersTool doesn't accept "filter by lifetime value greater than five thousand" — it accepts a strongly-typed args object with named fields, deserialized from a model-supplied JSON payload that conforms to the schema. The model can produce malformed JSON; it cannot produce malformed args, because malformed args fail to deserialize and the tool reports a clean error back into the planner. The planner can then replan, ask the operator, or fail the step. What it cannot do is execute partial garbage.

Each tool also declares its required permissions — controller and action pairs that map onto the same RBAC model the back-office UI uses. When the registry assembles the catalog for a session, it filters to tools whose permission requirements are satisfied by the calling agent's role. A support agent at level 1 sees a smaller catalog than a payments lead. The model never sees tools the agent isn't authorized to use, so it cannot propose them. This is the same principle as the trust list for scheduled sessions: structural availability beats prompted refusal.

The thirteen domains

The catalog is grouped by operational domain, mirroring the structure of the platform itself. Each domain holds one or more tools that read or write within that area.

Players — find, summarize, segment. The most-used domain.
Banking — search payments, look up specific transactions, summarize success/failure patterns.
Bonus — list active bonuses, search by criteria, draft new bonus offers (write, with preview).
Casino — game catalog queries, top games per player, recent rounds.
Sportsbook — open bets, settlement status, top events.
Bingo — room status, recent rounds, scheduled tournaments.
Tournament — leaderboards, participant counts, prize distribution status.
CashBack — accumulator state, recent disbursements.
Responsible Gaming — read RG flags, view limits and history, propose RG actions (write, with preview).
Player Segmentation — execute pre-built segmentation queries against player cohorts.
Reporting — pre-defined report queries with parameter slots.
Messaging — draft internal messages, draft customer messages (write, with preview).
Audit — query the platform-wide audit trail.

One additional domain — Behave — exposes the BeAware behavior engine. Admin and Studio contain meta-tools the planner uses to introspect its own session.

The numbers shift as we add tools. The shape doesn't. Every tool maps onto a domain with an existing back-office UI, an existing permission set, and an existing audit hook. We don't add a tool that a human couldn't already do through the back office. The AI's reach is the same as the operator's; only the speed is different.

The PII layer

Tool results pass through a sanitizer before they reach the model. Personally identifying fields — names, email addresses, phone numbers, postal addresses, government IDs, dates of birth — are tokenized into placeholders of the form [FNAME_1], [EMAIL_1], and so on. The model sees the tokens. The operator sees the original values, restored from the same token map after the model produces a response.

This is not a privacy theatre. The third-party model API has a data policy; we don't rely on it as the only line of defense. The sanitizer ensures that the worst-case data leakage from a model API breach is a structured set of placeholder tokens with no resolvable mapping outside our environment. The resolution map lives only in PAM, only for the duration of the session, only in memory.

Beyond tokenization, sensitive fields are conditionally included based on the agent's permission context. Some fields are never sent to the model regardless of role; some are sent only when the agent has a specific permission; some are masked to "***" instead of being omitted, so the model knows the field exists without seeing its value. The decision matrix is per-field, not per-tool — the tool authors don't decide what's sensitive, the platform does.

Skin scoping is the other invisible filter

PAM is multi-tenant. Every player, payment, bonus, and message belongs to a specific skin (operator brand). Every tool's execute method receives the agent's CurrentSkinId and uses it to scope its query. A support agent at Operator A querying for "all players with deposit limits above €500" sees only Operator A's players — never Operator B's, regardless of how the model phrases the request. Cross-skin access is a tool that doesn't exist.

What's deliberately not in the catalog

The interesting design conversations weren't about which tools to add. They were about which tools to leave out — capabilities that would have been straightforward to expose and that we decided, on review, the AI shouldn't have.

Raw SQL execution. The most powerful possible tool. Also the one that makes every other security control irrelevant. There is no SQL tool. The model cannot compose ad-hoc queries against the database, ever, under any circumstance. If a question can't be answered through a tool, the answer is "we'd need to add a tool for that," not "the model can figure something out."

Direct wallet adjustment. Crediting or debiting a player's wallet outside the bonus and payment flows is a back-office capability — for the right roles — but it isn't an AI capability. The audit trail and reconciliation requirements around wallet adjustments are stricter than what an AI-mediated path can satisfy with the same level of confidence.

KYC overrides. Approving a KYC document, marking a SOW review as complete, or unblocking an account on RG grounds are decisions that involve professional judgment and personal accountability. The AI can summarize a case file. It cannot make the decision.

Player password resets. A small tool that would be very useful to a support agent. Also a credible vector for social-engineering attacks if the AI participates in the decision. The path to a password reset goes through identity verification flows that don't include the AI.

Self-exclusion removal. A self-excluded player who asks for the exclusion to be lifted goes through a dedicated cooling-off and review process. There is no tool the AI can call that shortens this. Adding one would be a regulatory-grade mistake.

Schema migrations and operator config changes. Configuration is code. Code goes through a deployment pipeline. The AI is not in the deployment pipeline. There is no tool to "change the deposit limit policy for Brand X" — that's a config change, made by a human, code-reviewed, deployed.

Bulk customer messaging. Drafting a customer message is a tool. Sending one to three hundred customers in a single action is not. Outbound campaigns go through a campaign engine with its own approval and throttling — the AI can draft, the operator can approve a draft, the campaign engine handles the send. The AI does not get to be the sending mechanism.

A note on plausibility

Each of the absent tools above was, at some point, proposed in good faith — usually by someone making the reasonable observation that the AI could "obviously" do this. The capability check is the easy part. The "should this be a tool" check is the part that takes the meeting. The default is no. Adding a tool is permanent in the same way adding an API endpoint is permanent: every operator everywhere now has it, and the audit story now includes it. We add tools when we can describe the audit story in advance. We don't add them when "the AI did it" would be the only available answer.

How a tool gets added

The process for adding a tool is deliberately heavier than the process for adding a back-office button.

Permission audit. What controller / action pair gates this in the existing back-office? If there is no existing gate, why does this exist at all?
Skin scoping check. Does the tool query against the agent's skin? If the answer involves a join across skins, the tool doesn't ship.
PII path. What sensitive fields can appear in the result? Are they tokenized? Are they conditionally gated?
Write or read? If write: deterministic preview generator, reversibility flag, affected-IDs list, pass through the approval pipeline.
Failure mode. What does the tool return when the input is bad, the player doesn't exist, or the data is missing? "Throw and let the planner handle it" is not an answer.
Audit hook. Does the existing audit trail cover the tool's effect, or do we need to add one? Adding a write tool without auditability is not an option.

Most tool proposals don't reach implementation. The ones that do are usually narrower than the original idea, by design.

Why one big brain works

An obvious alternative architecture is to have multiple specialized agents — a payments AI, a KYC AI, a bonus AI — each with its own tool subset. We chose against this, and the reason was operational rather than technical.

Operators don't think in agents; they think in problems. A churn analysis is a player problem and a bonus problem and a messaging problem and a reporting problem. Splitting the AI by domain forces the operator to choose the agent before they know what kind of help they need. The single planner with a unified catalog lets the operator describe the problem in their own terms and lets the planner pick tools across domains as needed.

The complexity moves to the planner. That's the right place for it. The catalog stays flat, every tool is independent, and the relationships between tools — "this analysis usually follows from that query" — are emergent properties of the planner's behavior, not configuration.

Today

PAM's AI tool registry is around forty tools across thirteen operational domains, with new tools added on a deliberate review path. Every tool is typed, permission-scoped, skin-bounded, and PII-aware. Write tools generate previews. Tool results pass through a tokenization layer before reaching the model. The list of tools we've decided not to build is longer than the list of tools we have built — and the conversations about the absent tools are the ones that determined whether the system was deployable in regulated jurisdictions at all.

The principle

A tool catalog is not a feature list. It's a security boundary. Adding a tool is a permanent decision that expands the AI's reach across every operator running PAM. Refusing to add a tool is a quiet decision that shapes what the AI can and cannot ever be asked to do.

Most of the thinking in agentic AI for regulated industries goes into the planner, the model, the prompt. In our experience, more of the thinking should go into the catalog. The model is a moving part; the catalog is the immovable one. Get it right and the moving parts have somewhere stable to move within. Get it wrong and no amount of prompt engineering will save you.

Share this insight

Share on LinkedIn

Preview post text