The Wallet Problem

Gaming platforms handle money. That sentence sounds obvious, but its implications shape every architecture decision we make — especially around concurrency.

When PAM was processing thousands of simultaneous transactions during high-traffic promotions, we encountered a class of problem that no load test prepares you for: the race condition in a read-modify-write cycle.

The incident

A player initiates a withdrawal during an active promotion. Simultaneously, the bonus engine credits a freespin win. Both operations read the wallet balance at the same instant, both find sufficient balance, both proceed to update. The result: two writes with stale reads. The player's wallet reflects neither transaction correctly — or worse, both are over-committed.

This isn't hypothetical. It happened. And in a regulated gaming environment, a wallet discrepancy isn't a bug report — it's a compliance event.

Why the obvious fix doesn't work

The first instinct is to wrap everything in a transaction and call it done. Database transactions do provide atomicity, but a transaction boundary doesn't prevent two separate transactions from reading the same row before either has committed. This is the classic lost update anomaly — and it occurs at the default isolation level in SQL Server (Read Committed).

The second instinct is to use Serializable isolation. This works, but it comes with a cost that's unacceptable in a high-throughput system: every read in a serializable transaction places a range lock, dramatically reducing concurrency and increasing the likelihood of deadlocks under load.

We tried this path. During a promotional freespin release where tens of thousands of credits were being applied in parallel, the deadlock rate spiked. The system was correct — but it was effectively serializing a workload that should be parallel.

The pattern that held

The solution we settled on is optimistic concurrency with a row version check and automatic retry. The idea is simple: don't assume you're the only writer. Instead, read the current state, compute the new state, and only commit if the state hasn't changed since you read it.

In practice, this means the wallet table carries a RowVersion (timestamp) column. Every update includes the original row version in its WHERE clause:

-- Read
SELECT Balance, RowVersion FROM Wallets WHERE PlayerId = @id

-- Update — only succeeds if RowVersion hasn't changed
UPDATE Wallets
SET Balance = @newBalance, RowVersion = DEFAULT
WHERE PlayerId = @id
  AND RowVersion = @originalVersion

-- If 0 rows affected: another writer got there first → retry

If the update affects zero rows, a concurrent writer committed between our read and our write. We don't panic — we retry. The retry reads fresh state, recomputes, and attempts again. In practice, under realistic concurrency, retries rarely exceed one or two attempts.

The key insight

Optimistic concurrency is not about assuming nothing goes wrong — it's about detecting when something did and recovering cleanly. The cost of a retry is a few milliseconds. The cost of a wrong wallet balance is a compliance incident.

Handling the retry correctly

The retry logic itself needs care. A naive implementation retries indefinitely, which can turn a transient conflict into an infinite loop under extreme load. Our implementation:

Retries up to a configurable maximum (typically 3–5 attempts)
Introduces a small random jitter between retries to avoid thundering herd
Distinguishes between a concurrency conflict (retry is appropriate) and a logic error (retry is not)
Logs every retry with the wallet ID and attempt count for observability

The last point matters more than it sounds. If a wallet is seeing consistent retry exhaustion, that's a signal — either the conflict rate is abnormally high (a bug elsewhere) or the retry budget is too tight. The logs surface this before it becomes a problem.

Where this pattern is applied

Optimistic concurrency is not applied uniformly across the platform. We apply it specifically to operations that:

Modify financial state (wallet debits, credits, bonus credits, withdrawal holds)
Have a realistic probability of concurrent access from different code paths
Are called from both synchronous API flows and asynchronous background services simultaneously

The third point is particularly important in PAM's architecture. A player deposit can trigger — almost simultaneously — a balance update from the payment provider callback, a bonus evaluation from the behavior engine, and a CRM notification from FastTrack. All of these may touch the wallet. Without optimistic concurrency, the order of operations becomes critical. With it, each operation either succeeds on its own terms or retries with fresh state.

Concurrent wallet creation

A related problem surfaced during registration flows: duplicate wallet creation. When a player registers, the system creates their wallet. If the registration request is duplicated at the network layer (a retry from a slow client, or a double-submit), two wallet creation requests can arrive within milliseconds of each other.

This one we fixed at the database layer: a unique index on (PlayerId, CurrencyId, WalletType) ensures that even if two creation requests arrive simultaneously, exactly one commits and the other receives a constraint violation — which we catch, log, and resolve by returning the existing wallet. No money is duplicated. No orphaned records. No try/catch paper-over.

Today

PAM runs high-volume promotions — mass freespin releases, deposit bonuses with tight time windows, jackpot credit distributions — without wallet integrity incidents. The pattern has held across the range of operators and transaction volumes we've seen in five regulated markets. When the retry logger is quiet, we know the system is doing what it should.

The broader lesson

The wallet concurrency problem taught us something that applies across the platform: correctness under concurrency cannot be tested into a system. You can write a test that simulates two concurrent requests, but tests run sequentially by default, and the timing windows that produce race conditions in production are measured in microseconds.

The right approach is to design operations so that concurrent execution is safe by construction — not by luck. Optimistic concurrency, database-level constraints, and idempotent operation design are the tools. Load testing can validate that the retry logic performs acceptably. It cannot validate that the correctness model is sound.

If your wallet engine doesn't have an explicit answer to "what happens when two operations hit simultaneously," it's not a hypothetical question. It's a timer.

Share this insight

Share on LinkedIn

Preview post text