← All insights
Concurrency Production Incident 8 min read

The Wallet Problem

What high-concurrency production failures taught us about financial integrity — and the optimistic concurrency pattern we built to fix it.

W
WebPrefer Engineering
October 2025
The Wallet Problem
Race Condition in Read-Modify-Write
Request A
(Withdrawal)
Wallet
Balance: 1,000 SEK
Request B
(Bonus Credit)
Race Window
Step 1
READ balance → 1,000 SEK
Step 1
READ balance → 1,000 SEK
Step 2
CHECK: 1,000 ≥ 800? ✓
Step 2
CREDIT: 1,000 + 200 = 1,200
Step 3
WRITE balance = 200
Step 3
WRITE balance = 1,200
LOST UPDATE: Last write wins — one operation silently lost
FIX: Optimistic concurrency — check RowVersion before WRITE, retry on conflict

Gaming platforms handle money. That sentence sounds obvious, but its implications shape every architecture decision we make — especially around concurrency.

When PAM was processing thousands of simultaneous transactions during high-traffic promotions, we encountered a class of problem that no load test prepares you for: the race condition in a read-modify-write cycle.

The incident

A player initiates a withdrawal during an active promotion. Simultaneously, the bonus engine credits a freespin win. Both operations read the wallet balance at the same instant, both find sufficient balance, both proceed to update. The result: two writes with stale reads. The player's wallet reflects neither transaction correctly — or worse, both are over-committed.

This isn't hypothetical. It happened. And in a regulated gaming environment, a wallet discrepancy isn't a bug report — it's a compliance event.

Why the obvious fix doesn't work

The first instinct is to wrap everything in a transaction and call it done. Database transactions do provide atomicity, but a transaction boundary doesn't prevent two separate transactions from reading the same row before either has committed. This is the classic lost update anomaly — and it occurs at the default isolation level in SQL Server (Read Committed).

The second instinct is to use Serializable isolation. This works, but it comes with a cost that's unacceptable in a high-throughput system: every read in a serializable transaction places a range lock, dramatically reducing concurrency and increasing the likelihood of deadlocks under load.

We tried this path. During a promotional freespin release where tens of thousands of credits were being applied in parallel, the deadlock rate spiked. The system was correct — but it was effectively serializing a workload that should be parallel.

The pattern that held

The solution we settled on is optimistic concurrency with a row version check and automatic retry. The idea is simple: don't assume you're the only writer. Instead, read the current state, compute the new state, and only commit if the state hasn't changed since you read it.

In practice, this means the wallet table carries a RowVersion (timestamp) column. Every update includes the original row version in its WHERE clause:

-- Read SELECT Balance, RowVersion FROM Wallets WHERE PlayerId = @id -- Update — only succeeds if RowVersion hasn't changed UPDATE Wallets SET Balance = @newBalance, RowVersion = DEFAULT WHERE PlayerId = @id AND RowVersion = @originalVersion -- If 0 rows affected: another writer got there first → retry

If the update affects zero rows, a concurrent writer committed between our read and our write. We don't panic — we retry. The retry reads fresh state, recomputes, and attempts again. In practice, under realistic concurrency, retries rarely exceed one or two attempts.

The key insight

Optimistic concurrency is not about assuming nothing goes wrong — it's about detecting when something did and recovering cleanly. The cost of a retry is a few milliseconds. The cost of a wrong wallet balance is a compliance incident.

Handling the retry correctly

The retry logic itself needs care. A naive implementation retries indefinitely, which can turn a transient conflict into an infinite loop under extreme load. Our implementation:

The last point matters more than it sounds. If a wallet is seeing consistent retry exhaustion, that's a signal — either the conflict rate is abnormally high (a bug elsewhere) or the retry budget is too tight. The logs surface this before it becomes a problem.

Where this pattern is applied

Optimistic concurrency is not applied uniformly across the platform. We apply it specifically to operations that:

The third point is particularly important in PAM's architecture. A player deposit can trigger — almost simultaneously — a balance update from the payment provider callback, a bonus evaluation from the behavior engine, and a CRM notification from FastTrack. All of these may touch the wallet. Without optimistic concurrency, the order of operations becomes critical. With it, each operation either succeeds on its own terms or retries with fresh state.

Concurrent wallet creation

A related problem surfaced during registration flows: duplicate wallet creation. When a player registers, the system creates their wallet. If the registration request is duplicated at the network layer (a retry from a slow client, or a double-submit), two wallet creation requests can arrive within milliseconds of each other.

This one we fixed at the database layer: a unique index on (PlayerId, CurrencyId, WalletType) ensures that even if two creation requests arrive simultaneously, exactly one commits and the other receives a constraint violation — which we catch, log, and resolve by returning the existing wallet. No money is duplicated. No orphaned records. No try/catch paper-over.

Today

PAM runs high-volume promotions — mass freespin releases, deposit bonuses with tight time windows, jackpot credit distributions — without wallet integrity incidents. The pattern has held across the range of operators and transaction volumes we've seen in five regulated markets. When the retry logger is quiet, we know the system is doing what it should.

The broader lesson

The wallet concurrency problem taught us something that applies across the platform: correctness under concurrency cannot be tested into a system. You can write a test that simulates two concurrent requests, but tests run sequentially by default, and the timing windows that produce race conditions in production are measured in microseconds.

The right approach is to design operations so that concurrent execution is safe by construction — not by luck. Optimistic concurrency, database-level constraints, and idempotent operation design are the tools. Load testing can validate that the retry logic performs acceptably. It cannot validate that the correctness model is sound.

If your wallet engine doesn't have an explicit answer to "what happens when two operations hit simultaneously," it's not a hypothetical question. It's a timer.

Share this insight
Share on LinkedIn
Preview post text
More insights
Get in Touch

Ready to see it?

We offer live demos scoped to your specific operation type — whether you're launching a new brand, migrating from an existing platform, or evaluating options for a white-label deployment.

Address
Wahlbecksgatan 8, 582 13 Linköping, Sweden
Mikael Lindberg Castell
mikael@webprefer.com
CEO & Founder, WebPrefer AB