We had a background job that processed thousands of records in parallel. Each batch ran concurrently, and we kept track of total successful and failed records.
Everything worked perfectly.
For almost a year.
Then one day, the totals started becoming… wrong.
The Setup
- Records processed in chunks
- Multiple chunks running concurrently
- Shared counters tracking totals
- Periodic database updates with progress
The Symptom
- Some runs showed fewer successful records than expected
- Re-running the same data produced different counts
- The issue appeared only in one environment
What Was Actually Happening
Initial total = 10
Worker A reads total (10)
Worker B reads total (10)
Worker A increments → 11
Worker B increments → 11
Final total = 11 (should be 12)
This is a classic lost update race condition.
The Buggy Code
int totalSuccess = 0;
Parallel.ForEach(records, record =>
{
if (Process(record))
{
totalSuccess++; // not atomic
}
});
Why volatile Alone Doesn't Fix It
private static volatile int totalSuccess = 0;
This ensures visibility, but not atomicity.
The Fix: Atomic Counters
int totalSuccess = 0;
Parallel.ForEach(records, record =>
{
if (Process(record))
{
Interlocked.Increment(ref totalSuccess);
}
});
Snapshot-Based Progress Reporting
var finished = Interlocked.Increment(ref completedChunks);
if (finished % maxConcurrency == 0)
{
var successSnapshot = Volatile.Read(ref totalSuccess);
var failureSnapshot = Volatile.Read(ref totalFailed);
job.TotalSuccessfulRecords = successSnapshot;
job.TotalFailedRecords = failureSnapshot;
await UpdateJobProgress(job);
}
Lessons Learned
- Thread-safe collections ≠ thread-safe logic
- ++ is not atomic
- volatile ensures visibility, not correctness
- Use Interlocked for counters
- Use snapshot reads for reporting
- Reduce shared mutable state
- Concurrency bugs are timing dependent
Takeaway
If you're running parallel batch jobs and tracking totals:
- Use atomic counters
- Take snapshot reads for reporting
- Avoid frequent shared writes
Otherwise, everything may look fine… until it doesn't.
#concurrency
#multithreading
#csharp
#dotnet
#performance