Every two years, when the BES auto-enrollment cycle comes around, someone in a management meeting will say it out loud: "It's just adding new participants. The enrollment logic already works."
That sentence is where the incident bridge starts.
Regulatory auto-enrollment and re-enrollment mandates are not a marketing operation. They are a synchronized, legally-timestamped mass policy inception event, and they expose every optimistic assumption your contribution pipeline, EGM reporting stack, and state contribution (devlet katkısı) reconciliation layer was built on. The enrollment logic itself is rarely the problem. The problem is everything downstream of it.
What Actually Happens in the Batch Window
On a normal business day, a BES operator ingests maybe a few hundred new participant records across dozens of employer files. Contribution files arrive throughout the month. State contribution requests are generated on a predictable monthly cadence. EGM feeds are incremental.
During an auto-enrollment or re-enrollment cycle, all of that changes in one night:
- Tens of thousands (sometimes hundreds of thousands) of new participant records with the same legally valid effective date
- First contributions attached to those policies within the same batch
- Employer files that were built by HR systems with wildly inconsistent quality
- EGM notifications required within regulatory windows
- State contribution eligibility flags that must be correctly set on day one, because getting them wrong triggers reconciliation exceptions that take months to unwind
The enrollment engine handles it. It was designed to. The rest of the stack was not.
Where Things Actually Break
1. The Contribution Pipeline Assumes Temporal Spread
Most contribution processing pipelines were designed under the implicit assumption that policy inception dates are distributed across the calendar. Indexes, partitioning strategies, and staging table designs all quietly depend on this.
When 200,000 policies share the same effective date, you get:
- Partition hot spots on date-partitioned tables
- Index bloat on
effective_datecomposite keys - Aggregation queries that used to return in seconds timing out
- Reconciliation reports that group by inception cohort producing single rows with hundreds of thousands of members
The pipeline doesn't fail. It just gets very, very slow, and it gets slow at the exact moment the operations team needs it to be fast.
2. EGM Reporting Was Built for Deltas
EGM (Emeklilik Gözetim Merkezi) reporting stacks are typically designed around incremental change. New policy today, status change tomorrow, contribution posted the day after. The reporting layer builds its extracts around "what changed since the last successful run."
An auto-enrollment batch inverts this. Everything changed. The delta is the population. Reports that normally emit a few megabytes now emit gigabytes. Validation routines that iterate row-by-row hit timeouts. And because EGM submission windows are regulatory, you cannot simply "try again tomorrow."
I have seen teams discover, at 3 AM on the morning after enrollment, that their EGM extract job has a hardcoded row limit somewhere in a stored procedure written seven years ago by someone who no longer works at the company. That limit was 100,000. The batch was 340,000.
3. State Contribution Reconciliation Assumes Steady-State
Devlet katkısı reconciliation is the quiet killer. The logic that matches state contribution requests against Treasury responses, and reconciles rejections back to participants, was built assuming a normal monthly volume with predictable rejection patterns.
Auto-enrollment produces:
- A first-month state contribution request larger than the entire previous quarter
- Rejection rates that spike because employer-provided TCKN and identity data is dirtier than normal ongoing data
- Withdrawal-window exits (the 2-month opt-out right) that must be reconciled backwards against state contributions that may have already been requested
That last point is the one that catches everyone. A participant who opts out within the legal window is entitled to a full refund including any state contribution accrual. If your reconciliation layer processed the state contribution request before the opt-out was recorded — and it will, because the opt-out window extends past the first contribution cycle — you now have a manual unwind for every single opt-out.
At scale, "manual unwind for every opt-out" means a dedicated team for six months.
The Assumptions That Fail Silently
Going through incident postmortems from several enrollment cycles, the pattern is consistent. The failures are almost never in code that was written for enrollment. They are in code that was written years earlier under assumptions that were true at the time:
- Timestamp precision assumptions. When 50,000 policies have
created_atvalues within the same millisecond, ordering-dependent logic breaks. - Sequence and ID generation. Sequences that were fine at 1,000/day become bottlenecks at 10,000/minute.
- Downstream fan-out. CRM, document generation, welcome-letter dispatch, SMS notification — each of these has its own capacity envelope, and none of them were sized for a mass event.
- Reconciliation cutoffs. End-of-day reconciliation windows that assume the day's activity fits within the overnight window.
- Reference data lookups. Employer master data, sector codes, workplace mappings — all get hit thousands of times per second during the batch, and the caching layer either wasn't there or was never tuned for this pattern.
What Actually Works
From experience, the operational patterns that survive an auto-enrollment cycle look like this:
- Treat the enrollment batch as a first-class release event, not a business-as-usual run. Freeze other changes, staff the bridge, pre-warm caches, pre-allocate sequence ranges.
- Shadow-run the batch against production-scale synthetic data at least twice before the real date. Not UAT volume. Production volume. The bugs only appear at scale.
- Decouple the enrollment write from the downstream fan-out. Get the participant records committed with correct effective dates first. Let EGM notification, document generation, and welcome communications flow through async queues with backpressure. Regulatory clocks care about the record; the SMS can wait an hour.
- Pre-calculate the state contribution reconciliation impact. Model the opt-out unwind scenario before the batch runs. Know your worst case.
- Instrument the batch with cohort-level metrics, not just row counts. "350,000 rows processed" tells you nothing. "Cohort X: 12,000 policies, 98.2% with valid TCKN, 87 flagged for manual review" tells you what to do next.
The Real Lesson
Auto-enrollment is not hard because the enrollment logic is complex. It is hard because it is the only operation in BES that simultaneously exercises every downstream system at peak load with a hard regulatory deadline and legally binding effective dates.
Everything that has ever been approximate, quietly under-tested, or built around "normal daily volume" surfaces in the same batch window. And unlike most production incidents, you cannot roll back. The effective dates are legally established the moment the batch commits.
The teams that treat auto-enrollment like a marketing campaign spend the following quarter cleaning up. The teams that treat it like a coordinated mass-inception release, with the same rigor as a core system migration, spend that quarter working on something else.