← Back

2026-07-05

The Matching Problem: Why EGM Participant Identity Is Harder to Resolve Than Any Other Financial ID in Turkey

Every pipeline I have seen in a Turkish pension operation starts with the same wrong assumption: that a participant is uniquely identified by their TC kimlik numarası, and that a LEFT JOIN on that column resolves the question of "who is this person." It does not. In BES, participant identity is the single hardest matching problem in Turkish financial data — harder than customer deduplication in retail banking, harder than policyholder resolution in general insurance, and materially harder than the mükellef matching done by the tax authority.

The reason is not technical. It is historical. The BES system was built on top of employer payroll files, transferred contracts from şirket-to-şirket, legacy sigorta portfolios converted in 2013, and a regulatory environment where EGM (now Emeklilik Gözetim Merkezi) reconciles daily against records that were never designed to be reconciled.

Why TC Kimlik Is Not the Clean Key You Think It Is

On paper, TC kimlik numarası is an 11-digit unique national identifier with a checksum. In practice, in a pension book of business built up over 20+ years, you will find:

If your identity model assumes TC is the primary key, every one of these cases becomes a silent data quality bomb that surfaces during a MASAK inquiry or a beneficiary payout.

The Employer Submission Problem

OKS made this worse, not better. Employers submit monthly contribution files with participant identifiers, and the mismatch rate against the master participant record is not zero. Common patterns:

Each of these requires a matching decision, not a lookup. And the decision has to be defensible: if you match employer contribution X to participant Y with 87% confidence and the fon allocation is wrong, that is a legal exposure under the pension regulations, not a data engineering inconvenience.

Transfer-In Records: The Worst Category

When a participant transfers from one pension company to another (aktarım), the receiving company gets a file with:

None of this is guaranteed to match what the receiving company already knows about that person — because that person may already be a participant at the receiving company under a slightly different identity record. I have personally seen cases where a participant transferred in from another firm and was created as a new participant because the incoming name had "Ş" where the existing record had "S", or the doğum tarihi was off by one day due to a legacy conversion from Hicri calendar entries in older records.

The correct behavior is: run a fuzzy match against existing participants, produce a confidence score, and route anything below a threshold to a human review queue. What most systems actually do: create a duplicate participant, which then has to be manually merged three years later when the participant calls to ask why their birikim looks half of what it should be.

Identity Resolution Is a Confidence Score

The architectural mistake is treating participant identity as a boolean — either this record IS the participant or it is not. In reality, every incoming record (employer file, transfer file, agent submission, corporate enrollment) should produce a match score against the master participant registry, composed of:

The output is a probability, and the pipeline should have explicit thresholds: auto-match above 0.95, manual review 0.75–0.95, auto-reject below 0.75. Every decision gets logged with the score, the features, and the reviewer if applicable. That log is what you produce when SPK or EGM asks how you concluded that contribution X belongs to participant Y.

What This Means for the Data Layer

A few concrete implications for anyone building or maintaining a BES data platform:

The Legal Dimension

Identity errors in pension systems are not tolerated the way they are in, say, marketing databases. A misallocated contribution affects fon değeri, which affects the participant's future emekli maaşı, which affects payout calculations at retirement or on death. Regulators expect that the pension company can prove — with evidence — why every kuruş sits in the account it sits in.

This is why the matching problem cannot be treated as a lookup. A lookup that returns the wrong row produces silent, compounding financial error over a 20-year holding period. A confidence-scored match produces auditable decisions that can be defended, corrected, and improved.

Most pension data teams in Turkey are still running the lookup. The mature ones have already moved to the confidence model — usually after their first serious reconciliation failure taught them the difference.