Onur Altıntaşlı — Data Management Leader

Every pipeline I have seen in a Turkish pension operation starts with the same wrong assumption: that a participant is uniquely identified by their TC kimlik numarası, and that a LEFT JOIN on that column resolves the question of "who is this person." It does not. In BES, participant identity is the single hardest matching problem in Turkish financial data — harder than customer deduplication in retail banking, harder than policyholder resolution in general insurance, and materially harder than the mükellef matching done by the tax authority.

The reason is not technical. It is historical. The BES system was built on top of employer payroll files, transferred contracts from şirket-to-şirket, legacy sigorta portfolios converted in 2013, and a regulatory environment where EGM (now Emeklilik Gözetim Merkezi) reconciles daily against records that were never designed to be reconciled.

Why TC Kimlik Is Not the Clean Key You Think It Is

On paper, TC kimlik numarası is an 11-digit unique national identifier with a checksum. In practice, in a pension book of business built up over 20+ years, you will find:

Foreign participants who joined before yabancı kimlik numarası (starting with 99) was standardized, entered with passport numbers padded to 11 digits, or with placeholder values like 11111111111.
Minors enrolled in OKS (Otomatik Katılım Sistemi) whose TC was entered incorrectly by the employer's payroll system — one transposed digit and the checksum still passes for a different real person.
Deceased participants whose TC was reissued to a family member's file during a beneficiary claim, creating two participants with the same key.
Corporate transfers where the incoming şirket's file had TC values that failed Nüfus validation and were "cleaned" by an ETL job that dropped the row or, worse, defaulted the value.
Name-only participants from pre-2001 individual pension contracts migrated into BES, where TC was retroactively populated from a name match against Nüfus records — a match that was already probabilistic when it was done.

If your identity model assumes TC is the primary key, every one of these cases becomes a silent data quality bomb that surfaces during a MASAK inquiry or a beneficiary payout.

The Employer Submission Problem

OKS made this worse, not better. Employers submit monthly contribution files with participant identifiers, and the mismatch rate against the master participant record is not zero. Common patterns:

Employer HR system stores the participant's maiden name; the pension company has the married name after a Nüfus update.
Employer submits with an old TC that was reassigned after a citizenship correction (rare but happens with naturalized participants).
Employer's file has the participant's IBAN or SGK number in the TC column due to a mapping error in their payroll export.
Two employees at the same firm swapped in the file — same employer, same contribution amount, wrong participants.

Each of these requires a matching decision, not a lookup. And the decision has to be defensible: if you match employer contribution X to participant Y with 87% confidence and the fon allocation is wrong, that is a legal exposure under the pension regulations, not a data engineering inconvenience.

Transfer-In Records: The Worst Category

When a participant transfers from one pension company to another (aktarım), the receiving company gets a file with:

The participant's identity as recorded by the sending company
Contribution history
Fon allocation
Beneficiary data

None of this is guaranteed to match what the receiving company already knows about that person — because that person may already be a participant at the receiving company under a slightly different identity record. I have personally seen cases where a participant transferred in from another firm and was created as a new participant because the incoming name had "Ş" where the existing record had "S", or the doğum tarihi was off by one day due to a legacy conversion from Hicri calendar entries in older records.

The correct behavior is: run a fuzzy match against existing participants, produce a confidence score, and route anything below a threshold to a human review queue. What most systems actually do: create a duplicate participant, which then has to be manually merged three years later when the participant calls to ask why their birikim looks half of what it should be.

Identity Resolution Is a Confidence Score

The architectural mistake is treating participant identity as a boolean — either this record IS the participant or it is not. In reality, every incoming record (employer file, transfer file, agent submission, corporate enrollment) should produce a match score against the master participant registry, composed of:

TC kimlik exact match (weighted heavily, but not absolute)
Ad + soyad normalized match (Turkish character folding, common variant handling: Mehmet/Mehmed, Ayşe/Ayse)
Doğum tarihi match with tolerance for known conversion errors
Anne adı, baba adı where available (still the strongest disambiguators for common names)
Historical employer overlap
Address and iletişim history

The output is a probability, and the pipeline should have explicit thresholds: auto-match above 0.95, manual review 0.75–0.95, auto-reject below 0.75. Every decision gets logged with the score, the features, and the reviewer if applicable. That log is what you produce when SPK or EGM asks how you concluded that contribution X belongs to participant Y.

What This Means for the Data Layer

A few concrete implications for anyone building or maintaining a BES data platform:

The participant table should have both a business identifier (TC) and a synthetic surrogate key. Never join across systems on TC alone.
Every incoming file needs a staging layer where matching happens before insertion, not after.
Match scores need to be persisted as a first-class field on every relationship — contribution-to-participant, transfer-to-participant, beneficiary-to-participant. If it is not stored, it did not happen from an audit perspective.
Duplicate detection cannot be a monthly batch job. It has to run on every ingest, because a merged duplicate three years later means recalculating fon getirisi across the entire holding period.
Manual review queues are not a failure mode. They are a required control. If your pipeline has zero manual matches, your thresholds are wrong.

The Legal Dimension

Identity errors in pension systems are not tolerated the way they are in, say, marketing databases. A misallocated contribution affects fon değeri, which affects the participant's future emekli maaşı, which affects payout calculations at retirement or on death. Regulators expect that the pension company can prove — with evidence — why every kuruş sits in the account it sits in.

This is why the matching problem cannot be treated as a lookup. A lookup that returns the wrong row produces silent, compounding financial error over a 20-year holding period. A confidence-scored match produces auditable decisions that can be defended, corrected, and improved.

Most pension data teams in Turkey are still running the lookup. The mature ones have already moved to the confidence model — usually after their first serious reconciliation failure taught them the difference.