← Back

2026-06-06

Data Quality Metrics That Actually Matter in Pension Data vs. the Ones Consultants Sell You

Every consultant who walks into a pension operation with a data quality deck shows the same six dimensions: completeness, accuracy, consistency, timeliness, uniqueness, validity. They built those for transactional systems — order entry, CRM, billing. Then they exported them, unchanged, into pension and individual retirement contexts where the underlying data model is fundamentally different. The result is dashboards that glow green while real money sits in the wrong bucket.

I've spent years running HAYMER, GEV, and state contribution reconciliation pipelines in parallel. The metrics that catch real problems look nothing like what's on those slides.

Why Generic DQ Frameworks Miss the Point

Pension data is not transactional. It is temporal-bitemporal-event-sourced whether your vendor admits it or not. Every participant record carries:

A "completeness score of 99.7%" tells you nothing useful here. A record can have every field populated and still be catastrophically wrong because the effective date of a state transition was recorded one business day late, which then misaligns three months of government contribution matching.

Accuracy as a percentage is even worse. One wrong status flag on a high-balance participant can cost more than ten thousand correct rows save.

The Metrics That Actually Matter

After enough late-night reconciliations, the operationally useful metrics fall into five categories. None of them ship in a vendor tool.

1. Effective Date Drift

For every state transition (entry, exit, suspension, fund switch, retirement), measure the distribution of:

system_date - effective_date

The mean is meaningless. What you want is the p95 and p99 tail, broken down by transition type and source channel. A fund switch backdated 45 days is an operational problem. A state-contribution-eligible suspension backdated 45 days is a regulatory problem because it changes what the state owes the participant retroactively.

The metric to operationalize: percentage of transitions where drift exceeds the matching cycle boundary. That single number predicts how much manual reconciliation your back office will eat next month.

2. Bucket Alignment Integrity

Contributions live in buckets — by period, by fund, by contribution type (employee, employer, state match), and by regulatory regime. The integrity check that matters is not "does the total reconcile." Totals always reconcile eventually because someone forces them to.

The real check: for each (participant, period, contribution_type) triple, does the bucket assignment in HAYMER match the bucket assignment in the state contribution feed and in GEV's fund-level allocations?

A one-business-day misalignment between when a contribution was deducted from payroll and when it was allocated to the fund unit price produces a unit count error that compounds forever. You cannot fix it later without overriding history.

The operational metric: count of triples where any two of the three systems disagree on the bucket, weighted by balance impact. Not row count. Balance-weighted disagreement count.

3. State Transition Legality

Pension state machines have illegal transitions. A participant cannot go from "surrendered" back to "active" without a specific reinstatement event. They cannot be in two active contracts under the same regime simultaneously beyond a defined grace window. They cannot receive a state match on a contribution that arrived after the eligibility cutoff.

A validity check that says "status is one of the allowed enum values" passes all of these broken cases. You need a transition graph check: for every status change in the last N days, does the (from_state, to_state, trigger_event, effective_date) tuple correspond to a legal edge?

Illegal transitions are usually one or two per million records. They are also almost always the ones that produce complaints, regulatory questions, or write-offs.

4. Cross-System Identity Stability

The same participant exists under different identifiers across payroll, the pension admin platform, the custodian, and the regulator's database. Vendors will sell you a uniqueness metric on TCKN or national ID. Useless. The actual failure mode is identity drift over time: the same TCKN mapped to different internal participant IDs across system snapshots, or two TCKNs collapsed into one ID after a merge that should not have happened.

Metric: for each (external_id, internal_id) pair, how many distinct mappings have existed historically, and how many participants currently have more than one active internal ID across systems? This catches merger artifacts, manual data entry collisions, and the slow-motion identity corruption that happens when migration scripts run without proper deduplication.

5. Reconciliation Latency, Not Just Reconciliation Status

Everyone reports whether reconciliation passed. Nobody reports how long it took for a discrepancy to close. The useful metric is the age distribution of open breaks, segmented by break type and monetary impact.

A break that has been open for 90 days is not a data quality issue anymore. It is an operational debt that someone decided not to pay. Surfacing the age distribution makes that decision visible to people who can actually do something about it.

What to Stop Reporting

If your DQ dashboard prominently features any of these, replace them:

These metrics optimize for looking healthy, not being healthy. They also create perverse incentives: teams chase the completeness number by defaulting empty fields, which destroys the signal that something was actually missing.

How to Roll This Out Without Setting Off a War

The political problem with replacing vendor metrics is that someone signed off on them, usually at a senior level. Two things work:

  1. Run both in parallel for one quarter. Show the case where the vendor dashboard is green and the new metrics caught a real loss. There is always one. Usually several.
  2. Tie the new metrics to specific incidents. Don't pitch a framework. Pitch: "This metric would have caught the contribution bucket misalignment that cost us 11 days of reconciliation work in March."

Frameworks lose to anecdotes every time in operational meetings. Use that.

The Underlying Point

Data quality in pension systems is not about clean fields. It is about whether the temporal, financial, and regulatory state of every participant is internally consistent across every system that touches them, at every point in time someone might need to ask. None of the standard DQ dimensions measure that directly. Most of them actively distract from it.

Build the metrics that match how the data actually fails. Throw away the rest of the deck.