•
Published

The industry has spent two years hardening itself against AI bots and you have seen me write about deepfakes in video KYC, etc. Talking to a few friends this past week who work at the big AI labs, I realized the same threat model has been quietly attacking a softer surface: evals. The humans who produce RLHF training signal and eval ground truth are increasingly not who they claim to be, and as models get better at mimicking exactly the judgment those humans are paid to supply, the problem is about to get significantly worse.
Where ground truth comes from
Every serious eval pipeline bottoms out in human judgment: people label the data that defines correct, express the preferences RLHF optimizes against, red-team for failures automated checks miss, and supply expertise where being wrong is expensive. When you report a GPQA number or a preference win rate, you are trusting that the ground truth underneath was produced by real, competent, independent people. That assumption is doing enormous load-bearing work, and it is weakening.
The pools leak in predictable ways: Contributors run an AI model behind the curtain and submit its output as their own judgment, which is especially corrosive because the thing you are measuring is now grading itself. One operator runs many accounts, defeating the independence that makes aggregated labels meaningful.
People assert expertise they do not have, so your "expert" ratings are neither. And they collude on the gold-standard items meant to catch exactly this. The irony is plain: the data used to teach models what good looks like, and to verify whether a model is safe, is being polluted by the same synthetic-identity attack the industry spends heavily to defeat everywhere else.
The trend lines all point the wrong way, and the trajectory matters more than the current severity. Per-task pay rewards throughput and penalizes honesty. The tooling to impersonate a person, or to run ten accounts as one, gets cheaper every quarter. But the real forcing function is model capability: as frontier models close the gap on the exact judgment annotators are paid to supply, detection-by-inspection becomes structurally unreliable. The thing you are trying to catch fraud with is the same thing the fraudster is using. That dynamic does not plateau; it compounds. Agentic evals make the endpoint clear: the bottleneck becomes a human verifying that a multi-step agent trajectory did the right thing, and when that sign-off is your safety guarantee, the provenance of the signature matters as much as the signature itself.
You do not solve a provenance problem with better observability. Dashboards tell you what was submitted, not whether a unique, real, competent human stood behind it. That requires proof generated at the point of submission, not inferred statistically after the fact.
Prove the human, do not trust the claim
This is the premise Self was built on. The protocol lets a person prove they are a unique, real human using the cryptographic chip in their passport, and prove specific attributes about themselves, without disclosing their identity to whoever requested the proof. Proof of personhood and selective proof of credentials, enforced with zero-knowledge cryptography rather than a vendor's promise.
Attach a zero-knowledge attestation to each unit of human-produced data, and it establishes several properties at once, cryptographically rather than by trust. It proves a real, unique human produced the submission, ruling out the bot, the script, and the model-in-a-costume. It binds that submission to a single identity, so one operator cannot farm many accounts and break the independence your aggregation assumes. Where the task needs expertise, it proves the contributor holds the credential, without exposing who they are. And it does all this without the contributor handing any personal data to the lab or vendor, because the proof travels while the identity stays put.
The downstream effect is what matters. The lab buying the dataset no longer takes the vendor's word that contributors were real and qualified, because that guarantee arrives attached and independently checkable. The vendor stops policing fraud reactively with gold-question traps, because the properties fraud violates are now enforced at write time. Sybil resistance, human-ness, and credential gating stop being best-effort and become invariants of the pipeline.
A primitive, not another evals product
We are not building an evals company. Self sits underneath that layer. Verified-human attestation is a primitive you compose into any eval harness, any RLHF workflow, any red-team program. The eval platforms are integration surfaces, not competitors, and the data vendors are the parties who get to certify their output as verified-human and price it accordingly.
The general shape is proof of human work: cryptographic proof that a unique, real, optionally credentialed human produced this artifact, resistant to AI-generated forgery. Same threat model as deepfake KYC, synthetic identity attacking a trust system. New surface, the data supply chain that trains and evaluates frontier models. And the regulatory path rhymes with age verification: "verified human" moves from nice-to-have, to procurement line item, to compliance requirement, and someone has to be the neutral, privacy-preserving standard that satisfies it. Better an open protocol than a black box owned by whoever moved first, because a trust primitive is worthless if you have to trust its operator.
The industry built its foundation on human judgment and stopped checking whether the humans were real. That was a reasonable bet when models were weak enough that a fraudulent label was still recognizably human. It stops being reasonable as models get better at producing the exact signal those humans are paid to supply, because at that point the unchecked assumption becomes a load-bearing crack running through both training and evaluation, and it widens with every capability jump. We built the primitive that closes it before that crack becomes structural. Prove the human, and the rest of the measurement can be trusted again.
Published
Related blogs
Stay updated
Join us on the road to privacy-first identity.


