Dissei Data Research

Dissei Data Research https://data.dissei.credit/articles/ Field notes on building verifiable reinforcement-learning environments for institutional finance. en The theatrics of precision: the model reads what it won't repeat https://data.dissei.credit/articles/theatrics-of-precision.html https://data.dissei.credit/articles/theatrics-of-precision.html Fri, 03 Jul 2026 00:00:00 GMT 180 runs, three synthetic deals, the same forward margin rendered as words, an integer, and two decimals. The verdict never moved. What the model was willing to repeat moved by a factor of twenty-eight — and that gap is the tell. Your reward function is lying to you: probes for graders that fail silently https://data.dissei.credit/articles/your-reward-function-is-lying-to-you.html https://data.dissei.credit/articles/your-reward-function-is-lying-to-you.html Fri, 03 Jul 2026 00:00:00 GMT Before any graded task reaches a candidate model, we run three attacks against its grader, all hunting the same thing: reward without reasoning. A task that survives ships. A task that doesn't gets rewritten — and we keep the log, including the attacks that won. Does the model know the deal, or understand it? https://data.dissei.credit/articles/does-the-model-know-the-deal.html https://data.dissei.credit/articles/does-the-model-know-the-deal.html Fri, 03 Jul 2026 00:00:00 GMT Every task exists twice: once in the original deal room, once in a byte-level anonymized fork. The score gap between them separates a model that read the documents from one that recognized the company — but only after the probe passes its own positive control. The rubric locks first: grading criteria you write after reading the answer are not criteria https://data.dissei.credit/articles/the-rubric-locks-first.html https://data.dissei.credit/articles/the-rubric-locks-first.html Fri, 03 Jul 2026 00:00:00 GMT A rubric that can still change after seeing model output is not testing the model — it is describing it. Every task we ship can produce two dates on request: the day its rubric froze and the day a candidate model first saw it. The second never comes before the first. The noise floor of judgment: what 564 runs and a 69-task eval taught us about our own scoring math https://data.dissei.credit/articles/the-noise-floor-of-judgment.html https://data.dissei.credit/articles/the-noise-floor-of-judgment.html Fri, 03 Jul 2026 00:00:00 GMT A 69-task eval came back 68% zeros with a tail to 0.7 — two populations, not one. The model was not the culprit; the scoring math was manufacturing the pile. What we blamed first, what it actually was, and the one-line change we shipped in response, dated. The anchor travels: one deal's chatter prices the next deal https://data.dissei.credit/articles/the-anchor-travels.html https://data.dissei.credit/articles/the-anchor-travels.html Thu, 11 Jun 2026 00:00:00 GMT Plant one sentence of unverified market chatter in the first deal an AI analyst reads, and its price for an unrelated second deal in the same session moves half a turn of EBITDA. Telling the model to evaluate the second deal independently removes less than a third of the effect. Nothing in the output says it happened. Prompt Risk Is Investment Risk https://data.dissei.credit/articles/prompt-risk-is-investment-risk.html https://data.dissei.credit/articles/prompt-risk-is-investment-risk.html Mon, 08 Jun 2026 00:00:00 GMT We quadrupled a company's biggest risk and the AI's price didn't move. We added one line of market chatter and it moved $45 million. What that means for any committee putting AI near a capital decision. You can scaffold away the flip. You can't scaffold away the frame. https://data.dissei.credit/articles/scaffold-away-the-frame.html https://data.dissei.credit/articles/scaffold-away-the-frame.html Sun, 07 Jun 2026 00:00:00 GMT 564 isolated runs across one synthetic deal and its controlled variants. The recommendation never moved. The price, the leverage, and what the memo noticed moved with the packaging of the question. How a prompt's framing quietly answers its own question https://data.dissei.credit/articles/prompt-framing.html https://data.dissei.credit/articles/prompt-framing.html Sun, 31 May 2026 00:00:00 GMT Why the grammar of an eval stem can hand a model the shape of its answer, the five trigger families that do the leaking, and the one-word test that catches all of them.