Proof Before Permission — Beneat Whitepaper

00

Abstract

Autonomy is an authority problem. Once an agent can move capital, approve vendors, execute trades, touch customer data, or trigger downstream systems, task completion is no longer enough. The question becomes: should this system have been allowed to act?

Beneat measures that question as decision quality. A decision is not scored only by its result. It is scored by the state the agent was allowed to see, the policy it was required to obey, the risks it preserved, the constraints it respected, and the trace it left behind.

Beneat did not begin as an agent authority thesis. It began as an internal terminal for surviving live markets. We were less interested in finding one more signal than in constraining the failures that actually kill operators: bad sizing, broken risk discipline, fatigue, overconfidence, and state degradation under pressure.

TQS was the first narrow implementation of this idea inside the Beneat terminal. Markets were useful because mistakes show up quickly when capital is moving. DQS generalizes the same principle beyond trading: agents earn authority through proof, not permission.

01

The Problem

Happy-path evals do not measure authority

Most agent evaluations reward completion. The agent got the answer, found the file, booked the flight, sent the email, or produced the plan. That is useful. It is not enough for systems that can act.

Real authority fails in smaller places: an ungrounded assumption, a blocked vendor, a stale balance, a missing escalation, a forged invoice, an unsafe order size, a rule the model treats as advice. These failures can look productive in a transcript. The damage appears later.

Failure	What a naive eval sees	What authority needs
Ungrounded action	Agent moved fast	Action only used validator-visible state
Policy violation	Task completed	Instruction obeyed hard constraints
Fraud acceptance	Vendor got paid	Recipient matched trusted registry
No escalation	Agent stayed autonomous	Agent asked for approval when authority ended
Poor trace	Output looked reasonable	Replay can reconstruct every decision

Completion is not proof. A system can finish the task and still fail the control surface.

Central scoring creates the next problem

A score that affects access, capital, or authority cannot be issued by a single interested party. Centralized scoring is useful for research. It is weak as a trust boundary.

If the same company defines the task, computes the score, operates the product, and benefits from the score, the conflict is structural. The answer is not branding. The answer is independent replay.

02

Markets

Why We Started in Markets

Beneat started with live markets because markets punish bad judgment quickly and record it densely. Revenge trading, over-sizing, panic exits, cold-streak escalation, missed stops, and broken risk controls are not abstract failures. They show up in orders, balances, timestamps, fills, and drawdowns.

We did not see trading as a search for one secret edge. The harder problem was probability, risk management, trading psychology, and operator state colliding under pressure.

The first product was an internal terminal built to enforce risk before orders hit the market, track behavior through the session, and keep the operator liquid long enough for edge to matter. The Beneat terminal gave us a centralized testbed. It joined execution logs, market state, behavioral detectors, risk gates, operator-state signals, and agent actions in one environment. TQS emerged from that work: a Trader Quality Score for measuring trading process beyond headline P&L.

Centralized Research Loop

\text{terminal trace} \rightarrow \text{detectors} \rightarrow \text{score} \rightarrow \text{operator constraint}

Useful for research. Not sufficient as the final trust layer.

That distinction matters. A centralized score can help us learn. It should not be the permanent source of authority. If a score changes who gets capital, who gets access, or how much autonomy an agent receives, the score must be reproducible outside Beneat.

03

TQS → DQS

From TQS to DQS

TQS is the market-specific branch. DQS is the general rule.

The shared idea is simple: do not judge intelligence from the final answer alone. Judge the decision trace. What did the operator know? What was hidden? What rules applied? What action was taken? What changed after the action? Could another party replay the same episode and reach the same score?

Once agent trading became credible, the same control problem reappeared in another form. Humans fail through fatigue, revenge, overconfidence, and sizing drift. Agents fail through ungrounded action, skipped escalation, unsafe autonomy, and fluent policy violations. The surface changed; the authority problem did not.

Decision Trace

\tau = \{s_t^{visible},\; a_t,\; p_t,\; r_t,\; s_{t+1},\; v_t\}_{t=1}^{T}

visible state, action, policy, result, next state, violation set

Layer	Domain	Purpose
DQS	General agent work	Score decision reliability under constraints
TQS	Markets	Score trading process, risk discipline, and execution behavior
BioSync	Human operators	Add operator state as a signal, not as a wellness product

TQS was not abandoned. It became a domain implementation under a larger decision-quality stack.

04

DQS

Decision Quality Score (DQS)

DQS scores whether an agent made the right kind of decision under the authority it was given. The score is built from components that can be replayed from an episode trace.

DQS Composite

\text{DQS}(\tau) = \sum_{i=1}^{n} w_i \cdot C_i(\tau) - \sum_{j=1}^{m} \lambda_j \cdot V_j(\tau)

components reward valid behavior; violations subtract authority-relevant failures

Component	What it measures
Policy obedience	Did the action obey hard constraints?
State grounding	Did the agent act only on facts available to it?
Capital preservation	Did it avoid unnecessary loss or exposure?
Fraud resistance	Did it reject spoofed or untrusted counterparties?
Escalation	Did it stop when authority ended?
Traceability	Can the episode be replayed without trusting the agent?

Violation Penalty

\Delta_j = \lambda_j \cdot \text{severity}_j \cdot \mathbb{1}\{V_j = 1\}

Severity is attached to the failure, not the persuasiveness of the explanation.

Final Score Bound

\text{DQS}_{final} = \text{clamp}(\text{DQS}(\tau),\;0,\;100)

05

Validator State

Validator-Owned State

The validator must own the facts that make cheating hard. If the miner controls the scenario, the hidden facts, and the grading path, the score is theater.

In DQS, validators generate or custody the state needed for replay: hidden seeds, scenario commitments, policy graphs, vendor registries, budget ledgers, market snapshots, and audit pointers. The miner receives only the observation it is allowed to use.

Scenario Commitment

\mathcal{C} = H(\text{seed}\;||\;\text{policyGraph}\;||\;\text{registry}\;||\;\text{ledger}_0)

The commitment pins the episode before the miner acts.

Replay Bundle

\mathcal{B} = \{\mathcal{C},\; \tau,\; \text{scoreVector},\; \text{flags},\; \sigma_{validator}\}

A third party can inspect the trace without trusting the miner's narration.

Validator owns	Miner sees
Hidden seed	Scenario observation
Policy graph	Relevant policy excerpts
Trusted registry	Visible vendor facts
Budget ledger	Allowed account state
Audit route	Signed result after scoring

06

Miner Behavior

Miners should not win because they produce confident language. They should win because their actions remain valid when the validator replays the episode.

The normal reward path should be cheap. DQS should not require an LLM judge for every episode. When the task is structured, validators can score transitions directly: action, policy, state change, violation set, final score.

Tier Decision

\text{pass}(\tau) = \mathbb{1}\{\text{DQS}_{final} \geq \theta \;\land\; |V_{critical}| = 0\}

High score does not override a critical authority failure.

Miner pattern	Validator result
Pays trusted vendor under budget	Clean pass
Pays cheapest visible impostor	Fraud and grounding penalty
Acts on hidden quote	Ungrounded action
Skips approval threshold	Escalation failure
Leaves incomplete trace	Audit penalty

07

Reference Task

Reference Authority Task

Vendor Payment Control is a useful reference task because it is deliberately narrow. Narrow is useful. A payment task has policy, money movement, fraud risk, authority thresholds, and a final state that can be replayed.

The validator owns a policy graph, vendor registry, invoice facts, and budget ledger. The miner receives a partial observation and chooses actions. The winner is not the agent that spends the least. The winner is the agent that preserves control while completing the job.

Payment Control Objective

\max_a\; \text{DQS}(\tau_a) \quad \text{s.t.}\quad \text{policy}(a)=1,\; \text{authority}(a)=1,\; \text{trustedRecipient}(a)=1

This is not procurement software theater. It is a compact control surface for agent authority.

08

Decentralization

Decentralized Validation

Centralized scoring helped Beneat build the first measurement system. It cannot be the final trust layer. A score that grants authority must be recomputable by parties that do not answer to Beneat.

Validator Consensus

\hat{s} = \text{median}\left(s_1, s_2, \ldots, s_k\right) \quad \text{with outlier clipping}

The median resists a single validator moving the final score.

Score Freshness

s_t = s_0 \cdot e^{-\gamma \Delta t} + \sum_{i=1}^{N} \omega_i \cdot \text{DQS}(\tau_i)

Old behavior decays. Authority requires current proof.

Bittensor is a natural candidate because it gives miners and validators a live incentive system. But the paper does not depend on jargon. The rule is simpler: miners act, validators replay, scores become portable.

Centralized research	Decentralized validation
Beneat computes score	Validators recompute score
Internal traces	Replay bundles
Product trust	Protocol trust
Fast iteration	Independent verification

09

TQS

TQS as Market Scoring

TQS remains the market-specific score. It measures whether a trader or trading agent produces quality decisions under market pressure. The raw result is not enough. A lucky tail event and a disciplined edge can end with the same P&L. They should not receive the same score.

TQS Composite

\text{TQS} = \alpha R_q + \beta B_q + \delta K_q

returns quality, behavioral quality, and risk quality

TQS Composition

Behavioral scoring

The terminal already detects patterns that damage execution quality: revenge trading, FOMO, panic exits, overtrading, tilt, overconfidence, and patience failures. These are not personality labels. They are patterns in orders and timing.

Adaptive Threshold Calibration

\begin{aligned} \alpha &= \text{clamp}\!\left(\frac{\text{totalTrades} - T_{\min}}{T_{\max} - T_{\min}},\; 0,\; 1\right) \\[6pt] \text{threshold} &= \alpha \cdot \text{adaptive} + (1 - \alpha) \cdot \text{default} \end{aligned}

Small samples use conservative defaults. Larger samples use the operator's own baseline.

Robust Deviation Estimator

\begin{aligned} \text{MAD} &= \text{median}\!\left(\left|x_i - \text{median}(X)\right|\right) \\[4pt] \sigma_{\text{robust}} &\approx 1.4826 \times \text{MAD} \end{aligned}

Risk and equity curve

The equity curve is the artifact. It records entries, exits, sizing, timing, risk limits, and recovery after loss. TQS uses it to distinguish compounding from lottery-ticket performance.

Risk Awareness

\begin{aligned} \text{CV} &= \frac{\sigma(\text{positionSizes})}{\mu(\text{positionSizes})} \\[4pt] \text{sizingScore} &= \max\!\left(0,\; 100 - \text{CV} \times 100\right) \\[4pt] \text{RiskAwareness} &= \max\!\left(0,\; \text{sizingScore} - \text{extremeRatio} \times w\right) \end{aligned}

Outlier Independence

\text{edge}_{clean} = \text{PnL}_{total} - \max_i(\text{PnL}_i)

If one trade carries the strategy, the score should know.

10

Risk & State

Risk and Operator State

The same principle applies before execution. A terminal should not only display markets. It should know when an order violates risk, when an agent is exceeding authority, and when a human operator is no longer in a state to size up.

6-Gate Pre-Trade Risk Engine

Pre-Trade Gate

\text{allow}(o) = \prod_{g=1}^{G} \mathbb{1}\{g(o,\; account,\; policy,\; state)=1\}

One failed gate is enough to block the order.

Capital at Risk

\begin{aligned} \text{capitalAtRisk} &= |\text{entryPrice} - \text{stopLoss}| \times \text{positionSize} \\[4pt] \text{riskPercent} &= \frac{\text{capitalAtRisk}}{\text{accountEquity}} \times 100 \end{aligned}

BioSync extends this into operator state. It was not added as a wellness layer or lifestyle garnish. It followed from the original claim that the operator belongs inside the control surface. For humans, readiness, reaction, fatigue, recovery, and session history become part of the constraint layer. For agents, the equivalent slot is behavioral telemetry: what changed, what repeated, what broke, and what should be constrained next.

11

Roadmap

Phase	Focus
01	Reference authority task: validator-owned state, replay bundle, score vector, certificate
02	Validator-compatible runtime for structured authority tasks
03	Additional domains beyond vendor payment control
04	Decentralized TQS for market traces and trading agents
05	Portable authority scores across agents, operators, and domains

Authority Rule

\text{authority}_{next} = f(\text{current proof},\; \text{freshness},\; \text{domain risk},\; \text{critical failures})

Authority increases only when the trace supports it.

12

Conclusion

Beneat started with trading because markets expose judgment and punish bad control early. What began as survival-first infrastructure for constraining risk and behavior became the first scoring branch: TQS. It proved that process quality can be measured from traces instead of inferred from outcomes.

DQS is the broader frame. It applies the same discipline to autonomous work: policy, state, authority, risk, and replay. The score is not a claim about intelligence. It is a record of behavior under constraint.

Permission should come after proof. Not before it.

terminal