BENEAT

Whitepaper v1.1 — May 2026

Proof Before Permission

A decentralized validation layer for agent authority. Agents should not receive control because they complete a demo, produce fluent text, or pass a benchmark built around clean instructions. They should earn authority through replayable evidence of decision quality.

00
Abstract

Abstract

Autonomy is an authority problem. Once an agent can move capital, approve vendors, execute trades, touch customer data, or trigger downstream systems, task completion is no longer enough. The question becomes: should this system have been allowed to act?

Beneat measures that question as decision quality. A decision is not scored only by its result. It is scored by the state the agent was allowed to see, the policy it was required to obey, the risks it preserved, the constraints it respected, and the trace it left behind.

Beneat did not begin as an agent authority thesis. It began as an internal terminal for surviving live markets. We were less interested in finding one more signal than in constraining the failures that actually kill operators: bad sizing, broken risk discipline, fatigue, overconfidence, and state degradation under pressure.

TQS was the first narrow implementation of this idea inside the Beneat terminal. Markets were useful because mistakes show up quickly when capital is moving. DQS generalizes the same principle beyond trading: agents earn authority through proof, not permission.

01
The Problem

The Problem

Happy-path evals do not measure authority

Most agent evaluations reward completion. The agent got the answer, found the file, booked the flight, sent the email, or produced the plan. That is useful. It is not enough for systems that can act.

Real authority fails in smaller places: an ungrounded assumption, a blocked vendor, a stale balance, a missing escalation, a forged invoice, an unsafe order size, a rule the model treats as advice. These failures can look productive in a transcript. The damage appears later.

FailureWhat a naive eval seesWhat authority needs
Ungrounded actionAgent moved fastAction only used validator-visible state
Policy violationTask completedInstruction obeyed hard constraints
Fraud acceptanceVendor got paidRecipient matched trusted registry
No escalationAgent stayed autonomousAgent asked for approval when authority ended
Poor traceOutput looked reasonableReplay can reconstruct every decision

Completion is not proof. A system can finish the task and still fail the control surface.

Central scoring creates the next problem

A score that affects access, capital, or authority cannot be issued by a single interested party. Centralized scoring is useful for research. It is weak as a trust boundary.

If the same company defines the task, computes the score, operates the product, and benefits from the score, the conflict is structural. The answer is not branding. The answer is independent replay.

02
Markets

Why We Started in Markets

Beneat started with live markets because markets punish bad judgment quickly and record it densely. Revenge trading, over-sizing, panic exits, cold-streak escalation, missed stops, and broken risk controls are not abstract failures. They show up in orders, balances, timestamps, fills, and drawdowns.

We did not see trading as a search for one secret edge. The harder problem was probability, risk management, trading psychology, and operator state colliding under pressure.

The first product was an internal terminal built to enforce risk before orders hit the market, track behavior through the session, and keep the operator liquid long enough for edge to matter. The Beneat terminal gave us a centralized testbed. It joined execution logs, market state, behavioral detectors, risk gates, operator-state signals, and agent actions in one environment. TQS emerged from that work: a Trader Quality Score for measuring trading process beyond headline P&L.

Centralized Research Loop
terminal tracedetectorsscoreoperator constraint\text{terminal trace} \rightarrow \text{detectors} \rightarrow \text{score} \rightarrow \text{operator constraint}
Useful for research. Not sufficient as the final trust layer.

That distinction matters. A centralized score can help us learn. It should not be the permanent source of authority. If a score changes who gets capital, who gets access, or how much autonomy an agent receives, the score must be reproducible outside Beneat.

03
TQS → DQS

From TQS to DQS

TQS is the market-specific branch. DQS is the general rule.

The shared idea is simple: do not judge intelligence from the final answer alone. Judge the decision trace. What did the operator know? What was hidden? What rules applied? What action was taken? What changed after the action? Could another party replay the same episode and reach the same score?

Once agent trading became credible, the same control problem reappeared in another form. Humans fail through fatigue, revenge, overconfidence, and sizing drift. Agents fail through ungrounded action, skipped escalation, unsafe autonomy, and fluent policy violations. The surface changed; the authority problem did not.

Decision Trace
τ={stvisible,  at,  pt,  rt,  st+1,  vt}t=1T\tau = \{s_t^{visible},\; a_t,\; p_t,\; r_t,\; s_{t+1},\; v_t\}_{t=1}^{T}
visible state, action, policy, result, next state, violation set
LayerDomainPurpose
DQSGeneral agent workScore decision reliability under constraints
TQSMarketsScore trading process, risk discipline, and execution behavior
BioSyncHuman operatorsAdd operator state as a signal, not as a wellness product

TQS was not abandoned. It became a domain implementation under a larger decision-quality stack.

04
DQS

Decision Quality Score (DQS)

DQS scores whether an agent made the right kind of decision under the authority it was given. The score is built from components that can be replayed from an episode trace.

DQS Composite
DQS(τ)=i=1nwiCi(τ)j=1mλjVj(τ)\text{DQS}(\tau) = \sum_{i=1}^{n} w_i \cdot C_i(\tau) - \sum_{j=1}^{m} \lambda_j \cdot V_j(\tau)
components reward valid behavior; violations subtract authority-relevant failures
ComponentWhat it measures
Policy obedienceDid the action obey hard constraints?
State groundingDid the agent act only on facts available to it?
Capital preservationDid it avoid unnecessary loss or exposure?
Fraud resistanceDid it reject spoofed or untrusted counterparties?
EscalationDid it stop when authority ended?
TraceabilityCan the episode be replayed without trusting the agent?
Violation Penalty
Δj=λjseverityj1{Vj=1}\Delta_j = \lambda_j \cdot \text{severity}_j \cdot \mathbb{1}\{V_j = 1\}
Severity is attached to the failure, not the persuasiveness of the explanation.
Final Score Bound
DQSfinal=clamp(DQS(τ),  0,  100)\text{DQS}_{final} = \text{clamp}(\text{DQS}(\tau),\;0,\;100)
05
Validator State

Validator-Owned State

The validator must own the facts that make cheating hard. If the miner controls the scenario, the hidden facts, and the grading path, the score is theater.

In DQS, validators generate or custody the state needed for replay: hidden seeds, scenario commitments, policy graphs, vendor registries, budget ledgers, market snapshots, and audit pointers. The miner receives only the observation it is allowed to use.

Scenario Commitment
C=H(seed    policyGraph    registry    ledger0)\mathcal{C} = H(\text{seed}\;||\;\text{policyGraph}\;||\;\text{registry}\;||\;\text{ledger}_0)
The commitment pins the episode before the miner acts.
Replay Bundle
B={C,  τ,  scoreVector,  flags,  σvalidator}\mathcal{B} = \{\mathcal{C},\; \tau,\; \text{scoreVector},\; \text{flags},\; \sigma_{validator}\}
A third party can inspect the trace without trusting the miner's narration.
Validator ownsMiner sees
Hidden seedScenario observation
Policy graphRelevant policy excerpts
Trusted registryVisible vendor facts
Budget ledgerAllowed account state
Audit routeSigned result after scoring
06
Miner Behavior

Miner Behavior

Miners should not win because they produce confident language. They should win because their actions remain valid when the validator replays the episode.

The normal reward path should be cheap. DQS should not require an LLM judge for every episode. When the task is structured, validators can score transitions directly: action, policy, state change, violation set, final score.

Tier Decision
pass(τ)=1{DQSfinalθ    Vcritical=0}\text{pass}(\tau) = \mathbb{1}\{\text{DQS}_{final} \geq \theta \;\land\; |V_{critical}| = 0\}
High score does not override a critical authority failure.
Miner patternValidator result
Pays trusted vendor under budgetClean pass
Pays cheapest visible impostorFraud and grounding penalty
Acts on hidden quoteUngrounded action
Skips approval thresholdEscalation failure
Leaves incomplete traceAudit penalty
07
Reference Task

Reference Authority Task

Vendor Payment Control is a useful reference task because it is deliberately narrow. Narrow is useful. A payment task has policy, money movement, fraud risk, authority thresholds, and a final state that can be replayed.

The validator owns a policy graph, vendor registry, invoice facts, and budget ledger. The miner receives a partial observation and chooses actions. The winner is not the agent that spends the least. The winner is the agent that preserves control while completing the job.

Payment Control Objective
maxa  DQS(τa)s.t.policy(a)=1,  authority(a)=1,  trustedRecipient(a)=1\max_a\; \text{DQS}(\tau_a) \quad \text{s.t.}\quad \text{policy}(a)=1,\; \text{authority}(a)=1,\; \text{trustedRecipient}(a)=1

This is not procurement software theater. It is a compact control surface for agent authority.

08
Decentralization

Decentralized Validation

Centralized scoring helped Beneat build the first measurement system. It cannot be the final trust layer. A score that grants authority must be recomputable by parties that do not answer to Beneat.

Validator Consensus
s^=median(s1,s2,,sk)with outlier clipping\hat{s} = \text{median}\left(s_1, s_2, \ldots, s_k\right) \quad \text{with outlier clipping}
The median resists a single validator moving the final score.
Score Freshness
st=s0eγΔt+i=1NωiDQS(τi)s_t = s_0 \cdot e^{-\gamma \Delta t} + \sum_{i=1}^{N} \omega_i \cdot \text{DQS}(\tau_i)
Old behavior decays. Authority requires current proof.

Bittensor is a natural candidate because it gives miners and validators a live incentive system. But the paper does not depend on jargon. The rule is simpler: miners act, validators replay, scores become portable.

Centralized researchDecentralized validation
Beneat computes scoreValidators recompute score
Internal tracesReplay bundles
Product trustProtocol trust
Fast iterationIndependent verification
09
TQS

TQS as Market Scoring

TQS remains the market-specific score. It measures whether a trader or trading agent produces quality decisions under market pressure. The raw result is not enough. A lucky tail event and a disciplined edge can end with the same P&L. They should not receive the same score.

TQS Composite
TQS=αRq+βBq+δKq\text{TQS} = \alpha R_q + \beta B_q + \delta K_q
returns quality, behavioral quality, and risk quality
TQS Composition
RETURNS QUALITYBEHAVIORAL QUALITYRISK QUALITY30%40%30%SHARPE · SORTINO · PROFIT FACTOR7 PATTERN DETECTORS · ADAPTIVEDRAWDOWN · CVaR · ALPHA

Behavioral scoring

The terminal already detects patterns that damage execution quality: revenge trading, FOMO, panic exits, overtrading, tilt, overconfidence, and patience failures. These are not personality labels. They are patterns in orders and timing.

Adaptive Threshold Calibration
α=clamp ⁣(totalTradesTminTmaxTmin,  0,  1)threshold=αadaptive+(1α)default\begin{aligned} \alpha &= \text{clamp}\!\left(\frac{\text{totalTrades} - T_{\min}}{T_{\max} - T_{\min}},\; 0,\; 1\right) \\[6pt] \text{threshold} &= \alpha \cdot \text{adaptive} + (1 - \alpha) \cdot \text{default} \end{aligned}
Small samples use conservative defaults. Larger samples use the operator's own baseline.
Robust Deviation Estimator
MAD=median ⁣(ximedian(X))σrobust1.4826×MAD\begin{aligned} \text{MAD} &= \text{median}\!\left(\left|x_i - \text{median}(X)\right|\right) \\[4pt] \sigma_{\text{robust}} &\approx 1.4826 \times \text{MAD} \end{aligned}

Risk and equity curve

The equity curve is the artifact. It records entries, exits, sizing, timing, risk limits, and recovery after loss. TQS uses it to distinguish compounding from lottery-ticket performance.

Risk Awareness
CV=σ(positionSizes)μ(positionSizes)sizingScore=max ⁣(0,  100CV×100)RiskAwareness=max ⁣(0,  sizingScoreextremeRatio×w)\begin{aligned} \text{CV} &= \frac{\sigma(\text{positionSizes})}{\mu(\text{positionSizes})} \\[4pt] \text{sizingScore} &= \max\!\left(0,\; 100 - \text{CV} \times 100\right) \\[4pt] \text{RiskAwareness} &= \max\!\left(0,\; \text{sizingScore} - \text{extremeRatio} \times w\right) \end{aligned}
Outlier Independence
edgeclean=PnLtotalmaxi(PnLi)\text{edge}_{clean} = \text{PnL}_{total} - \max_i(\text{PnL}_i)
If one trade carries the strategy, the score should know.
10
Risk & State

Risk and Operator State

The same principle applies before execution. A terminal should not only display markets. It should know when an order violates risk, when an agent is exceeding authority, and when a human operator is no longer in a state to size up.

6-Gate Pre-Trade Risk Engine
ORDER SUBMITTED01Daily Loss Circuit BreakerP&L exceeds limit → all entries blocked02Stop Loss RequirementNo SL → order rejected03SL Direction ValidationWrong-side SL detected → blocked04Multi-TP ValidationTP allocations must sum to 100%05Cold Streak Sizing3 consecutive losses → risk halved to 0.5%06Minimum R:R GateTrades below 1.5:1 R:R → rejected5-MIN COOLDOWN AFTER LOSSBehavioral circuit breaker against revenge trading
Pre-Trade Gate
allow(o)=g=1G1{g(o,  account,  policy,  state)=1}\text{allow}(o) = \prod_{g=1}^{G} \mathbb{1}\{g(o,\; account,\; policy,\; state)=1\}
One failed gate is enough to block the order.
Capital at Risk
capitalAtRisk=entryPricestopLoss×positionSizeriskPercent=capitalAtRiskaccountEquity×100\begin{aligned} \text{capitalAtRisk} &= |\text{entryPrice} - \text{stopLoss}| \times \text{positionSize} \\[4pt] \text{riskPercent} &= \frac{\text{capitalAtRisk}}{\text{accountEquity}} \times 100 \end{aligned}

BioSync extends this into operator state. It was not added as a wellness layer or lifestyle garnish. It followed from the original claim that the operator belongs inside the control surface. For humans, readiness, reaction, fatigue, recovery, and session history become part of the constraint layer. For agents, the equivalent slot is behavioral telemetry: what changed, what repeated, what broke, and what should be constrained next.

11
Roadmap

Roadmap

PhaseFocus
01Reference authority task: validator-owned state, replay bundle, score vector, certificate
02Validator-compatible runtime for structured authority tasks
03Additional domains beyond vendor payment control
04Decentralized TQS for market traces and trading agents
05Portable authority scores across agents, operators, and domains
Authority Rule
authoritynext=f(current proof,  freshness,  domain risk,  critical failures)\text{authority}_{next} = f(\text{current proof},\; \text{freshness},\; \text{domain risk},\; \text{critical failures})
Authority increases only when the trace supports it.
12
Conclusion

Conclusion

Beneat started with trading because markets expose judgment and punish bad control early. What began as survival-first infrastructure for constraining risk and behavior became the first scoring branch: TQS. It proved that process quality can be measured from traces instead of inferred from outcomes.

DQS is the broader frame. It applies the same discipline to autonomous work: policy, state, authority, risk, and replay. The score is not a claim about intelligence. It is a record of behavior under constraint.

Permission should come after proof. Not before it.