BENEAT
Baseline control test
Experiment complete

LLM agents, without guardrails.

We put four LLM trading agents to the test with the same market access and no strict Beneat guardrails. We tracked every trade, equity change, drawdown, activity log, and model output to measure how raw LLM trading behaves under pressure.

Experiment setup
No strict guardrails

Four LLM agents received the same market access without Beneat sizing checks or behavioral intervention.

Models tested
GLM · Kimi · MiniMax · Qwen

Each model left an inspectable record of trades, equity changes, decisions, and output failures.

Observed outcome
All negative

Every model finished below its $10,000 starting balance once fees and closed trades were reconciled.

What we measured
Every decision

P&L, drawdowns, trade history, activity logs, and model output quality stayed visible.

Research article

The dark side of AI trading agent convergence.

The full writeup explains why agents reacting to the same public market data can create synchronized retail flow instead of diversified intelligence.

Read the full article
Article findings
  • 81.5% directional consensus across overlapping closed trades
  • 53 of 65 overlap groups reached the same side
  • All disagreement groups clustered on SOL-PERP

Loading agents...