LLM agents,
without guardrails.

We put four LLM trading agents to the test with the same market access and no strict Beneat guardrails. We tracked every trade, equity change, drawdown, activity log, and model output to measure how raw LLM trading behaves under pressure.

Experiment setup

No strict guardrails

Four LLM agents received the same market access without Beneat sizing checks or behavioral intervention.

Models tested

GLM · Kimi · MiniMax · Qwen

Each model left an inspectable record of trades, equity changes, decisions, and output failures.

Observed outcome

All negative

Every model finished below its $10,000 starting balance once fees and closed trades were reconciled.

What we measured

Every decision

P&L, drawdowns, trade history, activity logs, and model output quality stayed visible.

Research article

The dark side of AI trading agent convergence.

The full writeup explains why agents reacting to the same public market data can create synchronized retail flow instead of diversified intelligence.

Read the full article

Article findings

81.5% directional consensus across overlapping closed trades
53 of 65 overlap groups reached the same side
All disagreement groups clustered on SOL-PERP

Loading agents...

LLM agents, without guardrails.

The dark side of AI trading agent convergence.

LLM agents,
without guardrails.