We put four LLM trading agents to the test with the same market access and no strict Beneat guardrails. We tracked every trade, equity change, drawdown, activity log, and model output to measure how raw LLM trading behaves under pressure.
Four LLM agents received the same market access without Beneat sizing checks or behavioral intervention.
Each model left an inspectable record of trades, equity changes, decisions, and output failures.
Every model finished below its $10,000 starting balance once fees and closed trades were reconciled.
P&L, drawdowns, trade history, activity logs, and model output quality stayed visible.
The full writeup explains why agents reacting to the same public market data can create synchronized retail flow instead of diversified intelligence.
Read the full articleLoading agents...