4 rounds of synthetic persona simulation against joinotto.com. Strong baseline, three persistent objections, and a clear ceiling: some conversion problems require product evidence, not more copy.
6.20
Baseline intent
20/20
Explore rate
4
Sim rounds
80
Persona evals
$0
Research cost
Key Findings
Strong baseline for the category: 6.20/10 intent, 20/20 would explore. The homepage communicates clearly and lands with the right audience
Free tier "$25K limit" confused 17/20 personas — unclear if it meant revenue, transactions, or something else
No free trial blocked 18/20 who earn more than $25K/year. Adding a 14-day trial was the highest single-round lift
AI accuracy anxiety persisted through all 4 rounds despite adding explanations. Personas want proof, not mechanism descriptions
Sweet spot audience is 2-6 years in: outgrown spreadsheets, not yet committed to a full accountant. Year-one and 10+ year veterans both scored lower
Pricing is not the conversion problem: guesses matched actual price, and the $500/month bookkeeper comparison landed cleanly
Complete evaluation infrastructure on a single Mac Studio. Six eval dimensions, a 5.8x throughput advantage discovered in a different backend, and a routing system built on real data.
24
Models
6
Eval dims
5.8x
MLX vs Ollama
88k
Peak tok/s
$0
Cloud cost
Key Findings
Size is not quality for conversation, qwen2.5:7b (5GB) scored 100% multi-turn; llama3.3:70b (42GB) scored 47.8%
MLX delivers 5.8x aggregate throughput vs Ollama at 32 concurrent users, invisible at low concurrency, massive at scale
--decode-concurrency 8 made things worse. MLX's dynamic batcher outperforms any fixed value