Work: Kar

Landing Page Simulation · March 7, 2026

Talk Stories: Landing Page to 6.65/10

5 rounds of synthetic persona simulation lifted conversion intent by 55% in a single day. No real users. No waiting. All models ran locally.

+55%

Intent lift

5

Sim rounds

100

Persona evals

4

Variants tested

2x

Share rate

Key Findings

Subtraction beat addition in round one, removing 3 things (ghostwriter label, "beta," scary Slack line) lifted intent by 1.75 points
"Voice Engine" won the framing test at 7.35/10. "Story Engineer" failed for the same reason abstract labels always fail
Security section dropped privacy objections from 35% to 25% in one iteration, mechanisms, not reassurances
Testimonials doubled the share rate, engineered to the exact objection, not generic praise
The page hit a copy ceiling at 6.65/10, the remaining objection requires product experience, not more words

Deliverables

Final

Talk Stories Landing Page v5

Production-ready, all 5 sim rounds applied. Zero em dashes, proofread, ship-ready.

↗

Page

Talk Stories Landing Page v4

Security section + grounded voice examples. Useful for A/B reference.

↗

Ref

Talk Stories Landing Page v3

Post-framing experiment, early access, no ghostwriter label.

↗

Read the Case Study →

Homepage Simulation · March 7, 2026

Otto: Baseline Homepage Study

4 rounds of synthetic persona simulation against joinotto.com. Strong baseline, three persistent objections, and a clear ceiling: some conversion problems require product evidence, not more copy.

6.20

Baseline intent

20/20

Explore rate

4

Sim rounds

80

Persona evals

$0

Research cost

Key Findings

Strong baseline for the category: 6.20/10 intent, 20/20 would explore. The homepage communicates clearly and lands with the right audience
Free tier "$25K limit" confused 17/20 personas — unclear if it meant revenue, transactions, or something else
No free trial blocked 18/20 who earn more than $25K/year. Adding a 14-day trial was the highest single-round lift
AI accuracy anxiety persisted through all 4 rounds despite adding explanations. Personas want proof, not mechanism descriptions
Sweet spot audience is 2-6 years in: outgrown spreadsheets, not yet committed to a full accountant. Year-one and 10+ year veterans both scored lower
Pricing is not the conversion problem: guesses matched actual price, and the $500/month bookkeeper comparison landed cleanly

Deliverables

Final

Otto Homepage v4

Final prototype: integrations strip, pricing table, AI accuracy steps inline, 14-day trial for all paid plans.

↗

Read the Case Study →

AI Infrastructure · March 6–7, 2026

Local LLM Eval Farm: 24 Models, Zero Cloud

Complete evaluation infrastructure on a single Mac Studio. Six eval dimensions, a 5.8x throughput advantage discovered in a different backend, and a routing system built on real data.

24

Models

6

Eval dims

5.8x

MLX vs Ollama

88k

Peak tok/s

$0

Cloud cost

Key Findings

Size is not quality for conversation, qwen2.5:7b (5GB) scored 100% multi-turn; llama3.3:70b (42GB) scored 47.8%
MLX delivers 5.8x aggregate throughput vs Ollama at 32 concurrent users, invisible at low concurrency, massive at scale
--decode-concurrency 8 made things worse. MLX's dynamic batcher outperforms any fixed value
qwen2.5:7b wins on value: 80.6% quality, 100% multi-turn, 93% domain, 10k tok/s, 5GB
When every model scores 0%, the task is broken, found and fixed a wrong answer key mid-run

Read the Case Study →

The Work

Key Findings

Deliverables

Key Findings

Deliverables

Key Findings