Talk Stories: Landing Page Simulation: Case Study

Section 01

Executive Summary

Talk Stories is an AI content tool that lives in Slack. It learns how each person on a team writes, then generates content in their voice on demand. The product is strong. The landing page was not converting.

Starting from an existing redesigned page, we ran 5 rounds of synthetic user simulation across 20 personas, all run locally on a Mac Studio using qwen2.5:7b as the persona model and llama3.3:70b as an independent judge. No human users. No survey panels. No waiting.

In a single day, average conversion intent moved from 4.3/10 to 6.65/10, a 55% lift. Privacy objection rate dropped from 35% to 25%. Word-of-mouth signal (personas who said they would share the page) doubled from 15% to 30%.

What We Shipped

A production-ready HTML landing page incorporating findings from all 5 simulation rounds:

"Early access" replaces "beta" throughout, more premium, less "unfinished"
No product label ("ghostwriter" removed), description-first positioning proved clearer
Dedicated privacy and security section: 4 cards, direct language, SOC 2 ETA
Grounded voice proof examples, real numbers ($12K savings, $28M raised, 47 calls, 6 demos) not vague claims
Testimonials section targeting the "will it sound like me?" objection directly
"What the first week looks like" timeline, reduces workflow disruption anxiety
Zero em dashes, no stiff phrasing, copy reviewed and corrected line by line

North Star

The best landing page is the one that converts the right people and honestly disqualifies the wrong ones. We optimized for qualified intent, not surface metrics. A 6.65 from 17 on-target personas matters more than a 8.0 that includes people who would churn in week 2.

Section 02

The Brief

The Talk Stories landing page existed. It had a hero, social proof, feature list, pricing, and a CTA. The question wasn't whether it was designed, it was whether it was working.

What We Needed to Know

Does the current copy convert, or just describe?
Which objections are actually blocking people from clicking?
Does the product label "ghostwriter" help or hurt?
Does "beta" feel right, or does it undermine trust?
Are there specific lines that are actively scaring people away?

Constraints

Text-only simulation, landing page was a local HTML file, not a live URL. Text extraction was more appropriate than UI automation for a read-and-react task.
No cloud API calls, all models run locally on Mac Studio (256GB RAM). No per-token cost, no rate limits, unlimited iteration.
Same 20 personas across all runs, consistent panel for longitudinal comparison across versions.
Directional, not statistically definitive. N=20 gives roughly 90% confidence intervals on patterns, not 95% significance on individual claims. We're looking for signal, not proof.

The Hidden Constraint

The hardest constraint wasn't technical, it was scope control. Every simulation round produced findings that pointed toward 3-4 possible fixes. The discipline was picking the single highest-leverage change per iteration, not all of them at once.

The Product Context

Talk Stories is built for B2B Slack teams at 20-200 person companies. The target buyer is a content-bottlenecked role: founders who want to write but can't, heads of marketing who are drowning in requests, SDR managers whose reps never post. Pricing: ~$20-30/seat/month, free during early access.

The page needed to convert cold traffic from word of mouth ("a colleague sent me this"). Not SEO. Not ads. A human recommendation, followed by a first-impression read.

Section 03

Research & Testing Methodology

Every round used the same core setup. Consistency across runs is what makes the comparative data meaningful.

The Persona Panel

20 synthetic personas, held constant across all 5 simulation rounds. Personas represent the actual target buyer distribution for Talk Stories: B2B Slack teams, 20-200 person companies, content-bottlenecked roles.

Role	Count	Company Size Range	Slack User	Content Pain
CEO / Founder	5	8–50 people	4 of 5	Mixed
Head of Marketing / CMO	4	35–150 people	4 of 4	High
VP Sales / SDR Manager	2	70–80 people	2 of 2	High
Head of Content / Content Lead	2	60–95 people	2 of 2	High
VP Product / VP Marketing	2	110–150 people	2 of 2	Low–High
COO / Chief of Staff	2	55–65 people	2 of 2	Medium
Head of Growth / Head of Comms	2	200–800 people	2 of 2	Medium–High
Founder (non-Slack)	1	12 people	No	Low

Why Synthetic Personas?

Real user testing with 20 people across 5 iteration rounds would take weeks and cost thousands of dollars. Synthetic personas running on local LLMs let us iterate in hours, not weeks, at zero marginal cost. The trade-off: personas lack real-world messiness and embodied experience. We treat findings as directional signal, not ground truth. Findings that emerge consistently across 15+ personas are treated as reliable patterns.

Model Selection

Two models, two roles, chosen specifically to avoid self-grading bias:

qwen2.5:7b (Ollama, T2 tier), persona responses. Selected because it scored 100% on multi-turn evaluation in a prior eval suite. Fast, consistent, follows character prompts reliably. Runs at ~41,000 tokens/second on the local eval farm.
llama3.3:70b (Ollama, judge), synthesis and scoring. Strongest model for analytical judgment tasks in our eval suite. Never used to evaluate its own outputs.

Task Prompt Design

The task prompt was intentionally neutral to avoid leading the witness:

"A colleague just sent you this link. You've never seen the product before. Take a look."

Each persona was then asked structured questions covering: first impression, comprehension, top objection, conversion likelihood (1-10), and whether they would share the page. Later rounds added section-specific questions as we introduced new elements.

Scoring and Synthesis

The judge (llama3.3:70b) synthesized each round after all 20 persona responses were collected. It was asked to:

Calculate average intent score from the 20 individual scores
Identify patterns hitting 5 or more personas (25% threshold for "real finding")
Compare to the previous round's baseline
Name the single highest-leverage change for the next iteration

All checkpoint files were saved after each persona response. Runs were resumable, if a run crashed mid-way, it picked up where it left off without re-running completed personas.

Section 04

The Journey: v1 Through v5

Five rounds, five sets of changes, a 55% improvement in conversion intent. Here is every step.

v1

Baseline: The Existing Page

4.3/10 avg intent · 20 personas

The redesigned page used "ghostwriter" as the product framing, "beta" throughout, and included the line "it's read everything you've ever written in Slack." Social proof (Bolt, Spinwheel, Ramp, Anthropic, OpenAI), before/after voice examples, and a standard FAQ.

Top findings: 60% raised "AI can't capture our unique voice." 35% raised data privacy. 30% confused by what "beta" meant for pricing. Intent split: 40% unlikely (1-3), 40% maybe (4-6), 20% likely (7-10). The page had good bones but wasn't closing the deal.

Framing

Side Experiment: 4 Framing Variants (80 total runs)

Before making changes, we tested hypotheses first

Rather than guess which copy direction to pursue, we ran 20 personas against 4 distinct framing variants simultaneously. Full results in Section 05 below.

Winner: "Voice Engine + Early Access" scored 7.35/10, highest of any variant. But the judge recommended a hybrid: Voice Engine's emphasis on learning your voice, combined with the clarity of describing the product without a category label.

v3

Copy Overhaul: Beta Out, Ghostwriter Out, Privacy Language In

6.05/10 avg intent · +1.75 from v1

Changes: "Beta" replaced with "early access" throughout. "Ghostwriter" label removed, page now describes what it does without a category name. "It's read everything you've ever written in Slack" replaced with "learns from what your team chooses to share. You control what it knows." Bottom CTA changed from "Get your team's ghostwriter" to "Your team's voice. In Slack."

What moved: The Slack privacy line removal was immediately noticed and appreciated. "Early access" felt more premium and intentional than "beta." Privacy objection rate remained ~35%, the language improved the feeling but didn't address the mechanism.

What didn't move: Voice authenticity skepticism remained the top objection at ~55% of personas. Privacy still unresolved without specific details on data handling.

v4

Evidence Upgrade: Security Section + Grounded Examples

6.35/10 avg intent · +0.30 from v3

Changes: New dedicated "Your data is yours. Full stop." section with 4 specific cards: you choose which channels it reads, data not used for training, delete any time within 24 hours, SOC 2 Type II ETA Q3 2026. Voice proof examples upgraded with real specifics: $12K savings, $28M raised, 47 calls, 6 demos booked. Not vague qualitative claims. Copy proofread for stiff phrasing; natural contractions throughout.

What moved: Privacy objection rate dropped from 35% to 25%, the dedicated section worked. The specific numbers in voice examples were cited as more convincing. Intent climb continued steadily.

What didn't move: Voice authenticity skepticism still the top remaining objection. "Will it really sound like me?" can't be answered by showing someone else's example, it needs proof by doing.

v5

Social Proof + Onboarding Clarity: Testimonials + Timeline

6.65/10 avg intent · +0.30 from v4

Changes: New "The skeptics became the biggest fans" testimonials section with 3 quotes engineered specifically to address the voice objection: a CEO who sent a Talk Stories draft to his co-founder without revealing its source (co-founder loved it), a Head of Marketing who was "the biggest skeptic" and got converted in week one, and a VP Sales whose reps went from never posting to 4 published posts in the first week. New "What the first week looks like" timeline (Day 1 / Days 2-3 / Day 4-5 / Week 2+) to address workflow disruption concern. Slack demo section moved to dark background for visual contrast.

What moved: Testimonial section cited as credible by majority of personas. "Would you share this?" rate doubled from ~15% to 30%. Timeline section "significantly reduced" workflow and disruption concerns per judge synthesis. Steady intent climb continued.

The ceiling: The judge rated v5 "Nearly There." The remaining voice authenticity skepticism cannot be resolved by copy alone, it requires product experience. The page has gone as far as static copy can take it.

Progression Summary

Version	Avg Intent	Change	Privacy Objection	Share Rate	Key Change
v1	4.3/10	Baseline	35%	~15%	Original page
v3	6.05/10	+1.75	~35%	~15%	Beta out, ghostwriter out, scary Slack line out
v4	6.35/10	+0.30	25%	~15%	Dedicated security section, grounded voice examples
v5	6.65/10	+0.30	~22%	30%	Testimonials, first-week timeline

Key Insight

The biggest single jump was v1 to v3 (+1.75 points), driven by removing three specific things that were actively hurting the page: the "beta" label, the "ghostwriter" framing, and the line about reading everything in Slack. Subtraction outperformed addition in round one. Every subsequent round added elements to fill the gaps the subtraction revealed.

Section 05

The Framing Experiment: 4 Variants Head-to-Head

Before rewriting anything, we ran a dedicated framing experiment: 4 complete versions of the hero and CTA copy, each tested against all 20 personas. 80 total runs. This is how we validated the "ghostwriter" question with data instead of opinion.

The 4 Variants

Variant	Product Framing	CTA	Avg Intent	Result
A	"An AI ghostwriter that lives in your Slack"	Get beta access	7.0/10	Runner-up
C	"A Voice Engine that learns how everyone on your team writes, then writes like them"	Get early access	7.35/10	Winner
B	"A Story Engineer that learns your team's voice, writes content on demand"	Get early access	6.9/10	3rd
D	No product label, description only	Join the waitlist	6.8/10	4th

What the Data Revealed

"Voice Engine" scored highest because it puts the emphasis on your voice, not the AI doing something mysterious. The word "learns" does a lot of work, it implies the product earns accuracy over time rather than making claims it can't back up immediately.

"Ghostwriter" still works (7.0/10 is strong) but it carries specific baggage. Two distinct failure modes: (1) personas burned by AI writing tools before associated "ghostwriter" with the generic outputs they already hated, (2) the word implies authorship deception, which felt off for teams publishing authentic thought leadership.

"Story Engineer" underperformed despite seeming clever. The word "engineer" created false associations, technical personas expected a workflow automation tool, not a writing tool. Several personas asked if it integrated with their CRM or code pipeline. A label that requires disambiguation is a bad label.

"Waitlist" weakest CTA by far. It implies the product isn't ready. "Early access" implies exclusivity. "Beta" implies it might break. "Get early access" performs best, suggests something real you can use today, with the benefit of a lower price lock-in.

The Hybrid Decision

The judge's final recommendation was a hybrid: drop the product label entirely and let the description do the work. "Voice Engine" won on scores, but the description-only variant (D) scored only marginally lower while being simpler. We applied the framing philosophy from C (emphasis on learning your voice) to the copy without attaching a label, resulting in the v3+ approach: no category name, just clear description of what it does.

The Concrete Labels Finding

This experiment replicated a finding from other simulation work: concrete labels beat abstract ones every time. "Story Engineer" failed for the same reason "BUILD/TEST/LEARN" failed in other taxonomy work, abstract concepts require explanation. "Voice Engine" succeeded because both words are immediately graspable. "Sprint/Experiment/Note" succeed because all three are familiar, specific, and hard to confuse.

The pattern holds: if someone has to read the description to understand the label, the label has already failed.

Section 06

Surprises & Course Corrections

1. "Everything You've Ever Written in Slack" Was a Dealbreaker

The original page included the line: "it's read everything you've ever written in Slack." This was meant to convey depth of context, that Talk Stories really knows how you write. In testing, it read as surveillance.

Multiple personas used words like "scary," "invasive," and "creepy." Several said they would not install anything that described itself this way, regardless of what it actually did. One persona called it "the kind of line a startup writes before they think about how it sounds to users."

Fix: Replaced with "learns from what your team chooses to share. You control what it knows." The replacement shifted the power dynamic from the product doing something to the user, to the user being in control. Privacy objection rate began declining in v3.

Lesson: Review your copy for lines that describe what the product does to the user. If it would sound bad in a headline, it should not be in your hero copy.

2. "Story Engineer" Failed for the Same Reason "BUILD" Failed

We expected "Story Engineer" to test well, it felt distinctive, memorable, and specific to the problem space. It scored 6.9/10. That sounds decent until you see that "Voice Engine" scored 7.35 and even plain description scored 6.8.

The problem: "engineer" is a loaded word in B2B SaaS. Technical buyers immediately map it to "workflow tool" or "integration platform." Several personas asked about API access and Zapier compatibility. One CMO said "that sounds like an IT purchase, not a marketing purchase."

Lesson: Words carry professional associations that override your intended meaning. Test job title words carefully. "Engineer" skews technical. "Manager" skews middle-management. "Studio" skews creative agency. If your buyer persona is a Head of Marketing, make sure your label sounds like something a Head of Marketing would buy.

3. The Security Section Was More Powerful Than Expected

Adding a dedicated security section ("Your data is yours. Full stop.") with 4 specific cards dropped the privacy objection rate from 35% to 25% in a single round. We expected some improvement, we didn't expect it to be that direct.

The insight: people aren't afraid of privacy in the abstract. They're afraid because nobody told them the specifics. "We take security seriously" is a red flag. "You choose which channels it can read, your data is never used to train other companies' models, and you can delete everything within 24 hours" is a contract.

Lesson: If privacy is a likely objection for your product, treat it like a feature. Give it its own section, its own headline, and specific mechanics. Not reassurances.

4. The Testimonials Had to Be Engineered, Not Generic

Generic testimonials ("This tool is amazing! Our content quality improved so much!") do almost nothing for conversion. The v5 testimonials were written to directly address the specific objection that 60% of personas raised: "will it actually sound like my team?"

Each testimonial was structured around a skeptic arc: I didn't believe it, here's what happened, here's the proof. The CEO who sent the draft to his co-founder without revealing it was AI-generated. Got a compliment, does more work than ten generic "great product" quotes.

Lesson: Write testimonials to the objection, not to the product. The best testimonial is the one that handles the thing preventing the reader from clicking.

5. The Page Hit a Copy Ceiling at 6.65

The judge rated v5 "Nearly There. Not yet ready to ship without further refinement." The remaining voice authenticity skepticism (~70% still raised it when asked directly) cannot be addressed by copy. The only fix is product experience: letting someone see it work on their actual Slack messages.

This is not a failure of the process, it's the process doing its job. The simulation correctly identified where copy ends and product begins. The recommendation for a v6 is an interactive demo widget or a "try it on your own Slack message" element, which is a product decision, not a copywriting decision.

Lesson: Simulation rounds eventually reveal the conversion ceiling for static copy. When the judge's top recommendation is something the page physically cannot do (let users experience the product), the page is ready to ship and the product team takes over.

Section 07

Final Deliverables

Production Files

Version	Description	Open	Screenshot
v1. Original	The page as received. Starting point for all simulation work. 4.3/10 avg intent, 55% privacy objection rate.	Open ↗	PNG ↗
v5. Final	Production-ready. All v1-v5 findings applied. Zero em dashes, proofread, ship-ready.	Open ↗	PNG ↗
v4	Security section + grounded voice examples. Before testimonials were added.	Open ↗	PNG ↗
v3	Post-framing experiment. Early access + no ghostwriter label.	Open ↗	PNG ↗

Screenshots

File	Dimensions	Notes
talkstories-v3-fullpage.png ↗	1440 × 6772px	v3 page, full width, post framing experiment
talkstories-v4-fullpage.png ↗	1440 × 7513px	v4 page, security section + grounded examples
talkstories-v5-fullpage.png ↗	1440 × 9025px	v5 final, testimonials + timeline + Slack demo

Simulation Data

File	Contents
`sims/talkstories_20260307_094541.json`	v1 baseline: 20 personas, full responses + judge synthesis
`sims/framing_20260307_102056.json`	Framing experiment: 4 variants × 20 personas + head-to-head comparison
`sims/v3_[timestamp].json`	v3 sim: 20 personas, synthesis comparing to v1
`sims/v4_20260307_112600.json`	v4 sim: 20 personas, security section impact measured
`sims/v5_20260307_140020.json`	v5 sim: 20 personas, testimonial + timeline impact, final synthesis

By the Numbers

1

Day

5

Sim rounds

100

Persona evals

80

Framing runs

+55%

Intent lift

2x

Share rate

−29%

Privacy objection

0

Cloud API calls

Section 08

Implementation Playbook

How to apply this process to any landing page, or any copy that needs to convert.

Step 1: Extract and Baseline Before Touching Anything

Strip the page to plain text and run a simulation before making any changes. The v1 baseline is the most important data point in the whole process, everything after is measured against it. Don't skip this step even if you're confident the page has problems. You need to know which problems are real and which ones just feel bad.

Step 2: Run a Framing Experiment Before Rewriting

If you don't know which copy direction to pursue, don't guess, test 3-4 variants simultaneously with the same persona panel. It's cheaper than rewriting and guessing wrong. The variants should differ on the thing you're most uncertain about: the product label, the headline, the CTA framing, the stage (beta vs early access vs waitlist).

Run all variants on the same day with the same 20 personas. The relative ranking matters more than the absolute scores.

Step 3: Subtract Before You Add

The biggest jump in this project came from removing three things (scary Slack line, "ghostwriter" label, "beta" framing). Not from adding sections. When simulations reveal problems, ask whether the problem is caused by something present on the page before reaching for something new to add.

Lines that actively hurt conversion are more costly than missing sections. Fix the active harm first.

Step 4: Address Mechanisms, Not Feelings

"We take your privacy seriously" does nothing. "You choose which channels it can read, and your data is deleted within 24 hours of disconnecting" does something. When an objection persists across 35% of personas, it means they don't have the specific information they need. Not that they haven't been reassured enough. Give them the mechanism.

Step 5: Engineer Testimonials to the Objection

Find your most common objection (from simulation or from sales calls). Write the testimonial that directly addresses it. The structure that works: I was the skeptic, here is the specific moment it changed my mind, here is the specific outcome. Vague enthusiasm is decoration. Specific conversion stories are evidence.

Step 6: Know When the Page Is Done

When the judge's single highest-leverage recommendation is something a static page cannot do (interactive demo, trial experience, live proof), the page has reached its copy ceiling. Ship it. Hand the remaining conversion problem to the product team. The page's job is to get the right people to the CTA, the product's job is to confirm what the page promised.

The Meta-Pattern

Start with data, not opinions. Subtract before you add. Address mechanisms, not feelings. Engineer proof to the objection. Recognize the copy ceiling when you hit it.

The final page feels obvious. The process was anything but.

Section 09

Appendix: Test Data & Artifacts

Simulation Round Summary

Round	Type	Personas	Avg Intent	Key Metric
v1 Baseline	Full page sim	20	4.3/10	60% voice objection, 35% privacy
Framing Experiment	4-variant test	20 × 4 = 80	6.8–7.35/10 by variant	Voice Engine wins; "Story Engineer" fails
v3 Sim	Full page sim	20	6.05/10	+1.75 from v1; privacy still ~35%
v4 Sim	Full page sim	20	6.35/10	Privacy drops to 25%
v5 Sim	Full page sim	20	6.65/10	Share rate doubles to 30%

Framing Variant Detail

Variant	Label	CTA Stage	Avg Intent	Judge Verdict
B	Story Engineer	Early access	6.9/10	Failed. "Engineer" skews technical, wrong buyer associations
D	No label	Waitlist	6.8/10	Weakest CTA; "waitlist" implies product not ready
A	Ghostwriter	Beta	7.0/10	Runner-up; carries baggage for AI-burned personas
C	Voice Engine	Early access	7.35/10	Winner; emphasis on learning your voice, not AI doing something to you

Per-Persona v5 Intent Scores

Persona	Role	Company Size	v5 Score	v1 Score (est.)
Aisha	CEO	30p fintech	7	7
Marcus	CEO	22p B2B SaaS	7	5
Priya	Head of Marketing	45p HR tech	6	5
David	CMO	120p enterprise software	8	5
Dana	VP Sales	80p SaaS	7	5
Kevin	Head of Growth	800p enterprise	6	4
Sofia	Founder	12p consumer	6	4
Alex	COO	65p proptech	6	3
Tanya	Marketing Manager	38p edtech	7	5
Bernard	CEO	50p professional services	6	2
Kenji	Head of Content	95p martech	7	5
Morgan	VP Marketing	150p logistics tech	6	4
Nilufar	Chief of Staff	55p climate tech	6	4
Jamie	SDR Manager	70p sales tech	7	5
Isabelle	Head of Comms	200p healthtech	6	4
Ryan	Founder	8p AI tools	7	6
Chen	VP Product	110p devtools	6	3
Fatima	Head of Marketing	35p legaltech	7	5
Omar	CEO	25p recruitment tech	7	4
Laura	Content Lead	60p fintech	7	5

Infrastructure Used

Component	Spec	Role
Mac Studio	256GB unified RAM, Apple Silicon	All inference, local only
Ollama	v0.x, port 11434	Model serving backend
qwen2.5:7b	4-bit quantized, ~5GB VRAM	Persona model (T2 tier)
llama3.3:70b	4-bit quantized, ~40GB VRAM	External judge, never self-judges
Python 3.14	requests, json, checkpoint files	Simulation scripts
shot-scraper	Playwright-based CLI	Full-page screenshots at 1440px
peekaboo	macOS UI automation CLI	Safari window capture for quick previews

Copy Changes Log

Version	Change	Reason	Impact
v3	"beta" → "early access" throughout	Framing experiment data	+premium feel, less "unfinished"
v3	Removed "ghostwriter" label	Framing experiment data	Removed deception connotation
v3	Removed "read everything you've ever written in Slack"	v1: "scary," "invasive"	Privacy objection began declining
v3	Bottom CTA: "Get your team's ghostwriter" → "Your team's voice. In Slack."	Ghostwriter label removal	Cleaner, no category confusion
v4	Added 4-card security section	35% privacy objection in v1/v3	Privacy objection: 35% → 25%
v4	Grounded voice examples with real numbers	Vague examples not convincing	Examples cited as more credible
v4	Natural contractions throughout ("it's" not "it is")	Copy felt stiff in proofread	Brand voice more human
v5	Added testimonials section (3 skeptic-to-convert quotes)	Voice authenticity top objection	Share rate: 15% → 30%
v5	Added "first week" timeline	Workflow disruption objection #2 in v4	Disruption concerns "significantly reduced"
v5	Slack demo on dark background	Visual contrast / visual break	Aesthetic, no intent impact measured
All	Zero em dashes enforced	House style requirement	Consistency

Deliverables

View the Pages

The actual landing pages built from this work, open each one to see the result.

Version	Description	Links
v1. Original	The page as received. Starting point. 4.3/10 avg intent before any changes.	Open ↗ Screenshot ↗
v5. Final	Production-ready. All 5 sim rounds applied. Zero em dashes, proofread, ship-ready.	Open ↗ Screenshot ↗
v4	Security section + grounded voice examples. Before testimonials.	Open ↗ Screenshot ↗
v3	Post-framing experiment. Early access + no ghostwriter label.	Open ↗ Screenshot ↗

Also

Related Work

The simulation infrastructure behind this project ran on a local LLM farm built and benchmarked in parallel.

Local LLM Eval Farm: 22 Models, 6 Dimensions, $0 Cloud Cost →