GEO Case Studies: Real Generative Engine Optimization Results

A credible GEO case study proves a measurable lift in how often AI engines mention or cite a brand — and in its share of voice against named rivals — using a clear before/after method on the same prompts and the same engines.

TL;DR

Proof, not promises: a real GEO case study shows a baseline, an intervention, and a re-measure — not a single screenshot of one good answer.
Three metrics matter: mentions, citations and share of voice are different things; a trustworthy case separates them and names the engines.
Our policy: CapstonAI publishes verified cases only. This page ships with marked placeholders until real client data is validated — and we never guarantee citations.

What a credible GEO case study must show

The internet is filling with “GEO case studies” that prove nothing. A flattering screenshot of ChatGPT naming a brand once is not evidence — it’s a moment. Generative engines are probabilistic: ask the same question twice and you can get two different answers, with different sources cited. So before you believe any result, including ours, hold it to a standard. Learning to read proof is itself a core part of generative engine optimization, because the discipline only matures when claims can be checked.

1. A clear baseline before anything changed

Every honest case starts by measuring the starting point. What share of relevant prompts mentioned the brand? How often was it cited with a link? Which competitors were named instead? Without a documented baseline, there is no “before,” and therefore no real “after.” A case that opens with the result and skips the starting line is asking you to take the lift on faith.

2. A named, specific intervention

What actually changed between the two measurements? Strong cases describe the work: structured data added, content restructured for extractability, an llms.txt published, entity signals strengthened, a comparison page created. Vague language like “we optimized the brand” hides whether the lift came from the intervention or from background noise. The more specific the action, the more believable the outcome.

3. A re-measure on the same prompts and the same engines

The “after” has to be measured the same way as the “before” — same prompts, same engines, ideally the same time window. If the baseline tracked ChatGPT and Perplexity but the result quietly adds Gemini where the brand happened to do well, the comparison is rigged. A credible case re-runs the identical measurement set so the only meaningful variable is the work that was done.

4. The right metric, named honestly

These three are routinely blurred, and the difference decides whether a claim is impressive or empty:

Metric	What it actually means
Mentions	The engine names your brand in its answer — but may not link to you. Useful for awareness, weaker as proof of authority.
Citations	The engine cites your site as a source, usually with a link. Harder to earn and the strongest signal that your content is trusted.
Share of voice	How often you appear relative to named competitors for the same set of prompts. The metric that shows whether you’re winning or losing the category.

A case that says “mentions tripled” but never touched citations or share of voice may be describing noise. Always ask which metric moved, and against whom.

5. Controls for engine variance

Single runs are noisy. Because answers vary between identical prompts, a serious measurement repeats each prompt multiple times and reports a rate, not a one-off. If a case study rests on one query on one day, treat it as anecdote. The trustworthy version averages across repeated runs so that random variation doesn’t masquerade as a result.

6. A stated time window

GEO outcomes are not instant. Engines re-crawl, re-index and update their training and retrieval layers on their own schedules. A real case names the window between intervention and re-measure — weeks, not hours — so you can judge whether the lift had time to be real rather than coincidental.

Red flags: no baseline, a single screenshot, undated results, an unnamed engine, “guaranteed” citations, or a metric that’s never specified. Any one of these means the case proves less than it claims. If you want the underlying playbook, see how to do generative engine optimization.

How CapstonAI measures GEO outcomes

CapstonAI is a measurement and methodology platform, not an agency. Our entire model is built so that a case study can actually be verified — because we measure the same way before and after, every time. Here’s the loop the platform runs:

Free baseline scan. It starts at app.capston.ai/audit. The scan establishes your “before”: which prompts mention you, which cite you, and who’s named instead. That baseline is what any later result is compared against.
Per-prompt tracking across the major engines. The platform tracks mentions, citations and share of voice for each prompt across ChatGPT, Perplexity, Gemini and Google AI Overviews — separately, because each engine selects and cites sources differently. Aggregate-only tools hide where you actually win or lose.
Act on structural gaps. The reasons a brand goes uncited are usually structural: unparseable content, missing schema, no llms.txt, weak entity signals. CapstonAI connects measurement to action through agents for WordPress, Shopify, Drupal and Chrome, so the gaps a scan surfaces can be fixed in your stack.
Re-scan and compare the trend. After the work, the platform re-runs the same measurement and shows the trend over time, not a single snapshot. That repeated, like-for-like comparison is what turns activity into a defensible result.

This is deliberately the same structure we described above for a credible case study — baseline, intervention, re-measure, named engines, repeated runs. If you’re weighing this against classic search work, GEO compared with SEO explains why ranking on Google and being cited by ChatGPT are separate outcomes that need separate measurement. And if you want to compare measurement platforms directly, see our overview of GEO tools.

We run this on ourselves (dogfooding)

CapstonAI uses its own platform to track and improve CapstonAI’s AI visibility. We’re not above the discipline we sell — we run the same baseline scan on ourselves, monitor how often the major engines mention and cite us for the questions our buyers ask, watch our share of voice against the rest of the GEO category, and act on the structural gaps the scan surfaces using the same agents we offer customers.

We’re describing the practice, not parading metrics. We won’t publish our own before/after numbers here as marketing, because that would hold us to a lower standard than the one we just laid out — verified, dated, repeatable. What we will say plainly is that dogfooding keeps the product honest: when an engine stops citing us, we feel it first, and fixing it on our own properties is how we pressure-test what we recommend to you.

Case studies

Below is where CapstonAI’s verified client and self-measured cases will appear — each with a named context, the metric that moved, the before/after method, the engines tested, the time window, and a checkable source. We publish a case only after the data is validated, so that every figure on this page can survive the scrutiny we ask you to apply.

[case data — à valider produit : client, secteur, métrique avant/après, source, date]

Until then, treat the absence of numbers as a feature, not a gap: a missing case study is more trustworthy than a fabricated one. If you’d like to become one of these cases, the path starts with a baseline — covered next.

How to request your own baseline

Every case study begins the same way: with a measured starting point. You can generate yours in minutes. Run the free scan at app.capston.ai/audit to see, across ChatGPT, Perplexity, Gemini and Google AI Overviews, where your brand is mentioned, where it’s cited, and who’s winning the answers you should be in. That baseline is the “before” — and the foundation of any real result you measure later.

From there, the platform tracks the trend over time and helps you close structural gaps through agents for WordPress, Shopify, Drupal and Chrome. We measure and help you improve. We do not, and will not, guarantee that an AI engine will cite you — no honest GEO platform can.

Frequently asked questions

What makes a GEO case study trustworthy?

A documented baseline, a named intervention, a re-measure on the same prompts and engines, a clearly stated metric (mentions, citations or share of voice), repeated runs to control for engine variance, and a stated time window. Missing any of these weakens the claim.

What’s the difference between mentions, citations and share of voice?

A mention is the engine naming your brand. A citation is the engine linking your site as a source — harder to earn and a stronger signal. Share of voice is how often you appear relative to named competitors for the same prompts.

How long before GEO results show?

Weeks, not hours. Generative engines re-crawl, re-index and update retrieval on their own schedules, so a credible case names the time window between the intervention and the re-measure. Anyone promising overnight results is not measuring carefully.

Can you guarantee citations?

No. CapstonAI is a measurement and methodology platform, not an agency, and no honest GEO tool can guarantee an AI engine will cite you. We measure your visibility, surface the structural gaps, and help you improve the odds — the engines decide.

Why does this page have placeholders instead of numbers?

Because we publish verified cases only. The marked placeholder is intentional: a missing case study is more honest than an invented one. Real client and self-measured cases will be inserted once the data is validated against the standard described above.

How do I get a GEO baseline for my brand?

Run the free AI visibility scan at app.capston.ai/audit. It measures your mentions, citations and share of voice across ChatGPT, Perplexity, Gemini and Google AI Overviews — the “before” that any future result is compared against.

GEO Case Studies: What Real Results Look Like

What a credible GEO case study must show

1. A clear baseline before anything changed

2. A named, specific intervention

3. A re-measure on the same prompts and the same engines

4. The right metric, named honestly

5. Controls for engine variance

6. A stated time window

How CapstonAI measures GEO outcomes

We run this on ourselves (dogfooding)

Case studies

How to request your own baseline

Frequently asked questions

Start with your own baseline

What a credible GEO case study must show

1. A clear baseline before anything changed

2. A named, specific intervention

3. A re-measure on the same prompts and the same engines

4. The right metric, named honestly

5. Controls for engine variance

6. A stated time window

How CapstonAI measures GEO outcomes

We run this on ourselves (dogfooding)

Case studies

How to request your own baseline

Frequently asked questions

Related reading

Start with your own baseline