
Intro
Most generative engine optimisation conversations collapse two very different events into a single metric. “Did the engine cite us?” is treated as a binary win. It is not.
Getting picked as a source by ChatGPT, Perplexity, Google AI Overviews or Gemini is the first stage — citation selection. Being usefully absorbed into the generated answer, with the brand’s claims surviving paraphrase and ending up in front of the user, is a separate second stage — citation absorption. A page can be selected and barely absorbed. A page can be absorbed heavily without being visibly cited. Both stages must be optimised, and they pull on different levers.
This page sets out the framework, the empirical evidence behind it, and how it maps onto Capston Core measurement. It is the prerequisite read before any AI visibility scoring work.
Audit your selection and absorption gaps
Why two stages, not one
A generative engine does not write answers from your homepage. It writes them from a working set.
The retrieval pipeline first fetches a candidate pool of URLs: search results, indexed snippets, cached embeddings. From that pool it selects a smaller set of sources judged eligible to inform the answer. Only then does it absorb material from those selected sources into the synthesised response — extracting claims, weighting them, paraphrasing, occasionally quoting, and threading them through the user-facing answer.
Selection and absorption use different signals. Selection rewards authority, domain recognition, language match and topical fit. Absorption rewards semantic alignment with the prompt, evidence density, structural legibility (clean headings, tight paragraphs, scannable lists), and the presence of concrete facts the model can lift without rewriting.
Treating GEO as a single funnel hides the diagnosis. A brand that is rarely selected has a source-authority problem. A brand that is selected but rarely absorbed has a content-shape problem. The fixes are not the same.
Stage 1: Selection
Selection answers a narrow question: is this URL eligible to inform the answer at all?
The variables that move citation selection are mostly off-page and structural.
- Source authority — domain rating, mentions in trusted indices, presence in the engine’s training corpus, references from publications the engine already trusts.
- Recognizability — the brand and the domain are named together often enough that the engine treats them as a single entity.
- Language and locale match — the URL serves content in the language and region of the prompt, with hreflang aligned and canonical clean.
- Domain context — the wider domain is topically coherent. A single excellent page on an otherwise unrelated domain is selected less often than the same page on a domain whose whole topic graph reinforces it.
- Freshness signals — last-modified dates, recent inbound mentions, recent on-page updates.
If selection is the bottleneck, the work is editorial PR, entity consistency across the data evidence layer, and tightening the domain’s topical perimeter. Writing more pages does not fix a selection problem.
Stage 2: Absorption
Absorption answers a different question: once selected, how much of this page actually ends up in the answer?
The variables here are on-page and structural in a different way.
- Semantic alignment — the page directly matches the prompt’s intent, not a neighbouring intent. Pages built for one intent absorb better than pages built for three.
- Evidence density — concrete numbers, dates, named entities, defined terms, citations. Models lift facts more readily than adjectives.
- Structural legibility — short paragraphs, descriptive H2s, tight definitions near the top, FAQ blocks that mirror common phrasings. The engine extracts from spans it can isolate cleanly.
- Modular content — self-contained sections that make sense lifted out of context. A 200-word block that survives extraction is worth more than a 1,200-word essay that needs to be read whole.
- Claim survivability — the brand’s specific claim is phrased in a way that paraphrases without losing the brand. Generic phrasing dissolves into generic answer text.
If absorption is the bottleneck, the work is on-page: rewrite for evidence density, restructure for legibility, ensure each page targets exactly one intent. The Capston Core methodology covers the rewrite protocol in detail.
What the empirical data shows
The two-stage framework is not theoretical. Zhang Kai, He Xinyue and Yao Jingang (2026) ran 602 prompts across ChatGPT, Perplexity, Google AI Overview and Gemini, capturing 21,143 citations and 23,745 citation-level features. They report a sharp divergence between citation breadth and citation depth.
Three findings matter for brands.
First, Perplexity cites the most sources on average per prompt; Google AI Overview also cites broadly. ChatGPT cites fewer sources but each fetched page exerts substantially higher average citation influence on the produced answer. The platforms run different selection-to-absorption ratios. A brand optimised only for breadth will under-perform on the engine that concentrates influence.
Second, selection breadth does not predict absorption depth. A URL can sit in the cited list with negligible influence on the synthesised text. Counting citations without weighting absorption overstates visibility.
Third, the features that predict selection and the features that predict absorption only partly overlap. Authority signals dominate selection. Structural and semantic signals dominate absorption. This is empirical support for treating GEO as two coupled but distinct optimisation problems.
For a premium brand, the operational consequence is direct: measuring “did we get cited?” is insufficient. The measurement layer has to separate selection rate, absorption weight, and per-engine concentration.
How this fits into Capston Core
Capston Core measurement reports selection and absorption as separate dimensions, not a single citation count.
- Selection rate — share of prompts in the locked prompt set where the brand domain appears in the cited source list, by engine.
- Absorption weight — share of the synthesised answer text traceable to the brand’s own sources, by engine.
- Concentration index — how much of the answer leans on the top one or two sources, which determines whether broad citation or deep citation is the right play per engine.
- Gap diagnosis — for each weak prompt, is the failure at selection or at absorption? The recommended next move differs.
Selection work routes into the data evidence layer and entity programme. Absorption work routes into the page-level rewrites defined by the Capston Core methodology and scored inside AI visibility scoring.
→ Back to Capston Core
FAQ
Is citation selection or citation absorption more important?
Neither in isolation. A brand that is never selected cannot be absorbed. A brand that is selected but never absorbed is invisible in the user-facing answer. Capston Core measures both and routes work to whichever stage is the binding constraint.
Why do ChatGPT and Perplexity score so differently in the research?
The Zhang et al. study (602 prompts, 21,143 citations) shows Perplexity cites broadly while ChatGPT cites fewer sources with higher per-source influence. They run different selection-to-absorption ratios, so a brand needs an engine-specific plan, not a single GEO playbook.
Can a page have high absorption without visible citation?
Yes. Engines sometimes use a source to inform an answer without surfacing it in the visible citation list, especially when content has been absorbed into training corpora or cached embeddings. Absorption measurement looks at the answer text itself, not only the citation badge.
How often should selection and absorption be retested?
Quarterly as a baseline, monthly for high-stakes accounts. Model updates routinely shift the selection-to-absorption ratio on the same prompt set without any change on the brand side.
Reference
Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707v2. Empirical study of 602 prompts, 21,143 citations and 23,745 citation-level features across ChatGPT, Perplexity, Google AI Overview and Gemini.
Final CTA block
Find out whether selection or absorption is your bottleneck.
Audit selection and absorption
Read the measurement methodology