GEO Scientific Research 2026: Peer-Reviewed Evidence Map for AI Citation Optimization

Most published GEO content in 2026 is vendor marketing dressed as research. Only a handful of peer-reviewed empirical studies actually quantify how ChatGPT, Perplexity, Claude, and Google AI Overview select and use citations. This silo page maps the academic GEO literature that practitioners can cite with confidence: the University of Toronto comparative study of AI Search vs. Google (Chen et al., 2025), the citation-selection-vs-absorption measurement framework from the geo-citation-lab dataset (Zhang, He & Yao, 2026), and the original GEO formalization by Aggarwal et al. Below: the empirical findings practitioners must internalize, the methodologies behind them, and a citation-ready reading list organized by topic.

TL;DR: Two academic papers anchor evidence-based GEO in 2026: (1) Chen et al. (arXiv:2509.08919) document an overwhelming earned-media bias in AI Search, low cross-language domain stability, and a structural big-brand bias; (2) Zhang, He & Yao (arXiv:2604.25707) separate citation selection from citation absorption, show that ChatGPT cites fewer but deeper sources, and reveal that Q&A formatting alone does not improve absorption. Use these as the empirical foundation, not vendor claims.

Free CapstonAI scan → Pricing

The academic GEO landscape in 2026

Three streams of academic work shape evidence-based GEO. The first stream formalized the field: Aggarwal, Murthy, Sheth, Bose & Krishna (2024) introduced the GEO framework, the GEO-bench benchmark, and showed that black-box content interventions can lift visibility by up to 40% in generative engines. The second stream measures observed engine behavior: Chen, Wang, Chen & Koudas (2025) at the University of Toronto ran large-scale controlled experiments across verticals, languages, and paraphrases to compare Google with ChatGPT, Claude, Perplexity, and Gemini. The third stream studies citation mechanics: Zhang, He & Yao (2026) built the geo-citation-lab dataset (602 prompts, 21 143 valid citations, 23 745 citation-level feature records) to separate which sources get selected from which sources actually shape the generated answer.

The combined picture is more useful than any single paper. Aggarwal et al. proved GEO works as an intervention. Chen et al. mapped which sources AI engines prefer (overwhelmingly earned media, with sharp engine-by-engine differences). Zhang et al. quantified the depth-vs-breadth tradeoff and identified the page-level features that drive answer-level influence.

What the Toronto study (Chen et al., 2025) established

The Toronto group ran controlled experiments comparing Google’s top-10 results with web-enabled responses from Claude (3.5 Sonnet), ChatGPT (4o search-preview), Perplexity (sonar-pro), and Gemini (2.5 Flash with Google Search grounding). Each cited URL was classified as Brand (official manufacturer/retailer), Earned (independent reviews, media, government), or Social (community platforms like Reddit, YouTube, Quora).

Earned-media dominance. Across automotive, consumer electronics, software, and other verticals, AI Search overwhelmingly cited earned sources. Claude and ChatGPT were the most earned-heavy (above 80% in most verticals); Gemini sat in the middle with more brand content; Perplexity included more social sources (notably YouTube).
Low cross-engine overlap. Jaccard similarity between engines on the same vertical was typically 0.10 to 0.25. Each engine is sampling a different evidence pool.
Cross-language instability. Domain overlap across languages was generally near zero for GPT (it swaps site ecosystems by language), while Claude maintained much higher cross-language stability by reusing the same authority domains.
Big brand bias. Unbranded prompts in the cola vertical defaulted to market leaders (Coca-Cola, Pepsi, Dr Pepper), with niche brands appearing at much lower frequencies.
Paraphrase sensitivity is smaller than language sensitivity. Reformulating a query changes some citations; translating it changes the entire evidence ecology.

Deep-dive: Earned media bias in AI Search →

What the citation-absorption study (Zhang, He & Yao, 2026) added

The Zhang group analyzed 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity using the public geo-citation-lab dataset. They proposed and tested a two-stage measurement framework: citation selection (which sources a platform chooses) and citation absorption (how deeply a cited page shapes the generated answer, measured by a composite influence score combining reference count, position, paragraph coverage, TF-IDF similarity, and n-gram overlap).

Citation breadth and depth diverge. Mean citations per prompt: ChatGPT 6.88, Google 12.06, Perplexity 16.35. Mean fetched-page influence: ChatGPT 0.2713, Google 0.0584, Perplexity 0.0646. ChatGPT cites fewer sources but uses each one more intensively.
Q&A format alone is weak. Q&A pages averaged 0.0947 influence versus 0.1005 for non-Q&A pages, a relative difference of -5.74%. FAQ packaging without underlying evidence density does not earn deeper absorption.
Evidence genres matter. Pages containing code (+76.88%), numbers/statistics (+61.55%), definition markers (+57.33%), comparison content (+55.28%), and how-to content (+41.20%) showed substantially higher mean influence.
Domain type ranks differently in selection vs. absorption. News appears frequently in candidate pools but news_media pages averaged only 0.0726 influence, while encyclopedia pages averaged 0.2144. Selection probability and absorption intensity are separate outcomes.
Top selected domains in the dataset: youtube.com (560), en.wikipedia.org (352), reddit.com (315), reuters.com (287), linkedin.com (187), nytimes.com (174), pmc.ncbi.nlm.nih.gov (167), facebook.com (151), forbes.com (146), finance.yahoo.com (146).

Deep-dive: Citation selection vs. absorption →

Combined empirical map: what practitioners should believe in 2026

Claim	Evidence	Source	Confidence
AI engines overwhelmingly prefer earned media over brand-owned content	Multiple verticals, multiple engines, Brand/Earned/Social classification	Chen et al. 2025	High (replicated across verticals)
ChatGPT cites fewer sources but uses them more deeply than Perplexity or Google	21 143 citations, mean influence 0.2713 vs. 0.0646 vs. 0.0584	Zhang, He & Yao 2026	Medium-high (snapshot, single dataset)
Q&A formatting alone does not improve absorption	23 745 citation features, -5.74% relative difference	Zhang, He & Yao 2026	Medium (descriptive only, no controlled intervention)
Cross-language domain overlap is engine-dependent and generally low for GPT	Cross-language Jaccard heatmaps across 6 languages	Chen et al. 2025	High (multiple language pairs)
Unbranded prompts default to market leaders (big brand bias)	Cola vertical: Coca-Cola, Pepsi dominate ChatGPT and Perplexity outputs	Chen et al. 2025	Medium-high (one vertical extensively tested)
Evidence genres (definitions, stats, comparisons) drive deeper absorption than format alone	+57% to +77% mean influence uplift	Zhang, He & Yao 2026	Medium (observational, not experimental)
GEO interventions can lift visibility up to 40% in benchmark conditions	GEO-bench experiments on Perplexity.ai and synthetic benchmarks	Aggarwal et al. 2024	Medium (benchmark, not field)

The full GEO research reading list

Vague 21 — Toronto empirical study deep-dives

Vague 22 — Citation absorption framework deep-dives (FR)

How to cite this research properly

Both primary studies are available on arXiv with permanent identifiers. Cite them as follows in any GEO whitepaper, blog post, or client report:

Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. https://arxiv.org/abs/2509.08919
Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. https://arxiv.org/abs/2604.25707
Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.

FAQ — GEO scientific research

Are these papers peer-reviewed?

Chen et al. (2025) is an arXiv preprint from the University of Toronto research group; Zhang, He & Yao (2026) is also an arXiv preprint with public dataset and reproducible pipeline. Aggarwal et al. (2024) appeared at KDD 2024. arXiv preprints are common in this fast-moving field because journal cycles are too slow for the pace of AI search evolution. Treat findings as empirical descriptions of observed behavior at the moment of publication, not as permanent laws.

Why focus on academic studies when vendors publish their own data?

Vendor data is valuable but suffers from selection bias (vendors report what makes their tools look good). Academic studies use open prompt sets, multiple platforms, transparent methodology, and reproducible pipelines. They are the only sources that can be cited as independent evidence in client proposals and board reports.

How fast does this research age?

Quickly. The Toronto group explicitly warns that engine behavior is dynamic and that exact percentages should be treated as illustrative of relative trends, not permanent facts. Re-baseline every 6 months by running the same prompt panels and comparing engine outputs. The conceptual frameworks (earned-media bias, evidence-container hypothesis, selection vs. absorption) are more durable than the specific numbers.

Are there other peer-reviewed GEO sources worth tracking?

Yes. The Aggarwal et al. (2024) original GEO paper introduced GEO-bench. Citation-repair, agentic GEO, and structural GEO work has expanded the field. Studies of source attribution in answer engines (mentioned in the related-work sections of both Chen et al. and Zhang et al.) document hallucination, inaccurate citation, and evidence-claim mismatches. Track the arXiv cs.IR (Information Retrieval) and cs.CL (Computation and Language) categories monthly for new contributions.

Methodology cheat-sheet (cite these in your own work)

Chen et al. methodology: ranking-style prompts (e.g., “Top 10 X”), web-enabled API calls to each engine, domain extraction via tldextract, Brand/Earned/Social classification with rule-based and AI-assisted labeling, Jaccard overlap for cross-engine and cross-language stability, paraphrase templates (justification, source, quote, confidence, ranked, imperative, keyword-only).
Zhang, He & Yao methodology: 602 prompts across 4 layers (main A=432, style B=60, language C=60, scenarios D=50), influence_score = 0.20·ref_count + 0.15·(1-first_position_ratio) + 0.20·paragraph_coverage + 0.25·TF-IDF cosine + 0.20·(bigram+trigram overlap)/2, 72 feature dimensions per citation, fractional logit or beta regression recommended for absorption modeling.
Aggarwal et al. methodology: GEO-bench benchmark with synthetic and real-world queries, content intervention strategies (citations, quotations, statistics, formal tone), position-adjusted word count and subjective impression scores as visibility metrics.

Tools and related reading

🇫🇷 Version française : Recherche Scientifique GEO 2026 — Toutes les analyses Zhang 2026 disponibles en français, avec les 5 pages V22 de CapstonAI.

Ready to apply GEO research to your brand?

Free CapstonAI scan →

Last updated: May 2026. Primary sources: Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.

🇫🇷 Version française : Recherche Scientifique GEO 2026 — Toutes les analyses Zhang 2026 disponibles en français.

Free GEO Tools & Templates

Apply this research to your business — download the free calculators, audits and playbooks.

GEO ROI Calculator for CFOs

15-minute business case based on Zhang et al. 2026 + 86-customer cohort benchmarks.

Get Free Calculator →

GEO Metrics Defensibility Audit

25-point checklist to make your AI visibility metrics survive a board-level audit.

Get Free Audit →

Multi-Engine GEO Scorecard

Score your brand across ChatGPT, Perplexity and Google AI Overview. 75% of cited sources differ between engines.

Get Free Scorecard →