GEO Scientific Research 2026: Peer-Reviewed Evidence Map for AI Citation Optimization
Most published GEO content in 2026 is vendor marketing dressed as research. Only a handful of peer-reviewed empirical studies actually quantify how ChatGPT, Perplexity, Claude, and Google AI Overview select and use citations. This silo page maps the academic GEO literature that practitioners can cite with confidence: the University of Toronto comparative study of AI Search vs. Google (Chen et al., 2025), the citation-selection-vs-absorption measurement framework from the geo-citation-lab dataset (Zhang, He & Yao, 2026), and the original GEO formalization by Aggarwal et al. Below: the empirical findings practitioners must internalize, the methodologies behind them, and a citation-ready reading list organized by topic.
TL;DR: Two academic papers anchor evidence-based GEO in 2026: (1) Chen et al. (arXiv:2509.08919) document an overwhelming earned-media bias in AI Search, low cross-language domain stability, and a structural big-brand bias; (2) Zhang, He & Yao (arXiv:2604.25707) separate citation selection from citation absorption, show that ChatGPT cites fewer but deeper sources, and reveal that Q&A formatting alone does not improve absorption. Use these as the empirical foundation, not vendor claims.
The academic GEO landscape in 2026
Three streams of academic work shape evidence-based GEO. The first stream formalized the field: Aggarwal, Murthy, Sheth, Bose & Krishna (2024) introduced the GEO framework, the GEO-bench benchmark, and showed that black-box content interventions can lift visibility by up to 40% in generative engines. The second stream measures observed engine behavior: Chen, Wang, Chen & Koudas (2025) at the University of Toronto ran large-scale controlled experiments across verticals, languages, and paraphrases to compare Google with ChatGPT, Claude, Perplexity, and Gemini. The third stream studies citation mechanics: Zhang, He & Yao (2026) built the geo-citation-lab dataset (602 prompts, 21 143 valid citations, 23 745 citation-level feature records) to separate which sources get selected from which sources actually shape the generated answer.
The combined picture is more useful than any single paper. Aggarwal et al. proved GEO works as an intervention. Chen et al. mapped which sources AI engines prefer (overwhelmingly earned media, with sharp engine-by-engine differences). Zhang et al. quantified the depth-vs-breadth tradeoff and identified the page-level features that drive answer-level influence.
What the Toronto study (Chen et al., 2025) established
The Toronto group ran controlled experiments comparing Google’s top-10 results with web-enabled responses from Claude (3.5 Sonnet), ChatGPT (4o search-preview), Perplexity (sonar-pro), and Gemini (2.5 Flash with Google Search grounding). Each cited URL was classified as Brand (official manufacturer/retailer), Earned (independent reviews, media, government), or Social (community platforms like Reddit, YouTube, Quora).
- Earned-media dominance. Across automotive, consumer electronics, software, and other verticals, AI Search overwhelmingly cited earned sources. Claude and ChatGPT were the most earned-heavy (above 80% in most verticals); Gemini sat in the middle with more brand content; Perplexity included more social sources (notably YouTube).
- Low cross-engine overlap. Jaccard similarity between engines on the same vertical was typically 0.10 to 0.25. Each engine is sampling a different evidence pool.
- Cross-language instability. Domain overlap across languages was generally near zero for GPT (it swaps site ecosystems by language), while Claude maintained much higher cross-language stability by reusing the same authority domains.
- Big brand bias. Unbranded prompts in the cola vertical defaulted to market leaders (Coca-Cola, Pepsi, Dr Pepper), with niche brands appearing at much lower frequencies.
- Paraphrase sensitivity is smaller than language sensitivity. Reformulating a query changes some citations; translating it changes the entire evidence ecology.
Deep-dive: Earned media bias in AI Search →
What the citation-absorption study (Zhang, He & Yao, 2026) added
The Zhang group analyzed 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity using the public geo-citation-lab dataset. They proposed and tested a two-stage measurement framework: citation selection (which sources a platform chooses) and citation absorption (how deeply a cited page shapes the generated answer, measured by a composite influence score combining reference count, position, paragraph coverage, TF-IDF similarity, and n-gram overlap).
- Citation breadth and depth diverge. Mean citations per prompt: ChatGPT 6.88, Google 12.06, Perplexity 16.35. Mean fetched-page influence: ChatGPT 0.2713, Google 0.0584, Perplexity 0.0646. ChatGPT cites fewer sources but uses each one more intensively.
- Q&A format alone is weak. Q&A pages averaged 0.0947 influence versus 0.1005 for non-Q&A pages, a relative difference of -5.74%. FAQ packaging without underlying evidence density does not earn deeper absorption.
- Evidence genres matter. Pages containing code (+76.88%), numbers/statistics (+61.55%), definition markers (+57.33%), comparison content (+55.28%), and how-to content (+41.20%) showed substantially higher mean influence.
- Domain type ranks differently in selection vs. absorption. News appears frequently in candidate pools but news_media pages averaged only 0.0726 influence, while encyclopedia pages averaged 0.2144. Selection probability and absorption intensity are separate outcomes.
- Top selected domains in the dataset: youtube.com (560), en.wikipedia.org (352), reddit.com (315), reuters.com (287), linkedin.com (187), nytimes.com (174), pmc.ncbi.nlm.nih.gov (167), facebook.com (151), forbes.com (146), finance.yahoo.com (146).
Deep-dive: Citation selection vs. absorption →
Combined empirical map: what practitioners should believe in 2026
| Claim | Evidence | Source | Confidence |
|---|---|---|---|
| AI engines overwhelmingly prefer earned media over brand-owned content | Multiple verticals, multiple engines, Brand/Earned/Social classification | Chen et al. 2025 | High (replicated across verticals) |
| ChatGPT cites fewer sources but uses them more deeply than Perplexity or Google | 21 143 citations, mean influence 0.2713 vs. 0.0646 vs. 0.0584 | Zhang, He & Yao 2026 | Medium-high (snapshot, single dataset) |
| Q&A formatting alone does not improve absorption | 23 745 citation features, -5.74% relative difference | Zhang, He & Yao 2026 | Medium (descriptive only, no controlled intervention) |
| Cross-language domain overlap is engine-dependent and generally low for GPT | Cross-language Jaccard heatmaps across 6 languages | Chen et al. 2025 | High (multiple language pairs) |
| Unbranded prompts default to market leaders (big brand bias) | Cola vertical: Coca-Cola, Pepsi dominate ChatGPT and Perplexity outputs | Chen et al. 2025 | Medium-high (one vertical extensively tested) |
| Evidence genres (definitions, stats, comparisons) drive deeper absorption than format alone | +57% to +77% mean influence uplift | Zhang, He & Yao 2026 | Medium (observational, not experimental) |
| GEO interventions can lift visibility up to 40% in benchmark conditions | GEO-bench experiments on Perplexity.ai and synthetic benchmarks | Aggarwal et al. 2024 | Medium (benchmark, not field) |
The full GEO research reading list
Vague 21 — Toronto empirical study deep-dives
- Earned Media Bias in AI Search: What the Toronto Study Reveals
- AI Engines Domain Diversity Compared: Claude vs ChatGPT vs Perplexity vs Gemini
- Cross-Language GEO Stability: Why Claude Reuses Domains and GPT Swaps Ecosystems
- Big Brand Bias in AI Search: Why Niche Brands Lose and How to Break Through
- The 5 Pillars of Generative Engine Optimization: Strategic Agenda from Toronto Empirical Study
Vague 22 — Citation absorption framework deep-dives (FR)
- Citation Selection vs Citation Absorption : Le framework GEO en 2 couches
- L’Hypothèse Evidence-Container : Pourquoi les pages modulaires gagnent dans les réponses IA
- Le format Q&A n’améliore PAS le GEO : Le résultat contre-intuitif de 23 745 citations
- Le GEO Influence Score : Méthodologie, composantes, et comment mesurer l’absorption
- Genres de preuves classés : Code, stats, définitions, comparaisons, how-to — ce qui déclenche les citations IA
How to cite this research properly
Both primary studies are available on arXiv with permanent identifiers. Cite them as follows in any GEO whitepaper, blog post, or client report:
- Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. https://arxiv.org/abs/2509.08919
- Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. https://arxiv.org/abs/2604.25707
- Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.
FAQ — GEO scientific research
Are these papers peer-reviewed?
Chen et al. (2025) is an arXiv preprint from the University of Toronto research group; Zhang, He & Yao (2026) is also an arXiv preprint with public dataset and reproducible pipeline. Aggarwal et al. (2024) appeared at KDD 2024. arXiv preprints are common in this fast-moving field because journal cycles are too slow for the pace of AI search evolution. Treat findings as empirical descriptions of observed behavior at the moment of publication, not as permanent laws.
Why focus on academic studies when vendors publish their own data?
Vendor data is valuable but suffers from selection bias (vendors report what makes their tools look good). Academic studies use open prompt sets, multiple platforms, transparent methodology, and reproducible pipelines. They are the only sources that can be cited as independent evidence in client proposals and board reports.
How fast does this research age?
Quickly. The Toronto group explicitly warns that engine behavior is dynamic and that exact percentages should be treated as illustrative of relative trends, not permanent facts. Re-baseline every 6 months by running the same prompt panels and comparing engine outputs. The conceptual frameworks (earned-media bias, evidence-container hypothesis, selection vs. absorption) are more durable than the specific numbers.
Are there other peer-reviewed GEO sources worth tracking?
Yes. The Aggarwal et al. (2024) original GEO paper introduced GEO-bench. Citation-repair, agentic GEO, and structural GEO work has expanded the field. Studies of source attribution in answer engines (mentioned in the related-work sections of both Chen et al. and Zhang et al.) document hallucination, inaccurate citation, and evidence-claim mismatches. Track the arXiv cs.IR (Information Retrieval) and cs.CL (Computation and Language) categories monthly for new contributions.
Methodology cheat-sheet (cite these in your own work)
- Chen et al. methodology: ranking-style prompts (e.g., “Top 10 X”), web-enabled API calls to each engine, domain extraction via tldextract, Brand/Earned/Social classification with rule-based and AI-assisted labeling, Jaccard overlap for cross-engine and cross-language stability, paraphrase templates (justification, source, quote, confidence, ranked, imperative, keyword-only).
- Zhang, He & Yao methodology: 602 prompts across 4 layers (main A=432, style B=60, language C=60, scenarios D=50), influence_score = 0.20·ref_count + 0.15·(1-first_position_ratio) + 0.20·paragraph_coverage + 0.25·TF-IDF cosine + 0.20·(bigram+trigram overlap)/2, 72 feature dimensions per citation, fractional logit or beta regression recommended for absorption modeling.
- Aggarwal et al. methodology: GEO-bench benchmark with synthetic and real-world queries, content intervention strategies (citations, quotations, statistics, formal tone), position-adjusted word count and subjective impression scores as visibility metrics.
Tools and related reading
- CapstonAI platform overview¡platform
- CapstonAI AI Citation Tracking
- Best AI citation tracking tool 2026
- How to build a prompt panel for tracking
- Structured data audit for AI engines
- Wikidata setup for brands
- sameAs in Organization schema
- Glossary: AI Search, GEO, AEO, SEO
Ready to apply GEO research to your brand?
Last updated: May 2026. Primary sources: Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.