GEO Scientific Research 2026: Peer-Reviewed Evidence Map for AI Citation Optimization

GEO Scientific Research 2026: Peer-Reviewed Evidence Map for AI Citation Optimization

Most published GEO content in 2026 is vendor marketing dressed as research. Only a handful of peer-reviewed empirical studies actually quantify how ChatGPT, Perplexity, Claude, and Google AI Overview select and use citations. This silo page maps the academic GEO literature that practitioners can cite with confidence: the University of Toronto comparative study of AI Search vs. Google (Chen et al., 2025), the citation-selection-vs-absorption measurement framework from the geo-citation-lab dataset (Zhang, He & Yao, 2026), and the original GEO formalization by Aggarwal et al. Below: the empirical findings practitioners must internalize, the methodologies behind them, and a citation-ready reading list organized by topic.

TL;DR: Two academic papers anchor evidence-based GEO in 2026: (1) Chen et al. (arXiv:2509.08919) document an overwhelming earned-media bias in AI Search, low cross-language domain stability, and a structural big-brand bias; (2) Zhang, He & Yao (arXiv:2604.25707) separate citation selection from citation absorption, show that ChatGPT cites fewer but deeper sources, and reveal that Q&A formatting alone does not improve absorption. Use these as the empirical foundation, not vendor claims.

Free CapstonAI scan →    Pricing

The academic GEO landscape in 2026

Three streams of academic work shape evidence-based GEO. The first stream formalized the field: Aggarwal, Murthy, Sheth, Bose & Krishna (2024) introduced the GEO framework, the GEO-bench benchmark, and showed that black-box content interventions can lift visibility by up to 40% in generative engines. The second stream measures observed engine behavior: Chen, Wang, Chen & Koudas (2025) at the University of Toronto ran large-scale controlled experiments across verticals, languages, and paraphrases to compare Google with ChatGPT, Claude, Perplexity, and Gemini. The third stream studies citation mechanics: Zhang, He & Yao (2026) built the geo-citation-lab dataset (602 prompts, 21 143 valid citations, 23 745 citation-level feature records) to separate which sources get selected from which sources actually shape the generated answer.

The combined picture is more useful than any single paper. Aggarwal et al. proved GEO works as an intervention. Chen et al. mapped which sources AI engines prefer (overwhelmingly earned media, with sharp engine-by-engine differences). Zhang et al. quantified the depth-vs-breadth tradeoff and identified the page-level features that drive answer-level influence.

What the Toronto study (Chen et al., 2025) established

The Toronto group ran controlled experiments comparing Google’s top-10 results with web-enabled responses from Claude (3.5 Sonnet), ChatGPT (4o search-preview), Perplexity (sonar-pro), and Gemini (2.5 Flash with Google Search grounding). Each cited URL was classified as Brand (official manufacturer/retailer), Earned (independent reviews, media, government), or Social (community platforms like Reddit, YouTube, Quora).

  • Earned-media dominance. Across automotive, consumer electronics, software, and other verticals, AI Search overwhelmingly cited earned sources. Claude and ChatGPT were the most earned-heavy (above 80% in most verticals); Gemini sat in the middle with more brand content; Perplexity included more social sources (notably YouTube).
  • Low cross-engine overlap. Jaccard similarity between engines on the same vertical was typically 0.10 to 0.25. Each engine is sampling a different evidence pool.
  • Cross-language instability. Domain overlap across languages was generally near zero for GPT (it swaps site ecosystems by language), while Claude maintained much higher cross-language stability by reusing the same authority domains.
  • Big brand bias. Unbranded prompts in the cola vertical defaulted to market leaders (Coca-Cola, Pepsi, Dr Pepper), with niche brands appearing at much lower frequencies.
  • Paraphrase sensitivity is smaller than language sensitivity. Reformulating a query changes some citations; translating it changes the entire evidence ecology.

Deep-dive: Earned media bias in AI Search →

What the citation-absorption study (Zhang, He & Yao, 2026) added

The Zhang group analyzed 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity using the public geo-citation-lab dataset. They proposed and tested a two-stage measurement framework: citation selection (which sources a platform chooses) and citation absorption (how deeply a cited page shapes the generated answer, measured by a composite influence score combining reference count, position, paragraph coverage, TF-IDF similarity, and n-gram overlap).

  • Citation breadth and depth diverge. Mean citations per prompt: ChatGPT 6.88, Google 12.06, Perplexity 16.35. Mean fetched-page influence: ChatGPT 0.2713, Google 0.0584, Perplexity 0.0646. ChatGPT cites fewer sources but uses each one more intensively.
  • Q&A format alone is weak. Q&A pages averaged 0.0947 influence versus 0.1005 for non-Q&A pages, a relative difference of -5.74%. FAQ packaging without underlying evidence density does not earn deeper absorption.
  • Evidence genres matter. Pages containing code (+76.88%), numbers/statistics (+61.55%), definition markers (+57.33%), comparison content (+55.28%), and how-to content (+41.20%) showed substantially higher mean influence.
  • Domain type ranks differently in selection vs. absorption. News appears frequently in candidate pools but news_media pages averaged only 0.0726 influence, while encyclopedia pages averaged 0.2144. Selection probability and absorption intensity are separate outcomes.
  • Top selected domains in the dataset: youtube.com (560), en.wikipedia.org (352), reddit.com (315), reuters.com (287), linkedin.com (187), nytimes.com (174), pmc.ncbi.nlm.nih.gov (167), facebook.com (151), forbes.com (146), finance.yahoo.com (146).

Deep-dive: Citation selection vs. absorption →

Combined empirical map: what practitioners should believe in 2026

Claim Evidence Source Confidence
AI engines overwhelmingly prefer earned media over brand-owned content Multiple verticals, multiple engines, Brand/Earned/Social classification Chen et al. 2025 High (replicated across verticals)
ChatGPT cites fewer sources but uses them more deeply than Perplexity or Google 21 143 citations, mean influence 0.2713 vs. 0.0646 vs. 0.0584 Zhang, He & Yao 2026 Medium-high (snapshot, single dataset)
Q&A formatting alone does not improve absorption 23 745 citation features, -5.74% relative difference Zhang, He & Yao 2026 Medium (descriptive only, no controlled intervention)
Cross-language domain overlap is engine-dependent and generally low for GPT Cross-language Jaccard heatmaps across 6 languages Chen et al. 2025 High (multiple language pairs)
Unbranded prompts default to market leaders (big brand bias) Cola vertical: Coca-Cola, Pepsi dominate ChatGPT and Perplexity outputs Chen et al. 2025 Medium-high (one vertical extensively tested)
Evidence genres (definitions, stats, comparisons) drive deeper absorption than format alone +57% to +77% mean influence uplift Zhang, He & Yao 2026 Medium (observational, not experimental)
GEO interventions can lift visibility up to 40% in benchmark conditions GEO-bench experiments on Perplexity.ai and synthetic benchmarks Aggarwal et al. 2024 Medium (benchmark, not field)

The full GEO research reading list

Vague 21 — Toronto empirical study deep-dives

Vague 22 — Citation absorption framework deep-dives (FR)

How to cite this research properly

Both primary studies are available on arXiv with permanent identifiers. Cite them as follows in any GEO whitepaper, blog post, or client report:

  • Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. https://arxiv.org/abs/2509.08919
  • Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. https://arxiv.org/abs/2604.25707
  • Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.

FAQ — GEO scientific research

Are these papers peer-reviewed?

Chen et al. (2025) is an arXiv preprint from the University of Toronto research group; Zhang, He & Yao (2026) is also an arXiv preprint with public dataset and reproducible pipeline. Aggarwal et al. (2024) appeared at KDD 2024. arXiv preprints are common in this fast-moving field because journal cycles are too slow for the pace of AI search evolution. Treat findings as empirical descriptions of observed behavior at the moment of publication, not as permanent laws.

Why focus on academic studies when vendors publish their own data?

Vendor data is valuable but suffers from selection bias (vendors report what makes their tools look good). Academic studies use open prompt sets, multiple platforms, transparent methodology, and reproducible pipelines. They are the only sources that can be cited as independent evidence in client proposals and board reports.

How fast does this research age?

Quickly. The Toronto group explicitly warns that engine behavior is dynamic and that exact percentages should be treated as illustrative of relative trends, not permanent facts. Re-baseline every 6 months by running the same prompt panels and comparing engine outputs. The conceptual frameworks (earned-media bias, evidence-container hypothesis, selection vs. absorption) are more durable than the specific numbers.

Are there other peer-reviewed GEO sources worth tracking?

Yes. The Aggarwal et al. (2024) original GEO paper introduced GEO-bench. Citation-repair, agentic GEO, and structural GEO work has expanded the field. Studies of source attribution in answer engines (mentioned in the related-work sections of both Chen et al. and Zhang et al.) document hallucination, inaccurate citation, and evidence-claim mismatches. Track the arXiv cs.IR (Information Retrieval) and cs.CL (Computation and Language) categories monthly for new contributions.

Methodology cheat-sheet (cite these in your own work)

  • Chen et al. methodology: ranking-style prompts (e.g., “Top 10 X”), web-enabled API calls to each engine, domain extraction via tldextract, Brand/Earned/Social classification with rule-based and AI-assisted labeling, Jaccard overlap for cross-engine and cross-language stability, paraphrase templates (justification, source, quote, confidence, ranked, imperative, keyword-only).
  • Zhang, He & Yao methodology: 602 prompts across 4 layers (main A=432, style B=60, language C=60, scenarios D=50), influence_score = 0.20·ref_count + 0.15·(1-first_position_ratio) + 0.20·paragraph_coverage + 0.25·TF-IDF cosine + 0.20·(bigram+trigram overlap)/2, 72 feature dimensions per citation, fractional logit or beta regression recommended for absorption modeling.
  • Aggarwal et al. methodology: GEO-bench benchmark with synthetic and real-world queries, content intervention strategies (citations, quotations, statistics, formal tone), position-adjusted word count and subjective impression scores as visibility metrics.

Tools and related reading

Ready to apply GEO research to your brand?

Free CapstonAI scan →

Last updated: May 2026. Primary sources: Chen, M., Wang, X., Chen, K., & Koudas, N. (2025). Generative Engine Optimization: How to Dominate AI Search. University of Toronto. arXiv:2509.08919. Zhang, K., He, X., & Yao, J. (2026). From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms. arXiv:2604.25707. Aggarwal, P., Murthy, V., Sheth, V., Bose, A., & Krishna, S. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.