CapstonAI · AI Citation Foundation
Build an llms.txt file that points AI crawlers straight at your most citable pages. This generator helps you assemble the pieces; the guide below explains exactly what llms.txt is and how to ship it.
/llms.txt that signposts AI engines toward your best, most quotable pages.An llms.txt file is a Markdown document you place at the root of your domain — https://yoursite.com/llms.txt — that lists and describes your most important pages for large language models. Think of it as a curated index written for machines: instead of forcing an AI crawler to discover and rank every URL on your site, you hand it a short, human-readable map that says “here are the pages worth citing, and here’s what each one is about.” It complements, rather than replaces, your robots.txt and XML sitemap. Where robots.txt controls access and a sitemap lists everything, llms.txt does something different — it expresses editorial priority in plain language a model can read.
The format is deliberately simple: a top-level heading with your site name, an optional summary line, then sections of links grouped under headings, each link followed by a short description. There is no proprietary syntax to learn — if you can write Markdown, you can write llms.txt. That simplicity is the point: the file should be trivially easy for both a person and a model to parse.
AI engines build their answers by reading the web, and the easier you make it for them to find your best content, the more likely that content is to be surfaced and cited. An llms.txt file is one of the cleanest signals you can give. It removes ambiguity: rather than hoping a crawler stumbles onto your definitive comparison page beneath a pile of blog archives and tag pages, you point directly at it with a one-line description of why it matters. This is a core part of generative engine optimization — structuring your signals so engines mention and cite you.
It also reinforces the same principles the research community has flagged for citation. Aggarwal et al., “GEO: Generative Engine Optimization” (KDD 2024), found qualitatively that content which is well-sourced, quotable and clearly structured is more likely to be surfaced in generative answers. An llms.txt doesn’t rewrite your pages — but it tells engines which of your well-structured, quotable pages to look at first. That said, llms.txt is an emerging convention, not a guaranteed ranking lever: no file forces an engine to cite you. It improves discoverability and clarity; the content still has to earn the citation. Measuring whether engines actually pick you up — using an AI visibility tool — is how you tell the difference between hope and impact.
A good llms.txt has a small number of clear sections. You don’t need every one, but each plays a role:
## Docs, ## Guides, ## Products, ## About. Group by purpose, not by site structure.## Optional) for pages that are useful but not essential, so engines can skip them under tight context limits.Keep it curated. An llms.txt that lists every URL is no more useful than a raw sitemap. The value is in the editorial judgment: choosing your definitions, comparisons and authoritative guides — the pages most worth citing — and saying clearly what each one is.
Use the form below to assemble the pieces of your llms.txt by hand. It’s a drafting aid, not a backend service: it won’t auto-publish a file, crawl your site, or guarantee citations. Fill in your details, then use the example structure underneath as your template. To see whether AI engines actually find and cite your pages once your file is live, run a free scan at app.capston.ai/audit.
Here’s the shape of a finished file. Replace every placeholder with your own brand, URLs and descriptions — these values are illustrative only.
# YourBrand > YourBrand is a [one-line description of what you do and who you serve]. ## Docs - [Getting started](https://yoursite.com/docs/getting-started/): How to set up YourBrand in a few minutes. - [API reference](https://yoursite.com/docs/api/): Endpoints, auth and examples. ## Guides - [Beginner's guide](https://yoursite.com/guide/): The core concepts explained simply. - [Product comparison](https://yoursite.com/compare/): How our plans and tiers differ. ## About - [About YourBrand](https://yoursite.com/about/): Who we are and what we stand for. - [Contact](https://yoursite.com/contact/): How to reach the team. ## Optional - [Changelog](https://yoursite.com/changelog/): Release notes, useful but not essential.
Follow these six steps to go from a blank file to a live, maintained llms.txt. Each builds on the last, and the order matters: get the content right before you worry about hosting and validation.
Start the file with a single top-level heading containing your site or brand name, followed by a one-sentence summary of what you do and who you serve. This is the first context an AI model reads, so make it specific and plain — no marketing fluff. It frames everything that follows and helps an engine decide your site is relevant to a query.
Choose the pages most worth citing — your definitions, comparisons, authoritative guides and key product pages — and write each as a Markdown link followed by a short, factual description of what the page answers. Curate ruthlessly: a focused list of your best pages is far more useful to an engine than every URL on your site.
Organize your links under meaningful section headings such as ## Docs, ## Guides, ## Products or ## About. Group by purpose rather than by your site’s navigation, so a model can quickly find the kind of page it needs. Clean grouping is what turns a list of links into a readable map.
If you want to expose a more complete index, create a companion llms-full.txt that includes the broader set of pages and richer content. Keep llms.txt as the curated, priority shortlist and use llms-full.txt for depth, so engines with more context budget can go deeper without cluttering your primary file.
Upload the files so they’re reachable at https://yoursite.com/llms.txt and, if you made one, https://yoursite.com/llms-full.txt. The root location is the convention crawlers look for, the same way they expect robots.txt. Serve them as plain text or Markdown with a sensible content type so they’re easy to fetch and parse.
Confirm each file returns a 200 OK with the expected content — open the URL in a browser or use a simple HTTP check. Then treat the file as living documentation: revise it whenever you publish important new pages or retire old ones, so it always points to your freshest, most citable sources rather than dead links.
Writing an llms.txt once is straightforward; keeping it accurate as your site grows — and knowing whether it’s actually helping — is the harder part. CapstonAI is a measurement and methodology platform, not an agency, and it connects to your stack through agents for WordPress, Shopify, Drupal and Chrome. Those agents help you ship structural fixes like llms.txt, schema and answer-first content into your CMS, so the file isn’t a one-off you forget to maintain. The CapstonAI WordPress plugin and SaaS make it part of an ongoing workflow rather than a manual chore.
More importantly, CapstonAI closes the loop on whether any of it works. Hosting an llms.txt is an input; being cited is the outcome — and they’re not the same thing. CapstonAI scans whether AI engines like ChatGPT, Perplexity, Gemini and Google AI Overviews actually mention and cite you for the prompts that matter, separating mentions from citations and benchmarking your share of voice against named rivals. That’s how you tell whether your llms.txt and the content it points to are earning visibility, instead of assuming. We measure and help you improve — we don’t promise guaranteed citations or rankings, because no honest tool can.
An llms.txt is a plain-text Markdown file hosted at your site root that gives AI crawlers and large language models a curated map of your most important pages, with short descriptions, so they can find and cite the right content quickly.
No. The form on this page is an honest drafting aid that helps you assemble and structure your inputs. It doesn’t fake generated output, crawl your site or host a live file. Copy your details into the example format, host it at your site root yourself, then measure the result with a free scan.
At the root of your domain, reachable at https://yoursite.com/llms.txt. If you also create a fuller index, host it as llms-full.txt at the root too. Crawlers look at the root by convention, the same way they expect robots.txt.
llms.txt is your curated shortlist of priority pages with short descriptions. llms-full.txt is an optional, more complete index with broader pages and richer content, for engines that have more context budget to go deeper.
No. llms.txt is an emerging convention that improves discoverability and clarity, but no file forces an engine to cite you. Your content still has to earn the citation. The reliable way to know whether it’s working is to measure citations across engines over time.
robots.txt controls crawler access and a sitemap lists every URL. llms.txt is different: it expresses editorial priority in plain language, telling models which curated pages are worth reading and citing first. They complement each other.
Measure it. Run a free scan at app.capston.ai/audit to see whether ChatGPT, Perplexity, Gemini and Google AI Overviews actually mention and cite you for your key prompts, then watch the trend over repeated scans rather than judging a single run.
Ship your llms.txt, then run a free scan across ChatGPT, Perplexity, Gemini and Google AI Overviews — no credit card.