If you read most GEO write-ups from the last six months, you would think llms.txt is a settled default that every major AI assistant actively reads. The honest 2026 picture is more interesting than that, and the recommendation that falls out of it is still the same: most brands should ship one. Just not for the reason they have been told.
"Eighteen months after Jeremy Howard proposed it, llms.txt sits at roughly ten percent of crawled domains. Major AI platforms do not request it as a first-class input yet. The reason to ship one anyway is that the cost is an afternoon and the downside is zero, while the upside has at least four real channels."
This guide unpacks the current status, then walks through what to actually put in the file for a B2B SaaS, B2C, or media site, where to host it, the three mistakes the production audits keep surfacing, and how to measure whether yours is doing anything.
What is llms.txt and where did it come from?
An llms.txt file is a plain-text, markdown-formatted index of your most important content. It lives at the root of your domain, exactly like robots.txt does, and it answers a question robots.txt cannot: not "which pages may you crawl" but "which pages should you actually read."
The format was proposed by Jeremy Howard (Answer.AI, co-founder of fast.ai) on September 3, 2024. The pitch was deliberately modest. AI assistants have small context windows and almost no patience for site navigation. A short, markdown-formatted brief at /llms.txt would let a model find the canonical version of your About page, your most authoritative documentation, your highest-quality long-form content, and the summaries that orient the rest, without having to crawl the whole site to figure out what is important.
The format itself is simple. A first-line H1 with your site name. A blockquote summary in one sentence. A few markdown sections, each with a list of links and one-line annotations. Optional llms-full.txt for the expanded content. That is the whole specification.
Adoption stayed niche for the first two months after Howard's proposal. The inflection point came in November 2024, when documentation platform Mintlify enabled llms.txt by default across every site it hosts. Practically overnight, thousands of docs sites, including Anthropic, Cursor, Stripe, Cloudflare, Vercel, Zapier, and Hugging Face, began publishing the file. Eighteen months later, an SE Ranking survey of 300,000 domains found a 10.13% adoption rate. The pattern is concentration in developer-tool companies, where AI-assistant accuracy on product documentation is a direct support cost, and patchy adoption everywhere else.
Who actually reads llms.txt in 2026?
This is the part most adoption posts skip. The candid answer is: not nearly as many things as you have been told, but more than zero, and the trend line is real.
The major web-scale AI assistants (ChatGPT, Claude, Gemini, Perplexity) do not currently fetch /llms.txt as a first-class input during routine crawling. Independent audits of server logs from sites that ship the file confirm crawler requests for it remain low-volume. That is the empirical reality and the early skeptic posts that called the format dead in 2025 were responding to that signal honestly.
What changed in late 2025 and through 2026 is that the file moved into a different layer of the AI stack. Developer-facing assistants (Cursor, Continue, Aider) read it routinely when working in a repository. Anthropic, OpenAI, and Perplexity retrieval pipelines can be prompted to fetch it explicitly. And Google added an experimental Chrome Lighthouse audit that scores sites on whether they ship a well-formed llms.txt, framed as agentic-browsing readiness rather than search ranking.
Google Search, separately, has been explicit: in its 2026 generative-AI optimisation guide, Google Search lists llms.txt among the tactics you do not need to appear in AI Overviews or AI Mode. Search Engine Journal covered the resulting mixed signals across Google products: Lighthouse audits for it, Search says you do not need it. Both can be true.
Why ship it anyway?
If the major crawlers do not read it heavily and Google Search says you do not need it, why is the recommendation still ship it? Four reasons.
First, the file does get pulled when explicitly invoked. A user, an agent, or an internal tool that wants to brief an AI on your brand can point it at /llms.txt and get an authoritative summary in two seconds. That is a real surface, and it is one you control.
Second, developer-tool ecosystems read it. If your product has a developer audience or if your buyers use Cursor or Aider or any of the IDE-embedded assistants to research vendors, those tools fetch llms.txt when present. For B2B SaaS in particular, this is the most under-rated channel of the four.
Third, the underlying curation exercise is independently valuable. Writing a well-built llms.txt forces you to pick the five to fifteen pages on your site that actually carry your story. Most teams have never done that exercise. The output is useful for onboarding new hires, briefing PR agencies, and structuring internal site navigation, regardless of whether a crawler ever reads the file.
Fourth, the file is cheap and the cost of being wrong about adoption is asymmetric. Shipping llms.txt takes an afternoon. If web-scale crawlers start reading it heavily in 2027 (which is at least plausible), you are already in position. If they do not, you spent an afternoon. The expected-value math is clear.
This pairs with the rest of the GEO stack but does not replace any of it. Where JSON-LD schema tells machines what specific entities and pages mean, llms.txt tells them which pages are worth reading and in what order, when they decide to look. Both compound with a clean LLM-friendly About page as the canonical entity anchor.
The file structure, with a working template
Here is the minimum viable structure for a B2B SaaS brand. Copy it, swap your details, save it as llms.txt at the root of your domain.
# Bold GEO
> Bold GEO is an AI brand visibility tracker. We monitor how brands are cited across ChatGPT, Perplexity, Gemini, Claude, and Microsoft Copilot on a daily refresh.
## Core
- [About Bold GEO](https://boldgeo.co/about): who we are, what we measure, and the methodology behind the daily refresh.
- [Product overview](https://boldgeo.co/#product): the AI visibility tracker, what it covers, and how it works.
- [Plans and pricing](https://boldgeo.co/#plans): $1/scan pay-as-you-go and $49 / $99 / $249 monthly tiers.
## Guides
- [What is GEO](https://boldgeo.co/blog/what-is-geo): plain-English intro to generative engine optimisation.
- [The GEO audit](https://boldgeo.co/blog/geo-audit-guide): step-by-step audit of your AI presence.
- [Seven factors of AI visibility](https://boldgeo.co/blog/seven-factors-ai-visibility): the variables that move the citation score.
## Model deep dives
- [How ChatGPT recommends brands](https://boldgeo.co/blog/how-chatgpt-recommends-brands)
- [How Gemini selects sources](https://boldgeo.co/blog/how-gemini-selects-sources)
- [How Perplexity cites brands](https://boldgeo.co/blog/get-cited-by-perplexity)
- [How Claude treats brand citations](https://boldgeo.co/blog/why-claude-ignores-your-brand)
- [How Microsoft Copilot cites brands](https://boldgeo.co/blog/how-microsoft-copilot-cites-brands)
## Optional
- [State of AI Search 2026](https://boldgeo.co/blog/state-of-ai-search-2026): quarterly recap.
Notice what is and is not on the list. The Core section names the canonical About, product, and pricing pages, in that order. Guides surface the highest-authority long-form content. Model deep dives group together the per-platform references that a model is likely to want when asked "how do I rank on X." The Optional section catches the recurring content the model is welcome to read but should not treat as foundational.
For a B2C or DTC brand, swap the Model deep dives section for Categories or Best of pages. For a media site, structure the Guides section around your evergreen pillar content and put the daily news lower in the file. The structural rule is the same in every case: lead with identity, then expertise, then everything else.
Step-by-step implementation
The actual deploy takes under an hour for any site that has access to its webroot.
One. Pick the five to twenty pages that, if a model could only read those, would give it everything it needs to recommend you accurately. Resist the temptation to list everything. The format works because it is short.
Two. Draft the file in plain markdown, following the structure above. Keep the summary blockquote under thirty words. Keep each annotation under one line. Avoid marketing language. The audience is a model that will be asked a question about you in four months and is trying to remember who you are.
Three. Save the file as llms.txt (lowercase, no extension changes) and upload it to the root of your domain so it resolves at https://yourdomain.com/llms.txt. Set the content type to text/plain; charset=utf-8. Do not put it behind authentication or a paywall.
Four. Optionally, create a companion llms-full.txt at the same location with the full text of your most important pages concatenated together. This is useful for sites where the deeper content is gated behind heavy JavaScript and crawlers may not render it properly.
Five. Add a reference to the file in your robots.txt or sitemap if you want belt and braces, though most assistants check /llms.txt by default now.
Six. Ping the major engines so they recrawl. For Bing this is automatic if you have IndexNow wired up. For Google, submit the URL inside Search Console's URL Inspection tool.
The three mistakes we see most often
The first mistake is treating llms.txt as a sitemap. A sitemap lists every URL on your domain. An llms.txt file should list the small subset of URLs that are worth a model's attention. If your file has more than thirty links, prune it. The point of the format is curation.
The second mistake is writing the annotations for humans instead of for models. Lines like "our award-winning product page" provide zero signal. The model already knows it is a product page from context. What it does not know is what the product does. Replace marketing copy with one-line factual descriptions and the citation quality changes.
The third mistake is forgetting to update the file. llms.txt should change when you launch a new product, publish a major piece of content, or kill an old page. We recommend a quarterly review on the same cadence as your sitemap audit. If the file is older than your last big product update, models are pointing themselves at outdated information.
Measuring whether it is working
This is the part most guides skip. Publishing llms.txt is easy. Knowing whether it moved anything is harder, and the honest answer is that the measurement signal is currently weak. None of the major AI assistants tell you "we used your llms.txt file in this answer," and as noted above, web-scale crawlers do not request the file in high volume yet.
Three signals are still worth instrumenting.
The first is your own test prompts. Pick five questions where you want to be the recommended answer and run them weekly across ChatGPT, Claude, Gemini, Perplexity, and Copilot. Watch for citation frequency and citation accuracy. If the model starts pointing at the exact pages you listed in llms.txt, your curation is influencing the answer regardless of whether the file itself was the input. This is the methodology Bold GEO is built around, and it is the same one we describe in our piece on tracking share of voice in AI search.
The second is server logs for the explicit-invoke channel. AI tools that fetch /llms.txt identify themselves in the user-agent string (PerplexityBot, ClaudeBot, GPTBot, Google-Extended, BingBot for Copilot, plus the developer-tool agents). Grep your access logs and you will see who actually pulled the file. The volume is low today, but the trend over months is the leading indicator.
The third is Bing Webmaster Tools' AI Performance dashboard, which launched in February 2026 and reports Copilot citations, grounding queries, and page-level performance. Watch for whether the pages on your llms.txt list start getting cited at a higher rate than pages that are not. That delta, if it appears, is the strongest first-party evidence available today that the curation is influencing model behaviour.
None of these signals is perfect on its own. Together they triangulate. And they all sit on top of the underlying authority signals covered in our brand-authority playbook, because no amount of curation rescues a site whose third-party signals are weak.
What to do after you ship the file
Two follow-ups matter more than anything else.
First, treat llms.txt as the front door, not the whole house. Every page you link from the file should itself be optimised: clear H1 that names the entity, definition in the first paragraph, a fact-dense middle, an FAQ block at the bottom. The file gets the model to the page. The page has to do the rest.
Second, add the file to your release checklist. Whenever you ship a new pillar piece of content, ask whether it belongs in llms.txt and edit accordingly. If you launch a new product line, the file needs an update the same day. Treat it like documentation that ages, because it does.
If you want a benchmark for whether your AI visibility is moving after you ship the file, use Bold GEO's AI visibility dashboard → to compare citation counts across the five major models before and after deployment.