AI search

llms.txt

A proposed file at site root that hints to AI crawlers what content to prioritize.

Definition

A markdown-formatted file at /llms.txt that lists the most important pages on a site for AI crawlers, with structured descriptions. Inspired by robots.txt and sitemap.xml. Adoption is uneven: some AI engines respect it as a hint; most do not enforce it strongly. Worth shipping for forward compatibility.

When to use

Sites with > 50 indexed pages where AI crawlers struggle to identify canonical content.
Technical or topical sites where a clear hierarchy (overview, glossary, deep-dives) helps citation accuracy.
Brands with a distinct knowledge graph worth surfacing to answer engines — products, docs, methodology.
After major architecture changes (URL migration, content reorg) to signal new canonical structure.

Common pitfalls

Listing every URL — defeats the purpose; keep it to the 20-50 entries you most want cited.
Treating llms.txt as a guarantee — most AI engines treat it as a hint, not a directive.
Letting it drift out of sync with the sitemap — confuses crawlers that cross-check.
Forgetting to serve it as `text/plain; charset=utf-8` — some bots reject other content types.

Verification

curl -I https://your-site.com/llms.txt — must return 200 with `Content-Type: text/plain`.
Server logs: confirm GPTBot, ClaudeBot, PerplexityBot fetch /llms.txt at least once per week.
Test prompts in ChatGPT and Perplexity that reference the canonical content listed — verify citations resolve to those URLs.
Compare /llms.txt entries against /sitemap.xml — flagged entries should be a strict subset of indexable URLs.

References

Last updated: 2026-05-11

Need this concept applied to your stack?

Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.

Book scoping call