Skip to content

AI search

llms.txt

A proposed file at site root that hints to AI crawlers what content to prioritize.

Definition

A markdown-formatted file at /llms.txt that lists the most important pages on a site for AI crawlers, with structured descriptions. Inspired by robots.txt and sitemap.xml. Adoption is uneven: some AI engines respect it as a hint; most do not enforce it strongly. Worth shipping for forward compatibility.

When to use

  • Sites with > 50 indexed pages where AI crawlers struggle to identify canonical content.
  • Technical or topical sites where a clear hierarchy (overview, glossary, deep-dives) helps citation accuracy.
  • Brands with a distinct knowledge graph worth surfacing to answer engines — products, docs, methodology.
  • After major architecture changes (URL migration, content reorg) to signal new canonical structure.

Common pitfalls

  • Listing every URL — defeats the purpose; keep it to the 20-50 entries you most want cited.
  • Treating llms.txt as a guarantee — most AI engines treat it as a hint, not a directive.
  • Letting it drift out of sync with the sitemap — confuses crawlers that cross-check.
  • Forgetting to serve it as `text/plain; charset=utf-8` — some bots reject other content types.

Verification

  • curl -I https://your-site.com/llms.txt — must return 200 with `Content-Type: text/plain`.
  • Server logs: confirm GPTBot, ClaudeBot, PerplexityBot fetch /llms.txt at least once per week.
  • Test prompts in ChatGPT and Perplexity that reference the canonical content listed — verify citations resolve to those URLs.
  • Compare /llms.txt entries against /sitemap.xml — flagged entries should be a strict subset of indexable URLs.

References

Last updated:

See also

Need this concept applied to your stack?

Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.