AI search
llms.txt
A proposed file at site root that hints to AI crawlers what content to prioritize.
Definition
A markdown-formatted file at /llms.txt that lists the most important pages on a site for AI crawlers, with structured descriptions. Inspired by robots.txt and sitemap.xml. Adoption is uneven: some AI engines respect it as a hint; most do not enforce it strongly. Worth shipping for forward compatibility.
When to use
- Sites with > 50 indexed pages where AI crawlers struggle to identify canonical content.
- Technical or topical sites where a clear hierarchy (overview, glossary, deep-dives) helps citation accuracy.
- Brands with a distinct knowledge graph worth surfacing to answer engines — products, docs, methodology.
- After major architecture changes (URL migration, content reorg) to signal new canonical structure.
Common pitfalls
- Listing every URL — defeats the purpose; keep it to the 20-50 entries you most want cited.
- Treating llms.txt as a guarantee — most AI engines treat it as a hint, not a directive.
- Letting it drift out of sync with the sitemap — confuses crawlers that cross-check.
- Forgetting to serve it as `text/plain; charset=utf-8` — some bots reject other content types.
Verification
- curl -I https://your-site.com/llms.txt — must return 200 with `Content-Type: text/plain`.
- Server logs: confirm GPTBot, ClaudeBot, PerplexityBot fetch /llms.txt at least once per week.
- Test prompts in ChatGPT and Perplexity that reference the canonical content listed — verify citations resolve to those URLs.
- Compare /llms.txt entries against /sitemap.xml — flagged entries should be a strict subset of indexable URLs.
References
Last updated:
See also
More in AI search
Answer Engine Optimization (AEO)
Optimizing content to be cited inside AI-generated answers.
AI Overviews
Google's AI-generated answer block that appears above traditional search results.
GPTBot
OpenAI's training crawler. Distinct from OAI-SearchBot.
OAI-SearchBot
OpenAI's search crawler. Indexes content for ChatGPT search-time grounding.
Need this concept applied to your stack?
Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.