Crawl
robots.txt
A file at site root that tells crawlers which paths they may or may not fetch.
Definition
Defined by RFC 9309 (Robots Exclusion Protocol). Lives at /robots.txt. Each crawler reads it and is expected to honor Allow/Disallow rules. Blocking a path in robots.txt prevents crawling but does not guarantee non-indexation — pages can still be indexed from external links unless also marked noindex.
More in Crawl
Need this concept applied to your stack?
Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.