Skip to content

Crawl

robots.txt

A file at site root that tells crawlers which paths they may or may not fetch.

Definition

Defined by RFC 9309 (Robots Exclusion Protocol). Lives at /robots.txt. Each crawler reads it and is expected to honor Allow/Disallow rules. Blocking a path in robots.txt prevents crawling but does not guarantee non-indexation — pages can still be indexed from external links unless also marked noindex.

Need this concept applied to your stack?

Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.