Crawl
Crawl budget
The number of URLs a crawler is willing to fetch from your site within a given window.
Definition
Crawl budget is the product of crawl rate (how fast a crawler can fetch without overloading your origin) and crawl demand (how much the crawler thinks your URLs are worth re-fetching). Slow origins, high duplicate-URL counts, and faceted-navigation sprawl all waste crawl budget that should land on indexable, valuable pages.
When to use
- Sites with > 10k URLs where indexation rate (indexed ÷ submitted) is below 80%.
- Sites with high template duplication — pagination, faceted filters, sort parameters, tag/topic aggregations.
- Sites where a recent migration or programmatic launch created a step-change in URL volume.
- Sites where origin response time during Googlebot fetches is regularly above 1.5 s.
Common pitfalls
- Treating sitemap submission as crawl-budget control — sitemaps signal demand, but crawl rate is throttled by origin health.
- Leaving thin tag/filter pages indexable; they consume crawl budget without earning rank.
- Heavy use of `<a href>` to disallowed URLs — crawlers still spend HEAD/GET attempts confirming disallows.
- Confusing "Discovered, currently not crawled" with broken pages — usually the URL is fine but priority is low.
Verification
- GSC: Crawl Stats → Total crawl requests trend should rise with content additions; flat or declining signals constraint.
- Server logs: total Googlebot fetches per day vs total indexable URLs — sustained ratio below 0.5 means a backlog.
- GSC: Page Indexing → "Discovered, currently not crawled" count — should stay below 5% of submitted URLs.
- Average Googlebot response time in GSC Crawl Stats — keep under 500 ms for healthy crawl rate.
References
Last updated:
See also
Need this concept applied to your stack?
Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.