Crawl

Crawl budget

The number of URLs a crawler is willing to fetch from your site within a given window.

Definition

Crawl budget is the product of crawl rate (how fast a crawler can fetch without overloading your origin) and crawl demand (how much the crawler thinks your URLs are worth re-fetching). Slow origins, high duplicate-URL counts, and faceted-navigation sprawl all waste crawl budget that should land on indexable, valuable pages.

When to use

Sites with > 10k URLs where indexation rate (indexed ÷ submitted) is below 80%.
Sites with high template duplication — pagination, faceted filters, sort parameters, tag/topic aggregations.
Sites where a recent migration or programmatic launch created a step-change in URL volume.
Sites where origin response time during Googlebot fetches is regularly above 1.5 s.

Common pitfalls

Treating sitemap submission as crawl-budget control — sitemaps signal demand, but crawl rate is throttled by origin health.
Leaving thin tag/filter pages indexable; they consume crawl budget without earning rank.
Heavy use of `<a href>` to disallowed URLs — crawlers still spend HEAD/GET attempts confirming disallows.
Confusing "Discovered, currently not crawled" with broken pages — usually the URL is fine but priority is low.

Verification

GSC: Crawl Stats → Total crawl requests trend should rise with content additions; flat or declining signals constraint.
Server logs: total Googlebot fetches per day vs total indexable URLs — sustained ratio below 0.5 means a backlog.
GSC: Page Indexing → "Discovered, currently not crawled" count — should stay below 5% of submitted URLs.
Average Googlebot response time in GSC Crawl Stats — keep under 500 ms for healthy crawl rate.

References

Last updated: 2026-05-11

Need this concept applied to your stack?

Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.

Book scoping call