Skip to content

Crawl

Crawl budget

The number of URLs a crawler is willing to fetch from your site within a given window.

Definition

Crawl budget is the product of crawl rate (how fast a crawler can fetch without overloading your origin) and crawl demand (how much the crawler thinks your URLs are worth re-fetching). Slow origins, high duplicate-URL counts, and faceted-navigation sprawl all waste crawl budget that should land on indexable, valuable pages.

When to use

  • Sites with > 10k URLs where indexation rate (indexed ÷ submitted) is below 80%.
  • Sites with high template duplication — pagination, faceted filters, sort parameters, tag/topic aggregations.
  • Sites where a recent migration or programmatic launch created a step-change in URL volume.
  • Sites where origin response time during Googlebot fetches is regularly above 1.5 s.

Common pitfalls

  • Treating sitemap submission as crawl-budget control — sitemaps signal demand, but crawl rate is throttled by origin health.
  • Leaving thin tag/filter pages indexable; they consume crawl budget without earning rank.
  • Heavy use of `<a href>` to disallowed URLs — crawlers still spend HEAD/GET attempts confirming disallows.
  • Confusing "Discovered, currently not crawled" with broken pages — usually the URL is fine but priority is low.

Verification

  • GSC: Crawl Stats → Total crawl requests trend should rise with content additions; flat or declining signals constraint.
  • Server logs: total Googlebot fetches per day vs total indexable URLs — sustained ratio below 0.5 means a backlog.
  • GSC: Page Indexing → "Discovered, currently not crawled" count — should stay below 5% of submitted URLs.
  • Average Googlebot response time in GSC Crawl Stats — keep under 500 ms for healthy crawl rate.

References

Last updated:

See also

Need this concept applied to your stack?

Glossary entries are intentionally short. Real engineering tradeoffs need a scoping call — bring the domain, the stack, and the question.