Google released a post on their blog on 16th January 2017 explaining what crawl budget means to Googlebot.
They emphasized that if new pages are crawled the same day they’re published, then crawl budget is not something webmasters need to focus on. Also if a site has fewer than a few thousand URL’s, it will usually be crawled efficiently. So crawl budget is not something most publishers need to worry about.
It’s more of a priority for bigger sites that prioritize what to crawl and how much resource the host server can handle, or for auto generated pages based on URL parameters etc.
Crawl rate limit
Crawling is Googlebot’s main priority, while ensuring it gives the user to a site a good experience. This is “crawl rate limit”, it limits the maximum fetching rate for a site.
The number of simultaneous parallel connections Googlebot uses to crawl the site and the time it has to wait between each fetch. The crawl rate can go up and down based on numerous factors.
- Crawl health; if the site responds quickly for a while, the limit goes up meaning more connections can be used to crawl. The limit goes down if site slows down or responds with server errors, the Googlebot crawls less.
- Webmasters can reduce crawling of their site in Search Console.
These two factors that play a significant role in crawl demand
- URL’s that are popular and are crawled more often on the internet are crawled more often to keep them fresher.
- Googlebot’s system attempts to prevent URL’s from becoming stale in the index.
Events like site moves might cause an increase in crawl demand in order to reindex content under a URL
Crawl rate and crawl demand together define crawl budget, the number of URL’s Google can and wants to crawl.
Factors affecting crawl budget
Having many low value-added URL’s will negatively affect a websites crawling and indexing. The low value-added URL’s fall into these categories in this order of significance;
- Faceted navigation and session identifiers
- On-site duplicate content
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
Pages like these will waste server resources and will drain crawl activity from pages that do have value. This will cause a significant delay in discovering good content on the website.