In an update to Googlebot’s help document, Google quietly announced that it would crawl the first 15MB of a webpage. Anything after this limit will not be included in ranking calculations.
Google specifies in the help document:
This left some in the SEO community wonder if that meant that Googlebot would completely ignore text that falls below images at break in HTML files.
“It’s specific to the HTML file itself, as written,” John Mueller, Google Search Advocate, clarified via Twitter. “Resources/embedded content extracted with IMG tags are not part of the HTML file.”
What this means for SEO
To ensure it is weighted by Googlebot, important content should now be included at the top of web pages. This means that the code should be structured to place the SEO-relevant information with the first 15MB in a supported HTML or text file.
This also means that images and videos should be compressed and not encoded directly into HTML code, whenever possible.
SEO best practices currently recommend keeping HTML pages to 100KB or less, so many sites will not be affected by this change. Page size can be checked with a variety of tools, including Google Page Speed Insights.
In theory, it might sound worrying that you could potentially have content on a page that isn’t being used for indexing. In practice, however, 15MB is a considerable amount of HTML.
As Google indicates, assets such as images and videos are fetched separately. From Google’s wording, it appears that this 15MB threshold only applies to HTML.
It would be difficult to break this limit with HTML unless you were publishing the text of entire books on a single page.
If you have pages that exceed 15MB of HTML, chances are you have underlying issues that need to be fixed anyway.
Source: Google Search Center
Feature image: SNEHIT PHOTO/Shutterstock