• Blog
  • nopCommerce SEO sitemap 2020 + How to avoid 100% CPU usage

nopCommerce SEO sitemap 2020 + How to avoid 100% CPU usage

  • 8:18:02 AM
  • Wednesday, July 1, 2020

Poblem

One of our customer recently became aware that his site had been hacked some of their product pages didn’t response.
They have taken steps to rectify the problem as below:

  1. Completely rebuilt the site
  2. Try to use Redis and IIS Farm to improve performance

It helped to solve an issue with product page opening but their CPU usage (at live website) has been at 100% causing site to load slowly. Analyzing this issue we discovered that it is google bots that are crawling the site extremely hard causing the issue, (we have tried to block them via firewall and when we do so the CPU usage drops and sites become responsive again), We obviously want the site to rank again, so would like google to crawl the site in more intelligent way.

Overview

A website has almost ~100 millions pages, ~25 mln. products and 4 languages. It's one of the biggest nopCommerce website and it uses Solr Search Plugin to support this amount of products.

Solution

We decided to help google to understand the last modification date for each product URL and generate sitemap dynamically once per day. We used Solr search plugin so we can easily generate product sitemap dynamically. Our extension also had to updates sitemaps every day and update them in the sitemap index file.

Also our APP has to follow next limits to create well-structure sitemap index and sitemaps:

Limits

  • 50 000 per sitemap index
  • 500 sitemap indexes
  • 50 000 per sitemap
  • 50 MB per sitemap

Google Search Console provides SEO specialist with information about how many URLs in each sitemap are indexed. It helps with monitoring of indexation.

Sitemap index

Here is the template of our sitemap index with two files:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>https://www.yourstore.com/sitemap1.xml</loc>
      <lastmod>2020-01-05T12:00:00-02:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>https://www.yourstore.com/sitemap2.xml</loc>
      <lastmod>2020-01-04T12:00:00-02:00</lastmod>
   </sitemap>
</sitemapindex>

Alternate languages support

Our site has 4 languages so we link them by using “hreflang” attribute. It help to the search engine gets to know about our product pages language easily to show in the search results page of specific country or region.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>http://www.yourstore.com/</loc>
    <lastmod>2017-10-20T17:30:00-02:00</lastmod>
    <xhtml:link
       rel="alternate" hreflang="en-us"
       href="http://www.yourstore.com/en/product1"
    />
    <xhtml:link
       rel="alternate" hreflang="de"
       href="http://www.yourstore.com/de/product1"
    />
    <xhtml:link
       rel="alternate" hreflang="ru"
       href="http://www.yourstore.com/ru/product1"
    />
  </url>
</urlset>

Also our APP follows next Sitemap rules:

Additional Sitemap Rules

  • UTF-8 encoding
  • Entity escaping
CharacterEscapeCode
Ampersand & &amp;
Double " &quot;
Single ' &apos;
Less than < &lt;
Greater than > &gt;

Robots.txt.

The last step of the sitemap implementation was to notify every search engine about our changes. We added the following string to robots.txt
Sitemap: https://yourstore.com/sitemap-index.xml

Results

  • CPU usage drops till 16-30%
  • Our marketers called to us and ask what did we do with site and why our PageRank significantly increased))).