• Blog
  • Tip 1 - Cleaning / validating nopCommerce data (customers)

Tip 1 - Cleaning / validating nopCommerce data (customers)

  • 4:13:35 AM
  • Friday, August 12, 2022

Remove nopCommerce fake customers

Using nopCommerce day after day means that your database will grow, and some data will be not relevant. It can be many reasons behind the scene for this, e.x.

  • one of our customer use product import, and he had to remove products from time to time
  • if you are like me, you will get a lot of customers registration because of the bots. Those bots crawl your site buy different reasons and some of them are able to go through reCaptcha.
    In this post, I'll walk you through steps of finding data that are not needed and removing them.

Part 1 - Determine how much space on disk each table is consuming, and which tables should be cleaned first

SELECT 
    t.NAME AS TableName,
    s.Name AS SchemaName,
    p.rows,
    SUM(a.total_pages) * 8 AS TotalSpaceKB, 
    CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS TotalSpaceMB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB, 
    CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB, 
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB,
    CAST(ROUND(((SUM(a.total_pages) - SUM(a.used_pages)) * 8) / 1024.00, 2) AS NUMERIC(36, 2)) AS UnusedSpaceMB
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN 
    sys.schemas s ON t.schema_id = s.schema_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
GROUP BY 
    t.Name, s.Name, p.Rows
ORDER BY 
    TotalSpaceMB DESC, t.Name

Results for my DB:

As you can see the biggist tables are Customer, GenericAttribute and QueuedEmail

How do you fix it?

Part 2 - Removing fake customers:

Ideally we should validate all customers emails with 3d party webservice but I have 1Mln recods and most of those customers spent less then one minute.
So let's remove customers that didn't create an order, post and so on. I don't need customers that did zero activity on my site.

DELETE c FROM Customer c
 LEFT JOIN ActivityLog al ON al.CustomerId=c.Id
 LEFT JOIN [dbo].[ShoppingCartItem] sci ON sci.CustomerId=c.Id
 LEFT JOIN [dbo].[Forums_Post] fp ON fp.CustomerId=c.Id
 LEFT JOIN Forums_Topic ft ON fp.CustomerId=c.Id
 LEFT JOIN [Order] o ON o.CustomerId=c.Id
 LEFT JOIN [dbo].[DP_Entity] e1 ON e1.CustomerId=c.Id
 LEFT JOIN [dbo].[DP_Entity] e2 ON e2.CustomerIdOfLastChange=c.Id
WHERE DATEDIFF(MINUTE, c.CreatedOnUtc, LastActivityDateUtc)<1 AND c.Id>3 
 AND al.Id IS NULL
 AND sci.Id IS NULL
 AND fp.Id IS NULL
 AND ft.Id IS NULL
 AND o.Id IS NULL
 AND e1.Id IS NULL
 AND e2.Id IS NULL

Part 3 - Removing fake customers atts:

DELETE ga
  FROM [dbo].[GenericAttribute] ga
	LEFT JOIN [dbo].[Customer] c on ga.EntityId=c.Id
  WHERE KeyGroup='Customer' AND c.Id IS NULL

Part 4 - Removing QueuedEmail and Log:

TRUNCATE TABLE [dbo].[QueuedEmail]
TRUNCATE TABLE [dbo].[Log]

I hope this helps someone out there and if you want to stay in touch then I can be found on Twitter or GitHub .