Optimizing Analytics Queries: From Entity Framework to Stored Procedures

Sometimes we are tasked with writing some analytics query and the first step is to use the ORM and the job is DONE.

We test it out locally and it’s INSTANT, of course it is... we have 5 rows in the table.

We go to staging and everything is FINE.

Production is also FINE.

Some time passes and the users have filled up the database with their data.

Now we are starting to see some spikes...

And the day comes where our job fails and management is upset because the report no longer arrives.

Now that we know the pain, let's see how we optimized our monthly/weekly report query.

Our EF analytics query

Simply: This is a query that sums / counts it the period of last week or last month

            
DateTime currentUtcTime = DateTime.UtcNow;
DateTime oneWeekAgo = currentUtcTime.AddDays(-7);
DateTime oneMonthAgo = currentUtcTime.AddDays(-30);

return UnitOfWork.Entities.All()
.Select(e => new EntityStatisticsDto
{
    EntityName = e.Name,
    Attribute1LastWeek = e.RelatedEntities2.Where(re2 => DbFunctions.TruncateTime(re2.CreatedOn).Value >= DbFunctions.TruncateTime(oneWeekAgo).Value)
                                          .Select(x => x.Attribute1)
                                          .DefaultIfEmpty(0)
                                          .Sum() / CustomExtensions.OneUnitBytes,
    Attribute1LastMonth = e.RelatedEntities2.Where(re2 => DbFunctions.TruncateTime(re2.CreatedOn).Value >= DbFunctions.TruncateTime(oneMonthAgo).Value)
                                            .Select(x => x.Attribute1)
                                            .DefaultIfEmpty(0)
                                            .Sum() / CustomExtensions.OneUnitBytes,
    OptimizedAttributeLastWeek = e.RelatedEntities2.Where(re2 => DbFunctions.TruncateTime(re2.CreatedOn).Value >= DbFunctions.TruncateTime(oneWeekAgo).Value)
                                                  .Select(x => x.EntityType == Models.RelatedEntity2.EntityType.Type1 && x.Type1 != null && x.Type1.OptimizedAttribute != 0 ? x.Type1.OptimizedAttribute : x.Attribute1)
                                                  .DefaultIfEmpty(0)
                                                  .Sum() / CustomExtensions.OneUnitBytes,
    OptimizedAttributeLastMonth = e.RelatedEntities2.Where(re2 => DbFunctions.TruncateTime(re2.CreatedOn).Value >= DbFunctions.TruncateTime(oneMonthAgo).Value)
                                                    .Select(x => x.EntityType == Models.RelatedEntity2.EntityType.Type1 && x.Type1 != null && x.Type1.OptimizedAttribute != 0 ? x.Type1.OptimizedAttribute : x.Attribute1)
                                                    .DefaultIfEmpty(0)
                                                    .Sum() / CustomExtensions.OneUnitBytes
}).AsEnumerable();

          
        

The main problem we faced with this query is that entity framework read too many times from the tables, instead of joining

This was found by simply executing the SQL this code gave us and showing the Actual execution plan in SSMS

          
CREATE PROCEDURE [dbo].[sp_GetEntityStatistics] (@unitSize INTEGER)
AS
DECLARE @oneWeekAgo as DATETIME = GETUTCDATE() - 7
DECLARE @oneMonthAgo as DATETIME = GETUTCDATE() - 30
  
SELECT 
    [databaseEntity].[Name] as EntityName,
    SUM(CAST(CASE WHEN databaseRelatedEntity2.CreatedOn >= @oneWeekAgo then databaseRelatedEntity2.Attribute1 ELSE 0 END as DECIMAL(36,2))) / @unitSize as Attribute1LastWeek,
    SUM(CAST(CASE WHEN databaseRelatedEntity2.CreatedOn >= @oneMonthAgo then databaseRelatedEntity2.Attribute1 ELSE 0 END as DECIMAL(36,2))) / @unitSize as Attribute1LastMonth,
    SUM(CAST(CASE WHEN databaseRelatedEntity2.CreatedOn >= @oneWeekAgo then CASE WHEN databaseRelatedEntity2.EntityType = 1 AND databaseType1.ID IS NOT NULL AND databaseType1.OptimizedAttribute <> 0 THEN databaseType1.OptimizedAttribute ELSE databaseRelatedEntity2.Attribute1 END ELSE 0 END as DECIMAL(36,2))) / @unitSize as OptimizedAttributeLastWeek,
    SUM(CAST(CASE WHEN databaseRelatedEntity2.CreatedOn >= @oneMonthAgo then CASE WHEN databaseRelatedEntity2.EntityType = 1 AND databaseType1.ID IS NOT NULL AND databaseType1.OptimizedAttribute <> 0 THEN databaseType1.OptimizedAttribute ELSE databaseRelatedEntity2.Attribute1 END ELSE 0 END as DECIMAL(36,2))) / @unitSize as OptimizedAttributeLastMonth,
FROM dbo.[Entity] as databaseEntity
INNER JOIN dbo.[RelatedEntity2] as databaseRelatedEntity2 on databaseEntity.ID = databaseRelatedEntity2.EntityFk 
LEFT OUTER JOIN dbo.[Type1] as databaseType1 on databaseRelatedEntity2.ID = databaseType1.ID
WHERE databaseEntity.DeletedOn IS NULL
GROUP BY databaseEntity.Name, databaseEntity.ID
          
      

By having this Stored Procedure we reduced the execution from around 5 minutes and sometimes failing to around 20-30seconds

Summary
  • Initial query with Entity Framework was slow and inefficient as data volume grew.
  • Identified performance issues using the actual execution plan in SSMS.
  • Rewrote the query as a stored procedure for optimized performance.
  • Execution time reduced from around 5 minutes to 20-30 seconds.

Message me if you have complex queries that aren't performing well so we can review them together