Introduction
Recently, I came across a performance issue in my project in which a Batch class was running for a minimum of 24 hours to process 10 million records.
Existing Flow
We have a batch class that will query the records from an object based on certain conditions and then process those records based on the business requirement.
In our execute method, we are sending the scope to handler and querying child records based on several conditions.
Sample Query:
SELECT field1, field2, field3 FROM Sobject WHERE cond1 AND (cond2 OR cond3) AND (cond4 or Cond 5 OR cond6) AND (cond7 OR cond 8) AND (cond9 OR cond 10) AND (cond 11 OR cond 12)
The above query alone was taking minimum of 3 seconds for a single record and the overall time for the single record was taking about 4 seconds. So, the batch was running for more than 24 hours to process 10 million records.
Solution
Each record is taking more time because they were going under multiple checks while running the batch. Instead of checking the conditions while the batch is running, we created a formula field directly in the child object based on the above conditions. The field will return true or false based on the conditions.
Now, we are using that formula field in our queries after removing all the conditions.
Updated Query
SELECT field1, field2, field3 FROM Sobject WHERE formulaField = True
After the change, each record is taking less than a second to complete.
Conclusion:
From the above solution, whenever the record is created or updated, the formula field will have the calculated value. So, on running the batch, we can refer the formula field value rather executing multiple conditions during the batch execution. This approach will reduce the process time and increase the performance.