Oh my god. It's full of code!

Non-selective query against large object type

So recently one of our triggers started throwing this error.

caused by: System.QueryException: Non-selective query against large object type (more than 100000 rows). Consider an indexed filter or contact salesforce.com about custom indexing.
Even if a field is indexed a filter might still not be selective when:
1. The filter value includes null (for instance binding with a list that contains null) 2. Data skew exists whereby the number of matching rows is very large (for instance, filtering for a particular foreign key value that occurs many times)

Trigger.PaymentDuplicatePreventer: line 23, column 2

It was coming from a trigger that had been running flawlessly for many months so originally I was confused as to why this was happening. It then became obvious that the type of object I was querying (Payment__c) now had over 100,000 rows, and the query was just too damn big. It’s kind of odd becuase my actual query would normally only return a few rows, but the dataset it was reading from is too large, or some nonsense like that. After some reading I found the following ideas for fixed.

1) Mark the field as an external identifier, this would force indexing to be enabled for this field. I couldn’t do this though because it was a formula field.

2) Ask Salesforce to enable custom indexing on the field. I asked them, they said they couldn’t because it was a formula field.

3) Break the large query into smaller queries. No, just no. I’m not doing that. That’s dumb and would cause governor limit problems.

4) Add more where statements to your query. At first this seemed insane as well as my query was already pretty tight.

Below you can see the code for the trigger I’m taking about. It’s basically straight out of the Salesforce cookbook.

trigger PaymentDuplicatePreventer on Payments__c(before insert)
{
    //Create a map to hold all the payments we have to query against
    Map<String, Payments__c> payMap = new Map<String, Payments__c>();
    
    //Loop over all passed in payments
    for (Payments__c payment : System.Trigger.new)
    {
    
        // As long as this payment has a payment code and either it's an insert or it's doesn't conflict with another payment in this batch */
        if ((payment.UniquePaymentCode__c != null) && (System.Trigger.isInsert || (payment.UniquePaymentCode__c != System.Trigger.oldMap.get(payment.Id).UniquePaymentCode__c)))
        {
        
            // Make sure another new payment isn't also a duplicate. If it is, flag it, if not, add it
            if (payMap.containsKey(payment.UniquePaymentCode__c))
            {
                payment.UniquePaymentCode__c.addError('Another new payment has the same unique identifier.');
            }
            
            else
            {
                payMap.put(payment.UniquePaymentCode__c, payment);
            }
        }
    }
    
    /* Using a single database query, find all the payments in
    the database that have the same uniquepaymentcode as any
    of the payments being inserted or updated. */
    if(payMap.size() > 0)
    {
        for (Payments__c payment : [SELECT Id,UniquePaymentCode__c FROM Payments__c WHERE UniquePaymentCode__c IN :payMap.KeySet()])
        {
            try
            {
                Payments__c newPay = payMap.get(payment.UniquePaymentCode__c);
                if(newPay != null)
                {
                    newPay.UniquePaymentCode__c.addError('A payment for this person in this study already exists.');
                }
            }
            catch ( System.DmlException e)
            {
                payment.adderror('Payment' + payment.UniquePaymentCode__c + 'Error ' + e);
            }
        }    
    }
} 

The problem is the query
SELECT Id,UniquePaymentCode__c FROM Payments__c WHERE UniquePaymentCode__c IN :payMap.KeySet()

How could I refine that any further and still find all my duplicates? Well thankfully in this case, the only way duplicates can really happen is within a short time span. Someone getting multiple checks for the same event. So really I only need to look in the last few weeks to look for any duplicates. Anything older isn’t really a duplicate. So I changed my query to

SELECT Id,UniquePaymentCode__c FROM Payments__c WHERE UniquePaymentCode__c IN :payMap.KeySet() and CreatedDate = LAST_90_DAYS

And that did the trick. So the moral of the story, if you have a formula field that you are using as a unique key to prevent duplicates, eventually you are going to hit the problem described above. The only fix I have found is to try and have a condition that reduces the number of records the query has to look at. While it may not always be feasible to use the date, perhaps there are other ways you can narrow the result set while still finding any duplicates. Think outside the box a little, and you should be able to come up with a way to do what you need to do.

12 responses

  1. we’re up against the same wall with a not-restrictive-enough query. your post helps a lot!

    September 26, 2010 at 5:10 pm

  2. At first, I didn’t believe your post, but after thinking about it, it kind of make sense. =)

    Here’s the way I arrived at believing you: Since formula field values are calculated upon access, the database would need to recalculate the formula field for _all_ records to then be able to compare it with your set of values.
    With introducing your additional criteria it only needs to recalculate the formula field value for that smaller subset of records.

    January 28, 2011 at 9:54 am

  3. I was in a similar situation–I wanted to check for duplicates across two fields, so I created a UniqueID__c field as a formula that combined the values of the two fields. I don’t yet have more than 100,000 records in my object (by I anticipate these being created in the future).

    My solution was this: instead of making the UniqueID field a formula field, making it a text field that is hidden from all user profiles and indexed as an ExternalID. Then, to make it “act” like a formula field, I simply put a bit of code at the _beginning_ of my “on update” and “on insert” trigger for that object. In my example, I had a custom object Campaign_Account__c that had two lookup fields Campaign__c and Account__c that contained the IDs of the referenced Campaign and Account object, respectively. Here is the beginning of my trigger:

    trigger CampaignAccountDuplicatePreventer on Campaign_Account__c (before insert, before update) {

    for (Campaign_Account__c ca : Trigger.New) {
    String c = ca.Campaign__c;
    String a = ca.Account__c;
    ca.UniqueID__c = c + a;
    }

    //…typical Duplicate Preventer code from the force.com cookbook…

    }

    You could easily modify this code to do anything that a formula field could do, and construct your UniquePaymentCode__c field that way.

    April 19, 2011 at 2:50 pm

    • Not a bad way to go at all. Good thinking, thanks for the alternate solution. I actually ended up doing something like that on a recent project. Great minds think alike 😉

      April 19, 2011 at 3:02 pm

    • Peter

      Thanks for the Post ….I will try the same thing….

      April 22, 2011 at 6:38 pm

      • Cool, hope it works for you. Let me know if I can be of any help.

        April 22, 2011 at 7:56 pm

  4. Pingback: Non-selective query against large object type | MING_HOME

  5. davidludut

    Because the audit dates (include createdate, lastmodifieddate etc.) are index fields as default, so when you use it, the soql becomes selective soql then. If a soql is selective then will not follow that rule. I think that the reason why fix this issue.

    For more information, you guys can check the Apex Developer’s Guide (Working with Very Large SOQL Queries).

    Regards,
    Ming

    February 15, 2012 at 9:30 am

  6. Rishiraj

    The above post has worked for my project issues too. thanks for the valuable post.

    July 4, 2013 at 7:12 am

  7. Alex

    Thanks for referring to the “Database Query & Search Optimization Cheat Sheet,” Mitesh. FYI, there’s a new “salesforce.com” link to that cheat sheet here: http://help.salesforce.com/help/pdfs/en/salesforce_query_search_optimization_developer_cheatsheet.pdf

    October 31, 2013 at 11:33 pm

  8. Shamina

    Hello,

    I am having the same non-selective error message for a query executed from a trigger,
    even though there are filters on 2 external id fields and LastModifiedDate.

    The query with dynamic variables is as follows :
    select Id,
    Date__c,
    FromCurrency__c,
    FromCurrency__r.Name,
    Rate__c,
    ToCurrency__c,
    ToCurrency__r.Name
    from ExchangeRate__c
    where Date__c >=: minDate
    and Date__c <=: maxDate
    and FromCurrency__c IN:currencyIds
    and ToCurrency__c IN:currencyIds
    and TECH_FromCurrency__c IN :currencyIds
    and TECH_ToCurrency__c IN :currencyIds
    and FromCurrency__c null
    and ToCurrency__c null
    and TECH_FromCurrency__c null
    and TECH_ToCurrency__c null
    and Date__c null
    and IsDeleted = false
    order by Date__c desc

    Note that the external id fields are TECH_FromCurrency__c and TECH_ToCurrency__c.

    If I try to run the query with hardcoded binding values (see 2nd query below),
    from the trigger context: still get the non-selective error
    from an execute anonymous script: its runs successfully and returns 57 records.

    select Id,
    Date__c,
    FromCurrency__c,
    FromCurrency__r.Name,
    Rate__c,
    ToCurrency__c,
    ToCurrency__r.Name
    from ExchangeRate__c
    where
    FromCurrency__c IN (‘a0K11000000UVMpEAO’,’a0K11000000UVMXEA4′,’a0K11000000UVOOEA4′)
    and ToCurrency__c IN (‘a0K11000000UVMpEAO’,’a0K11000000UVMXEA4′,’a0K11000000UVOOEA4′)
    and TECH_FromCurrency__c IN (‘a0K11000000UVMpEAO’,’a0K11000000UVMXEA4′,’a0K11000000UVOOEA4′)
    and TECH_ToCurrency__c IN (‘a0K11000000UVMpEAO’,’a0K11000000UVMXEA4′,’a0K11000000UVOOEA4′)
    and LastModifiedDate >= 2013-01-16T00:00:00Z
    and FromCurrency__c null
    and ToCurrency__c null
    and TECH_FromCurrency__c null
    and TECH_ToCurrency__c null
    and Date__c null
    and IsDeleted = false
    and Date__c >= 2013-01-16
    and Date__c <= 2014-01-16
    order by Date__c desc

    The object ExchangeRate__c currently contains 157,840 records but I dont understand why the non-selective error is generated since number of records retrieved (57) is far less that 10% of total number of records.

    Can you help me resolve this issue please?

    Thanks
    Regards,
    Shamina

    January 24, 2014 at 12:16 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s