On Thu, 16 Apr 2020 at 07:56, singh400@xxxxxxxxx <singh400@xxxxxxxxx> wrote: > We have an odd issue where specifying the same where clause twice causes PG to pick a much more efficent plan. We would like to know why. > The EXPLAIN ANALYZE for both queries can be found here:- > Query A: https://explain.depesz.com/s/lFuy > Query B: https://explain.depesz.com/s/Jqmv This is basically down to just a poor join selectivity estimation. The selectivity estimation on the duplicate not equal clause is not removed by the planner and the selectivity of that is taking into account twice which reduces the selectivity of the table named "problem". With that selectivity taken into account, the query planner thinks a nested loop will be a more optimal plan, to which it seems to be. Join selectivity estimations can use the most common values lists as you may see if you look at the pg_stats view for the tables and columns involved in the join condition. Perhaps ID columns are not good candidates to get an MCV list in the stats. In that case, the ndistinct estimate will be used. If there's no MCV list in the stats then check ndistinct is reasonably accurate. If there is an MCV list, then you can make that bigger by increasing the statistics targets on the join columns and running ANALYZE. Note: Planning can become slower when you increase the statistics targets. Starting with PostgreSQL 9.6, foreign keys are also used to help with join selectivity estimations. I see you have a suitable foreign key from the schema you posted. You might want to add that to the list of reasons to upgrade. David