Re: Duplicate WHERE condition changes performance and plan

David Rowley <dgrowleyml@xxxxxxxxx> · Thu, 16 Apr 2020 11:57:49 +1200

On Thu, 16 Apr 2020 at 07:56, singh400@xxxxxxxxx <singh400@xxxxxxxxx> wrote:
> We have an odd issue where specifying the same where clause twice causes PG to pick a much more efficent plan. We would like to know why.

> The EXPLAIN ANALYZE for both queries can be found here:-
> Query A: https://explain.depesz.com/s/lFuy
> Query B: https://explain.depesz.com/s/Jqmv

This is basically down to just a poor join selectivity estimation.
The selectivity estimation on the duplicate not equal clause is not
removed by the planner and the selectivity of that is taking into
account twice which reduces the selectivity of the table named
"problem". With that selectivity taken into account, the query planner
thinks a nested loop will be a more optimal plan, to which it seems to
be.

Join selectivity estimations can use the most common values lists as
you may see if you look at the pg_stats view for the tables and
columns involved in the join condition.  Perhaps ID columns are not
good candidates to get an MCV list in the stats.  In that case, the
ndistinct estimate will be used.  If there's no MCV list in the stats
then check ndistinct is reasonably accurate. If there is an MCV list,
then you can make that bigger by increasing the statistics targets on
the join columns and running ANALYZE. Note: Planning can become slower
when you increase the statistics targets.

Starting with PostgreSQL 9.6, foreign keys are also used to help with
join selectivity estimations. I see you have a suitable foreign key
from the schema you posted.  You might want to add that to the list of
reasons to upgrade.

David