Re: Redundant sub query triggers slow nested loop left join

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 22 Apr 2007 13:53:26 -0400

"henk de wit" <henk53602@xxxxxxxxxxx> writes:
> In that case the prediction is 2 rows, which is only 1 row more than in the 
> previous case. Yet the plan is much better and performance improved 
> dramatically. Is there a reason/explanation for that?

Well, it's just an estimated-cost comparison.  If there's only one row
then a nestloop join looks like the best way since it requires no extra
overhead.  But as soon as you get to two rows, the other side of the
join would have to be executed twice, and that's more expensive than
doing it once and setting up a hash table.  In the actual event, with
359 rows out of the scan, the nestloop way is just horrid because it
repeats the other side 359 times :-(

It strikes me that it might be interesting to use a minimum rowcount
estimate of two rows, not one, for any case where we can't actually
prove there is at most one row (ie, the query conditions match a unique
index).  That is probably enough to discourage this sort of brittle
behavior ... though no doubt there'd still be cases where it's the
wrong thing.  We do not actually have any code right now to make such
proofs, but there's been some discussion recently about adding such
logic in support of removing useless outer joins.

>> FWIW, CVS HEAD does get rid of the duplicate conditions for the common
>> case of mergejoinable equality operators --- but it's not explicitly
>> looking for duplicate conditions, rather this is falling out of a new
>> method for making transitive equality deductions.

> This sounds very interesting Tom. Is there some documentation somewhere 
> where I can read about this new method?

Check the archives for mention of equivalence classes, notably these
two threads:
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00568.php
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00826.php

			regards, tom lane