Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> writes: > [ assorted examples showing that commit > 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] Thanks for the test cases. After playing with these for a bit I believe I've figured out the error in my previous thinking. Clamping the ndistinct value like that can improve matters when applied to the inside relation of a semi or anti join, but in all other cases it's just wrong. If you think about what is happening in eqjoinsel_inner with the patch, we are reducing the ndistinct estimate for the join key column proportionally to the selectivity of whatever baserel restrictions apply. This then results in proportionally increasing the selectivity number for the join condition --- in other words, we're more or less cancelling out the effects of one or the other relation's base restrictions. So that's pretty broken in general. The reason it is important for semi/antijoin inner relations is that this is actually the only way that restrictions applied to the inner rel get to impact the join size estimate at all, since set_joinrel_size_estimates is not going to factor the inner rel size into what it multiplies the join selectivity against. In short, I was mistakenly extrapolating from the observation that it helped to hack the ndistinct estimate for a semijoin's inner rel, to the conclusion that we should do that for all join input rels. So, not only are you correct that we should revert the changes to eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. It should only be clamping the ndistinct value for the inner side. And I think it needs to be taking that into account for the case where it does have MCVs as well as the case where it doesn't. So I'll go back to this with hopefully a clearer picture of what's happening. Thanks again for the test cases. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance