Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes: > Why exactly does the second query use a much slower plan I'm not sure. I > believe I've found an issue in planning semi joins (reported to > pgsql-hackers a few minutes ago), but may be wrong and the code is OK. I think you are probably right that there's a bug there: the planner is vastly overestimating the cost of the nestloop-with-inner-indexscan plan. However, the reason why the mergejoin plan gets chosen in some cases seems to be that an additional estimation error is needed to make that happen; otherwise the nestloop still comes out looking cheaper. The undesirable case looks like: >> Merge Semi Join (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1) >> Merge Cond: ((t.term)::text = (f.berechnungsart)::text) >> -> Index Scan using term_term_idx on term t (cost=0.00..319880.73 rows=636 width=81) (actual time=7703.809..7703.938 rows=3 loops=1) >> Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) >> -> Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.00..2545748.85 rows=43577940 width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) >> Total runtime: 30948.648 ms Notice that it's estimating the cost of the join as significantly less than the cost of the inner-side indexscan. This means it believes that the inner indexscan will not be run to completion. That's not because of semijoin semantics; there's no stop-after-first-match benefit for mergejoins. It must be that it thinks the range of keys on the outer side of the join is much less than the range of keys on the inner. Given that it knows that facttable_stat_fta4.berechnungsart only contains the values "m" and "n", this implies that it thinks term.term only contains "m" and not "n". So this estimation error presumably comes from "n" not having been seen in ANALYZE's last sample of term.term, and raising the stats target for term.term would probably be a way to fix that. However, this would all be moot if the cost estimate for the nestloop plan were nearer to reality. Since you started a separate -hackers thread for that issue, let's go discuss that there. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance