Re: Different plan for very similar queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes:
> Why exactly does the second query use a much slower plan I'm not sure. I 
> believe I've found an issue in planning semi joins (reported to 
> pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

I think you are probably right that there's a bug there: the planner is
vastly overestimating the cost of the nestloop-with-inner-indexscan
plan.  However, the reason why the mergejoin plan gets chosen in some
cases seems to be that an additional estimation error is needed to make
that happen; otherwise the nestloop still comes out looking cheaper.
The undesirable case looks like:

>>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual time=7703.809..7703.938 rows=3 loops=1)
>>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text))
>>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85 rows=43577940 width=2) (actual time=0.089..16263.582 rows=21336180 loops=1)
>>  Total runtime: 30948.648 ms

Notice that it's estimating the cost of the join as significantly less
than the cost of the inner-side indexscan.  This means it believes that
the inner indexscan will not be run to completion.  That's not because of
semijoin semantics; there's no stop-after-first-match benefit for mergejoins.
It must be that it thinks the range of keys on the outer side of the join
is much less than the range of keys on the inner.  Given that it knows
that facttable_stat_fta4.berechnungsart only contains the values "m"
and "n", this implies that it thinks term.term only contains "m" and
not "n".  So this estimation error presumably comes from "n" not having
been seen in ANALYZE's last sample of term.term, and raising the stats
target for term.term would probably be a way to fix that.

However, this would all be moot if the cost estimate for the nestloop
plan were nearer to reality.  Since you started a separate -hackers
thread for that issue, let's go discuss that there.

			regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux