Hi,
On 05/29/15 11:51, Peter J. Holzer wrote:
A couple of additional observations:
The total cost of both queries is quite similar, so random variations
might push into one direction or the other. Indeed, after dropping
and recreating indexes (I tried GIN indexes as suggested by Heikki on
[1]) and calling analyze after each change, I have now reached a
state where both queries use the fast plan.
I don't think bitmap indexes are particularly good match for this use
case. The queries need to check an existence of a few records, and btree
indexes are great for that - the first plan is very fast.
Why exactly does the second query use a much slower plan I'm not sure. I
believe I've found an issue in planning semi joins (reported to
pgsql-hackers a few minutes ago), but may be wrong and the code is OK.
Can you try forcing the same plan for the second query, using "enable"
flags? E.g.
SET enable_mergejoin = off;
will disable the merge join, and push the optimizer towards a different
join type. You may have to disable a few more node types until you get
the same plan as for the first query, i.e.
nestloop semi join
-> index scan
-> index scan
See this for more info:
http://www.postgresql.org/docs/9.1/static/runtime-config-query.html
Also, have you tuned the PostgreSQL configuration? How?
Can you provide the dataset? Not necessarily all the columns, it should
be sufficient to provide the columns used in the join/where clauses:
term -> facttablename, columnname, term
facttable_stat_fta4 -> einheit, berechnungsart
That'd make reproducing the problem much easier.
In the first case the query planner seems to add the cost of the two
index scans to get the total cost, despite the fact that for a semi
join the second index scan can be aborted after the first hit (so
either the cost of the second scan should be a lot less than
384457.80 or it needs to be divided by a large factor for the semi
join).
In the second case the cost of the second index scan (2545748.85) is
either completely ignored or divided by a large factor: It doesn't
seem to contribute much to the total cost.
I believe this is a consequence of the semi join semantics, because the
explain plan contains "total" costs and row counts, as if the whole
relation was scanned (in this case all the 43M rows), but the optimizer
only propagates fraction of the cost estimate (depending on how much of
the relation it expects to scan). In this case it expects to scan a tiny
part of the index scan, so the impact on the total cost is small.
A bit confusing, yeah.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance