Re: Different plan for very similar queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 05/29/15 11:51, Peter J. Holzer wrote:
A couple of additional observations:

The total cost of both queries is quite similar, so random variations
might push into one direction or the other. Indeed, after dropping
and recreating indexes (I tried GIN indexes as suggested by Heikki on
[1]) and calling analyze after each change, I have now reached a
state where both queries use the fast plan.

I don't think bitmap indexes are particularly good match for this use case. The queries need to check an existence of a few records, and btree indexes are great for that - the first plan is very fast.

Why exactly does the second query use a much slower plan I'm not sure. I believe I've found an issue in planning semi joins (reported to pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

Can you try forcing the same plan for the second query, using "enable" flags? E.g.

   SET enable_mergejoin = off;

will disable the merge join, and push the optimizer towards a different join type. You may have to disable a few more node types until you get the same plan as for the first query, i.e.

   nestloop semi join
     -> index scan
     -> index scan

See this for more info:

   http://www.postgresql.org/docs/9.1/static/runtime-config-query.html

Also, have you tuned the PostgreSQL configuration? How?

Can you provide the dataset? Not necessarily all the columns, it should be sufficient to provide the columns used in the join/where clauses:

    term -> facttablename, columnname, term
    facttable_stat_fta4 -> einheit, berechnungsart

That'd make reproducing the problem much easier.

In the first case the query planner seems to add the cost of the two
index scans to get the total cost, despite the fact that for a semi
join the second index scan can be aborted after the first hit (so
either the cost of the second scan should be a lot less than
384457.80 or it needs to be divided by a large factor for the semi
join).

In the second case the cost of the second index scan (2545748.85) is
either completely ignored or divided by a large factor: It doesn't
seem to contribute much to the total cost.

I believe this is a consequence of the semi join semantics, because the explain plan contains "total" costs and row counts, as if the whole relation was scanned (in this case all the 43M rows), but the optimizer only propagates fraction of the cost estimate (depending on how much of the relation it expects to scan). In this case it expects to scan a tiny part of the index scan, so the impact on the total cost is small.

A bit confusing, yeah.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux