Re: Bad Plan for Questionnaire-Type Query

Robert Haas <robertmhaas@xxxxxxxxx> · Fri, 5 Jun 2009 22:02:18 -0400

On Fri, Jun 5, 2009 at 8:29 PM, David Blewett<david@xxxxxxxxxxxxxxxx> wrote:
> On Fri, Jun 5, 2009 at 7:32 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
>> My first thought would be to increase statistics dramatically on the
>> filtered columns in hopes of making PG realize there's a lot of rows there;
>> it's off by 8x.  Correlations stats are an ongoing issue in PostgreSQL.
>
> I started at a stats_target of 250, then tried 500 and finally the
> plan that I pasted before resorting to disabling nestloops was at 1000
> (and re-analyzing in between of course). Will a CLUSTER or REINDEX
> help at all?

Probably not.  Your problem is similar to the one Anne Rosset was
complaining about on -performance a couple of days ago, though your
case is appears to be more complex.

http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php

It's really not clear what to do about this problem.  In Anne's case,
it would probably be enough to gather MCVs over the product space of
her folder_id and is_deleted columns, but I'm not certain that would
help you.   It almost seems like we need a way to say "for every
distinct value that appears in column X, you need to gather separate
statistics for the other columns of the table".  But that could make
statistics gathering and query planning very expensive.

Another angle of attack, which we've talked about before, is to teach
the executor that when a nestloop with a hash-joinable condition
executes too many times, it should hash the inner side on the next
pass and then switch to a hash join.

But none of this helps you very much right now...

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance