On Fri, Jun 5, 2009 at 8:29 PM, David Blewett<david@xxxxxxxxxxxxxxxx> wrote: > On Fri, Jun 5, 2009 at 7:32 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote: >> My first thought would be to increase statistics dramatically on the >> filtered columns in hopes of making PG realize there's a lot of rows there; >> it's off by 8x. Correlations stats are an ongoing issue in PostgreSQL. > > I started at a stats_target of 250, then tried 500 and finally the > plan that I pasted before resorting to disabling nestloops was at 1000 > (and re-analyzing in between of course). Will a CLUSTER or REINDEX > help at all? Probably not. Your problem is similar to the one Anne Rosset was complaining about on -performance a couple of days ago, though your case is appears to be more complex. http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php It's really not clear what to do about this problem. In Anne's case, it would probably be enough to gather MCVs over the product space of her folder_id and is_deleted columns, but I'm not certain that would help you. It almost seems like we need a way to say "for every distinct value that appears in column X, you need to gather separate statistics for the other columns of the table". But that could make statistics gathering and query planning very expensive. Another angle of attack, which we've talked about before, is to teach the executor that when a nestloop with a hash-joinable condition executes too many times, it should hash the inner side on the next pass and then switch to a hash join. But none of this helps you very much right now... ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance