Re: incorrect row estimates for primary key join

Ben <midfield@xxxxxxxxx> · Tue, 25 Jun 2013 11:27:38 -0700

hello --

thanks kevin for the tuning advice, i will answer your questions below and try different tuning configurations and report back.  but first allow me take a step back and ask a couple simple questions :

it seems to me that an equality join between two relations (call them A and B) using columns in relation B with a unique constraint should yield row estimates which are at most equal to the row estimates for relation A.  my questions are

1 - is this correct?

2 - does the postgresql planner implement this when generating row estimates?

it seems like if the answers to 1 and 2 are yes, then the row estimates for my join should always come back less or equal to the estimates for jointable, regardless of what the query plan is.  indeed this is what i find experimentally for smaller examples.  what is perplexing to me is why this is not true for this large table.  (the fact that the table size is greater than 2^31 is probably a red herring but hasn't escaped my attention.)  while i do have a performance issue (i'd like for it to select the index scan) which might be solved by better configuration, that at the moment is a secondary question -- right now i'm interested in why the row estimates are off.

moving on to your remarks :

On Jun 25, 2013, at 6:20 AM, Kevin Grittner wrote:

> Ben <midfield@xxxxxxxxx> wrote:
> 
>> PostgreSQL 9.1.1 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.6.2, 64-bit
> 
> Consider applying the latest bug fixes for 9.1 -- which would leave
> you showing 9.1.9.

i will bring it up with our ops people.  do you have any particular fixes in mind, or is this a (very sensible) blanket suggestion?

>> default_statistics_target    | 50                | configuration file
> 
> Why did you change this from the default of 100?

sorry, i do not know.  it is possible this was copied from the configuration of a different server, which is serving some very very large tables with gist indexes, where the statistics do not help the selectivity estimations much if at all (as far as i can tell gist indexes often use hard-coded selectivity estimates as opposed to using the statistics.)  in that case it is an oversight and i will correct it.  but i believe the statistics for the tables in question are close enough, and certainly do not explain the off row estimates in the query plan.

>> effective_cache_size        | 5632MB            | configuration file
> 
> How much RAM is on this machine?  What else is running on it? 
> (Normally people set this to 50% to 75% of total RAM.  Lower values
> discourage index usage in queries like your example.)

24GB.  i can up it to 12 or 16GB and report back.

> Do you get a different plan if you set cpu_tuple_cost = 0.03?  How
> about 0.05?  You can set this just for a single connection and run
> explain on the query to do a quick check.

setting cpu_tuple_cost to 0.03 or 0.05 has no effect on the choice of plan or the row estimates for the un-limited query or the limited query.

best regards, ben

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance