Re: Incorrect estimates on columns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 17 October 2007 20:23, Tom Lane wrote:
>  Chris Kratz <chris.kratz@xxxxxxxxxxxxxx> writes:
> > On Wednesday 17 October 2007 14:49, Tom Lane wrote:
> >> Evidently it's not realizing that every row of par will have a join
> >> partner, but why not?  I suppose a.activityid is unique, and in most
> >> cases that I've seen the code seems to get that case right.
> >>
> >> Would you show us the pg_stats rows for par.activity and a.activityid?
> >
> > Here are the pg_stats rows for par.activity and a.activityid.
>
> Hmm, nothing out of the ordinary there.
>
> I poked at this a bit and realized that what seems to be happening is
> that the a.programid = 171 condition is reducing the selectivity
> estimate --- that is, it knows that that will filter out X percent of
> the activity rows, and it assumes that *the size of the join result will
> be reduced by that same percentage*, since join partners would then be
> missing for some of the par rows.  The fact that the join result doesn't
> actually decrease in size at all suggests that there's some hidden
> correlation between the programid condition and the condition on
> par.provider_lfm.  Is that true?  Maybe you could eliminate one of the
> two conditions from the query?
>
> Since PG doesn't have any cross-table (or even cross-column) statistics
> it's not currently possible for the optimizer to deal very well with
> hidden correlations like this ...
>
> 			regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Yes, you are correct.  Programid is a "guard" condition to make sure a user 
cannot look at rows outside of their program.  In this particular case the 
par table only has rows for this agency (at one point in time, all rows were 
in one table), so I was able to remove  the check on programid on "a". This 
causes my example query to run in 200ms.   That's wonderful.

So, to recap.  We had a filter on the join clause which really didn't in this 
case affect the selectivity of the join table.  But the optimizer assumed 
that the selectivity would be affected causing it to think the join would 
generate only a few rows.  Since it thought that there would be relatively 
few rows returned, it used a nestloop instead of another type of join that 
would have been faster with larger data sets.

Thanks for all your help.

-Chris 

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux