Re: Yet another abort-early plan disaster on 9.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30 September 2014 18:28, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

>> Anyway, in the particular case I posted fixing n_distinct to realistic
>> numbers (%) fixed the query plan.
>
>
> But wouldn't fixing the absolute number also have fixed the plan?

There are two causes of this issue.

1. Poor estimates of n_distinct. Fixable by user.

2. Poor assumption of homogeneous distribution. No way for user to
fix. Insufficient stats detail to be able to solve in current planner.

I see (2) as the main source of issues, since as we observe, (1) is fixable.

An example is a social media application where the business query is
"Display the last 10 posts". If the user is a frequent, recent user
then the query could come back very quickly, so a reverse scan on
post_id would work great. If the user hasn't logged on for ages, then
that plan needs to scan lots and lots of data to get to find 10 posts.
That gives the problem that only certain users experience poor
performance - even the data isn't consistent in its distribution, so
stats wouldn't help much, even if we could capture the profile of the
"typical user".

>> > The problem, as I see it, is different. We assume that if there are
>> > 100 distinct values and you use LIMIT 1 that you would only need to
>> > scan 1% of rows. We assume that the data is arranged in the table in a
>> > very homogenous layout. When data is not, and it seldom is, we get
>> > problems.
>> >
>> > Simply put, assuming that LIMIT will reduce the size of all scans is
>> > just way wrong. I've seen many plans where increasing the LIMIT
>> > dramatically improves the plan.
>> >
>> > If we can at least agree it is a problem, we can try to move forwards.
>
>
> I don't think anyone doubts there is a problem (many more than one of them),
> there is just disagreement about the priority and what can be done about it.


>> That is certainly another problem.  Does correlation stat figure in the
>> LIMIT calculation at all, currently?  That's what correlation stat is
>> for, no?
>
>
> I don't think correlation is up to the task as a complete solution, although
> it might help a little.  There is no way a simple correlation can encode
> that John retired 15 years ago and hasn't logged on since, while Johannes
> was hired yesterday and never logged on before then.

Ah, OK, essentially the same example.

Which is why I ruled out correlation stats based approaches and
suggested a risk-weighted cost approach.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux