Re: Yet another abort-early plan disaster on 9.3

Josh Berkus <josh@xxxxxxxxxxxx> · Mon, 29 Sep 2014 14:54:31 -0700

On 09/26/2014 01:06 AM, Simon Riggs wrote:
> On 23 September 2014 00:56, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
> 
>> We've hashed that out a bit, but frankly I think it's much more
>> profitable to pursue fixing the actual problem than providing a
>> workaround like "risk", such as:
>>
>> a) fixing n_distinct estimation
>> b) estimating stacked quals using better math (i.e. not assuming total
>> randomness)
>> c) developing some kind of correlation stats
>>
>> Otherwise we would be just providing users with another knob there's no
>> rational way to set.
> 
> I believe this is a serious issue for PostgreSQL users and one that
> needs to be addressed.
> 
> n_distinct can be fixed manually, so that is less of an issue.

It's an issue for the 99.8% of our users who don't know what n_distinct
is, let alone how to calculate it.  Also, changing it requires an
exclusive lock on the table. Of course, you and I have been over this
issue before.

One thing I'm wondering is why our estimator is creates n_distinct as a
% so seldom.  Really, any time n_distinct is over 10K we should be
estimating a % instead.  Now, estimating that % has its own issues, but
it does seem like a peculiar quirk of our stats model.

Anyway, in the particular case I posted fixing n_distinct to realistic
numbers (%) fixed the query plan.

> 
> The problem, as I see it, is different. We assume that if there are
> 100 distinct values and you use LIMIT 1 that you would only need to
> scan 1% of rows. We assume that the data is arranged in the table in a
> very homogenous layout. When data is not, and it seldom is, we get
> problems.
> 
> Simply put, assuming that LIMIT will reduce the size of all scans is
> just way wrong. I've seen many plans where increasing the LIMIT
> dramatically improves the plan.
> 
> If we can at least agree it is a problem, we can try to move forwards.

That is certainly another problem.  Does correlation stat figure in the
LIMIT calculation at all, currently?  That's what correlation stat is
for, no?

Also, to be fair, physical correlation of rows can also lead to
abort-early plans being extra fast, if everything we want is towards the
beginning of the index.  Which means we'd need working multi-column
correlation, which is a known hard problem.

For example, consider the query:

SELECT id, updated_on FROM audit_log
WHERE updated_on < '2010-01-01'
ORDER BY id LIMIT 10;

In an append-only table, that query is liable to be very fast with an
abort-early plan scanning on an ID index (AEP from now on), since the
oldest rows are likely to correspond with the smallest IDs.  But then
the user does this:

SELECT id, updated_on FROM audit_log
WHERE updated_on < '2010-01-01'
ORDER BY id DESC LIMIT 10;

... and a completely different plan is called for, because using an AEP
will result in reverse scanning most of the index.  However, I'm not
sure the planner knows the difference, since it's only comparing the
estimated selectivity of (updated_on < '2010-01-01') and seeing that
it's 20% of rows.  I bet you'd get an AEP in the 2nd case too.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance