Re: Yet another abort-early plan disaster on 9.3

Simon Riggs <simon@xxxxxxxxxxxxxxx> · Fri, 26 Sep 2014 09:06:07 +0100

On 23 September 2014 00:56, Josh Berkus <josh@xxxxxxxxxxxx> wrote:

> We've hashed that out a bit, but frankly I think it's much more
> profitable to pursue fixing the actual problem than providing a
> workaround like "risk", such as:
>
> a) fixing n_distinct estimation
> b) estimating stacked quals using better math (i.e. not assuming total
> randomness)
> c) developing some kind of correlation stats
>
> Otherwise we would be just providing users with another knob there's no
> rational way to set.

I believe this is a serious issue for PostgreSQL users and one that
needs to be addressed.

n_distinct can be fixed manually, so that is less of an issue.

The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very homogenous layout. When data is not, and it seldom is, we get
problems.

Simply put, assuming that LIMIT will reduce the size of all scans is
just way wrong. I've seen many plans where increasing the LIMIT
dramatically improves the plan.

If we can at least agree it is a problem, we can try to move forwards.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance