On 23 September 2014 00:56, Josh Berkus <josh@xxxxxxxxxxxx> wrote: > We've hashed that out a bit, but frankly I think it's much more > profitable to pursue fixing the actual problem than providing a > workaround like "risk", such as: > > a) fixing n_distinct estimation > b) estimating stacked quals using better math (i.e. not assuming total > randomness) > c) developing some kind of correlation stats > > Otherwise we would be just providing users with another knob there's no > rational way to set. I believe this is a serious issue for PostgreSQL users and one that needs to be addressed. n_distinct can be fixed manually, so that is less of an issue. The problem, as I see it, is different. We assume that if there are 100 distinct values and you use LIMIT 1 that you would only need to scan 1% of rows. We assume that the data is arranged in the table in a very homogenous layout. When data is not, and it seldom is, we get problems. Simply put, assuming that LIMIT will reduce the size of all scans is just way wrong. I've seen many plans where increasing the LIMIT dramatically improves the plan. If we can at least agree it is a problem, we can try to move forwards. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance