On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: > The problem, as I see it, is different. We assume that if there are > 100 distinct values and you use LIMIT 1 that you would only need to > scan 1% of rows. We assume that the data is arranged in the table in a > very homogenous layout. When data is not, and it seldom is, we get > problems. Hm, good point -- 'data proximity'. At least in theory, can't this be measured and quantified? For example, given a number of distinct values, you could estimate the % of pages read (or maybe non sequential seeks relative to the number of pages) you'd need to read all instances of a particular value in the average (or perhaps the worst) case. One way of trying to calculate that would be to look at proximity of values in sampled pages (and maybe a penalty assigned for high update activity relative to table size). Data proximity would then become a cost coefficient to the benefits of LIMIT. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance