On Fri, Dec 5, 2014 at 12:46 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: > On 30 September 2014 at 05:53, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >> On 29 September 2014 16:00, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: >>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >>>> The problem, as I see it, is different. We assume that if there are >>>> 100 distinct values and you use LIMIT 1 that you would only need to >>>> scan 1% of rows. We assume that the data is arranged in the table in a >>>> very homogenous layout. When data is not, and it seldom is, we get >>>> problems. >>> >>> Hm, good point -- 'data proximity'. At least in theory, can't this be >>> measured and quantified? For example, given a number of distinct >>> values, you could estimate the % of pages read (or maybe non >>> sequential seeks relative to the number of pages) you'd need to read >>> all instances of a particular value in the average (or perhaps the >>> worst) case. One way of trying to calculate that would be to look at >>> proximity of values in sampled pages (and maybe a penalty assigned for >>> high update activity relative to table size). Data proximity would >>> then become a cost coefficient to the benefits of LIMIT. >> >> The necessary first step to this is to realise that we can't simply >> apply the LIMIT as a reduction in query cost, in all cases. >> >> The way I'm seeing it, you can't assume the LIMIT will apply to any >> IndexScan that doesn't have an index condition. If it has just a >> filter, or nothing at all, just an ordering then it could easily scan >> the whole index if the stats are wrong. >> >> So plans like this could be wrong, by assuming the scan will end >> earlier because of the LIMIT than it actually will. >> >> Limit >> IndexScan (no index cond) >> >> Limit >> NestJoin >> IndexScan (no index cond) >> SomeScan >> >> Limit >> NestJoin >> NestJoin >> IndexScan (no index cond) >> SomeScan >> SomeScan >> >> and deeper... >> >> I'm looking for a way to identify and exclude such plans, assuming >> that this captures at least some of the problem plans. > > After looking at this for some time I now have a patch that solves this. > > It relies on the observation that index scans with no bounded quals > don't play nicely with LIMIT. The solution relies upon the point that > LIMIT does not reduce the startup cost of plans, only the total cost. > So we can solve the problem by keeping the total cost estimate, just > move some of that into startup cost so LIMIT does not reduce costs as > much as before. > > It's a simple patch, but it solves the test cases I know about and > does almost nothing to planning time. > > I tried much less subtle approaches involving direct prevention of > LIMIT pushdown but the code was much too complex for my liking. Neat -- got any test cases (would this have prevented OP's problem)? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance