Re: Yet another abort-early plan disaster on 9.3

Ryan Johnson <ryan.johnson@xxxxxxxxxxxxxx> · Thu, 02 Oct 2014 07:59:00 -0600

On 29/09/2014 9:00 AM, Merlin Moncure wrote:
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:
The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very homogenous layout. When data is not, and it seldom is, we get
problems.
Hm, good point -- 'data proximity'.  At least in theory, can't this be
measured and quantified?  For example, given a number of distinct
values, you could estimate the % of pages read (or maybe non
sequential seeks relative to the number of pages) you'd need to read
all instances of a particular value in the average (or perhaps the
worst) case.   One way of trying to calculate that would be to look at
proximity of values in sampled pages (and maybe a penalty assigned for
high update activity relative to table size).  Data proximity would
then become a cost coefficient to the benefits of LIMIT.
Latecomer to the conversation here, but it seems like this issue (unlike 
some) is really easy to recognize at runtime. The optimizer assumed the 
scan would access  O(1) pages; if the scan has not returned enough 
results after k pages, that would be a really good indication that it's 
time to rethink the plan, and probably before too much work has been 
done higher in the plan (esp. if there's any kind of buffering between 
operators, perhaps intentionally so in special cases like this)

Not sure pgsql has any dynamic reoptimization infrastructure in place, 
tho. If not, these sorts of dangerous plans are best left alone IMO.

Ryan

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance