Re: index problems (again)

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 07 Mar 2016 11:02:19 -0500

Geoff Winkless <pgsqladmin@xxxxxxxx> writes:
> On 7 March 2016 at 14:51, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> Because the other way is estimated to be cheaper.  The estimate is
>> wrong, because it's based on a statistical assumption that's wrong
>> (ie that sc_id and scdate are uncorrelated), but it's what we have
>> to work with at the moment.

> Are you saying that the planner can't tell without scanning the index
> how much of the index the range constraint will retrieve?

The question isn't "how much", the question is "where is that data
exactly?".  In English, what that plan is trying to do is scan the index
in sc_id order until it hits a row with scdate in the target range.
The first such row, by definition, has the correct min(sc_id) value.
The problem is that we're guessing at how soon we'll hit such a row.
If the columns are independent, then the planner can guess based on how
many rows in the whole table have scdate in the target range, and it
will probably be about right.  But that estimate can fall down very
badly if sc_id and scdate increase together, because then the target
rows aren't randomly distributed in the index sequence but could all be
all the way at the far end of the index.

> I accept that this is how the planner behaves, but I don't accept that
> it's optimal.

If we had cross-column correlation stats we could detect this pitfall,
but without that it's hard to do.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general