Geoff Winkless <pgsqladmin@xxxxxxxx> writes: > On 7 March 2016 at 14:51, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Because the other way is estimated to be cheaper. The estimate is >> wrong, because it's based on a statistical assumption that's wrong >> (ie that sc_id and scdate are uncorrelated), but it's what we have >> to work with at the moment. > Are you saying that the planner can't tell without scanning the index > how much of the index the range constraint will retrieve? The question isn't "how much", the question is "where is that data exactly?". In English, what that plan is trying to do is scan the index in sc_id order until it hits a row with scdate in the target range. The first such row, by definition, has the correct min(sc_id) value. The problem is that we're guessing at how soon we'll hit such a row. If the columns are independent, then the planner can guess based on how many rows in the whole table have scdate in the target range, and it will probably be about right. But that estimate can fall down very badly if sc_id and scdate increase together, because then the target rows aren't randomly distributed in the index sequence but could all be all the way at the far end of the index. > I accept that this is how the planner behaves, but I don't accept that > it's optimal. If we had cross-column correlation stats we could detect this pitfall, but without that it's hard to do. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general