On Wed, Mar 16, 2011 at 1:38 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > That isn't going to dissuade the planner from using that index for this > query. It would result in the scan being a forward indexscan instead of > backwards. Now it'd be worth trying that, to see if you and Kevin are > right that it's the backwards aspect that's hurting. I'm not convinced > though. I suspect the issue is that the planner is expecting the target > records (the ones selected by the filter condition) to be approximately > equally distributed in the month ordering, but really there is a > correlation which causes them to be much much further back in the index > than it expects. So a lot more of the index has to be scanned than it's > expecting. This has got to be one of the five most frequently reported planner problems, and it's nearly always with a *backward* index scan. So I agree with Kevin that we probably ought to have a Todo to make backward index scans look more expensive than forward index scans, maybe related in some way to the correlation estimates for the relevant columns. But I don't really think that's the root of the problem. When confronted with this type of query, you can either filter-then-sort, or index-scan-in-desired-order-then-filter. I think the heart of the problem is that we're able to estimate the cost of the first plan much more accurately than the cost of the second one. In many cases, the filter is done using a sequential scan, which is easy to cost, and even if it's done using a bitmap index scan the cost of that is also pretty simple to estimate, as long as our selectivity estimate is somewhere in the ballpark. The cost of the sort depends primarily on how many rows we need to sort, and if the qual is something like an equality condition, as in this case, then we'll know that pretty accurately as well. So we're good. On the other hand, when we use an index scan to get the rows in order, and then apply the filter condition to them, the cost of the index scan is heavily dependent on how far we have to scan through the index, and that depends on the distribution of values in the qual column relative to the distribution of values in the index column. We have no data that allow us to estimate that, so we are basically shooting in the dark. This is a multi-column statistics problem, but I think it's actually harder than what we usually mean by multi-column statistics, where we only need to estimate selectivity. A system that can perfectly estimate the selectivity of state = $1 and zipcode = $2 might still be unable to tell us much about how many zipcodes we'd have to read in ascending order to find a given number in some particular state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance