Alvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> writes: > Tom Lane wrote: >> Has anyone got any thoughts about the best way to do this? > How about doing both: sort the index by index scan cost; then pick the > first index on the list and start adding indexes when they lower the > cost. When adding each index, consider it by itself against the > already stacked indexes. If the cost is lower, put this index at the > top of the list, and restart the algorithm (after the sorting step of > course). The "restart" part of that bothers me --- it's not entirely clear that you couldn't get into an infinite loop. (Imagine that A+B is better than A alone, so we adopt it, but worse than C alone, so we take C as the new leader and start over. Then perhaps C+B is better than C alone but worse than A alone, so we take A as the leader and start over. Maybe this is impossible but I'm unsure.) I looked back at the gdb results I'd gotten from Steve's example and noticed that for his 38 indexes there were only three distinct index selectivity values, and the sort step grouped indexes by cost within those groups. In hindsight of course this is obvious: the selectivity depends on the set of WHERE-clauses used, so with two WHERE-clauses there are three possible selectivities (to within roundoff error anyway) depending on whether the index uses one or both clauses. So the existing algorithm gets one thing right: for any two indexes that make use of just the same set of WHERE-clauses, it will always take the one with cheaper scan cost. Thinking more about this leads me to the following proposal: 1. Explicitly group the indexes according to the subset of WHERE-conditions (and partial index conditions, if any) they use. Within each such group, discard all but the cheapest-scan-cost one. 2. Sort the remaining indexes according to scan cost. 3. For each index in order, consider it as a standalone scan, and also consider adding it on to the AND-group led by each preceding index, using the same logic as now: reject using any WHERE-condition twice in a group, and then add on only if the total cost of the AND-group scan is reduced. This would be approximately O(N^2) in the number of relevant indexes, whereas the current scheme is roughly linear (handwaving a bit here because the number of WHERE-clauses is a factor too). But that seems unlikely to be a problem for sane numbers of indexes, unlike the O(2^N) behavior of an exhaustive search. It would get rid of (most of) the order-sensitivity problem, since we would definitely consider the AND-combination of every pair of combinable indexes. I can imagine cases where it wouldn't notice useful three-or-more-way combinations because the preceding two-way combination didn't win, but they seem pretty remote possibilities. Comments? regards, tom lane