Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 14 Apr 2007 14:21:26 -0400

Alvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> writes:
> Tom Lane wrote:
>> Has anyone got any thoughts about the best way to do this?

> How about doing both: sort the index by index scan cost; then pick the
> first index on the list and start adding indexes when they lower the
> cost.  When adding each index, consider it by itself against the
> already stacked indexes.  If the cost is lower, put this index at the
> top of the list, and restart the algorithm (after the sorting step of
> course).

The "restart" part of that bothers me --- it's not entirely clear that
you couldn't get into an infinite loop.  (Imagine that A+B is better
than A alone, so we adopt it, but worse than C alone, so we take C as
the new leader and start over.  Then perhaps C+B is better than C alone
but worse than A alone, so we take A as the leader and start over.
Maybe this is impossible but I'm unsure.)

I looked back at the gdb results I'd gotten from Steve's example and
noticed that for his 38 indexes there were only three distinct index
selectivity values, and the sort step grouped indexes by cost within
those groups.  In hindsight of course this is obvious: the selectivity
depends on the set of WHERE-clauses used, so with two WHERE-clauses
there are three possible selectivities (to within roundoff error anyway)
depending on whether the index uses one or both clauses.  So the
existing algorithm gets one thing right: for any two indexes that make
use of just the same set of WHERE-clauses, it will always take the one
with cheaper scan cost.

Thinking more about this leads me to the following proposal:

1. Explicitly group the indexes according to the subset of
WHERE-conditions (and partial index conditions, if any) they use.
Within each such group, discard all but the cheapest-scan-cost one.

2. Sort the remaining indexes according to scan cost.

3. For each index in order, consider it as a standalone scan, and also
consider adding it on to the AND-group led by each preceding index,
using the same logic as now: reject using any WHERE-condition twice
in a group, and then add on only if the total cost of the AND-group
scan is reduced.

This would be approximately O(N^2) in the number of relevant indexes,
whereas the current scheme is roughly linear (handwaving a bit here
because the number of WHERE-clauses is a factor too).  But that seems
unlikely to be a problem for sane numbers of indexes, unlike the O(2^N)
behavior of an exhaustive search.  It would get rid of (most of) the
order-sensitivity problem, since we would definitely consider the
AND-combination of every pair of combinable indexes.  I can imagine
cases where it wouldn't notice useful three-or-more-way combinations
because the preceding two-way combination didn't win, but they seem
pretty remote possibilities.

Comments?

			regards, tom lane