Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 14 Apr 2007 13:19:33 -0400

Alvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> writes:
> I think the concern about condition redundancy should be attacked
> separately.  How about just comparing whether they have common prefixes
> of conditions?  I admit I don't understand what would happen with
> indexes defined like (lower(A), B, C) versus (A, B) for example.

I understand that issue a bit better than I did when I wrote the comment
(so I suppose I better rewrite it).  The $64 reason for rejecting
AND-combinations of indexes that are using some of the same
WHERE-conditions is that if we don't, we effectively double-count the
selectivity of those conditions, causing us to prefer useless
AND-combinations.  An example is "WHERE A > 42 AND B < 100" where we
have an index on A and one on (A,B).  The selectivity calculation
will blindly assume that the selectivities of the two indexes are
independent and thus prefer to BitmapAnd them, when obviously there
is no point in using both.  Ideally we should improve the selectivity
calculation to not get fooled like this, but that seems hard and
probably slow.  So for the moment we have the heuristic that no
WHERE-clause should be used twice in any AND-combination.

Given that we are using that heuristic, it becomes important that
we visit the indexes in the "right order" --- as the code stands,
in the (A) vs (A,B) case it will consider only the first index it
arrives at in the selectivity sort order, because the second will
be rejected on the basis of re-use of the WHERE A > 42 condition.
Sorting by selectivity tends to ensure that we pick the index that
can make use of as many WHERE-clauses as possible.

The idea of considering each index alone fixes the order dependency
for cases where a single index is the best answer, but it doesn't
help much for cases where you really do want a BitmapAnd, only not
one using the index with the individually best selectivity.

We really need a heuristic here --- exhaustive search will be
impractical in cases where there are many indexes, because you'd
be looking at 2^N combinations.  (In Steve's example there are
actually 38 relevant indexes, which is bad database design if
you ask me, but in any case we cannot afford to search through
2^38 possibilities.)  But the one we're using now is too fragile.

Maybe we should use a cutoff similar to the GEQO one: do exhaustive
search if there are less than N relevant indexes, for some N.
But that's not going to help Steve; we still need a smarter heuristic
for what to look for above the cutoff.

			regards, tom lane