"Peter J. Holzer" <hjp-pgsql@xxxxxx> writes: > we'll consider just three columns, which we unimaginatively call a, b, > and c. There are also three indexes: > t_a_idx btree (a) WHERE a IS NOT NULL > t_b_idx btree (b) WHERE b IS NOT NULL > t_a_b_idx btree (a, b) WHERE a IS NOT NULL AND b IS NOT NULL > Nowe I have a query > select c from t where a='A' and b='B'; > This uses t_b_idx, not - as I expected - t_a_b_idx. > The distribution of values in columns a and b is quite different: a has > 10 different values of similar frequency (and no null values). b has > only a single non-null value which with a frequency of about 1 %. > So I definitely understand why it would prefer t_b_idx to t_a_idx, but > certainly t_a_b_idx should be even better? Not necessarily --- t_a_b_idx is (presumably) physically bigger than t_b_idx, which makes it more expensive to search. The additional selectivity gain apparently doesn't outweigh that. > If I create an index with the columns swapped: > t_b_a_idx btree (b, a) WHERE b IS NOT NULL and a IS NOT NULL > this index will be used. Hmm. Probably that has something to do with a calculation about the selectivity of the leading index column, ie do you have to scan 10% of the index or 1% of the index. It's not taking the partial-index filter into account in that, I suspect, which skews the results in this case --- but that would be hard to account for accurately. Anyway I can't get excited about optimizing for a single non-null value. regards, tom lane