I realize that using OR will not result in an index scan. I will never be interested in a OR condition for the kinds of searches I use. In my Select statements, I always name every column of the multi-column index in same order that they were named when creating the index. I always use the >= condition, and very rarely, the = condition.
All the leftmost index column must be named, but the order is unimportant. You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner.
However, I am concerned that I must place
the most selective column first in my index. I cannot tell,
a priori, which column will be most selective. That depends on the
nature of search, which can vary widely each time.
Are you saying that if my first column is not selective, even though the remaining
columns are, the planner may choose not to use the index after
seeing that the first column is not very selective?
I thought this was true but made some tests and the index scanner is smart.
Try this :
CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER, r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS;
INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being floor(random()*10) for instance.
Then you can try various selects. a,z,e,r,t,y are a linear distribution between 0 and 9 included, so :
a>=A AND z>=Z ... y>=Y gives a result set of about (10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index scan when needed. You can try the easiest case (a>=9) which just explores one part of the tree, and the worst case which explores a part of all leafs (y>=9). Both should yield about the same number of results, but the first should be faster. To know how much, just try ;)
That seems like an oversight, IMHO. Shouldn't the overall effect of using all the columns be considered before choosing not to use an index scan?
I think it is. There are no cross column correlation stats though.
Since I'm using every column of my multi-column index for every search, and I always use >=, Explain Analyze always shows that every column is considered in the index scan. However, that is only when the index scan is used. Sometimes, Explain Analyze shows it is not used. That appears to happen when my search condition is very general. This it to be expected, so I am not worried. Most of my searches will be intermediate, namely not VERY selective, but also not VERY general. So the idea of the multi-column index is to "characterize" each row sufficiently, even when it is a perfectly ordinary row with no ONE feature being distinctive, but rather several features together giving it it's distinctive character. That is my interpretation of the multi-column index.
If you have some features which are highly selective, you can create a single column index on them. It won't be used often, but when it will, it will really work.
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly