Re: visualizing B-tree index coverage

PFC <lists@xxxxxxxxxxxxxxxxxxxxx> · Thu, 27 Jan 2005 17:40:25 +0100

I realize that using OR will not result in an index scan.
I will never be interested in a OR condition for the kinds
of searches I use.  In my Select statements, I always name
every column of the multi-column index in same order that
they were named when creating the index.  I always use
the >= condition, and very rarely, the = condition.

	All the leftmost index column must be named, but the order is unimportant.
	You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner.

However, I am concerned that I must place

the most selective column first in my index.  I cannot tell,

a priori, which column will be most selective.  That depends on the

nature of search, which can vary widely each time.

Are you saying that if my first column is not selective, even though the  
remaining

columns are, the planner may choose not to use the index after

seeing that the first column is not very selective?

	I thought this was true but made some tests and the index scanner is  
smart.

	Try this :

CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER,  
r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS;

INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being  
floor(random()*10) for instance.

	Then you can try various selects. a,z,e,r,t,y are a linear distribution  
between 0 and 9 included, so :

a>=A AND z>=Z ... y>=Y gives a result set of about  
(10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index  
scan when needed. You can try the easiest case (a>=9) which just explores  
one part of the tree, and the worst case which explores a part of all  
leafs (y>=9). Both should yield about the same number of results, but the  
first should be faster. To know how much, just try ;)

That seems like an oversight, IMHO.  Shouldn't the overall effect of
using all the columns be considered before choosing not to use an
index scan?

	I think it is. There are no cross column correlation stats though.

Since I'm using every column of my multi-column index for every search,
and I always use >=, Explain Analyze always shows that every column
is considered in the index scan.  However, that is only when the
index scan is used.  Sometimes, Explain Analyze shows it is not used.
That appears to happen when my search condition is very general.
This it to be expected, so I am not worried.  Most of my searches will
be intermediate, namely not VERY selective, but also not VERY general.
So the idea of the multi-column index is to "characterize" each row
sufficiently, even when it is a perfectly ordinary row with no ONE
feature being distinctive, but rather several features together giving
it it's distinctive character.  That is my interpretation of the
multi-column index.

	If you have some features which are highly selective, you can create a  
single column index on them. It won't be used often, but when it will, it  
will really work.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
     message can get through to the mailing list cleanly