Search Postgresql Archives

Re: visualizing B-tree index coverage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I realize that using OR will not result in an index scan.
I will never be interested in a OR condition for the kinds
of searches I use.  In my Select statements, I always name
every column of the multi-column index in same order that
they were named when creating the index.  I always use
the >= condition, and very rarely, the = condition.

All the leftmost index column must be named, but the order is unimportant. You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner.

However, I am concerned that I must place
the most selective column first in my index. I cannot tell,
a priori, which column will be most selective. That depends on the
nature of search, which can vary widely each time.
Are you saying that if my first column is not selective, even though the remaining
columns are, the planner may choose not to use the index after
seeing that the first column is not very selective?

I thought this was true but made some tests and the index scanner is smart.


Try this :
CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER, r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS;
INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being floor(random()*10) for instance.


Then you can try various selects. a,z,e,r,t,y are a linear distribution between 0 and 9 included, so :
a>=A AND z>=Z ... y>=Y gives a result set of about (10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index scan when needed. You can try the easiest case (a>=9) which just explores one part of the tree, and the worst case which explores a part of all leafs (y>=9). Both should yield about the same number of results, but the first should be faster. To know how much, just try ;)


That seems like an oversight, IMHO.  Shouldn't the overall effect of
using all the columns be considered before choosing not to use an
index scan?

I think it is. There are no cross column correlation stats though.

Since I'm using every column of my multi-column index for every search,
and I always use >=, Explain Analyze always shows that every column
is considered in the index scan.  However, that is only when the
index scan is used.  Sometimes, Explain Analyze shows it is not used.
That appears to happen when my search condition is very general.
This it to be expected, so I am not worried.  Most of my searches will
be intermediate, namely not VERY selective, but also not VERY general.
So the idea of the multi-column index is to "characterize" each row
sufficiently, even when it is a perfectly ordinary row with no ONE
feature being distinctive, but rather several features together giving
it it's distinctive character.  That is my interpretation of the
multi-column index.

If you have some features which are highly selective, you can create a single column index on them. It won't be used often, but when it will, it will really work.





---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux