Re: Index optimization ?

"Florian G. Pflug" <fgp@xxxxxxxx> · Wed, 19 Jan 2005 02:15:42 +0100

Bo Lorentsen wrote:
Greg Stark wrote:
If Postgres used an index it would call odd(), which would return 1 
because

it's the first time, and then Postgres would go look up the rows where 
col is

1 and return all of them. That's a very different behaviour from if 
the index

isn't used. If all the records have col=1 then you're getting all of the

records instead of half of them. If col=0 then you're getting none of 
them

instead of half of them.
This is the differance between "volatile" and "stable" functions, but 
not the answer to why an index lookuo are per query and not per row, is 
it not ?

Because the _whole_ _point_ of an index is to find matching rows 
_without_ scanning the whole table. IF you have to look at every row

anyway, then just might as well to an sequential scan.

The performance boost an index gives you comes from that very fact that 
you can avoid looking most of the rows. You only have to traverse the 
index tree from the root to the node(s) you are looking for. If you have 
an unique index, this means you have to traverse (roughly) log2(n) 
nodes, if your table has n rows. (If your table has 2^32, or about 4 
billion rows, you just need to do 32(!) comparisons when walking the 
index tree, but whould need to do at worst 2^32 if you sequentially 
scanned the table.

_But_ the "downside" is that you skipped rows. You just didn't look at 
them _ever_. You don't even _know_ how many rows you skipped (Thats why 
count(*) on a hughe table is slow). So you _can't_ use an index for a 
where-condition that has to inspect every row. But, since the SQL-spec 
requires you to call volatile functions _for_every_row_, you _can't_ use

an index in your case.

You can, howevery, accelerate something like "where f in (1,2,3,4)". You 
just scan the index 4 times, each time for a different value. Of course,

if the number of values becomes larger and larger, there is a point 
where it's more efficient to do a sequential scan _once_, instead of a 
few tousand index scans (depends on the number of rows in the table).

The postgres optimizer tries to estimate this, and will switch to an 
seq-scan, if it would have to do too many index lookups.

Your example (the one with currval()) would translate to a IN-clause 
with about as many entries in the IN-list is you have rows in the table 
(Because the function has to be called _for_ _every_ _row_). Clearly,

it's not feasable to use an index scan here - it would be slower than a 
seq scan.

greetings, florian pflug
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature