Re: Index optimization ?

Bo Lorentsen <bl@xxxxxxxxxxx> · Mon, 17 Jan 2005 16:46:31 +0100

Florian G. Pflug wrote:

Lets say, you have an query "select * from table where field = 
function()". Now, according to the sql-spec, you would have to

scan each row in "table", call the function "functio()", and compare the

result. If the result of the call to "function()" matches the value in 
"field", the you return the row, otherwise you don't return it.

This far I follow :-)

An index is a tree, where each node has two subnodes/children. Each node

representents a row of your table (or, rather, references a row - it 
contails only the value of the field you indexed). Additionally,

the value of the field of the "left" child (and the value of the field 
of its children, and so on) is always guaranteed to be 
smaller-or-equal to the value of field of the node itself, and the value

of the field of the "right" child is always guaranteed to be 
greater-or-equal to the value of the field of the node.

What you say, is based on the result of the evaluation, the executer 
will optimize or performe the index lookup, if the righthand is a 
constant, but it will perform a seq scan if the value is not known on 
the first query (volatile) ?

To me this sound like the indexes can't be performed per row, but only 
per query ? Or, PG is not type aware when maching indexes ?

Now, when doing an index lookup, you have to know which value to look 
for (lets say you look for 3). Then you look at the top node, and 
compare the value you are looking for to the value of the node. In our 
case, the node has a smaller value then the one we are looking for. 
Because we know that the left child of the toplevel node will have an 
even smaller value, we don't need to look at the left child at all. We 
just check the right child, and there we find our record with "field"=3.

But why can't we evaluate righthand and use this new row value (could 
still be 3, but not 'foobar' :-)) as a key to the index ?

_BUT_ this only works, because we knew for which value to look before we

started searching. If we the value we look for is constantly changing, 
our index lookup would return bogus results. So, if the value is unknown

_at_the_beginning_ of the search, you can't use the index, because the 
power of an index search comes from the idea to skip a whole subtree (in

our case the left one), because we _know_ it can't contain the value 
we are looking for.

But the "value" is unknow yes ... but the type of the value is not, this 
will not change from row to row only the value do this.

Functions marked "immutable" or "stable" is _guaranteed_ to never 
change their value during a select statement, or at least not in an 
unpredictable way. Thats why you can use return values of "immutable" 
or "stable" functions for an index search.

I understand that, I just can't see why an index lookup can't be used on 
"per row" basis.

Lets say, you are doing the following query

"select * from table where field1 = currval('seq')" and field2 = 
nextval('seq')

Now, the value of "currval('seq')" changes with every row visited. In 
your example, the value of "currval" is actually stable - but postgres 
has no way to know this. To use an index scan for your query, postgres

would need to know, that only a call to nextval can change the value 
of currval - but this, of course, is a quite difficult dependency for a

database to track.

But why can't the integer index for "field1" be used, with the "currval" 
result, even if it changes ? Are the index lookup only performed ones 
(asked this before ) per query ?

/BL

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

              http://archives.postgresql.org