Re: Overriding the optimizer

Kevin Brown <kevin@xxxxxxxxxxxxxx> · Thu, 15 Dec 2005 21:08:33 -0800

Craig A. James wrote:
> 
> 
> Christopher Kings-Lynne wrote:
> >>>>  select * from my_table     where row_num >= 50000 and row_num < 
> >>>>100000
> >>>>   and myfunc(foo, bar);
> >>>
> >>>
> >>>You just create an index on myfunc(foo, bar)
> >>
> >>
> >>only if myfunc(foo, bar) is immutable...
> >
> >
> >And if it's not then the best any database can do is to index scan 
> >row_num - so still you have no problem.
> 
> Boy, you picked a *really* bad example ;-)
> 
> The problem is that Postgres decided to filter on myfunc() *first*, and 
> then filter on row_num, resulting in a query time that jumped from seconds 
> to hours.  And there's no way for me to tell Postgres not to do that!

Apologies in advance if all of this has been said, or if any of it is
wrong.

What kind of plan do you get if you eliminate the myfunc(foo, bar)
from the query entirely?  An index scan or a full table scan?  If the
latter then (assuming that the statistics are accurate) the reason you
want inclusion of myfunc() to change the plan must be the expense of
the function, not the expense of the scan (index versus sequential).
While the expense of the function isn't, as far as I know, known or
used by the planner, that obviously needn't be the case.

On the other hand, if the inclusion of the function call changes the
plan that is selected from an index scan to a sequential scan, then
that, I think, is clearly a bug, since even a zero-cost function
cannot make the sequential scan more efficient than an index scan
which is already more efficient than the base sequential scan.

> So, "you still have no problem" is exactly wrong, because Postgres picked 
> the wrong plan.  Postgres decided that applying myfunc() to 10,000,000 
> rows was a better plan than an index scan of 50,000 row_nums.  So I'm 
> screwed.

If PostgreSQL is indeed applying myfunc() to 10,000,000 rows, then
that is a bug if the function is declared VOLATILE (which is the
default if no volatility is specified), because it implies that it's
applying the function to rows that don't match the selection
condition.  From your prior description, it sounds like your function
is declared STABLE.

For your specific situation, my opinion is that the proper
modification to PostgreSQL would be to give it (if it isn't already
there) the ability to include the cost of functions in the plan.  The
cost needn't be something that it automatically measures -- it could
be specified at function creation time.

-- 
Kevin Brown					      kevin@xxxxxxxxxxxxxx