On Thu, 15 Dec 2005, Craig A. James wrote:
Mark Kirkwood wrote:
I hear what you are saying, but to use this fine example - I don't know
what the best plan is - these experiments part of an investigation to find
*if* there is a better plan, and if so, why Postgres is not finding it.
There isn't a database in the world that is as smart as a developer, or
that can have insight into things that only a developer can possibly know.
That is often true - but the aim is to get Postgres's optimizer closer to
developer smartness.
What would be cool would be some way the developer could alter the plan, but
they way of doing so would strongly encourage the developer to send the
information to this mailing list. Postgres would essentially say, "Ok, you
can do that, but we want to know why!"
at the risk of sounding flippent (which is NOT what I intend) I will point
out that with the source you can change the optimizer any way you need to
:-)
that being said, in your example the issue is the cost of the user created
function and the fact that postgres doesn't know it's cost.
would a resonable answer be to give postgres a way to learn how expensive
the call is?
a couple ways I could see to do this.
1. store some stats automagicly when the function is called and update the
optimization plan when you do an ANALYSE
2. provide a way for a user to explicitly set a cost factor for a function
(with a default value that's sane for fairly trivial functions so that it
would only have to be set for unuseually expensive functions)
now, neither of these will work all the time if a given function is
sometimes cheap and sometimes expensive (depending on it's parameters),
but in that case I would say that if the application knows that a function
will be unusueally expensive under some conditions (and knows what those
conditions will be) it may be a reasonable answer to duplicate the
function, one copy that it uses most of the time, and a second copy that
it uses when it expects it to be expensive. at this point the cost of the
function can be set via either of the methods listed above)
After years of using several other database products (some supporting hint
type constructs and some not), I have come to believe that hinting (or
similar) actually *hinders* the development of a great optimizer.
I agree. It takes the pressure off the optimizer gurus. If the users can
just work around every problem, then the optimizer can suck and the system is
still usable.
Lest anyone think I'm an all-out advocate of overriding the optimizer, I know
from first-hand experience what a catastrophe it can be. An Oracle hint I
used worked fine on my test schema, but the customer's "table" turned out to
be a view, and Oracle's optimizer worked well on the view whereas my hint was
horrible. Unfortunately, without the hint, Oracle sucked when working on an
ordinary table. Hints are dangerous, and I consider them a last resort.
I've been on the linux-kernel mailing list for the last 9 years, and have
seen a similar debate rage during that entire time about kernel memory
management. overall both of these tend to be conflicts between short-term
and long-term benifits.
in the short-term the application user wants to be able to override the
system to get the best performance _now_
in the long run the system designers don't trust the application
programmers to get the hints right and want to figure out the right
optimizer plan, even if it takes a lot longer to do so.
the key to this balance seems to be to work towards as few controls as
possible, becouse the user will get them wrong far more frequently then
they get them right, but when you hit a point where there's absolutly no
way for the system to figure things out (and it's a drastic difference)
provide the application with a way to hint to the system that things are
unusueal, but always keep looking for patterns that will let the system
detect the need itself
even the existing defaults are wrong as frequently as they are right (they
were set when hardware was very different then it is today) so some way to
gather real-world stats and set the system defaults based on actual
hardware performance is really the right way to go (even for things like
sequential scan speed that are set in the config file today)
David Lang