Re: Overriding the optimizer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 15 Dec 2005, Craig A. James wrote:

Mark Kirkwood wrote:
I hear what you are saying, but to use this fine example - I don't know what the best plan is - these experiments part of an investigation to find *if* there is a better plan, and if so, why Postgres is not finding it.

There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a developer can possibly know.

That is often true - but the aim is to get Postgres's optimizer closer to developer smartness.

What would be cool would be some way the developer could alter the plan, but they way of doing so would strongly encourage the developer to send the information to this mailing list. Postgres would essentially say, "Ok, you can do that, but we want to know why!"

at the risk of sounding flippent (which is NOT what I intend) I will point out that with the source you can change the optimizer any way you need to :-)

that being said, in your example the issue is the cost of the user created function and the fact that postgres doesn't know it's cost.

would a resonable answer be to give postgres a way to learn how expensive the call is?

a couple ways I could see to do this.

1. store some stats automagicly when the function is called and update the optimization plan when you do an ANALYSE

2. provide a way for a user to explicitly set a cost factor for a function (with a default value that's sane for fairly trivial functions so that it would only have to be set for unuseually expensive functions)

now, neither of these will work all the time if a given function is sometimes cheap and sometimes expensive (depending on it's parameters), but in that case I would say that if the application knows that a function will be unusueally expensive under some conditions (and knows what those conditions will be) it may be a reasonable answer to duplicate the function, one copy that it uses most of the time, and a second copy that it uses when it expects it to be expensive. at this point the cost of the function can be set via either of the methods listed above)

After years of using several other database products (some supporting hint type constructs and some not), I have come to believe that hinting (or similar) actually *hinders* the development of a great optimizer.

I agree. It takes the pressure off the optimizer gurus. If the users can just work around every problem, then the optimizer can suck and the system is still usable.

Lest anyone think I'm an all-out advocate of overriding the optimizer, I know from first-hand experience what a catastrophe it can be. An Oracle hint I used worked fine on my test schema, but the customer's "table" turned out to be a view, and Oracle's optimizer worked well on the view whereas my hint was horrible. Unfortunately, without the hint, Oracle sucked when working on an ordinary table. Hints are dangerous, and I consider them a last resort.

I've been on the linux-kernel mailing list for the last 9 years, and have seen a similar debate rage during that entire time about kernel memory management. overall both of these tend to be conflicts between short-term and long-term benifits.

in the short-term the application user wants to be able to override the system to get the best performance _now_

in the long run the system designers don't trust the application programmers to get the hints right and want to figure out the right optimizer plan, even if it takes a lot longer to do so.

the key to this balance seems to be to work towards as few controls as possible, becouse the user will get them wrong far more frequently then they get them right, but when you hit a point where there's absolutly no way for the system to figure things out (and it's a drastic difference) provide the application with a way to hint to the system that things are unusueal, but always keep looking for patterns that will let the system detect the need itself

even the existing defaults are wrong as frequently as they are right (they were set when hardware was very different then it is today) so some way to gather real-world stats and set the system defaults based on actual hardware performance is really the right way to go (even for things like sequential scan speed that are set in the config file today)

David Lang


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux