I agree with you. Not completely, but I do.
I'm just stuck in a situation where I can't figure out what values to use for the parameters. I can't even think of a way on how to test and discover that.
I followed Josh Berkus' GUC spreadsheet and some tutorials on PG wiki, but how do I test if my configuration is good or bad? I see in PG log that some queries have bad plans, but should I do in order to tell PG to make better decisions? I tried different values with no success.
I understand that parameters have no "work everywhere" values. Each database has its characteristics and each server has its HW specifications.
Is there any automated test tool? A can compile a list of real-world queries, and provide an exact copy of my db server just for testing. But how do I do it? Write a bunch of scripts? Is there any serious tool that try different parameters, run a load test, process results and generate reports?
Again, thanks all of you for the replies.
Cheers,
Fabrício dos Anjos Silva
LinkCom Soluções em T.I.
2010/10/1 Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx>
Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> wrote:I agree with that, but I think there's an even more insidious issue
> Because the query often only wants a small subset of the data, and
> whole relations are rarely fully cached, it's not enough to know
> that "some of relation X is cached", it has to know if the cached
> parts are the parts that'll be required, or at least an
> approximation of that. It sounds horrendously complicated to keep
> track of to me, and in the end it won't make query execution any
> faster, it'll just potentially help the planner pick a better
> plan. I wonder if that'd be worth the extra CPU time spent
> managing the cache and cache content stats, and using those cache
> stats when planning? It'd be an interesting experiment, but the
> outcome is hardly obvious.
here. Biasing plans heavily toward using what is already in cache
could have a destabilizing effect on performance. Let's say that
some query or maintenance skews the cache toward some plan which is
much slower when cached than another plan would be if cached. Let's
also postulate that this query runs very frequently. It will always
settle for what's fastest *this* time, not what would make for
fastest performance if consistently used. If it never chooses the
plan which would run better if cached, the data used for that plan
may never make it into cache, and you will limp along with the
inferior plan forever.
If you set the overall level of caching you expect, the optimizer
will tend to wind up with data cached to support the optimal plans
for that level of caching for the frequently run queries.
-Kevin