Dne 14.4.2011 00:05, Nathan Boley napsal(a): >>> If you model the costing to reflect the reality on your server, good >>> plans will be chosen. >> >> Wouldn't it be "better" to derive those costs from actual performance >> data measured at runtime? >> >> Say, pg could measure random/seq page cost, *per tablespace* even. >> >> Has that been tried? > > FWIW, awhile ago I wrote a simple script to measure this and found > that the *actual* random_page / seq_page cost ratio was much higher > than 4/1. > > The problem is that caching effects have a large effect on the time it > takes to access a random page, and caching effects are very workload > dependent. So anything automated would probably need to optimize the > parameter values over a set of 'typical' queries, which is exactly > what a good DBA does when they set random_page_cost... Plus there's a separate pagecache outside shared_buffers, which adds another layer of complexity. What I was thinking about was a kind of 'autotuning' using real workload. I mean - measure the time it takes to process a request (depends on the application - could be time to load a page, process an invoice, whatever ...) and compute some reasonable metric on it (average, median, variance, ...). Move the cost variables a bit (e.g. the random_page_cost) and see how that influences performance. If it improved, do another step in the same direction, otherwise do step in the other direction (or do no change the values at all). Yes, I've had some lectures on non-linear programming so I'm aware that this won't work if the cost function has multiple extremes (walleys / hills etc.) but I somehow suppose that's not the case of cost estimates. Another issue is that when measuring multiple values (processing of different requests), the decisions may be contradictory so it really can't be fully automatic. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance