On Mon, Nov 12, 2012 at 12:04 PM, Lists <lists@xxxxxxxxxxxxxxxxxx> wrote: > > > Should I increase the max_workers field from the default of 3 to (perhaps) > 10? I would not. You report that the reason you turned off autovac is because it made your database crawl when it kicked in. That suggests that if anything you should reduce that parameter (assuming you are still doing manual vacuums at off-peak hours, leaving autovacuum to only mop up what is left). > > Why would I want to reduce the cost delay to 0, and how does this relate to > cost_limit? Careful reading of the docs: > http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes > me believe that, given my substantial I/O subsystem, I'd want to drop > cost_delay to near zero and set the cost_limit really high, which is a rough > restatement of the last quoted paragraph above. (I think) Given that autovac kicking in destroys your performance, I think that your I/O subsystem may not be all that you think it is. Do you have test/dev/QA system with the same subsystem that you can use for investigation? If so, do you have a vaguely realistic load generator to drive those systems? > Assuming that I make these suggestions and notice a subsequent system load > problem, what information should I be gathering in order to provide better > post-incident forensics? If you are going to be focusing your undivided attention on monitoring the system during the period, just keeping a window open with "top" running is invaluable. (On most implementations, if you hit 'c' it will toggle the command display so you can see the results of "update_process_title=on") Also, "sar" is useful, and on most systems has the advantage that its stats are always being gathered without you having to do anything, so it works well for unexpected problems arising. I often just have "vmstat 1 -t" running in the background streaming into a log file, for the same reason. For internal to pgsql, set log_min_duration_statement to a value which few statements will exceed under normal operations, but many will when things bog down. That way you can figure out exactly when things bogged down after unattended operation, to correlate it with the sar/vmstat/etc reports. I'd also set for the probationary period (if you haven't already): log_lock_waits = on log_checkpoints = on log_autovacuum_min_duration = 0 (or some smallish positive value) The last one only logs when it finishes vacuuming a table. I wish there was a way to make it log when it started as well, but I don't think there is. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general