I have two hot-spare databases that use wal archiving and continuous recovery mode. I want to minimize recovery time when we have to fail over to one of our hot spares. Right now, I'm seeing the following behavior which makes a quick recovery seem problematic: (1) hot spare applies 70 to 75 wal files (~1.1g) in 2 to 3 min period (2) hot spare pauses for 15 to 20 minutes, during this period pdflush consumes 99% IO (iotop). Dirty (from /proc/meminfo) spikes to ~760mb, remains at that level for the first 10 minutes, and then slowly ticks down to 0 for the second 10 minutes. (3) goto 1 My concern is that if the database has been in recovery mode for some time, even if it's caught up, if I go live sometime in (1) I can face a recovery time of upwards of 20 minutes. We've experienced delays during fail over in the past (not 20 minutes, but long enough to make me second guess what we are doing). I want to better understand what is going on so that I can determine what I can do (if anything) to minimize down time when we fail over to one of our hot spares. Here are my current settings: postgres (v8.3.7): shared_buffers = 2GB (15GB total) effective_cache_size = 12GB (15GB total) checkpoint_segments = 10 checkpoint_completion_target = 0.7 (other checkpoint/bgwriter settings left at default values) sysctl: kernel.shmmax = 2684354560 vm.dirty_background_ratio = 1 vm.dirty_ratio = 5 Thanks, Bryan -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general