Greg, All: So, I've been running some stats over some customer workloads who are having issues with checkpoint spikes, and I'm finding that the bgwriter is almost entirely ineffective for them: pct_checkpoints_req | 33.0 avg_frequency_min | 2.30 avg_write_time_s | 112.17 avg_sync_time_s | 1.33 mb_written | 2387369.6 mb_written_per_min | 61.01 mb_per_checkpoint | 82.27 pct_checkpoint_buffers | 58.6 pct_bgwriter_buffers | 0.3 pct_backend_buffers | 41.1 bgwriter_halt_freq | 0.06 bgwriter_halt_potential | 70.11 buffer_allocation_ratio | 1.466 (query for the above is below) The key metric I'm looking at is that the bgwriter only took care of 0.3% of buffers. Yet average write throughput is around 1mb/s, and the bgwriter is capable of flushing 4mb/s, if it's waking up every 200ms. Of course, our first conclusion is that the writes are very bursty and the bgwriter is frequently hitting lru_maxpages. In fact, that seems to be the case, per bgwriter_halt_potential above (this is a measurement of the % of the time the bgwriter halted vs if all buffer writes were done in one continuous session where it couldn't keep up). And from the raw pg_stat_bgwriter: maxwritten_clean | 6950 So we're halting because we hit lru_maxpages a *lot*, which is keeping the bgwriter from keeping up. What gives? Well, digging into the docs, one thing I noticed was this note: "It then sleeps for bgwriter_delay milliseconds, and repeats. When there are no dirty buffers in the buffer pool, though, it goes into a longer sleep regardless of bgwriter_delay." combined with: "The number of dirty buffers written in each round is based on the number of new buffers that have been needed by server processes during recent rounds" ... so Greg built in the bgwriter autotuner with ramp-up/down behavior, where it sleeps longer and writes less if it hasn't been busy lately. But given the stats I'm looking at, I'm wondering if that isn't too much of a double-whammy for people with bursty workloads. That is, if you have several seconds of inactivity followed by a big write, then the bgwriter will wake up slowly (which, btw, is not manually tunable), and then write very little when it does wake up, at least in the first round. Of course, I may be misinterpreting the data in front of me ... I'm currently running a week-long test of raising lru_maxpages and decreasing bgwriter_delay to see how it affects things ... but I wanted to discuss it on-list. Bgwriter stats query follows: with bgstats as ( select checkpoints_timed, checkpoints_req, checkpoints_timed + checkpoints_req as checkpoints, checkpoint_sync_time, checkpoint_write_time, buffers_checkpoint, buffers_clean, maxwritten_clean, buffers_backend, buffers_backend_fsync, buffers_alloc, buffers_checkpoint + buffers_clean + buffers_backend as total_buffers, round(extract('epoch' from now() - stats_reset)/60)::numeric as min_since_reset, lru.setting::numeric as bgwriter_maxpages, delay.setting::numeric as bgwriter_delay from pg_stat_bgwriter cross join pg_settings as lru cross join pg_settings as delay where lru.name = 'bgwriter_lru_maxpages' and delay.name = 'bgwriter_delay' ) select round(checkpoints_req*100/checkpoints,1) as pct_checkpoints_req, round(min_since_reset/checkpoints,2) as avg_frequency_min, round(checkpoint_write_time::numeric/(checkpoints*1000),2) as avg_write_time_s, round(checkpoint_sync_time::numeric/(checkpoints*1000),2) as avg_sync_time_s, round(total_buffers/128::numeric,1) as mb_written, round(total_buffers/(128 * min_since_reset),2) as mb_written_per_min, round(buffers_checkpoint/(checkpoints*128::numeric),2) as mb_per_checkpoint, round(buffers_checkpoint*100/total_buffers::numeric,1) as pct_checkpoint_buffers, round(buffers_clean*100/total_buffers::numeric,1) as pct_bgwriter_buffers, round(buffers_backend*100/total_buffers::numeric,1) as pct_backend_buffers, round(maxwritten_clean*100::numeric/(min_since_reset*60000/bgwriter_delay),2) as bgwriter_halt_freq, round(maxwritten_clean*100::numeric/(buffers_clean/bgwriter_maxpages),2) as bgwriter_halt_potential, round(buffers_alloc::numeric/total_buffers,3) as buffer_allocation_ratio from bgstats; -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance