On Tue, Jan 12, 2016 at 10:25 AM, Cory Tucker <cory.tucker@xxxxxxxxx> wrote:
PG 9.4.4 (RDS)I'm experiencing an issue when trying to update many rows in a single table (one row at a time, but parallelized across ~12 connections). The issue we see is that the writes will periodically be blocked for a duration of several minutes and then pick back up. After digging through our monitoring stack, I was able to uncover these stats which seem to allude to it being a background writer performance problem:
I'm not familiar with your monitoring stack. I assume bgwriter.sync_time refers to pg_stat_bgwriter.checkpoint_sync_time? Also, most of the stats shown will increase monotonically until the stats are reset. So it looks like our monitoring stack is either resetting stats frequently, or is implicitly doing a delta between each consecutive period for display purposes. And what are the units? bgwriter.buffers_checkpoint is presumably in buffers as the numerator, but over what period of time in the denominator?
Anyway, it looks to me like you have a checkpoint problem. The checkpoint overwhelms your IO system. The overwhelmed IO system then backs up into the bgwriter. What you see in the bgwriter is just a symptom, not the cause. The background writer is usually not very useful in recent versions of PostgreSQL, anyway. But, the same IO problem that is clogging up the background writer is also clogging up either your buffer_backend, or your WAL writes/fsyncs. And both of those will destroy your throughput.
(apologies for the image)
Our settings for the background writer are pretty standard OOB (I threw in some others that I thought might be helpful, too):name | setting | unit-------------------------+---------+------bgwriter_delay | 200 | msbgwriter_lru_maxpages | 100 |bgwriter_lru_multiplier | 2 |maintenance_work_mem | 65536 | kBmax_worker_processes | 8 |work_mem | 32768 | kB
What are your checkpoint settings?
Also, you should turn on log_checkpoints.
It seems to me that the background writer just can't keep up with the amount of writes that I am trying to do and freezes all the updates. What are my options to improve the background writer performance here?
You probably just need more IO throughput in general.
Cheers,
Jeff