Re: Tuning Checkpoints

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Mon, 31 Oct 2016 22:59:50 +0100

On 10/31/2016 08:19 PM, Andre Henry wrote:
My PG 9.4.5 server runs on Amazon RDS some times of the day we have a
lot of checkpoints really close (less than 1 minute apart, see logs
below) and we are trying to tune the DB to minimize the impact of the
checkpoint or reduce the number of checkpoints.

Server Stats

·         Instance Type db.r3.4xl

•         16 vCPUs 122GB of RAM

•         PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

Some PG Stats

•         Shared Buffers = 31427608kB

•         Checkpoint Segments = 64

•         Checkpoint completion target = .9

•         Rest of the configuration is below

Things we are doing

•         We have a huge table where each row is over 1kB and its very
busy. We are splitting that into multiple tables especially the one json
field that making it large.

Questions

•         Each checkpoint log writes out the following checkpoint
complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added,
0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s;
sync files=274, longest=0.049 s, average=0.000 s

OK, each checkpoint has to write all dirty data from checkpoints. You 
have ~170k buffers worth of dirty data, i.e. ~1.3GB.

•         What does buffers mean? How do I find out how much RAM that is
equivalent to?

Buffer holds 8kB of data, which is the "chunk" of data files.

•         Based on my RDS stats I don't think IOPs will help, because I
don't see any flat lines on my write operations / second graph. Is this
a good assumption?

Not sure what you mean by this. Also, maybe you should talk to AWS if 
you're on RDS.

•         What else can we tune to spread out checkpoints?

Based on the logs, your checkpoints are triggered by filling WAL. I see 
your checkpoints happen every 30 - 40 seconds, and you only have 64 
segments.

So to get checkpoints checkpoints triggered by timeout (which I assume 
is 5 minutes, because you have not mentioned checkpoint_timeout), you 
need to increase checkpoint_segments enough to hold 5 minutes worth of WAL.

That means 300/30 * 64, i.e. roughly 640 segments (it's likely an 
overestimate, due to full page writes, but well).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance