My PG 9.4.5 server runs on Amazon RDS some times of the day we have a lot of checkpoints really close (less than 1 minute apart, see logs below) and we are trying to tune the DB to minimize the impact of the checkpoint or reduce the number of checkpoints.
Server Stats · Instance Type db.r3.4xl • 16 vCPUs 122GB of RAM • PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
Some PG Stats • Shared Buffers = 31427608kB • Checkpoint Segments = 64 • Checkpoint completion target = .9 • Rest of the configuration is below
Things we are doing • We have a huge table where each row is over 1kB and its very busy. We are splitting that into multiple tables especially the one json field that making it large.
Questions • Each checkpoint log writes out the following checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s • What does buffers mean? How do I find out how much RAM that is equivalent to? • Based on my RDS stats I don't think IOPs will help, because I don't see any flat lines on my write operations / second graph. Is this a good assumption? • What else can we tune to spread out checkpoints?
Here is the Postgres configuration
application_name | psql archive_mode | on archive_timeout | 5min autovacuum_analyze_scale_factor | 0.05 autovacuum_naptime | 30s autovacuum_vacuum_scale_factor | 0.1 checkpoint_completion_target | 0.9 checkpoint_segments | 64 client_encoding | UTF8 effective_cache_size | 62855216kB fsync | on full_page_writes | on hot_standby | off listen_addresses | * lo_compat_privileges | off log_checkpoints | on log_error_verbosity | default log_file_mode | 0644 log_hostname | on log_line_prefix | %t:%r:%u@%d:[%p]: log_rotation_age | 1h log_timezone | UTC log_truncate_on_rotation | off logging_collector | on maintenance_work_mem | 250MB max_connections | 4092 max_locks_per_transaction | 64 max_prepared_transactions | 0 max_replication_slots | 5 max_stack_depth | 6MB max_wal_senders | 10 port | 5432 rds.extensions | btree_gin,btree_gist,chkpass,citext,cube,dblink,dict_int,dict_xsyn,earthdistance,fuzzystrmatch,hstore,intagg,intarray,ip4r,isn,ltree,pgcrypto,pgrowlocks,pgstattuple,pg_buffercache,pg_prewarm,pg_stat_statements,pg_trgm,plcoffee,plls,plperl,plpgsql,pltcl,plv8,postgis,postgis_tiger_geocoder,postgis_topology,postgres_fdw,sslinfo,tablefunc,test_parser,tsearch2,unaccent,uuid-ossp rds.internal_databases | rdsadmin,template0 rds.rds_superuser_reserved_connections | 0 rds.superuser_variables | session_replication_role shared_buffers | 31427608kB ssl | on ssl_renegotiation_limit | 0 superuser_reserved_connections | 3 synchronous_commit | off TimeZone | UTC unix_socket_group | rdsdb unix_socket_permissions | 0700 wal_keep_segments | 32 wal_level | hot_standby wal_receiver_timeout | 30s wal_sender_timeout | 30s work_mem | 8MB
Here are some checkpoint logs from a busy period
2016-10-26 18:54:32 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:55:10 UTC::@:[5594]:LOG: checkpoint complete: wrote 168357 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=37.650 s, sync=0.036 s, total=37.879 s; sync files=266, longest=0.027 s, average=0.000 s 2016-10-26 18:55:15 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:55:53 UTC::@:[5594]:LOG: checkpoint complete: wrote 171087 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=38.147 s, sync=0.116 s, total=38.298 s; sync files=276, longest=0.098 s, average=0.000 s 2016-10-26 18:55:58 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:56:34 UTC::@:[5594]:LOG: checkpoint complete: wrote 171131 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.080 s, sync=0.142 s, total=35.323 s; sync files=274, longest=0.103 s, average=0.000 s 2016-10-26 18:56:39 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:57:14 UTC::@:[5594]:LOG: checkpoint complete: wrote 158818 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.545 s, sync=0.072 s, total=35.771 s; sync files=277, longest=0.036 s, average=0.000 s 2016-10-26 18:57:19 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:57:56 UTC::@:[5594]:LOG: checkpoint complete: wrote 179405 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=36.289 s, sync=0.069 s, total=36.405 s; sync files=295, longest=0.030 s, average=0.000 s 2016-10-26 18:58:01 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:58:37 UTC::@:[5594]:LOG: checkpoint complete: wrote 169711 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.248 s, sync=0.199 s, total=35.511 s; sync files=295, longest=0.146 s, average=0.000 s 2016-10-26 18:58:42 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:59:17 UTC::@:[5594]:LOG: checkpoint complete: wrote 174124 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=34.666 s, sync=0.059 s, total=34.790 s; sync files=279, longest=0.051 s, average=0.000 s 2016-10-26 18:59:23 UTC::@:[5594]:LOG: checkpoint starting: xlog 2016-10-26 18:59:55 UTC::@:[5594]:LOG: checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000
s |