Anyone have an idea? Thanks very much in advance for any reply.
On Fri, Jul 19, 2019 at 1:46 PM Tiemen Ruiten <t.ruiten@xxxxxxxxxxx> wrote:
Hello,In my previous post[1] on this list I brought up an issue with long running checkpoints. I reduced checkpoint_timeout to a more reasonable value (15m down from 60m) and forced/immediate checkpoints now complete mostly in under a minute. This thread and another one[2] on the Clusterlabs mailinglist also helped me understand more about how PostgreSQL internals work, thanks everyone!Now to my current issue: I took the advice to add more monitoring on replay lag (using pg_last_xact_replay_timestamp) and things are not looking good. Last night replication lagged by almost 6 hours on one of the nodes[3], but eventually caught up. As you can see in that screenshot, ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04 is the current master) and there happen to be different SSD's in ph-sql-03 (Crucial MX300 vs Crucial MX500 in the other two), which makes me think this is IO related.When I check the replay_lag column of pg_stat_replication, the numbers are consistent with the data from pg_last_xact_replay_timestamp:postgres=# SELECT application_name, replay_lag FROM pg_stat_replication;
application_name | replay_lag
------------------+-----------------
ph-sql-03 | 00:15:16.179952
ph-sql-05 | 00:10:01.078163Currently this doesn't present an operational issue, as the slaves aren't used by applications (still waiting for development to make the necessary changes). So there are no queries running at all on the slaves apart from the occasional monitoring.Cluster specifications:all nodes:- filesystem: ZFS stripe of mirrors- 2* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores total)- 128 GB RAMph-sql-03: 8* Crucial MX300 1050MB, underprovisioned to 1TBph-sql-0{4,5}: 8* Crucial MX500 1TBpostgresql.conf with GUCs changed from default:shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = off
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 15min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500
shared_preload_libraries = 'timescaledb, pg_cron, pg_prewarm' # (change requires restart)
max_locks_per_transaction = 512What are possible reasons for the high replay_lag? Is my storage just too slow? Are there any tunables available?Thank you,Tiemen Ruiten