Hi Andres.
Nearly one WAL (16 MB) per second most of the time and 3 WALs per second in the beginning of checkpoint (due to full_page_writes).
Here is the output from a couple of our masters (and that is actually two hours before peak load): $ pg_xlogdump --stats 0000000100012B2800000089 0000000100012B3000000088 | fgrep -v 0.00 Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- Heap2 55820638 ( 21.31) 1730485085 ( 22.27) 1385795249 ( 13.28) 3116280334 ( 17.12) Heap 74366993 ( 28.39) 2288644932 ( 29.46) 5880717650 ( 56.34) 8169362582 ( 44.87) Btree 84655827 ( 32.32) 2243526276 ( 28.88) 3170518879 ( 30.38) 5414045155 ( 29.74) -------- -------- -------- -------- Total 261933790 7769663301 [42.67%] 10437031778 [57.33%] 18206695079 [100%] $ $ pg_xlogdump --stats 000000010000D17F000000A5 000000010000D19100000004 | fgrep -v 0.00 Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- Heap2 13676881 ( 18.95) 422289539 ( 19.97) 15319927851 ( 25.63) 15742217390 ( 25.44) Heap 22284283 ( 30.88) 715293050 ( 33.83) 17119265188 ( 28.64) 17834558238 ( 28.82) Btree 27640155 ( 38.30) 725674896 ( 34.32) 19244109632 ( 32.19) 19969784528 ( 32.27) Gin 6580760 ( 9.12) 172246586 ( 8.15) 8091332009 ( 13.54) 8263578595 ( 13.35) -------- -------- -------- -------- Total 72172983 2114133847 [3.42%] 59774634680 [96.58%] 61888768527 [100%] $
We do sometimes see WALWriteLock in pg_stat_activity.wait_event, but not too often.
We use fork of barman project. In most cases that’s not a problem.
Physical streaming replication. We used to have problems with network bandwidth (1 Gbit/s was consumed by transferring WAL to two replicas and one archive) but that became better after 1. upgrading to 9.5 and turning wal_compression on, 2. changing archive command to doing parallel compression and sending WALs to archive, 3. increasing checkpoint_timeout.
xdb301e/postgres M # SELECT name, current_setting(name) FROM pg_settings WHERE name IN ('max_wal_size', 'checkpoint_timeout', 'wal_compression', 'wal_buffers'); name | current_setting --------------------+----------------- checkpoint_timeout | 1h max_wal_size | 128GB wal_buffers | 16MB wal_compression | on (4 rows) Time: 0.938 ms xdb301e/postgres M #
OLTP workload with 80% reads and 20% writes.
Well, we actually workarounded issues with WAL write rate by increasing checkpoint_timeout to maximum possible (in 9.6 it can be even more). The downside of this change is recovery time. Thanks postgres for its stability but sometimes you can waste ~ 10 minutes just to restart postgres for upgrading to new minor version and that’s not really cool.
|