Hi, On 2018-08-14 15:18:55 +0200, Alexis Lê-Quôc wrote: > We run > a cluster of > large, SSD-backed, i3.16xl (64 cores visible to Linux, ~500GB of RAM, with > 8GB of shared_buffers, fast NVMe drives) nodes > , each > running PG 9.3 > on linux > in a vanilla streaming asynchronous replication setup: 1 primary node, 1 > replica designated for failover (left alone) and 6 read replicas, taking > queries. 9.3 is extremely old, we've made numerous performance improvements in areas potentially related to your problem. > Under normal circumstances this is working exactly as planned but when I > dial up the number of INSERTs on the primary to ~10k rows per second, or > roughly 50MB of data per second (not enough to saturate the network between > nodes) > , read replicas falls hopelessly and consistently behind until read traffic > is diverted away > . Do you use hot_standby_feedback=on? > 1. We see read replicas fall behind and we can measure their replication > throughput to be > consistently > 1-2% of what the primary is sustaining, by measuring the replication delay > (in second) every second. We quickly get > that metric > to 0.98-0.99 (1 means that replication is completely stuck > as it falls behind by one second every second > ). CPU, memory > , I/O > (per core iowait) > or network > (throughput) > as a whole resource are not > visibly > maxed out Are individual *cores* maxed out however? IIUC you're measuring overall CPU util, right? Recovery (streaming replication apply) is largely single threaded. > Here are some settings that may help and a perf profile of a recovery > process that runs without any competing read traffic processing the INSERT > backlog (I don't unfortunately have the same profile on a lagging read > replica). Unfortunately that's not going to help us much identifying the contention... > + 30.25% 26.78% postgres postgres [.] mdnblocks This I've likely fixed ~two years back: http://archives.postgresql.org/message-id/72a98a639574d2e25ed94652848555900c81a799 > + 18.64% 18.64% postgres postgres [.] 0x00000000000fde6a Hm, too bad that this is without a symbol - 18% self is quite a bit. What perf options are you using? > + 4.74% 4.74% postgres [kernel.kallsyms] [k] > copy_user_enhanced_fast_string Possible that a slightly bigger shared buffer would help you. It'd probably more helpful to look at a perf report --no-children for this kind of analysis. Greetings, Andres Freund