Re: Bi-modal streaming replication throughput

Andres Freund <andres@xxxxxxxxxxx> · Tue, 14 Aug 2018 10:46:45 -0700

Hi,

On 2018-08-14 15:18:55 +0200, Alexis Lê-Quôc wrote:
> We run
> a cluster of
> large, SSD-backed, i3.16xl (64 cores visible to Linux, ~500GB of RAM, with
> 8GB of shared_buffers, fast NVMe drives) nodes
> , each
> running PG 9.3
> on linux
> in a vanilla streaming asynchronous replication setup: 1 primary node, 1
> replica designated for failover (left alone) and 6 read replicas, taking
> queries.

9.3 is extremely old, we've made numerous performance improvements in
areas potentially related to your problem.

> Under normal circumstances this is working exactly as planned but when I
> dial up the number of INSERTs on the primary to ~10k rows per second, or
> roughly 50MB of data per second (not enough to saturate the network between
> nodes)
> , read replicas falls hopelessly and consistently behind until read traffic
> is diverted away
> .

Do you use hot_standby_feedback=on?

> 1. We see read replicas fall behind and we can measure their replication
> throughput to be
> consistently
> 1-2% of what the primary is sustaining, by measuring the replication delay
> (in second) every second. We quickly get
> that metric
> to 0.98-0.99 (1 means that replication is completely stuck
> as it falls behind by one second every second
> ). CPU, memory
> , I/O
> (per core iowait)
> or network
> (throughput)
> as a whole resource are not
> visibly
> maxed out

Are individual *cores* maxed out however? IIUC you're measuring overall
CPU util, right? Recovery (streaming replication apply) is largely
single threaded.

> Here are some settings that may help and a perf profile of a recovery
> process that runs without any competing read traffic processing the INSERT
> backlog (I don't unfortunately have the same profile on a lagging read
> replica).

Unfortunately that's not going to help us much identifying the
contention...

> +   30.25%    26.78%  postgres  postgres           [.] mdnblocks

This I've likely fixed ~two years back:

http://archives.postgresql.org/message-id/72a98a639574d2e25ed94652848555900c81a799

> +   18.64%    18.64%  postgres  postgres           [.] 0x00000000000fde6a

Hm, too bad that this is without a symbol - 18% self is quite a
bit. What perf options are you using?

> +    4.74%     4.74%  postgres  [kernel.kallsyms]  [k]
> copy_user_enhanced_fast_string

Possible that a slightly bigger shared buffer would help you.

It'd probably more helpful to look at a perf report --no-children for
this kind of analysis.

Greetings,

Andres Freund