On Tue, Aug 14, 2018 at 9:18 AM, Alexis Lê-Quôc <alq@xxxxxxxxxxxxx> wrote:
>
each
running PG 9.3 on linux
That is the oldest version which is still supported. There have been a lot of improvements since then, including to performance. You should see if an upgrade solves the problem. If not, at least you will have access to better tools (like pg_stat_activity.wait_event_type), and people will be more enthusiastic about helping you figure it out knowing it is not an already-solved problem.
Here are some settings that may help and a perf profile of a recovery process that runs without any competing read traffic processing the INSERT backlog (I don't unfortunately have the same profile on a lagging read replica).
Unfortunately the perf when the problem is not occuring won't be very helpful. You need it from when the problem is occurring. Also, I find strace and gdb to more helpful than perf in this type of situation where you already know it is not CPU bound, although perhaps that is just my own lack of skill with perf. You need to know why it is not on the CPU, not what it is doing when it is on the CPU.
Where the settings you showed all of the non-default settings?
I assume max_standby_streaming_delay is at the default value of 30s? Are you getting query cancellations due conflicts with recovery, or anything else suspicious in the log? What is the maximum lag you see measured in seconds?
Cheers,
Jeff