On Mon, May 14, 2018 at 11:11:40AM -0500, Don Seiler wrote: > Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica. > [...] > However I'd like to know if there are any optimal networking settings on > the host or network that we maybe missing. My manager says that the circuit > between data centers was only 60% utilized at its peak. That actually hints at your network link/TCP performance _not_ being the problem, I think. Do you happen to have historical host-monitoring data available for when the replication interruption happened? You should definitely check for CPU (on both sides) and I/O (on the receiver/secondary) saturation. I remember when we first set up streaming replication initially, back then under postgres 9.0, the replication connection defaulted to using TLS/SSL; at the time with SSL/TLS compression enabled. The huge extra work that this incurred on the CPUs involved regularly made the WAL sender on the primary break streaming replication because it couldn't possibly keep up with the data that was being pushed into it encrypted & compressed TCP connection over a 10G link. (Linux's excellent perf tool proved invaluable in determining the exact cause for the high CPU load inside the postgres processes; once we had re-compiled OpenSSL without compression, the problem went away.) Now of course modern TLS library versions don't implement compression any more, and the streaming ciphers are most probably hardware accelerated for your combination of hard- and software, but the lesson we learned back then may still be worth keeping in mind... Other than that... have you verified that the network link between your hosts can actually live up to you and your manager's expectations in terms of bandwidth delivered? iperf3 could help verify that; if the measured bandwidth for a single TCP stream lives up to what you'd expect, you can probably rule out network-related concerns and concentrate on looking at other potential bottlenecks. -- with best regards: - Johannes Truschnigg ( johannes@xxxxxxxxxxxxxxx ) www: https://johannes.truschnigg.info/ phone: +43 650 2 133337 xmpp: johannes@xxxxxxxxxxxxxxx Please do not bother me with HTML-email or attachments. Thank you.
Attachment:
signature.asc
Description: PGP signature