On 20/11/18 10:48 μ.μ., Rui DeSousa
wrote:
Hey, I was reading the docs, it
seems it means :
net.ipv4.tcp_keepalive_time +
net.ipv4.tcp_keepalive_intvl *
net.ipv4.tcp_keepalive_probes = 2hrs 11 Mins 15 Secs,
rather than 18 Hrs
Yeah, that’s correct. I wonder why it didn’t
terminate.
Most probably because there was another created clone, cloud
migration magic, that's my theory, albeit not confirmed by the
provider. Logical worker (walreceiver) was still alive and happy
even after the primary crushed. I have the logs from the other
standby and it immediately detected the problem (PANIC on the
primary) and retried. No firewall dropping packets, in every test I
did, the logical bgworker detects any problems *instantly*, and
retries after 5 secs by default.
|