On 13/11/18 5:35 μ.μ., Rui DeSousa
wrote:
Is there a way for the WAL receiver
to not have detected the termination of the replication
stream?
The teardown of the network socket on the upstream
server should send a reset packet to the downstream server and
at that point the WAL receiver would close its connection. Is
there any firewalls, router, rules, etc between the nodes that
could have dropped the packet?
No
Shouldn't normally the WAL receiver detect this
and try again in wal_retrieve_retry_interval ?
Not really… if the connection has already been torn
down; the upstream server would send another reset packet on the
next request and in this case it would. However, if request
packets at not reaching the upstream server; i.e. due to
firewall silently dropping the packets (personally I believe
firewall should always set reset packets to friendly hosts) then
what happens is the TCP/IP send queue builds up with the
requests packets instead — a t this point waiting on the OS to
terminate the connection which can day or two depending on your
TCP/IP setting.
Again no dropping, no firewall.
What you want to use instead is wal_receiver_timeout
to detect the given case where upstream server either no longer
exists or the firewall, etc is silently dropping packets.
Once again from my original message :
"while setting up logical replication since August we had seen early
on the need to increase max_receiver_timeout and max_sender_timeout
from 60sec to 5mins"
So with wal_receiver_timeout='5 min', the receiver never detected
any timeout.
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
|