Yeah, well done! Thanks for letting us know and glad my tips helped! regards, fabio pardi On 07/17/2018 10:10 AM, Ganesh Korde wrote: > Hi, > > Finally issue has been resolved and issue was with network. There > were two issues as below. > > 1. We have two links between primary and secondary, VPN tunnel and > L2TP. Multiple routes were configured for VPN and L2TP from the > Secondary to primary. Tunnel and link was always up. > But, ss the source from Secondary is coming through tunnel towards > Primary the connection was getting dropped after reaching the > destination due to ambiguity on routes between L2TP and VPN tunnel. This > issue has been resolved by allowing access via VPN tunnel > removing L2TP route. > > 2. After fixing the above, still disconnection was happening, but after > specific time interval. It was due to certain negotiation time set IPSEC > tunnel. > So they have enabled auto negotiation on the IPSEC tunnel so it won’t > wait from tunnel initiation from other end. > > So issues related disconnection has been resolved. > > Thanks Johannes and Fabio for your help. > > Regards, > Ganesh. > > On Tue, Jul 3, 2018 at 5:37 PM Fabio Pardi <f.pardi@xxxxxxxxxxxx > <mailto:f.pardi@xxxxxxxxxxxx>> wrote: > > Hi Ganesh, > > the logs you posted refer to timeouts in the connections. > > Your configuration tells that in case of network drop, the standby > server will be the first to acknowledge it. That's because > wal_receiver_timeout is < than wal_sender_timeout > > From the documentation: > > |wal_receiver_timeout| (|integer|) > > Terminate replication connections that are inactive longer than > the specified number of milliseconds. This is useful for the > receiving standby server to detect a primary node crash or > network outage. A value of zero disables the timeout mechanism. > This parameter can only be set in the |postgresql.conf| file or > on the server command line. The default value is 60 seconds. > > > That might explain why your secondary server calls for a RST. > > RST packages you posted are more a consequence, than a cause of your > problem. > > I think that the RST is sent to acknowledge master that the > connection should be closed due to timeout. > > What above goes together with the fact that the sequence number > 1232664740 of the RST packet is retransmitted several times, meaning > that it did not reach its destination at first. > > I would look more carefully to your network because I suspect the > real problem might be there. > > > regards, > > fabio pardi > > > > > On 03/07/18 09:36, Ganesh Korde wrote: >> Hi, >> >> After analysis by network team, they found packets are getting >> reset by Secondary server. Below are the logs. >> >> 782.822280 port7 in <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822310 wan2 out <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822313 port7 in <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822315 wan2 out <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822317 port7 in <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822319 wan2 out <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> 782.822345 port7 in <Secondary_server_IP>.35918 -> >> <Primary_server_IP>.5433: rst 1232664740 >> >> >> But they didn't able to find why secondary generating reset >> packet. There are no any devices between these servers which can >> modify the packets. >> Though both servers are on different firewall, but packets are >> getting reset at secondary server and not at the firewall level, >> we can see this in the log. >> >> Below points I would like to mention about application >> 1. This connection interruption happens in day time, when >> transactions are little bit high. In day time, average >> transactions per second are 5 (Inserts and processing). >> 2. We are not using connection pool, so each time request comes >> app server creates new connection to db server and when processing >> is done, app server disconnects. >> >> We are now clue less why secondary server resetting the packets. >> Any help is highly appreciated. >> >> Thanks & Regards, >> Ganesh. >> >> >> >> >> On Thu, Jun 28, 2018 at 3:38 PM Ganesh Korde >> <ganeshakorde@xxxxxxxxx <mailto:ganeshakorde@xxxxxxxxx>> wrote: >> >> Hi Johannes, >> >> Thanks for your reply. We are using VPN Tunnel between these >> two hosts. I will check with network team, with remaining >> questions you mentioned and will get back. >> >> Thanks & Regards, >> Ganesh. >> >> On Wed, Jun 27, 2018 at 6:46 PM Johannes Truschnigg >> <johannes@xxxxxxxxxxxxxxx <mailto:johannes@xxxxxxxxxxxxxxx>> >> wrote: >> >> Hi Ganesh, >> >> >> On Wed, Jun 27, 2018 at 06:37:25PM +0530, Ganesh Korde wrote: >> > [...] >> > 1. Because of what reason, " unexpected EOF on standby >> connection" occurs >> > on primary db server? >> > 2. After replication disconnection, secondary should >> immediately connect to >> > primary, but it takes some time, what could be the >> reason for this? >> >> From skimming the log, it seems to me that there is an >> issue at the >> socket/network level, which yields the "connection reset >> by peer" eror >> message. >> >> What is the network between these two hosts like? Is it a >> WAN link; is a VPN >> or SSH tunnel involved? Do you have other, long-running >> TCP sessions between >> these peers, and do they experience similar or other >> problems? Do the hosts' >> link-layer stats hint at problems, e. g. packet loss? Do >> the hosts' kernels >> leave a message hinting at L2 connectivity problems in >> their debug ringbuffers >> (`dmesg`) at the time you observe the replication drop out? >> >> -- >> with best regards: >> - Johannes Truschnigg ( johannes@xxxxxxxxxxxxxxx >> <mailto:johannes@xxxxxxxxxxxxxxx> ) >> >> www: https://johannes.truschnigg.info/ >> phone: +43 650 2 133337 >> xmpp: johannes@xxxxxxxxxxxxxxx >> <mailto:johannes@xxxxxxxxxxxxxxx> >> >> Please do not bother me with HTML-email or attachments. >> Thank you. >> >