Jozsef Kadlecsik wrote on Thu, Jan 31, 2019: > You should enable nf_conntrack_log_invalid and disable tcp_be_liberal. > Then the nf_ct_l4proto_log_invalid() is called and we can see in the log > why the system thinks the packet is out of window. That's what I did, or at least what I think I did; doing it again on another client with the problem worked somehow... This is the message: [641959.519509] nf_ct_tcp: SEQ is under the lower bound (already ACKed data retransmitted) IN= OUT= SRC=x.y.z.1 DST=x.y.z.16 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24007 DPT=49150 SEQ=2130642244 ACK=2743748999 WINDOW=26844 RES=0x00 ACK SYN URGP=0 OPT (020423000402080A25D06A57263DB38501030307) It matches what I was saying earlier (synack not in an expected sequence), but I still cannot reproduce - I'm starting to think that something was already wrong before I started trying to reproduce the other day and it just happened to work out... One key difference between working and not working is that when it's trying to reconnect, the nonworking nodes do not have [UNREPLIED] in /proc/net/nf_conntrack so I'm really curious at how it got to this state: ipv4 2 tcp 6 114 SYN_SENT src=10.255.6.12 dst=x.y.z.1 sport=49151 dport=24007 src=x.y.z.1 dst=10.255.6.12 sport=24007 dport=49151 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 >From what I understand that means the conntrack thinks it's seen a reply since the syn was sent, so maybe something silly with server sending a late packet when the client already was done disconnecting and sent first reconnecting syn? But the server was cut with a power failure last week so I don't see how that could possibly have happened, unless the clients had been in this state since before that reboot and nobody told me :p Anyway, I'm afraid that this is going to have to wait another while until the next time we reboot the servers, so putting this on hold unless someone has a bright idea :/ Thanks, -- Dominique