synack packet invalid when client reconnecting with same src port because out of window?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm having another conntrack blocking packets that I don't quite
understand. Jozsef, Florian and Michal have been of great help back in
April when I had another problem so counting on you again :-)

Unlike last time though I don't have a simple reproducer yet, and
instead of my personal laptop/servers this is work production
environment and I won't be able to test as freely, sorry in advance for
that.

Environment
===========

These are using a quite old rhel7 kernel - 3.10.0-693.11.6.el7.x86_64

As said previously I do not have a reproducer yet so cannot try
upgrading; it'll get an update to 3.10.0-862.14.4.el7.x86_64 next week
but if I have a reproducer I'll try upstream first.

Please do say if this rings a bell as potentially already fixed, and
I'll stop wasting everyone's time.


Overview
========

Basically, when we restart one of our gluster servers the clients will
try to reconnect to it but something happens that the synack is refused
by conntrack on maybe one out of four or five clients.

tcpdump loops on:
05:30:41.411346 IP x.y.z.34.49149 > x.y.z.1.24007: Flags [S], seq 837922022, win 26880, options [mss 8960,sackOK,TS val 1689048672 ecr 0,nop,wscale 7], length 0
05:30:41.411481 IP x.y.z.1.24007 > x.y.z.34.49149: Flags [S.], seq 1749683989, ack 837922023, win 26844, options [mss 8960,sackOK,TS val 560823762 ecr 1689017605,nop,wscale 7], length 0
05:30:57.595860 IP x.y.z.1.24007 > x.y.z.34.49149: Flags [S.], seq 1749683989, ack 837922023, win 26844, options [mss 8960,sackOK,TS val 560839947 ecr 1689017605,nop,wscale 7], length 0

glusterfs (the client) is silly and always keeps retrying with the same
source port, so the connection never recovers by itself.


The connection seems to correctly be identified as syn sent:
# ss -temoi | grep 24007
SYN-SENT   0      1      x.y.z.34:49149                x.y.z.1:24007                 timer:(on,31sec,6) ino:3170069 sk:ffff88082779be00 <->
# conntrack -L | grep 24007
tcp      6 78 SYN_SENT src=x.y.z.34 dst=x.y.z.1 sport=49149 dport=24007 src=x.y.z.1 dst=x.y.z.34 sport=24007 dport=49149 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1


conntrack -E never lists the connection, but the timestamp is refreshed
everytime a new syn comes


For some reason nf_conntrack_log_invalid does not output anything to
dmesg, but adding log rules before and after the default firewalld's
INPUT --ctstate INVALID -j DROP rule shows that the synack packets fall
there (and should have been picked up by the RELATED rule earlier and
weren't)


I unfortunately do not have any trace of when the server restarted,
which would likely help with this. I'm trying to see if I can reproduce
by forcefully disconnecting the server so the client would try to
reconnect; if I can do that I'll be able to test anything easily.



Workarounds/hints
=================

- deleting the conntrack entry with conntrack -D --src etc etc makes the
next syn/synack work.
- stopping the client for two minutes (so the conntrack entry times out)
also obviously works for the same reason; the client just repeatedly
refreshes the rule so it doesn't have a chance to fade.

- the net.netfilter.nf_conntrack_tcp_be_liberal sysctl also works, so
that would hint at a window issue? conntrack still expects the previous
connexion sequences to be used?



Any help to move forward would be great; I'll try to somehow reproduce
without disrupting production first but help appreciated!


Thanks,
-- 
Dominique Martinet



[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux