Ok, top posting my own message ... : Forget it. Debugging ssh tcp session tracking via ssh session is a very bad idea ... my test on the debian machine is normal. I think I found the culprit of my headache : heavy ZWP filtering by some firewalls.... Emmanuel. Le 17/02/2020 à 18:33, Emmanuel Fusté a écrit : > Hello, > I am facing a strange problem with recent kernels. > > On "bad" kernel, nf_conntrack_tcp_timeout_established default value is > not honored, and conntrack -L return different results on the same > machine in different ssh root sessions. > > Ubuntu vendor kernel 4.15 (64bits) : correct behaviour > Ubuntu 5.3.0 vendor kernel (64bits): BAD (with iptable 1.6.1 -> > iptable rules) > Debian kernel 5.4.19 (32bits): BAD (with iptable-nft -> nft rules) > > Clean boot, no conntrack module loaded: > # cat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established > cat: /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: No > such file or directory > # modprobe nf_conntrack > # cat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established > 432000 > > Add an ip table rule to start connection tracking: > # iptable -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT > > show tcp session tracking : > # conntrack -L |grep ^tcp > tcp 6 299 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=54470 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=54470 [ASSURED] mark=0 use=1 > > timeout is not 432000s but 300s. > On a moderated loaded smtp server, all sessions are at 300s > > do > # echo 432000 > >/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established > sometimes sessions start to pick 432000 as new timeout sometimes not. > Force things to happen: > # conntrack -F > # conntrack -L |grep ^tcp |grep ESTABLISHED |grep ASSURED > now on the loaded server, most tcp sessions pick the 432000 timeout > value, but time to time some still pick 300s. > > On the debian test machine tree ssh sessions are opened in tree window > (I dont have console on this machine) > First ssh session: > # conntrack -L |grep ^tcp > conntrack v1.4.5 (conntrack-tools): 24 flow entries have been shown. > tcp 6 431144 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=55243 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=55243 [ASSURED] mark=0 use=1 > tcp 6 431120 ESTABLISHED src=10.222.219.8 dst=10.222.219.164 > sport=22 dport=55339 src=10.222.219.164 dst=10.222.219.8 sport=55339 > dport=22 [ASSURED] mark=0 use=1 > tcp 6 299 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=54470 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=54470 [ASSURED] mark=0 use=1 > > second one: > ~# conntrack -L |grep ^tcp > conntrack v1.4.5 (conntrack-tools): 27 flow entries have been shown. > tcp 6 431099 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=55243 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=55243 [ASSURED] mark=0 use=1 > tcp 6 431999 ESTABLISHED src=10.222.219.8 dst=10.222.219.164 > sport=22 dport=55339 src=10.222.219.164 dst=10.222.219.8 sport=55339 > dport=22 [ASSURED] mark=0 use=1 > tcp 6 431963 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=54470 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=54470 [ASSURED] mark=0 use=1 > > last one: > # conntrack -L |grep ^tcp > conntrack v1.4.5 (conntrack-tools): 22 flow entries have been shown. > tcp 6 431999 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=55243 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=55243 [ASSURED] mark=0 use=1 > tcp 6 431979 ESTABLISHED src=10.222.219.8 dst=10.222.219.164 > sport=22 dport=55339 src=10.222.219.164 dst=10.222.219.8 sport=55339 > dport=22 [ASSURED] mark=0 use=1 > tcp 6 431942 ESTABLISHED src=10.222.219.164 dst=10.222.219.8 > sport=54470 dport=22 src=10.222.219.8 dst=10.222.219.164 sport=22 > dport=54470 [ASSURED] mark=0 use=1 > > crazy no ?!?!..... > > Ok these are all "vendor" kernels, but the Debian one is pretty > genuine. It seems that some upstream bugs are lurking around. Debian > kernel 5.2.9 (32bits) seems not affected, but Ubuntu 5.0 is partially > affected: 10% of connections (due to some backports ?) > > On the most affected production machine (Ubuntu with 5.3 kernel), the > same conntrack -L invocation sometimes return 300 sometimes 432000 for > the same long-running tcp connection. I don't know if it is a netlink > problem or a real conntrack timer change on activity on the tcp > session. But as my ssh sessions never survive more than 10~15 min I > think there is a real problem on the conntrack timers. > > Any thoughts ? > > Emmanuel.