SV: SV: "straggler" packets being logged

André Paulsberg-Csibi (IBM Consultant) <Andre.Paulsberg-Csibi@xxxxxxxx> · Tue, 9 Oct 2018 14:37:35 +0000

This seems like multiple issues here , as this shown in your last mail is not "normal" behavior .

First off like you mentioned it is weird the SERVICE side sends the FIN first , it almost always the CLIENT side that initiates this for normal TCP session like used in HTTPS and HTTP .
Under you see a typical HTTP request for a "small" web request where closure is within the norm .

test:~ # tcpdump -s0 -vvnni vlan0 host 194.63.249.211
tcpdump: listening on vlan0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:32:21.596467 IP (tos 0x0, ttl 64, id 35588, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [S], cksum 0xc7a2 (incorrect -> 0x199b), seq 1514875111, win 29200, options [mss 1460,sackOK,TS val 1138402275 ecr 0,nop,wscale 7], length 0
11:32:21.598706 IP (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    194.63.249.211.80 > 192.168.1.10.35232: Flags [S.], cksum 0x5aec (correct), seq 2238753163, ack 1514875112, win 28960, options [mss 1460,sackOK,TS val 200115364 ecr 1138402275,nop,wscale 7], length 0
11:32:21.598759 IP (tos 0x0, ttl 64, id 35589, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [.], cksum 0xc79a (incorrect -> 0xf9f1), seq 1, ack 1, win 229, options [nop,nop,TS val 1138402277 ecr 200115364], length 0
11:32:21.598824 IP (tos 0x0, ttl 64, id 35590, offset 0, flags [DF], proto TCP (6), length 161)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [P.], cksum 0xc807 (incorrect -> 0xc9b5), seq 1:110, ack 1, win 229, options [nop,nop,TS val 1138402277 ecr 200115364], length 109
11:32:21.601405 IP (tos 0x0, ttl 59, id 11561, offset 0, flags [DF], proto TCP (6), length 52)
    194.63.249.211.80 > 192.168.1.10.35232: Flags [.], cksum 0xf985 (correct), seq 1, ack 110, win 227, options [nop,nop,TS val 200115365 ecr 1138402277], length 0
11:32:21.604269 IP (tos 0x0, ttl 59, id 11562, offset 0, flags [DF], proto TCP (6), length 585)
    194.63.249.211.80 > 192.168.1.10.35232: Flags [P.], cksum 0x429d (correct), seq 1:534, ack 110, win 227, options [nop,nop,TS val 200115366 ecr 1138402277], length 533
11:32:21.604297 IP (tos 0x0, ttl 64, id 35591, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [.], cksum 0xc79a (incorrect -> 0xf760), seq 110, ack 534, win 237, options [nop,nop,TS val 1138402282 ecr 200115366], length 0
11:32:21.605416 IP (tos 0x0, ttl 64, id 35592, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [F.], cksum 0xc79a (incorrect -> 0xf75e), seq 110, ack 534, win 237, options [nop,nop,TS val 1138402283 ecr 200115366], length 0
11:32:21.607340 IP (tos 0x0, ttl 59, id 11563, offset 0, flags [DF], proto TCP (6), length 52)
    194.63.249.211.80 > 192.168.1.10.35232: Flags [F.], cksum 0xf767 (correct), seq 534, ack 111, win 227, options [nop,nop,TS val 200115366 ecr 1138402283], length 0
11:32:21.607372 IP (tos 0x0, ttl 64, id 35593, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.10.35232 > 194.63.249.211.80: Flags [.], cksum 0xc79a (incorrect -> 0xf75b), seq 111, ack 535, win 237, options [nop,nop,TS val 1138402285 ecr 200115366], length 0

What is more also strange is there is no visible ACK for any of the last 2 PSH,ACK packets with data back from the SERVICE ( packets at 19:59:05.676219 & 19:59:05.676553 ) Combined with the first FIN ( 19:59:05.676712 ) which is only 0.159 ms after last data was sent and and would not allow for a "normal" ACK to arrive .
The FIN is also re-transmitted ( 19:59:05.742295 )which as another 65.583 ms after first FIN .
After this you have multiple re-transmits for the "first" of the "failed" PSH,ACK ( packet at 19:59:05.676219 ) which is because none of the 2 packets received any ACK according to the tcpdump 

[cut]
19:49:05.677980 IP6 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172 > 2a04:4e42:1e::323.https: Flags [.], ack 6192, win 1446, options [nop,nop,TS val 139507677 ecr 1418337836], length 0
19:59:05.676219 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [P.], seq 6192:6250, ack 923, win 63, options [nop,nop,TS val 1418487836 ecr 139507677], length 58
19:59:05.676553 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [P.], seq 6250:6281, ack 923, win 63, options [nop,nop,TS val 1418487836 ecr 139507677], length 31
19:59:05.676712 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [F.], seq 6281, ack 923, win 63, options [nop,nop,TS val 1418487836 ecr 139507677], length 0
19:59:05.742295 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [F.], seq 6281, ack 923, win 63, options [nop,nop,TS val 1418487853 ecr 139507677], length 0
19:59:05.978059 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [P.], seq 6192:6250, ack 923, win 63, options [nop,nop,TS val 1418487912 ecr 139507677], length 58 [cut]
20:07:10.938037 IP6 2a04:4e42:1e::323.https > 2001:abcd:1234:9876:80ff:7163:ca19:ad06.52172: Flags [P.], seq 6192:6250, ack 923, win 63, options [nop,nop,TS val 1418609152 ecr 139507677], length 58

It is also weird that it retries so many times that the retry last 8 minutes and 5 seconds , however it explains why the packets gets logged since the last longer then the 30 seconds default window before session table is cleared after a "normal" session FIN/ACK closure , as shown under for same session as I tested and showed in the top of this mail  .

test:~ # conntrack -E -o timestamp,id,extended | grep 194.63.249.211
[1539077541.596483]	    [NEW] ipv4     2 tcp      6 30 SYN_SENT src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 [UNREPLIED] src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 id=118172424
[1539077541.598778]	 [UPDATE] ipv4     2 tcp      6 30 SYN_RECV src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 id=118172424
[1539077541.598819]	 [UPDATE] ipv4     2 tcp      6 3600 ESTABLISHED src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 [ASSURED] id=118172424
[1539077541.605458]	 [UPDATE] ipv4     2 tcp      6 30 FIN_WAIT src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 [ASSURED] id=118172424
[1539077541.607410]	 [UPDATE] ipv4     2 tcp      6 29 LAST_ACK src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 [ASSURED] id=118172424
[1539077541.607449]	 [UPDATE] ipv4     2 tcp      6 29 TIME_WAIT src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 [ASSURED] id=118172424
[1539077571.605452]	[DESTROY] ipv4     2 tcp      6 src=192.168.1.10 dst=194.63.249.211 sport=35232 dport=80 src=194.63.249.211 dst=192.168.1.10 sport=80 dport=35232 [ASSURED] id=118172424

"more /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_*" Will show the various timers used for various stages if a TCP session However I would not increase the timeout(s) to 540 seconds ( 9 minutes ) to avoid this issue so fast/early .
That would rather be a "last resort" if I could not find out why this sessions to not closed "normally"
( Or why the SERVICE retries a session it self sent FIN for more then 8 minutes ago )

I noticed also the CLIENT have "low" MTU as it reports mss=1220 bytes , while the SERVICE have a more "normal" MTU with mss=1440 bytes The SYN-SN/ACK-ACK have approx 12ms and 48 ms latency , but other times the ACKs seems to be closer to 1ms and 10ms .
May not be relevant , but noticed and would suggest that if possible one should try to recreate the issue on 2 hosts you control ...
.. so you know the real latency and can check if the missing packets are lost on the endhosts or on the path in your network infrastructure .
Could be that there is some strange packetloss that triggers the issue that you might solve that and remove both that packetloss itself improving performance and most logs you see .

Best regards
André Paulsberg-Csibi
Senior Network Engineer 
IBM Services AS

Sensitivity: Internal