Connection timeouts due to INVALID state rule

Will Storey <will@xxxxxxxxxxxxx> · Sun, 7 Jul 2019 13:32:44 -0700

Hello,

I've been experiencing sporadic timeouts when connecting to daemons on
127.0.0.1. I narrowed the cause down to an iptables INPUT rule that blocks
INVALID state packets:

 603K   24M DROP  all  --  *  *  0.0.0.0/0  0.0.0.0/0   state INVALID

I can work around this by allowing everything on lo before this rule, but
I'm wondering if this is expected or not.

Here's more about the situation:

All involved systems are running Ubuntu Bionic with kernel
4.15.0-52-generic.

On systems with the problem, there are half open TCP connections:

tcp        0      0 127.0.0.1:2348          127.0.0.1:47268         ESTABLISHED

When a client connects with source port 47268, it gets stuck in SYN_SENT
and eventually times out:

22:09:17.601482 IP (tos 0x0, ttl 64, id 53505, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47268 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0x02e6), seq 3436316390, win 43690, options [mss 65495,sackOK,TS val 712761924 ecr 0,nop,wscale 7], length 0
22:09:17.601487 IP (tos 0x0, ttl 64, id 42105, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.2348 > 127.0.0.1.47268: Flags [.], cksum 0xfe28 (incorrect -> 0x08f5), seq 1489307482, ack 3500129728, win 2309, options [nop,nop,TS val 712761924 ecr 696680490], length 0
22:09:18.629342 IP (tos 0x0, ttl 64, id 53506, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47268 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xfee1), seq 3436316390, win 43690, options [mss 65495,sackOK,TS val 712762952 ecr 0,nop,wscale 7], length 0
22:09:18.629469 IP (tos 0x0, ttl 64, id 42106, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.2348 > 127.0.0.1.47268: Flags [.], cksum 0xfe28 (incorrect -> 0x04f1), seq 0, ack 1, win 2309, options [nop,nop,TS val 712762952 ecr 696680490], length 0

It repeats like this (SYN then ACK) until timeout.

My understanding is that I should see a RST from the client and the
handshake beginning from scratch. Indeed, if I create a half open TCP
connection to try to replicate the issue, that's what I see:

14:19:47.429668 IP (tos 0x0, ttl 64, id 35002, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.59118 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xf9f1), seq 1911409434, win 43690, options [mss 65495,sackOK,TS val 2900480312 ecr 0,nop,wscale 7], length 0
14:19:47.429698 IP (tos 0x0, ttl 64, id 44792, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.2348 > 127.0.0.1.59118: Flags [.], cksum 0xfe28 (incorrect -> 0x81ca), seq 1940761408, ack 1119853882, win 342, options [nop,nop,TS val 2900480312 ecr 2900155296], length 0
14:19:47.429724 IP (tos 0x0, ttl 64, id 50333, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.59118 > 127.0.0.1.2348: Flags [R], cksum 0xe1c9 (correct), seq 1119853882, win 0, length 0
14:19:48.452510 IP (tos 0x0, ttl 64, id 35003, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.59118 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xf5f2), seq 1911409434, win 43690, options [mss 65495,sackOK,TS val 2900481335 ecr 0,nop,wscale 7], length 0
14:19:48.452533 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.2348 > 127.0.0.1.59118: Flags [S.], cksum 0xfe30 (incorrect -> 0x1929), seq 2748298959, ack 1911409435, win 43690, options [mss 65495,sackOK,TS val 2900481335 ecr 2900481335,nop,wscale 7], length 0
14:19:48.452547 IP (tos 0x0, ttl 64, id 35004, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.59118 > 127.0.0.1.2348: Flags [.], cksum 0xfe28 (incorrect -> 0xeb6d), seq 1911409435, ack 2748298960, win 342, options [nop,nop,TS val 2900481335 ecr 2900481335], length 0

>From what I can gather, either the ACK from the server or the RST from the
client (which doesn't show in the tcpdump if it is occurring) is getting
blocked by the INVALID state rule. If I allow everything on lo, I see the
RST and the connection succeeds.

I've tried setting nf_conntrack_log_invalid to 255, but I don't see any
logs about what's invalid.

I'm at a loss to explain why these packets are invalid. I'm also curious
why I'm unable to replicate the issue. There's seems to be something
special about certain half open connections.

I've attached packet captures. One shows a case where the timeout happens
(synack_loop_timeout). The other is a case where I created a half open
connection and the timeout didn't occur (expected_rst).

What do you think?

Thank you!

Will
Attachment:
expected_rst.pcap

Description: application/vnd.tcpdump.pcap
Attachment:
synack_loop_timeout.pcap

Description: application/vnd.tcpdump.pcap