Re: Connection timeouts due to INVALID state rule

Reindl Harald <h.reindl@xxxxxxxxxxxxx> · Mon, 8 Jul 2019 13:17:03 +0200



Am 08.07.19 um 12:51 schrieb Anton Danilov:
> To avoid this issue you can tune the conntrack behaviour with sysctl:
> sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1
> sysctl -w net.netfilter.nf_conntrack_tcp_loose=1

that's a bad idea on a firewall device acting as a forwarding router
just to not break loopback - if that behaviors are "normal" for loopback
connections iptables sould not handle them as invalid

so the better workaorund for now is just exclude the INVALID rules for
the "lo" device

> ---
> From https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt
> :
> nf_conntrack_tcp_be_liberal - BOOLEAN
> 0 - disabled (default)
> not 0 - enabled
> 
> Be conservative in what you do, be liberal in what you accept from others.
> If it's non-zero, we mark only out of window RST segments as INVALID.
> 
> nf_conntrack_tcp_loose - BOOLEAN
> 0 - disabled
> not 0 - enabled (default)
> 
> If it is set to zero, we disable picking up already established
> connections.
> 
> On Mon, 8 Jul 2019 at 01:45, Will Storey <will@xxxxxxxxxxxxx> wrote:
>>
>> Hello,
>>
>> I've been experiencing sporadic timeouts when connecting to daemons on
>> 127.0.0.1. I narrowed the cause down to an iptables INPUT rule that blocks
>> INVALID state packets:
>>
>>  603K   24M DROP  all  --  *  *  0.0.0.0/0  0.0.0.0/0   state INVALID
>>
>> I can work around this by allowing everything on lo before this rule, but
>> I'm wondering if this is expected or not.
>>
>> Here's more about the situation:
>>
>> All involved systems are running Ubuntu Bionic with kernel
>> 4.15.0-52-generic.
>>
>> On systems with the problem, there are half open TCP connections:
>>
>> tcp        0      0 127.0.0.1:2348          127.0.0.1:47268         ESTABLISHED
>>
>> When a client connects with source port 47268, it gets stuck in SYN_SENT
>> and eventually times out:
>>
>> 22:09:17.601482 IP (tos 0x0, ttl 64, id 53505, offset 0, flags [DF], proto TCP (6), length 60)
>>     127.0.0.1.47268 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0x02e6), seq 3436316390, win 43690, options [mss 65495,sackOK,TS val 712761924 ecr 0,nop,wscale 7], length 0
>> 22:09:17.601487 IP (tos 0x0, ttl 64, id 42105, offset 0, flags [DF], proto TCP (6), length 52)
>>     127.0.0.1.2348 > 127.0.0.1.47268: Flags [.], cksum 0xfe28 (incorrect -> 0x08f5), seq 1489307482, ack 3500129728, win 2309, options [nop,nop,TS val 712761924 ecr 696680490], length 0
>> 22:09:18.629342 IP (tos 0x0, ttl 64, id 53506, offset 0, flags [DF], proto TCP (6), length 60)
>>     127.0.0.1.47268 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xfee1), seq 3436316390, win 43690, options [mss 65495,sackOK,TS val 712762952 ecr 0,nop,wscale 7], length 0
>> 22:09:18.629469 IP (tos 0x0, ttl 64, id 42106, offset 0, flags [DF], proto TCP (6), length 52)
>>     127.0.0.1.2348 > 127.0.0.1.47268: Flags [.], cksum 0xfe28 (incorrect -> 0x04f1), seq 0, ack 1, win 2309, options [nop,nop,TS val 712762952 ecr 696680490], length 0
>>
>> It repeats like this (SYN then ACK) until timeout.
>>
>> My understanding is that I should see a RST from the client and the
>> handshake beginning from scratch. Indeed, if I create a half open TCP
>> connection to try to replicate the issue, that's what I see:
>>
>> 14:19:47.429668 IP (tos 0x0, ttl 64, id 35002, offset 0, flags [DF], proto TCP (6), length 60)
>>     127.0.0.1.59118 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xf9f1), seq 1911409434, win 43690, options [mss 65495,sackOK,TS val 2900480312 ecr 0,nop,wscale 7], length 0
>> 14:19:47.429698 IP (tos 0x0, ttl 64, id 44792, offset 0, flags [DF], proto TCP (6), length 52)
>>     127.0.0.1.2348 > 127.0.0.1.59118: Flags [.], cksum 0xfe28 (incorrect -> 0x81ca), seq 1940761408, ack 1119853882, win 342, options [nop,nop,TS val 2900480312 ecr 2900155296], length 0
>> 14:19:47.429724 IP (tos 0x0, ttl 64, id 50333, offset 0, flags [DF], proto TCP (6), length 40)
>>     127.0.0.1.59118 > 127.0.0.1.2348: Flags [R], cksum 0xe1c9 (correct), seq 1119853882, win 0, length 0
>> 14:19:48.452510 IP (tos 0x0, ttl 64, id 35003, offset 0, flags [DF], proto TCP (6), length 60)
>>     127.0.0.1.59118 > 127.0.0.1.2348: Flags [S], cksum 0xfe30 (incorrect -> 0xf5f2), seq 1911409434, win 43690, options [mss 65495,sackOK,TS val 2900481335 ecr 0,nop,wscale 7], length 0
>> 14:19:48.452533 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>>     127.0.0.1.2348 > 127.0.0.1.59118: Flags [S.], cksum 0xfe30 (incorrect -> 0x1929), seq 2748298959, ack 1911409435, win 43690, options [mss 65495,sackOK,TS val 2900481335 ecr 2900481335,nop,wscale 7], length 0
>> 14:19:48.452547 IP (tos 0x0, ttl 64, id 35004, offset 0, flags [DF], proto TCP (6), length 52)
>>     127.0.0.1.59118 > 127.0.0.1.2348: Flags [.], cksum 0xfe28 (incorrect -> 0xeb6d), seq 1911409435, ack 2748298960, win 342, options [nop,nop,TS val 2900481335 ecr 2900481335], length 0
>>
>> From what I can gather, either the ACK from the server or the RST from the
>> client (which doesn't show in the tcpdump if it is occurring) is getting
>> blocked by the INVALID state rule. If I allow everything on lo, I see the
>> RST and the connection succeeds.
>>
>> I've tried setting nf_conntrack_log_invalid to 255, but I don't see any
>> logs about what's invalid.
>>
>> I'm at a loss to explain why these packets are invalid. I'm also curious
>> why I'm unable to replicate the issue. There's seems to be something
>> special about certain half open connections.
>>
>> I've attached packet captures. One shows a case where the timeout happens
>> (synack_loop_timeout). The other is a case where I created a half open
>> connection and the timeout didn't occur (expected_rst).
>>
>> What do you think?