Re: Connection timeouts due to INVALID state rule

Reindl Harald <h.reindl@xxxxxxxxxxxxx> · Fri, 12 Jul 2019 00:59:05 +0200

Am 09.07.19 um 01:49 schrieb Reindl Harald:
> Am 08.07.19 um 23:27 schrieb Reindl Harald:
>> Am 08.07.19 um 22:05 schrieb Will Storey:
>>> On Mon 2019-07-08 21:07:16 +0200, Reindl Harald wrote:
>>>> Am 08.07.19 um 20:43 schrieb Florian Westphal:
>>>>>> Another thing I'm wondering is whether this rule could be impacting
>>>>>> connections beyond lo, but I just don't know about it.
>>>>>
>>>>> NORACK? If you restrict it via -i lo / -o lo, then no, it won't affect
>>>>> anything else.
>>>>>
>>>>> NAT for such connections won't work but thats normally not an issue
>>>>> in the loopback case.
>>>>
>>>> i think the question was if "iptables -t mangle -A PREROUTING -p all -m
>>>> conntrack --ctstate INVALID -j DROP" also breaks things beyond the "lo"
>>>> interface which it shouldn't and don't appear to, but who knows
>>>>
>>>> it shouldn't break anything at all, also not on "lo"
>>>
>>> Right, sorry, I was wondering about the INVALID rule given it would still
>>> be applied to non-lo traffic.
>>>
>>>> if you wan't to reproduce this setup SSH-forwarding to a VNC server, let
>>>> the VNC window in the background and after a realtive short amount of
>>>> time the tunneled connection with tigervnc-1.9.0-3.fc29.x86_64 just
>>>> freezes with the last picture
>>>
>>> That is concerning if it's the same issue!
>>
>> it is, th eversion with "! -i lo" has no problem
>>
>> iptables -t mangle -A PREROUTING -p all -m conntrack --ctstate INVALID
>> -j DROP
>>
>> iptables -t mangle -A PREROUTING -p all -m conntrack --ctstate INVALID !
>> -i lo -j DROP
>>
>> -t mangle for DROP rules because you don't need to write the same rules
>> in INPUT and FORWARD and it skips NAT / routing decision while you still
>> can have your EST/RELATED quick path on top
> 
> in fact that breaks even more
> 
> i have a complexer setup with a nested ESXi within VMware Workstation
> 
> * vmnet8: 192.168.196.2
> * ESXi: 192.168.196.3
> * Guest on ESXi: 192.168.196.4
> * Firewall on ESXi: 192.168.196.5
> * NETMAP NAT:  172.17.0.0/24 <-> 172.16.0.0/24
> * Destination: 172.17.0.3
> * Static Route: 172.17.00/24 -> 192.168.196.5
> 
> with the above rule on the *host* SSH from 192.168.196.4 to 172.17.0.3
> is not possible at all and it's for sure the "--ctstate INVALID" rule
> because when i put the INVALID-Rule in "-tmangle" in the INBOND-Chain
> limited to the interface "wan" everything is just fine

while it won't explain the last noticed issue with "--ctsate INVALID"
but the the description of the patch below may explain the issue VNC
over SSH freezing after some time in background when the INVALID rule
don't exclude "lo" as well as the high hitrate for "state INVALID" over
14 days in production which likely could be IMAP connections losing
their EST state way too early

net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 60
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 0

770.053.250  ----- %  ACCEPT-PACKET
68.767.920   76.33 %  DROP+REJECT
21.330.287   23.67 %  ACCEPT-CONNECTION
3.015.232    3.35 %   INVALID

-------- Weitergeleitete Nachricht --------
Betreff: [PATCH nf] netfilter: conntrack: always store window size un-scaled
Datum: Fri, 12 Jul 2019 00:29:05 +0200
Von: Florian Westphal <fw@xxxxxxxxx>
An: netfilter-devel@xxxxxxxxxxxxxxx
Kopie (CC): Florian Westphal <fw@xxxxxxxxx>, Jakub Jankowski
<shasta@xxxxxxxxxxx>

Jakub Jankowski reported following oddity:

After 3 way handshake completes, timeout of new connection is set to
max_retrans (300s) instead of established (5 days).

shortened excerpt from pcap provided:
25.070622 IP (flags [DF], proto TCP (6), length 52)
10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
26.070462 IP (flags [DF], proto TCP (6), length 48)
10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535,
[wscale 3]
27.070449 IP (flags [DF], proto TCP (6), length 40)
10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0

Turns out the last_win is of u16 type, but we store the scaled value:
512 << 8 (== 0x20000) becomes 0 window.

The Fixes tag is not correct, as the bug has existed forever, but
without that change all that this causes might cause is to mistake a
window update (to-nonzero-from-zero) for a retransmit.