Re: [PATCH] document danger of '-j REJECT'ing of '-m state INVALID' packets

Maciej Żenczykowski <zenczykowski@xxxxxxxxx> · Sat, 9 May 2020 10:45:42 -0700

So I've never tried to figure out how things break, just observed that
they do - first many many years ago (close to 15ish) - between my wifi
connected laptop at home and my university server in the same city.
I've kept an INVALID->DROP rule in all my firewalls since then and not
had problems.  I vaguely recall seeing delayed packets when I debugged
it back then.

See for example: https://github.com/moby/libnetwork/issues/1090 for
others running into this.

Now we've hit an issue at work where a network misconfiguration has
asymmetric one way pathing with a result that some packets were
getting *massively* delayed, and it's been causing user firewalls to
generate tcp resets for 'too old' 'already ack'ed' packets (ie. dups).

While this is of course a misconfig, and it shouldn't happen, in
practice it sometimes simply does.
All it takes is for a packet to get into a long queue, and the network
path to shift (immediately after it) to a less congested path.
Due to bufferbloat those long queues can take seconds to drain and
exceed path rtt by orders of magnitude.

I *think* what happens is:

A non-final tcp packet gets massively delayed, the packet past that
makes it through to the receive, and triggers an ACK with SACK, which
makes it back to the sender and triggers a retransmit and the
connections keeps on making forward progress,  then eventually the
delayed packet arrives and it's no longer considered valid and
triggers a tcp reset.  Massively of course depends on the rtt and
retransmit aggressiveness.

Here's my attempt to demonstrate what I believe the problem to be:

(on a freshly booted clean/empty/idle fedora 31 vm)

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state INVALID -j DROP
modprobe ifb
ip link set dev ifb0 up
tc qdisc add dev ifb0 root netem reorder 99% 0% delay 10s
tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress u32 match u32 0 0 action mirred egress
redirect dev ifb0
wget -O /dev/null https://git.kernel.org/torvalds/t/linux-5.7-rc4.tar.gz
iptables-save -c

...
/dev/null                             [     <=>
                           ] 169.58M  2.93MB/s    in 45s
2020-05-09 10:35:44 (3.81 MB/s) - ‘/dev/null’ saved [177819073]
...
[31750:181080717] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[244:1403178] -A INPUT -m state --state INVALID -j DROP

Now if I reboot, and run the same script, except instead of the
INVALID/DROP rule I do
  iptables -A INPUT -p tcp -j REJECT --reject-with tcp-reset
then the download never finishes (it hangs after 15MB @ 2MB/s and
eventually times out).

[4170:16758894] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[37:147454] -A INPUT -p tcp -j REJECT --reject-with tcp-reset

(arguably since this is a VM, and thus NAT'ed by my host, and then
again by the real ipv4 NAT, the setup isn't entirely clear, but I hope
it makes my point: INVALID state needs to be dropped, not rejected)