On Wed, Jan 06, 2021 at 12:15:20AM +0100, Florian Westphal wrote: > Christian Perle reported a PMTU blackhole due to unexpected interaction > between the ip defragmentation that comes with connection tracking and > ip tunnels. > > Unfortunately setting 'nopmtudisc' on the tunnel breaks the test > scenario even without netfilter. > > Christinas setup looks like this: > +--------+ +---------+ +--------+ > |Router A|-------|Wanrouter|-------|Router B| > | |.IPIP..| |..IPIP.| | > +--------+ +---------+ +--------+ > / mtu 1400 \ > / \ > +--------+ +--------+ > |Client A| |Client B| > +--------+ +--------+ > > MTU is 1500 everywhere, except on Router A to Wanrouter and > Wanrouter to Router B. > > Router A and Router B use IPIP tunnel interfaces to tunnel traffic > between Client A and Client B over WAN. > > Client A sends a 1400 byte UDP datagram to Client B. > This packet gets encapsulated in the IPIP tunnel. > > This works, packet is received on client B. > > When conntrack (or anything else that forces ip defragmentation) is > enabled on Router A, the packet gets dropped on Router A after > encapsulation because they exceed the link MTU. > > Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse, > no packets pass even in the no-netfilter scenario. > > Patch one is a reproducer script for selftest infra. > > Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send > an icmp error to Client A. This allows 'nopmtudisc' tunnel to forward > the UDP datagrams. > > Patch three enables ip refragmentation for all reassembled packets, just > like ipv6. Acked-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Thanks.