On Tue, Jul 30, 2019 at 8:41 PM Nikolay Aleksandrov <nikolay@xxxxxxxxxxxxxxxxxxx> wrote: > > On 30/07/2019 15:25, Rundong Ge wrote: > > Given following setup: > > -modprobe br_netfilter > > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > > -brctl addbr br0 > > -brctl addif br0 enp2s0 > > -brctl addif br0 enp3s0 > > -brctl addif br0 enp6s0 > > -ifconfig enp2s0 mtu 1300 > > -ifconfig enp3s0 mtu 1500 > > -ifconfig enp6s0 mtu 1500 > > -ifconfig br0 up > > > > multi-port > > mtu1500 - mtu1500|bridge|1500 - mtu1500 > > A | B > > mtu1300 > > > > With netfilter defragmentation/conntrack enabled, fragmented > > packets from A will be defragmented in prerouting, and refragmented > > at postrouting. > > But in this scenario the bridge found the frag_max_size(1500) is > > larger than the dst mtu stored in the fake_rtable whitch is > > always equal to the bridge's mtu 1300, then packets will be dopped. > > > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > > of bridge's mtu in bridge refragment. > > > > Signed-off-by: Rundong Ge <rdong.ge@xxxxxxxxx> > > --- > > include/net/ip.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 29d89de..0512de3 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > > const struct sk_buff *skb) > > { > > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > > + return min(skb->dev->mtu, IP_MAX_MTU); > > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > > > > > > I don't think this is correct, there's a reason why the bridge chooses the smallest > possible MTU out of its members and this is simply a hack to circumvent it. > If you really like to do so just set the bridge MTU manually, we've added support > so it won't change automatically to the smallest, but then how do you pass packets > 1500 -> 1300 in this setup ? > > You're talking about the frag_size check in br_nf_ip_fragment(), right ? > Hi Nikolay My setup may not be common. And may I know if there is any reason to use output port's MTU to do the re-fragment check but then use the bridge's MTU to do the re-fragment? Is it the expected behavior that the bridge's MTU will affect the FORWARD traffic re-fragment, because I used to think the bridge's MTU will only effect the OUTPUT traffic sent from "br0". And the modification in this patch will replace the MTU in the fake_rtable which is only used in the FORWARD re-fragment and won't affect the local traffic from "br0". TKS Raydodn