Yan Zhai wrote: > Hi Willem, > > Thanks for getting back to me. > > On Mon, Jan 27, 2025 at 8:33 AM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Yan Zhai wrote: > > > Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO > > > for small packets. But the kernel currently dismisses GSO requests only > > > after checking MTU on gso_size. This means any packets, regardless of > > > their payload sizes, would be dropped when MTU is smaller than requested > > > gso_size. > > > > Is this a realistic concern? How did you encounter this in practice. > > > > It *is* a misconfiguration to configure a gso_size larger than MTU. > > > > > Meanwhile, EINVAL would be returned in this case, making it > > > very misleading to debug. > > > > Misleading is subjective. I'm not sure what is misleading here. From > > my above comment, I believe this is correctly EINVAL. > > > > That said, if this impacts a real workload we could reconsider > > relaxing the check. I.e., allowing through packets even when an > > application has clearly misconfigured UDP_SEGMENT. > > > We did encounter a painful reliability issue in production last month. > > To simplify the scenario, we had these symptoms when the issue occurred: > 1. QUIC connections to host A started to fail, and cannot establish new ones > 2. User space Wireguard to the exact same host worked 100% fine > > This happened rarely, like one or twice a day, lasting for a few > minutes usually, but it was quite visible since it is an office > network. > > Initially this prompted something wrong at the protocol layer. But > after multiple rounds of digging, we finally figured the root cause > was: > 3. Something sometimes pings host B, which shares the same IP with > host A but different ports (thanks to limited IPv4 space), and its > PMTU was reduced to 1280 occasionally. This unexpectedly affected all > traffic to that IP including traffic toward host A. Our QUIC client > set gso_size to 1350, and that's why it got hit. > > I agree that configurations do matter a lot here. Given how broken the > PMTU was for the Internet, we might just turn off pmtudisc option on > our end to avoid this failure path. But for those who hasn't yet, this > could still be confusing if it ever happens, because nothing seems to > point to PMTU in the first place: > * small packets also get dropped > * error code was EINVAL from sendmsg > > That said, I probably should have used PMTU in my commit message to be > more clear for our problem. But meanwhile I am also concerned about > newly added tunnels to trigger the same issue, even if it has a static > device MTU. My proposal should make the error reason more clear: > EMSGSIZE itself is a direct signal pointing to MTU/PMTU. Larger > packets getting dropped would have a similar effect. Thanks for that context. Makes sense that this is a real issue. One issue is that with segmentation, the initial mtu checks are skipped, so they have to be enforced later. In __ip_append_data: mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; Also, might this make the debugging actually harder, as the error condition is now triggered intermittently.