On 5/31/2021 7:29 AM, Vladimir Oltean wrote: > Hi Ben, > > On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote: >> I'm working on a system that uses a TI Sitara SoC with one of its >> Ethernet ports connected to the host port of a Microchip KSZ8795 >> switch. I'm updating the kernel from 4.14.y to 5.10.y. Currently I >> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver >> has the same issue. >> >> The Microchip switch expects a tail tag on ingress from the host port >> to control which external port(s) to forward to. This must appear >> immediately before the frame checksum. The DSA core correctly pads >> outgoing skbs to at least 60 bytes before tag_ksz appends the tag. >> >> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min >> eth packet size"), the cpsw driver pads outgoing skbs to at least 64 >> bytes. This means that in smaller packets the tag byte is no longer >> at the tail. >> >> It's not obvious to me where this should be fixed. Should drivers >> that pad in ndo_start_xmit be aware of any tail tag that needs to be >> moved? Should DSA be aware that a lower driver has a minimum size > >> 60 bytes? > > These are good questions. > > In principle, DSA needs a hint from the master driver for tail taggers > to work properly. We should pad to ETH_ZLEN + <the hint value> before > inserting the tail tag. This is for correctness, to ensure we do not > operate in marginal conditions which are not guaranteed to work. > > A naive approach would be to take the hint from master->min_mtu. > However, the first issue that appears is that the dev->min_mtu value is > not always set quite correctly. > > The MTU in general measures the number of bytes in the L2 payload (i.e. > not counting the Ethernet + VLAN header, nor FCS). The DSA tag is > considered to be a part of the L2 payload from the perspective of a > DSA-unaware master. > > But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68), > which cites RFC791. This says: > > Every internet module must be able to forward a datagram of 68 > octets without further fragmentation. This is because an internet > header may be up to 60 octets, and the minimum fragment is 8 octets. > > But many drivers simply don't set dev->min_mtu = 0, even if they support > sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN, > proving nothing except the fact that they don't understand that the > Ethernet header should not be counted by the MTU anyway. > > So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we > would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is > not quite ideal, so even if it would be the correct approach, a large > amount of drivers would have to be converted to set dev->min_mtu = 0 > before we could consider switching to that and not have too many > regressions. > > Also, dev->min_mtu does not appear to have a very strict definition > anywhere other than "Interface Minimum MTU value". My hopes were some > guarantees along the lines of "if you try to send a packet with a > smaller L2 payload than dev->mtu, the controller might pad the packet". > But no luck with that, it seems. > > Going to commit 9421c9015047, it looks like that took a shortcut for > performance reasons, and omitted to check whether the skb is actually > VLAN-tagged or not, and if egress untagging was requested or not. > My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_ > be sent, it's only that the value was chosen (too) conservatively as > VLAN_ETH_ZLEN. The cpsw driver might be able to check whether the packet > is a VLAN tagged one by looking at skb->protocol, and choose the pad > size dynamically. Although I can understand why Grygorii might not want > to do that. Agree, that specific commit seems to be possibly by off by 4 in most cases. > > The pitfall is that even if we declare the proper min_mtu value for > every master driver, it would still not avoid padding in the cpsw case. > This is because the reason cpsw pads is due to VLAN, but VLAN is not > part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in > spite of needing to pad. > > The only honest solution might be to extend struct net_device and add a > pad_size value somewhere in there. You might be able to find a hole with > pahole or something, and it doesn't need to be larger than an u8 (for up > to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA > can look at it for tail taggers. Do we need another way for drivers to be left a chance to be wrong? Even if we document the semantics of net_device::pad_size correctly this probably won't cut it. TBH, I don't fully understand why the network stack has left so much leeway for Ethernet drivers to do their own padding as opposed to making sure that non-tagged (VLAN, DSA, whatever) frames are guaranteed to be at least 60 bytes when they reach ndo_start_xmit() and then just leave the stacking of devices to add their bytes where they need them, with the special trailer case that is a tiny bit harder to figure out. Maybe back in the days most Ethernet NICs would hardware pad and this only became a concern with newer/cheaper/embedded SoCs NICs that can no longer hardware pad by default? I can understand the argument about raw sockets which should permit an application to have full control over the minimum packet length, but again, in general we have a real link partner on the other side that is not going to be very tolerant. -- Florian