I've copied a bunch of new people to this email. TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, does the DSA master declare any of the following features as "on"? ethtool -k eth0 | grep tx-checksum-ip I would expect not. Otherwise, we've either found a bug, or discovered the Sasquatch. On Mon, Apr 11, 2022 at 08:03:06PM -0300, Luiz Angelo Daros de Luca wrote: > DSA tags before IP header (categories 1 and 2) or after the payload (3) > might introduce offload checksum issues. > > Signed-off-by: Luiz Angelo Daros de Luca <luizluca@xxxxxxxxx> > --- Reviewed-by: Vladimir Oltean <vladimir.oltean@xxxxxxx> > Documentation/networking/dsa/dsa.rst | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst > index ddc1dd039337..ed7fa76e7a40 100644 > --- a/Documentation/networking/dsa/dsa.rst > +++ b/Documentation/networking/dsa/dsa.rst > @@ -193,6 +193,23 @@ protocol. If not all packets are of equal size, the tagger can implement the > default behavior by specifying the correct offset incurred by each individual > RX packet. Tail taggers do not cause issues to the flow dissector. > > +Checksum offload should work with category 1 and 2 taggers when the DSA master > +driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and > +csum_offset. For those cases, DSA will shift the checksum start and offset by > +the tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM > +or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the > +offload hardware already expects that specific tag (perhaps due to matching > +vendors). DSA slaves inherit those flags from the master port, and it is up to > +the driver to correctly fall back to software checksum when the IP header is not > +where the hardware expects. If that check is ineffective, the packets might go > +to the network without a proper checksum (the checksum field will have the > +pseudo IP header sum). For category 3, when the offload hardware does not > +already expect the switch tag in use, the checksum must be calculated before any > +tag is inserted (i.e. inside the tagger). Otherwise, the DSA master would > +include the tail tag in the (software or hardware) checksum calculation. Then, > +when the tag gets stripped by the switch during transmission, it will leave an > +incorrect IP checksum in place. > + While what you're describing here is truthful to what is currently being done, I'm re-reading this conversation: https://lore.kernel.org/netdev/20210715114908.ripblpevmdujkf2m@skbuf/T/#m13a2e3a78d22b82f14bcdf85d988844053b1e8f9 and trying to remember why I didn't point out what now seems obvious. It was said that inheriting master->vlan_features & NETIF_F_HW_CSUM is counter-productive for tail taggers, since now we have to patch all of them to do that "skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))" dance. And that is most definitely true. It was also said that some systems where the DSA master vendor coincides with the DSA switch vendor rely on the switch inheriting NETIF_F_HW_CSUM from master->vlan_features, for a boost in performance. That is also most definitely true. But none of the examples given was for a tail tagger, which is what the discussion was about. With the exception of the obsolete tag_trailer.c used by mv88e6060, Marvell use Ethertype headers, and Broadcom either use an Ethertype header or a header prepended to the Ethernet header. Of all tagging protocol drivers which declare a non-zero needed_tailroom: - tag_ksz.c also calls skb_checksum_help() so I don't have doubts that there aren't masters which offload checksumming for it - tag_trailer.c doesn't call skb_checksum_help(), but it's orphan and probably super broken anyway - tag_hellcreek.c doesn't call skb_checksum_help() and is therefore probably broken with checksum offloading masters. But it was probably only tested on FPGA (or at least I assume "hirschmann,hellcreek-de1soc-r1" stands for "Altera DE1") and it happens to work there. - tag_xrs700x.c doesn't call skb_checksum_help() either, so there are probably breakages waiting to happen - tag_sja1105.c (actually only SJA1110) uses a tail tag only for link-local traffic, which is non-IP so there is no checksum offload breakage there - tag_rtl8_4.c (the tail tagging version) has been added by yourself with a call to skb_checksum_help(). In any case, we give this advice to driver writers so off-hand that it's comical (I'm not attacking you, Luiz, for merely writing it down): | For category 3, when the offload hardware does not already expect the | switch tag in use, the checksum must be calculated before any tag is | inserted (i.e. inside the tagger). As if the tagging protocol driver has any crystal ball to guess whether the offload hardware of the DSA master in current use is going to expect the tail tag or not. BS. A tail tagging protocol concerned with correctness and portability is always going to call skb_checksum_help(), hence the absurdity of allowing tail taggers to inherit NETIF_F_HW_CSUM | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM from the master in the first place. > Due to various reasons (most common being category 1 taggers being associated > with DSA-unaware masters, mangling what the master perceives as MAC DA), the > tagging protocol may require the DSA master to operate in promiscuous mode, to > -- > 2.35.1 >