On Tue, 5 Nov 2024, chia-yu.chang@xxxxxxxxxxxxxxxxxxx wrote: > From: Ilpo Järvinen <ij@xxxxxxxxxx> > > Handling the CWR flag differs between RFC 3168 ECN and AccECN. > With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared > starting from 2nd segment which is incompatible how AccECN handles > the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN. > With AccECN, CWR flag (or more accurately, the ACE field that also > includes ECE & AE flags) changes only when new packet(s) with CE > mark arrives so the flag should not be changed within a super-skb. > The new skb/feature flags are necessary to prevent such TSO engines > corrupting AccECN ACE counters by clearing the CWR flag (if the > CWR handling feature cannot be turned off). > > If NIC is completely unaware of RFC3168 ECN (doesn't support > NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag > despite supporting also NETIF_F_TSO_ECN, TSO could be safely used > with AccECN on such NIC. This should be evaluated per NIC basis > (not done in this patch series for any NICs). > > For the cases, where TSO cannot keep its hands off the CWR flag, > a GSO fallback is provided by this patch. > > Signed-off-by: Ilpo Järvinen <ij@xxxxxxxxxx> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@xxxxxxxxxxxxxxxxxxx> > --- > include/linux/netdev_features.h | 8 +++++--- > include/linux/netdevice.h | 2 ++ > include/linux/skbuff.h | 2 ++ > net/ethtool/common.c | 1 + > net/ipv4/tcp_offload.c | 6 +++++- > 5 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h > index 66e7d26b70a4..c59db449bcf0 100644 > --- a/include/linux/netdev_features.h > +++ b/include/linux/netdev_features.h > @@ -53,12 +53,12 @@ enum { > NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */ > NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */ > NETIF_F_GSO_FRAGLIST_BIT, /* ... Fraglist GSO */ > + NETIF_F_GSO_ACCECN_BIT, /* TCP AccECN w/ TSO (no clear CWR) */ > /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ > - NETIF_F_GSO_FRAGLIST_BIT, > + NETIF_F_GSO_ACCECN_BIT, > > NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ > NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ > - __UNUSED_NETIF_F_37, > NETIF_F_NTUPLE_BIT, /* N-tuple filters supported */ > NETIF_F_RXHASH_BIT, /* Receive hashing offload */ > NETIF_F_RXCSUM_BIT, /* Receive checksumming offload */ > @@ -128,6 +128,7 @@ enum { > #define NETIF_F_SG __NETIF_F(SG) > #define NETIF_F_TSO6 __NETIF_F(TSO6) > #define NETIF_F_TSO_ECN __NETIF_F(TSO_ECN) > +#define NETIF_F_GSO_ACCECN __NETIF_F(GSO_ACCECN) > #define NETIF_F_TSO __NETIF_F(TSO) > #define NETIF_F_VLAN_CHALLENGED __NETIF_F(VLAN_CHALLENGED) > #define NETIF_F_RXFCS __NETIF_F(RXFCS) > @@ -210,7 +211,8 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) > NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID) > > /* List of features with software fallbacks. */ > -#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP | \ > +#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \ > + NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \ > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST) > > /* > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 6e0f8e4aeb14..480d915b3bdb 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -5076,6 +5076,8 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) > BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT)); > BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT)); > BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT)); > + BUILD_BUG_ON(SKB_GSO_TCP_ACCECN != > + (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT)); > > return (features & feature) == feature; > } > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 48f1e0fa2a13..530cb325fb86 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -694,6 +694,8 @@ enum { > SKB_GSO_UDP_L4 = 1 << 17, > > SKB_GSO_FRAGLIST = 1 << 18, > + > + SKB_GSO_TCP_ACCECN = 1 << 19, > }; > > #if BITS_PER_LONG > 32 > diff --git a/net/ethtool/common.c b/net/ethtool/common.c > index 0d62363dbd9d..5c3ba2dfaa74 100644 > --- a/net/ethtool/common.c > +++ b/net/ethtool/common.c > @@ -32,6 +32,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = { > [NETIF_F_TSO_BIT] = "tx-tcp-segmentation", > [NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust", > [NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation", > + [NETIF_F_GSO_ACCECN_BIT] = "tx-tcp-accecn-segmentation", > [NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation", > [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation", > [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation", > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > index 2308665b51c5..0b05f30e9e5f 100644 > --- a/net/ipv4/tcp_offload.c > +++ b/net/ipv4/tcp_offload.c > @@ -139,6 +139,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, > struct sk_buff *gso_skb = skb; > __sum16 newcheck; > bool ooo_okay, copy_destructor; > + bool ecn_cwr_mask; > __wsum delta; > > th = tcp_hdr(skb); > @@ -198,6 +199,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, > > newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta)); > > + ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN); > + > while (skb->next) { > th->fin = th->psh = 0; > th->check = newcheck; > @@ -217,7 +220,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, > th = tcp_hdr(skb); > > th->seq = htonl(seq); > - th->cwr = 0; > + > + th->cwr &= ecn_cwr_mask; Hi all, I started to wonder if this approach would avoid CWR corruption without need to introduce the SKB_GSO_TCP_ACCECN flag at all: - Never set SKB_GSO_TCP_ECN for any skb - Split CWR segment on TCP level before sending. While this causes need to split the super skb, it's once per RTT thing so does not sound very significant (compare e.g. with SACK that cause split on every ACK) - Remove th->cwr cleaning from GSO (the above line) - Likely needed: don't negotiate HOST/GUEST_ECN in virtio I think that would prevent offloading treating CWR flag as RFC3168 and CWR would be just copied as is like a normal flag which is required to not corrupt it. In practice, RFC3168 style traffic would just not combine anything with the CWR'ed packet as CWRs are singletons with RFC3168. Would that work? It sure sounds too simple to work but I cannot immediately see why it would not. -- i.