Re: [RFC PATCH net-next 3/3] sctp: Add GSO support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 29, 2016 at 11:15:54AM -0800, Alexander Duyck wrote:
> On Wed, Jan 27, 2016 at 9:06 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@xxxxxxxxx> wrote:
> > This patch enables SCTP to do GSO.
> >
> > SCTP has this pecualiarty that its packets cannot be just segmented to
> > (P)MTU. Its chunks must be contained in IP segments, padding respected.
> > So we can't just generate a big skb, set gso_size to the fragmentation
> > point and deliver it to IP layer.
> >
> > Instead, this patch proposes that SCTP build a skb as it would be if it
> > was received using GRO. That is, there will be a cover skb with the
> > headers (incluing SCTP one) and children ones containing the actual SCTP
> > chunks, already segmented in a way that respects SCTP RFCs and MTU.
> >
> > This way SCTP can benefit from GSO and instead of passing several
> > packets through the stack, it can pass a single large packet if there
> > are enough data queued and cwnd allows.
> >
> > Main points that need help:
> > - Usage of skb_gro_receive()
> >   It fits nicely in there and properly handles offsets/lens, though the
> >   name means another thing. If you agree with this usage, we can rename
> >   it to something like skb_coalesce
> >
> > - Checksum handling
> >   Why only packets with checksum offloaded can be GSOed? Most of the
> >   NICs doesn't support SCTP CRC offloading and this will nearly defeat
> >   this feature. If checksum is being computed in sw, it doesn't really
> >   matter if it's earlier or later, right?
> >   This patch hacks skb_needs_check() to allow using GSO with sw-computed
> >   checksums.
> >   Also the meaning of UNNECESSARY and NONE are quite foggy to me yet and
> >   its usage may be wrong.
> >
> > - gso_size = 1
> >   There is skb_is_gso() all over the stack and it basically checks for
> >   non-zero skb_shinfo(skb)->gso_size. Setting it to 1 is the hacky way I
> >   found to keep skb_is_gso() working while being able to signal to
> >   skb_segment() that it shouldn't use gso_size but instead the fragment
> >   sizes themselves. skb_segment() will mainly just unpack the skb then.
> 
> Instead of 1 why not use 0xFFFF?  It is a value that can never be used
> for a legitimate segment size since IP total length is a 16 bit value
> and includes the IP header in the size.

Just felt that 1 was unpractical. But perhaps with no hard restriction
like the one for 0xFFFF you said. I can replace it, 0xFFFF is better.

> > - socket / gso max values
> >   usage of sk_setup_caps() still needs a review
> >
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>
> > ---
> >  include/linux/netdev_features.h |   7 +-
> >  include/linux/netdevice.h       |   1 +
> >  net/core/dev.c                  |   6 +-
> >  net/core/skbuff.c               |  12 +-
> >  net/ipv4/af_inet.c              |   1 +
> >  net/sctp/offload.c              |  53 +++++++
> >  net/sctp/output.c               | 338 +++++++++++++++++++++++++---------------
> >  net/sctp/socket.c               |   2 +
> >  8 files changed, 292 insertions(+), 128 deletions(-)
> >
> > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> > index d9654f0eecb3519383441afa6b131ff9a5898485..f678998841f1800e0f2fe416a79935197d4ed305 100644
> > --- a/include/linux/netdev_features.h
> > +++ b/include/linux/netdev_features.h
> > @@ -48,8 +48,9 @@ enum {
> >         NETIF_F_GSO_UDP_TUNNEL_BIT,     /* ... UDP TUNNEL with TSO */
> >         NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
> >         NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */
> > +       NETIF_F_GSO_SCTP_BIT,           /* ... SCTP fragmentation */
> >         /**/NETIF_F_GSO_LAST =          /* last bit, see GSO_MASK */
> > -               NETIF_F_GSO_TUNNEL_REMCSUM_BIT,
> > +               NETIF_F_GSO_SCTP_BIT,
> >
> >         NETIF_F_FCOE_CRC_BIT,           /* FCoE CRC32 */
> >         NETIF_F_SCTP_CRC_BIT,           /* SCTP checksum offload */
> > @@ -119,6 +120,7 @@ enum {
> >  #define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL)
> >  #define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
> >  #define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
> > +#define NETIF_F_GSO_SCTP       __NETIF_F(GSO_SCTP)
> >  #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
> >  #define NETIF_F_HW_VLAN_STAG_RX        __NETIF_F(HW_VLAN_STAG_RX)
> >  #define NETIF_F_HW_VLAN_STAG_TX        __NETIF_F(HW_VLAN_STAG_TX)
> > @@ -144,7 +146,8 @@ enum {
> >
> >  /* List of features with software fallbacks. */
> >  #define NETIF_F_GSO_SOFTWARE   (NETIF_F_TSO | NETIF_F_TSO_ECN | \
> > -                                NETIF_F_TSO6 | NETIF_F_UFO)
> > +                                NETIF_F_TSO6 | NETIF_F_UFO | \
> > +                                NETIF_F_GSO_SCTP)
> >
> >  /* List of IP checksum features. Note that NETIF_F_ HW_CSUM should not be
> >   * set in features when NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM are set--
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 289c2314d76668b8357728382bb33d6828617458..ce14fab858bf96dd0f85aca237350c8d8317756e 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3928,6 +3928,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
> >         BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
> >         BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
> >         BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));
> > +       BUILD_BUG_ON(SKB_GSO_SCTP    != (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT));
> >
> >         return (features & feature) == feature;
> >  }
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d852f251c503b193823b71b27aaef3fb3ae..9583284086967c0746de5f553535e25e125714a5 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2680,7 +2680,11 @@ EXPORT_SYMBOL(skb_mac_gso_segment);
> >  static inline bool skb_needs_check(struct sk_buff *skb, bool tx_path)
> >  {
> >         if (tx_path)
> > -               return skb->ip_summed != CHECKSUM_PARTIAL;
> > +               /* FIXME: Why only packets with checksum offloading are
> > +                * supported for GSO?
> > +                */
> > +               return skb->ip_summed != CHECKSUM_PARTIAL &&
> > +                      skb->ip_summed != CHECKSUM_UNNECESSARY;
> >         else
> >                 return skb->ip_summed == CHECKSUM_NONE;
> >  }
> 
> Tom Herbert just got rid of the use of CHECKSUM_UNNECESSARY in the
> transmit path a little while ago.  Please don't reintroduce it.

Can you give me some pointers on that? I cannot find such change.
skb_needs_check() seems to be like that since beginning.

> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 704b69682085dec77f3d0f990aaf0024afd705b9..96f223f8d769d2765fd64348830c76cb222906c8 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -3017,8 +3017,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> >                 int size;
> >
> >                 len = head_skb->len - offset;
> > -               if (len > mss)
> > -                       len = mss;
> > +               if (len > mss) {
> > +                       /* FIXME: A define is surely welcomed, but maybe
> > +                        * shinfo->txflags is better for this flag, but
> > +                        * we need to expand it then
> > +                        */
> > +                       if (mss == 1)
> > +                               len = list_skb->len;
> > +                       else
> > +                               len = mss;
> > +               }
> >
> 
> Using 0xFFFF here as a flag with the MSS value would likely be much
> more readable.

Either way it will be replaced by a define/name instead.

> >                 hsize = skb_headlen(head_skb) - offset;
> >                 if (hsize < 0)
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 5c5db6636704daa0c49fc13e84b2c5b282a44ed3..ec1c779bb664d1399d74f2bd7016e30b648ce47d 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -1220,6 +1220,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> >                        SKB_GSO_UDP_TUNNEL |
> >                        SKB_GSO_UDP_TUNNEL_CSUM |
> >                        SKB_GSO_TUNNEL_REMCSUM |
> > +                      SKB_GSO_SCTP |
> >                        0)))
> >                 goto out;
> >
> > diff --git a/net/sctp/offload.c b/net/sctp/offload.c
> > index 7080a6318da7110c1688dd0c5bb240356dbd0cd3..3b96035fa180a4e7195f7b6e7a8be7b97c8f8b26 100644
> > --- a/net/sctp/offload.c
> > +++ b/net/sctp/offload.c
> > @@ -36,8 +36,61 @@
> >  #include <net/sctp/checksum.h>
> >  #include <net/protocol.h>
> >
> > +static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
> > +{
> > +       skb->ip_summed = CHECKSUM_NONE;
> > +       return sctp_compute_cksum(skb, skb_transport_offset(skb));
> > +}
> > +
> 
> I really despise the naming of this bit here.  SCTP does not use a
> checksum.  It uses a CRC.  Please don't call this a checksum as it
> will just make the code really confusing.   I think the name should be
> something like gso_make_crc32c.

Agreed. SCTP code still references it as 'cksum'. I'll change that in
another patch.

> I think we need to address the CRC issues before we can really get
> into segmentation.  Specifically we need to be able to offload SCTP
> and FCoE in software since they both use the CHECKSUM_PARTIAL value
> and then we can start cleaning up more of this mess and move onto
> segmentation.

Hm? The mess on CRC issues here is caused by this patch alone. It's good
as it is today. And a good part of this mess is caused by trying to GSO
without offloading CRC too.

Or you mean that SCTP and FCoE should stop using CHECKSUM_* at all?

> > +static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
> > +                                       netdev_features_t features)
> > +{
> > +       struct sk_buff *segs = ERR_PTR(-EINVAL);
> > +       struct sctphdr *sh;
> > +
> > +       sh = sctp_hdr(skb);
> > +       if (!pskb_may_pull(skb, sizeof(*sh)))
> > +               goto out;
> > +
> > +       __skb_pull(skb, sizeof(*sh));
> > +
> > +       if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
> > +               /* Packet is from an untrusted source, reset gso_segs. */
> > +               int type = skb_shinfo(skb)->gso_type;
> > +
> > +               if (unlikely(type &
> > +                            ~(SKB_GSO_SCTP | SKB_GSO_DODGY |
> > +                              0) ||
> > +                            !(type & (SKB_GSO_SCTP))))
> > +                       goto out;
> > +
> > +               /* This should not happen as no NIC has SCTP GSO
> > +                * offloading, it's always via software and thus we
> > +                * won't send a large packet down the stack.
> > +                */
> > +               WARN_ONCE(1, "SCTP segmentation offloading to NICs is not supported.");
> > +               goto out;
> > +       }
> > +
> 
> So what you are going to end up needing here is some way to tell the
> hardware that you are doing the checksum no matter what.  There is no
> value in you computing a 1's compliment checksum for the payload if
> you aren't going to use it.  What you can probably do is just clear
> the standard checksum flags and then OR in NETIF_F_HW_CSUM if
> NETIF_F_SCTP_CRC is set and that should get skb_segment to skip
> offloading the checksum.

Interesting, ok

> One other bit that will make this more complicated is if we ever get
> around to supporting SCTP in tunnels.  Then we will need to sort out
> how things like remote checksum offload should impact SCTP, and how to
> deal with needing to compute both a CRC and 1's compliment checksum.
> What we would probably need to do is check for encap_hdr_csum and if
> it is set and we are doing SCTP then we would need to clear the
> NETIF_F_HW_CSUM, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM flags.

Yup. And that includes on storing pointers to where to store each of it.

> > +       segs = skb_segment(skb, features);
> > +       if (IS_ERR(segs))
> > +               goto out;
> > +
> > +       /* All that is left is update SCTP CRC if necessary */
> > +       for (skb = segs; skb; skb = skb->next) {
> > +               if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > +                       sh = sctp_hdr(skb);
> > +                       sh->checksum = sctp_gso_make_checksum(skb);
> > +               }
> > +       }
> > +
> 
> Okay, so it looks like you are doing the right thing here and leaving
> this as CHECKSUM_PARTIAL.

Actually no then. sctp_gso_make_checksum() replaces it:
+static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
+{
+       skb->ip_summed = CHECKSUM_NONE;
+       return sctp_compute_cksum(skb, skb_transport_offset(skb));

Why again would have to leave it as CHECKSUM_PARTIAL? IP header?

> > +out:
> > +       return segs;
> > +}
> > +
> >  static const struct net_offload sctp_offload = {
> >         .callbacks = {
> > +               .gso_segment = sctp_gso_segment,
> >         },
> >  };

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux