Re: [RFC PATCH net-next 2/5] net: split skb_checksum_help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 2, 2017 at 7:07 AM, Davide Caratti <dcaratti@xxxxxxxxxx> wrote:
> hello Tom and David,
>
> thank you for the attention.
>
>> From: Tom Herbert
>> >
>> > Sent: 23 January 2017 21:00
>> ..
>> >
>> > skb_checksum_help is specific to the Internet checksum. For instance,
>> > CHECKSUM_COMPLETE can _only_ refer to Internet checksum calculation
>> > nothing else will work. Checksums and CRCs are very different things
>> > with very different processing. They are not interchangeable, have
>> > very different properties, and hence it is a mistake to try to shoe
>> > horn things so that they use a common infrastructure.
>> >
>
> true, we don't need to test CHECKSUM_COMPLETE on skbs carrying SCTP.
> So maybe we can simply replace patches 2/5 and 3/5 with the smaller one at
> the bottom of this message.
>
>> > It might make sense to create some CRC helper functions, but last time
>> > I checked there are so few users of CRC in skbufs I'm not even sure
>> > that would make sense.
>
> This is exactly the cause of issues I see with SCTP. These packets can be
> wrongly checksummed using skb_checksum_help, or simply not checksummed at
> all; and in both cases, the packet goes out from the NIC with wrong L4
> checksum.
>
Okay, makes sense. Please consider doing the following:

- Add a bit to skbuf called something like "csum_not_inet". When
ip_summed == CHECKSUM_PARTIAL and this bit is set that means we are
dealing with something other than an Internet checksum.
- At the top of skb_checksum_help (or maybe before the point where the
inet specific checksum start begins do something like:

   if (unlikely(skb->csum_not_inet))
       return skb_checksum_help_not_inet(...);

   The rest of skb_checksum_help should remained unchanged.

- Add a description of the new bit and how skb_checksum_help can work
to the comments for CHECKSUM_PARTIAL in skbuff.h
- Add FCOE to the list of protocol that can set CHECKSUM_UNNECESSARY
for a CRC/csum
- Add a note to CHECKSUM_COMPLETE section that it can only refer to an
Internet checksum

Thanks,
Tom

> For example: there are scenarios, even the trivial one below, where skb
> carrying SCTP packets are wrongly checksummed, because the originating
> socket read NETIF_F_SCTP_CRC bit in the underlying device features.
> Then, after the kernel forwards the skb, the final transmission
> happens on another device where CRC offload is not available: this
> typically leads to bad checksums on transmitted SCTP packets.
>
>
>
> namespace 1 |                   namespace 2
>             |
>             |                      br0
>             |         +------- Linux bridge -------+
>             |         |                            |
>             |         V                            V
> vethA <-----------> vethB                        eth0
>             |
>             |
>
> when a socket bound to vethA in namespace 1 generates an INIT packet,
> it's not checksummed since veth devices have NETIF_F_SCTP_CRC set [1].
> Then, after vethB receives the packet in namespace 2, linux bridge
> forwards it to eth0, and (depending on eth0 driver code), it will be
> transmitted with wrong CRC32c or simply dropped.
>
> On Tue, 2017-01-24 at 16:35 +0000, David Laight wrote:
>>
>> I can imagine horrid things happening if someone tries to encapsulate
>> SCTP/IP in UDP (or worse UDP/IP in SCTP).
>>
>> For UDP in UDP I suspect that CHECKSUM_COMPLETE on an inner UDP packet
>> allows the outer checksum be calculated by ignoring the inner packet
>> (since it sums to zero).
>> This just isn't true if SCTP is involved.
>> There are tricks to generate a crc of a longer packet, but they'd only
>> work for SCTP in SCTP.
>>
>> For non-encapsulated packets it is a different matter.
>
> If we limit the scope to skbs having ip_summed equal to CHECKSUM_PARTIAL,
> like it's done in patch 4, we only need checksumming the packet starting
> from csum_start to its end, and copy the computed value to csum_offset.
> The difficult thing is discriminating skbs that need CRC32c, namely SCTP,
> from the rest of the traffic (that will likely be checksummed by
> skb_checksum_help).
>
> Currently, the only way to fix wrong CRCs in the scenario above is to
> configure tc filter with "csum" action on eth0 egress, to compensate the
> missing capability of eth0 driver to deal with SCTP packets having
> ip_summed equal to CHECKSUM_PARTIAL [2].
>
> Patch 4 in the series is an attempt to solve the issue, both for
> encapsulated and non-encapsulated skbs, calling skb_csum_hwoffload_help()
> inside validate_xmit_skb. In order to look for unchecksummed SCTP packets,
> I took inspiration from a Linux-4.4 commit (6ae23ad36253 "net: Add driver
> helper functions ...) to implement skb_csum_hwoffload_help, then I called
> it in validate_xmit_skb() to fix situations that can't be recovered by the
> NIC driver (it's the case where NETIF_F_CSUM_MASK bits are all zero).
>
> Today most NICs can provide at least HW offload for Internet Checksum:
> that's why I'm a bit doubtful if it's ok to spend extra CPU cycles in
> validate_xmit_skb() to ensure correct CRC in some scenarios.
> Since this issue affects some (not all) NICs, maybe it's better to drop
> patch 4, or part of it, and provide a fix for individual drivers that
> don't currently handle non-checksummed SCTP packets. But to do that, we
> need at least patch 1 and the small code below.
>
> ------------------- 8< --------------------------
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -200,7 +200,8 @@
>   *     accordingly. Note the there is no indication in the skbuff that the
>   *     CHECKSUM_PARTIAL refers to an FCOE checksum, a driver that supports
>   *     both IP checksum offload and FCOE CRC offload must verify which offload
> - *     is configured for a packet presumably by inspecting packet headers.
> + *     is configured for a packet presumably by inspecting packet headers; in
> + *     case, skb_sctp_csum_help is provided to compute CRC on SCTP packets.
>   *
>   * E. Checksumming on output with GSO.
>   *
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ad5959e..fa9be6d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2580,6 +2580,42 @@ int skb_checksum_help(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(skb_checksum_help);
>
> +int skb_sctp_csum_help(struct sk_buff *skb)
> +{
> +       __le32 crc32c_csum;
> +       int ret = 0, offset;
> +
> +       if (skb->ip_summed != CHECKSUM_PARTIAL)
> +               goto out;
> +       if (unlikely(skb_is_gso(skb)))
> +               goto out;
> +       if (skb_has_shared_frag(skb)) {
> +               ret = __skb_linearize(skb);
> +               if (ret)
> +                       goto out;
> +       }
> +
> +       offset = skb_checksum_start_offset(skb);
> +       crc32c_csum = cpu_to_le32(~__skb_checksum(skb, offset,
> +                                 skb->len - offset, ~(__u32)0,
> +                                 sctp_csum_stub));
> +
> +       offset += offsetof(struct sctphdr, checksum);
> +       BUG_ON(offset >= skb_headlen(skb));
> +
> +       if (skb_cloned(skb) &&
> +           !skb_clone_writable(skb, offset + sizeof(__le32))) {
> +               ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
> +               if (ret)
> +                       goto out;
> +       }
> +       *(__le32 *)(skb->data + offset) = crc32c_csum;
> +       skb->ip_summed = CHECKSUM_NONE;
> +out:
> +       return ret;
> +}
> +EXPORT_SYMBOL(skb_sctp_csum_help);
> +
>  __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
>  {
>         __be16 type = skb->protocol;
> --
> 2.7.4
> ------------------- >8 --------------------------
>
> Thank you again for paying attention to this, and I would appreciate if
> you share your opinion.
>
> Notes:
>
> [1] see commit c80fafbbb59e ("veth: sctp: add NETIF_F_SCTP_CRC to device
> features")
> [2] see commit c008b33f3ef ("net/sched: act_csum: compute crc32c on SCTP
> packets").  We could also turn off NETIF_F_SCTP_CRC bit from vethA, but
> this would generate useless crc32c calculations if the SCTP server is not
> outside the physical node (e.g. it is bound to br0), leading to a
> throughput degradation.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux