Re: [RFC PATCH net-next 2/5] net: split skb_checksum_help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello Tom and David,

thank you for the attention.

> From: Tom Herbert
> > 
> > Sent: 23 January 2017 21:00
> ..
> > 
> > skb_checksum_help is specific to the Internet checksum. For instance,
> > CHECKSUM_COMPLETE can _only_ refer to Internet checksum calculation
> > nothing else will work. Checksums and CRCs are very different things
> > with very different processing. They are not interchangeable, have
> > very different properties, and hence it is a mistake to try to shoe
> > horn things so that they use a common infrastructure.
> > 

true, we don't need to test CHECKSUM_COMPLETE on skbs carrying SCTP.
So maybe we can simply replace patches 2/5 and 3/5 with the smaller one at
the bottom of this message.

> > It might make sense to create some CRC helper functions, but last time
> > I checked there are so few users of CRC in skbufs I'm not even sure
> > that would make sense.

This is exactly the cause of issues I see with SCTP. These packets can be
wrongly checksummed using skb_checksum_help, or simply not checksummed at
all; and in both cases, the packet goes out from the NIC with wrong L4
checksum.

For example: there are scenarios, even the trivial one below, where skb
carrying SCTP packets are wrongly checksummed, because the originating
socket read NETIF_F_SCTP_CRC bit in the underlying device features.
Then, after the kernel forwards the skb, the final transmission
happens on another device where CRC offload is not available: this
typically leads to bad checksums on transmitted SCTP packets.



namespace 1 |                   namespace 2
            |
            |                      br0
            |         +------- Linux bridge -------+
            |         |                            |
            |         V                            V
vethA <-----------> vethB                        eth0
            |
            |

when a socket bound to vethA in namespace 1 generates an INIT packet,
it's not checksummed since veth devices have NETIF_F_SCTP_CRC set [1].
Then, after vethB receives the packet in namespace 2, linux bridge
forwards it to eth0, and (depending on eth0 driver code), it will be
transmitted with wrong CRC32c or simply dropped.

On Tue, 2017-01-24 at 16:35 +0000, David Laight wrote:
> 
> I can imagine horrid things happening if someone tries to encapsulate
> SCTP/IP in UDP (or worse UDP/IP in SCTP).
> 
> For UDP in UDP I suspect that CHECKSUM_COMPLETE on an inner UDP packet
> allows the outer checksum be calculated by ignoring the inner packet
> (since it sums to zero).
> This just isn't true if SCTP is involved.
> There are tricks to generate a crc of a longer packet, but they'd only
> work for SCTP in SCTP.
> 
> For non-encapsulated packets it is a different matter.

If we limit the scope to skbs having ip_summed equal to CHECKSUM_PARTIAL,
like it's done in patch 4, we only need checksumming the packet starting
from csum_start to its end, and copy the computed value to csum_offset.
The difficult thing is discriminating skbs that need CRC32c, namely SCTP,
from the rest of the traffic (that will likely be checksummed by
skb_checksum_help).

Currently, the only way to fix wrong CRCs in the scenario above is to
configure tc filter with "csum" action on eth0 egress, to compensate the
missing capability of eth0 driver to deal with SCTP packets having
ip_summed equal to CHECKSUM_PARTIAL [2].

Patch 4 in the series is an attempt to solve the issue, both for
encapsulated and non-encapsulated skbs, calling skb_csum_hwoffload_help()
inside validate_xmit_skb. In order to look for unchecksummed SCTP packets,
I took inspiration from a Linux-4.4 commit (6ae23ad36253 "net: Add driver
helper functions ...) to implement skb_csum_hwoffload_help, then I called
it in validate_xmit_skb() to fix situations that can't be recovered by the
NIC driver (it's the case where NETIF_F_CSUM_MASK bits are all zero).

Today most NICs can provide at least HW offload for Internet Checksum:
that's why I'm a bit doubtful if it's ok to spend extra CPU cycles in
validate_xmit_skb() to ensure correct CRC in some scenarios. 
Since this issue affects some (not all) NICs, maybe it's better to drop
patch 4, or part of it, and provide a fix for individual drivers that
don't currently handle non-checksummed SCTP packets. But to do that, we
need at least patch 1 and the small code below.

------------------- 8< --------------------------
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -200,7 +200,8 @@
  *     accordingly. Note the there is no indication in the skbuff that the
  *     CHECKSUM_PARTIAL refers to an FCOE checksum, a driver that supports
  *     both IP checksum offload and FCOE CRC offload must verify which offload
- *     is configured for a packet presumably by inspecting packet headers.
+ *     is configured for a packet presumably by inspecting packet headers; in
+ *     case, skb_sctp_csum_help is provided to compute CRC on SCTP packets.
  *
  * E. Checksumming on output with GSO.
  *
diff --git a/net/core/dev.c b/net/core/dev.c
index ad5959e..fa9be6d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2580,6 +2580,42 @@ int skb_checksum_help(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(skb_checksum_help);
 
+int skb_sctp_csum_help(struct sk_buff *skb)
+{
+	__le32 crc32c_csum;
+	int ret = 0, offset;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		goto out;
+	if (unlikely(skb_is_gso(skb)))
+		goto out;
+	if (skb_has_shared_frag(skb)) {
+		ret = __skb_linearize(skb);
+		if (ret)
+			goto out;
+	}
+
+	offset = skb_checksum_start_offset(skb);
+	crc32c_csum = cpu_to_le32(~__skb_checksum(skb, offset,
+				  skb->len - offset, ~(__u32)0,
+				  sctp_csum_stub));
+
+	offset += offsetof(struct sctphdr, checksum);
+	BUG_ON(offset >= skb_headlen(skb));
+
+	if (skb_cloned(skb) &&
+	    !skb_clone_writable(skb, offset + sizeof(__le32))) {
+		ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+		if (ret)
+			goto out;
+	}
+	*(__le32 *)(skb->data + offset) = crc32c_csum;
+	skb->ip_summed = CHECKSUM_NONE;
+out:
+	return ret;
+}
+EXPORT_SYMBOL(skb_sctp_csum_help);
+
 __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
 {
 	__be16 type = skb->protocol;
-- 
2.7.4
------------------- >8 --------------------------

Thank you again for paying attention to this, and I would appreciate if
you share your opinion.

Notes:

[1] see commit c80fafbbb59e ("veth: sctp: add NETIF_F_SCTP_CRC to device
features")
[2] see commit c008b33f3ef ("net/sched: act_csum: compute crc32c on SCTP
packets").  We could also turn off NETIF_F_SCTP_CRC bit from vethA, but
this would generate useless crc32c calculations if the SCTP server is not
outside the physical node (e.g. it is bound to br0), leading to a
throughput degradation.

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux