On Thu, Feb 2, 2017 at 7:07 AM, Davide Caratti <dcaratti@xxxxxxxxxx> wrote: > hello Tom and David, > > thank you for the attention. > >> From: Tom Herbert >> > >> > Sent: 23 January 2017 21:00 >> .. >> > >> > skb_checksum_help is specific to the Internet checksum. For instance, >> > CHECKSUM_COMPLETE can _only_ refer to Internet checksum calculation >> > nothing else will work. Checksums and CRCs are very different things >> > with very different processing. They are not interchangeable, have >> > very different properties, and hence it is a mistake to try to shoe >> > horn things so that they use a common infrastructure. >> > > > true, we don't need to test CHECKSUM_COMPLETE on skbs carrying SCTP. > So maybe we can simply replace patches 2/5 and 3/5 with the smaller one at > the bottom of this message. > >> > It might make sense to create some CRC helper functions, but last time >> > I checked there are so few users of CRC in skbufs I'm not even sure >> > that would make sense. > > This is exactly the cause of issues I see with SCTP. These packets can be > wrongly checksummed using skb_checksum_help, or simply not checksummed at > all; and in both cases, the packet goes out from the NIC with wrong L4 > checksum. > Okay, makes sense. Please consider doing the following: - Add a bit to skbuf called something like "csum_not_inet". When ip_summed == CHECKSUM_PARTIAL and this bit is set that means we are dealing with something other than an Internet checksum. - At the top of skb_checksum_help (or maybe before the point where the inet specific checksum start begins do something like: if (unlikely(skb->csum_not_inet)) return skb_checksum_help_not_inet(...); The rest of skb_checksum_help should remained unchanged. - Add a description of the new bit and how skb_checksum_help can work to the comments for CHECKSUM_PARTIAL in skbuff.h - Add FCOE to the list of protocol that can set CHECKSUM_UNNECESSARY for a CRC/csum - Add a note to CHECKSUM_COMPLETE section that it can only refer to an Internet checksum Thanks, Tom > For example: there are scenarios, even the trivial one below, where skb > carrying SCTP packets are wrongly checksummed, because the originating > socket read NETIF_F_SCTP_CRC bit in the underlying device features. > Then, after the kernel forwards the skb, the final transmission > happens on another device where CRC offload is not available: this > typically leads to bad checksums on transmitted SCTP packets. > > > > namespace 1 | namespace 2 > | > | br0 > | +------- Linux bridge -------+ > | | | > | V V > vethA <-----------> vethB eth0 > | > | > > when a socket bound to vethA in namespace 1 generates an INIT packet, > it's not checksummed since veth devices have NETIF_F_SCTP_CRC set [1]. > Then, after vethB receives the packet in namespace 2, linux bridge > forwards it to eth0, and (depending on eth0 driver code), it will be > transmitted with wrong CRC32c or simply dropped. > > On Tue, 2017-01-24 at 16:35 +0000, David Laight wrote: >> >> I can imagine horrid things happening if someone tries to encapsulate >> SCTP/IP in UDP (or worse UDP/IP in SCTP). >> >> For UDP in UDP I suspect that CHECKSUM_COMPLETE on an inner UDP packet >> allows the outer checksum be calculated by ignoring the inner packet >> (since it sums to zero). >> This just isn't true if SCTP is involved. >> There are tricks to generate a crc of a longer packet, but they'd only >> work for SCTP in SCTP. >> >> For non-encapsulated packets it is a different matter. > > If we limit the scope to skbs having ip_summed equal to CHECKSUM_PARTIAL, > like it's done in patch 4, we only need checksumming the packet starting > from csum_start to its end, and copy the computed value to csum_offset. > The difficult thing is discriminating skbs that need CRC32c, namely SCTP, > from the rest of the traffic (that will likely be checksummed by > skb_checksum_help). > > Currently, the only way to fix wrong CRCs in the scenario above is to > configure tc filter with "csum" action on eth0 egress, to compensate the > missing capability of eth0 driver to deal with SCTP packets having > ip_summed equal to CHECKSUM_PARTIAL [2]. > > Patch 4 in the series is an attempt to solve the issue, both for > encapsulated and non-encapsulated skbs, calling skb_csum_hwoffload_help() > inside validate_xmit_skb. In order to look for unchecksummed SCTP packets, > I took inspiration from a Linux-4.4 commit (6ae23ad36253 "net: Add driver > helper functions ...) to implement skb_csum_hwoffload_help, then I called > it in validate_xmit_skb() to fix situations that can't be recovered by the > NIC driver (it's the case where NETIF_F_CSUM_MASK bits are all zero). > > Today most NICs can provide at least HW offload for Internet Checksum: > that's why I'm a bit doubtful if it's ok to spend extra CPU cycles in > validate_xmit_skb() to ensure correct CRC in some scenarios. > Since this issue affects some (not all) NICs, maybe it's better to drop > patch 4, or part of it, and provide a fix for individual drivers that > don't currently handle non-checksummed SCTP packets. But to do that, we > need at least patch 1 and the small code below. > > ------------------- 8< -------------------------- > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -200,7 +200,8 @@ > * accordingly. Note the there is no indication in the skbuff that the > * CHECKSUM_PARTIAL refers to an FCOE checksum, a driver that supports > * both IP checksum offload and FCOE CRC offload must verify which offload > - * is configured for a packet presumably by inspecting packet headers. > + * is configured for a packet presumably by inspecting packet headers; in > + * case, skb_sctp_csum_help is provided to compute CRC on SCTP packets. > * > * E. Checksumming on output with GSO. > * > diff --git a/net/core/dev.c b/net/core/dev.c > index ad5959e..fa9be6d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2580,6 +2580,42 @@ int skb_checksum_help(struct sk_buff *skb) > } > EXPORT_SYMBOL(skb_checksum_help); > > +int skb_sctp_csum_help(struct sk_buff *skb) > +{ > + __le32 crc32c_csum; > + int ret = 0, offset; > + > + if (skb->ip_summed != CHECKSUM_PARTIAL) > + goto out; > + if (unlikely(skb_is_gso(skb))) > + goto out; > + if (skb_has_shared_frag(skb)) { > + ret = __skb_linearize(skb); > + if (ret) > + goto out; > + } > + > + offset = skb_checksum_start_offset(skb); > + crc32c_csum = cpu_to_le32(~__skb_checksum(skb, offset, > + skb->len - offset, ~(__u32)0, > + sctp_csum_stub)); > + > + offset += offsetof(struct sctphdr, checksum); > + BUG_ON(offset >= skb_headlen(skb)); > + > + if (skb_cloned(skb) && > + !skb_clone_writable(skb, offset + sizeof(__le32))) { > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > + if (ret) > + goto out; > + } > + *(__le32 *)(skb->data + offset) = crc32c_csum; > + skb->ip_summed = CHECKSUM_NONE; > +out: > + return ret; > +} > +EXPORT_SYMBOL(skb_sctp_csum_help); > + > __be16 skb_network_protocol(struct sk_buff *skb, int *depth) > { > __be16 type = skb->protocol; > -- > 2.7.4 > ------------------- >8 -------------------------- > > Thank you again for paying attention to this, and I would appreciate if > you share your opinion. > > Notes: > > [1] see commit c80fafbbb59e ("veth: sctp: add NETIF_F_SCTP_CRC to device > features") > [2] see commit c008b33f3ef ("net/sched: act_csum: compute crc32c on SCTP > packets"). We could also turn off NETIF_F_SCTP_CRC bit from vethA, but > this would generate useless crc32c calculations if the SCTP server is not > outside the physical node (e.g. it is bound to br0), leading to a > throughput degradation. > -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html