On 04-06-2020 15:48, Willem de Bruijn wrote: > On Thu, Jun 4, 2020 at 5:47 AM Victor Julien <victor@xxxxxxxxxxxx> wrote: >> >> On 02-06-2020 22:18, Willem de Bruijn wrote: >>> On Tue, Jun 2, 2020 at 4:05 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>> >>>> On 02-06-2020 21:38, Willem de Bruijn wrote: >>>>> On Tue, Jun 2, 2020 at 3:22 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>> >>>>>> On 02-06-2020 21:03, Willem de Bruijn wrote: >>>>>>> On Tue, Jun 2, 2020 at 2:31 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>>>> On 02-06-2020 19:37, Willem de Bruijn wrote: >>>>>>>>> On Tue, Jun 2, 2020 at 1:03 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On 02-06-2020 16:29, Willem de Bruijn wrote: >>>>>>>>>>> On Tue, Jun 2, 2020 at 4:05 AM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Introduce a new flag (TP_STATUS_CSUM_UNNECESSARY) to indicate >>>>>>>>>>>> that the driver has completely validated the checksums in the packet. >>>>>>>>>>>> >>>>>>>>>>>> The TP_STATUS_CSUM_UNNECESSARY flag differs from TP_STATUS_CSUM_VALID >>>>>>>>>>>> in that the new flag will only be set if all the layers are valid, >>>>>>>>>>>> while TP_STATUS_CSUM_VALID is set as well if only the IP layer is valid. >>>>>>>>>>> >>>>>>>>>>> transport, not ip checksum. >>>>>>>>>> >>>>>>>>>> Allow me a n00b question: what does transport refer to here? Things like >>>>>>>>>> ethernet? It isn't clear to me from the doc. >>>>>>>>> >>>>>>>>> The TCP/UDP/.. transport protocol checksum. >>>>>>>> >>>>>>>> Hmm that is what I thought originally, but then it didn't seem to work. >>>>>>>> Hence my patch. >>>>>>>> >>>>>>>> However I just redid my testing. I took the example tpacketv3 program >>>>>>>> and added the status flag checks to the 'display()' func: >>>>>>>> >>>>>>>> if (ppd->tp_status & TP_STATUS_CSUM_VALID) { >>>>>>>> printf("TP_STATUS_CSUM_VALID, "); >>>>>>>> } >>>>>>>> if (ppd->tp_status & (1<<8)) { >>>>>>>> printf("TP_STATUS_CSUM_UNNECESSARY, "); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Then using scapy sent some packets in 2 variants: >>>>>>>> - default (good csums) >>>>>>>> - deliberately bad csums >>>>>>>> (then also added a few things like ip6 over ip) >>>>>>>> >>>>>>>> >>>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(), >>>>>>>> iface="enp1s0") // good csums >>>>>>>> >>>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(chksum=1), >>>>>>>> iface="enp1s0") //bad tcp >>>>>>> >>>>>>> Is this a test between two machines? What is the device driver of the >>>>>>> machine receiving and printing the packet? It would be helpful to know >>>>>>> whether this uses CHECKSUM_COMPLETE or CHECKSUM_UNNECESSARY. >>>>>> >>>>>> Yes 2 machines, or actually 2 machines and a VM. The receiving Linux >>>>>> sits in a kvm vm with network pass through and uses the virtio driver >>>>>> (host uses e1000e). Based on a quick 'git grep CHECKSUM_UNNECESSARY' >>>>>> virtio seems to support that. >>>>>> >>>>>> I've done some more tests. In a pcap replay that I know contains packet >>>>>> with bad TCP csums (but good IP csums for those pkts), to a physical >>>>>> host running Ubuntu Linux kernel 5.3: >>>>>> >>>>>> - receiver uses nfp (netronome) driver: TP_STATUS_CSUM_VALID set for >>>>>> every packet, including the bad TCP ones >>>>>> - receiver uses ixgbe driver: TP_STATUS_CSUM_VALID not set for the bad >>>>>> packets. >>>>> >>>>> Great. Thanks a lot for running all these experiments. >>>>> >>>>> We might have to drop the TP_STATUS_CSUM_VALID with CHECKSUM_COMPLETE >>>>> unless skb->csum_valid. >>>>> >>>>> For packets with multiple transport layer checksums, >>>>> CHECKSUM_UNNECESSARY should mean that all have been verified. >>>>> >>>>> I believe that in the case of multiple transport headers, csum_valid >>>>> similarly ensures all checksums up to csum_start are valid. Will need >>>>> to double check. >>>>> >>>>> If so, there probably is no need for a separate new TP_STATUS. >>>>> TP_STATUS_CSUM_VALID is reported only when all checksums are valid. >>>> >>>> So if I understand you correctly the key may be in the call to >>>> `skb_csum_unnecessary`: >>>> >>>> That reads: >>>> >>>> static inline int skb_csum_unnecessary(const struct sk_buff *skb) >>>> { >>>> return ((skb->ip_summed == CHECKSUM_UNNECESSARY) || >>>> skb->csum_valid || >>>> (skb->ip_summed == CHECKSUM_PARTIAL && >>>> skb_checksum_start_offset(skb) >= 0)); >>>> } >>>> >>>> But really only the first 2 conditions are reachable >>> >>> .. from this codepath. That function is called in other codepaths as well. >>> >>>> , as we already know >>>> skb->ip_summed is not CHECKSUM_PARTIAL when we call it. >>>> >>>> So our unmodified check is: >>>> >>>> else if (skb->pkt_type != PACKET_OUTGOING && >>>> (skb->ip_summed == CHECKSUM_COMPLETE || >>>> skb->ip_summed == CHECKSUM_UNNECESSARY || >>>> skb->csum_valid)) >>>> >>>> Should this become something like: >>>> >>>> else if (skb->pkt_type != PACKET_OUTGOING && >>>> (skb->ip_summed == CHECKSUM_COMPLETE && >>>> skb->csum_valid) || >>>> skb->ip_summed == CHECKSUM_UNNECESSARY) >>>> >>>> Is this what you had in mind? >>> >>> I don't suggest modifying skb_csum_unnecessary probably. Certainly not >>> until I've looked at all other callers of it. >>> >>> But in case of packet sockets, yes, adding that csum_valid check is my >>> first rough approximation. >>> >>> That said, first let's give others more familiar with >>> TP_STATUS_CSUM_VALID some time to comment. >>> >> >> I did some more experiments, on real hw this time. I made the following >> change to 5.7.0 (wasn't brave enough to remote upgrade a box to netnext): >> >> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c >> index 29bd405adbbd..3afb1913837a 100644 >> --- a/net/packet/af_packet.c >> +++ b/net/packet/af_packet.c >> @@ -2216,8 +2216,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct >> net_device *dev, >> if (skb->ip_summed == CHECKSUM_PARTIAL) >> status |= TP_STATUS_CSUMNOTREADY; >> else if (skb->pkt_type != PACKET_OUTGOING && >> - (skb->ip_summed == CHECKSUM_COMPLETE || >> - skb_csum_unnecessary(skb))) >> + ((skb->ip_summed == CHECKSUM_COMPLETE && >> skb->csum_valid) || >> + skb->ip_summed == CHECKSUM_UNNECESSARY)) >> status |= TP_STATUS_CSUM_VALID; >> >> if (snaplen > res) >> >> With this change it seems the TP_STATUS_CSUM_VALID flag is *never* set >> for the nfp driver. > > I was mistaken. skb->csum_valid only signals whether the skb->csum > field is initialized. As of commit 573e8fca255a ("net: skb_gro_checksum_* > functions") skb->csum_valid it is always set if CHECKSUM_COMPLETE. > This does not imply that the checksum field in the header is correct. > > The checksum field may get checked against the known checksum of > the payload in skb->csum before __netif_receive_skb_core and thus > before packet sockets during GRO when that is enabled. But not > always. Not if the packet gets flushed, for instance, see tcp4_gro_receive. > > Commit 662880f44203 ("net: Allow GRO to use and set levels of checksum > unnecessary") indicates that the original assumption in this patch > that CHECKSUM_UNNECESSARY implies all checksums being valid does not > necessarily hold. Drivers are expected to set up skb->csum_level when > they have verified more than just the inner transport header. > I think I found another case in the kernel that does seem to assume we can rely on skb_csum_unnecessary. 496e4ae7dc94 ("netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flag") seems to try to do what I'm after for nfqueue, but with an inverted flag. I assume that if the flag is not set (and neither NFQA_SKB_CSUMNOTREADY) it means we should be able to infer that the csums are valid. Otherwise, what would be the point of the flag. The logic seems to come down to: csum_verify = !skb_csum_unnecessary(entskb); (for ip_summed != CHECKSUM_PARTIAL) The it's passed to userspace: if (packet->ip_summed == CHECKSUM_PARTIAL) flags = NFQA_SKB_CSUMNOTREADY; else if (csum_verify) flags = NFQA_SKB_CSUM_NOTVERIFIED; So according to this code, if skb_csum_unnecessary returns false the csums is not verified, implying that it is when skb_csum_unnecessary returns true. I have no idea if this can be mapped directly to af-packet like this. Despite reading 77cffe23c1f8 ("net: Clarification of CHECKSUM_UNNECESSARY") multiple times I'm still not sure. If we get a straightforward IPv4/TCP or IPv6/UDP does it mean that if CHECKSUM_UNNECESSARY is set we can trust the csums of those layers are validated? If properly documented that would cover all the use cases I initially care about, although it would of course be nice if the kernel already knows the VXLAN encapsulated traffic was also verified that we can pass this on as well. -- --------------------------------------------- Victor Julien http://www.inliniac.net/ PGP: http://www.inliniac.net/victorjulien.asc ---------------------------------------------