On 02-06-2020 22:18, Willem de Bruijn wrote: > On Tue, Jun 2, 2020 at 4:05 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >> >> On 02-06-2020 21:38, Willem de Bruijn wrote: >>> On Tue, Jun 2, 2020 at 3:22 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>> >>>> On 02-06-2020 21:03, Willem de Bruijn wrote: >>>>> On Tue, Jun 2, 2020 at 2:31 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>> On 02-06-2020 19:37, Willem de Bruijn wrote: >>>>>>> On Tue, Jun 2, 2020 at 1:03 PM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On 02-06-2020 16:29, Willem de Bruijn wrote: >>>>>>>>> On Tue, Jun 2, 2020 at 4:05 AM Victor Julien <victor@xxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> Introduce a new flag (TP_STATUS_CSUM_UNNECESSARY) to indicate >>>>>>>>>> that the driver has completely validated the checksums in the packet. >>>>>>>>>> >>>>>>>>>> The TP_STATUS_CSUM_UNNECESSARY flag differs from TP_STATUS_CSUM_VALID >>>>>>>>>> in that the new flag will only be set if all the layers are valid, >>>>>>>>>> while TP_STATUS_CSUM_VALID is set as well if only the IP layer is valid. >>>>>>>>> >>>>>>>>> transport, not ip checksum. >>>>>>>> >>>>>>>> Allow me a n00b question: what does transport refer to here? Things like >>>>>>>> ethernet? It isn't clear to me from the doc. >>>>>>> >>>>>>> The TCP/UDP/.. transport protocol checksum. >>>>>> >>>>>> Hmm that is what I thought originally, but then it didn't seem to work. >>>>>> Hence my patch. >>>>>> >>>>>> However I just redid my testing. I took the example tpacketv3 program >>>>>> and added the status flag checks to the 'display()' func: >>>>>> >>>>>> if (ppd->tp_status & TP_STATUS_CSUM_VALID) { >>>>>> printf("TP_STATUS_CSUM_VALID, "); >>>>>> } >>>>>> if (ppd->tp_status & (1<<8)) { >>>>>> printf("TP_STATUS_CSUM_UNNECESSARY, "); >>>>>> >>>>>> } >>>>>> >>>>>> Then using scapy sent some packets in 2 variants: >>>>>> - default (good csums) >>>>>> - deliberately bad csums >>>>>> (then also added a few things like ip6 over ip) >>>>>> >>>>>> >>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(), >>>>>> iface="enp1s0") // good csums >>>>>> >>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(chksum=1), >>>>>> iface="enp1s0") //bad tcp >>>>> >>>>> Is this a test between two machines? What is the device driver of the >>>>> machine receiving and printing the packet? It would be helpful to know >>>>> whether this uses CHECKSUM_COMPLETE or CHECKSUM_UNNECESSARY. >>>> >>>> Yes 2 machines, or actually 2 machines and a VM. The receiving Linux >>>> sits in a kvm vm with network pass through and uses the virtio driver >>>> (host uses e1000e). Based on a quick 'git grep CHECKSUM_UNNECESSARY' >>>> virtio seems to support that. >>>> >>>> I've done some more tests. In a pcap replay that I know contains packet >>>> with bad TCP csums (but good IP csums for those pkts), to a physical >>>> host running Ubuntu Linux kernel 5.3: >>>> >>>> - receiver uses nfp (netronome) driver: TP_STATUS_CSUM_VALID set for >>>> every packet, including the bad TCP ones >>>> - receiver uses ixgbe driver: TP_STATUS_CSUM_VALID not set for the bad >>>> packets. >>> >>> Great. Thanks a lot for running all these experiments. >>> >>> We might have to drop the TP_STATUS_CSUM_VALID with CHECKSUM_COMPLETE >>> unless skb->csum_valid. >>> >>> For packets with multiple transport layer checksums, >>> CHECKSUM_UNNECESSARY should mean that all have been verified. >>> >>> I believe that in the case of multiple transport headers, csum_valid >>> similarly ensures all checksums up to csum_start are valid. Will need >>> to double check. >>> >>> If so, there probably is no need for a separate new TP_STATUS. >>> TP_STATUS_CSUM_VALID is reported only when all checksums are valid. >> >> So if I understand you correctly the key may be in the call to >> `skb_csum_unnecessary`: >> >> That reads: >> >> static inline int skb_csum_unnecessary(const struct sk_buff *skb) >> { >> return ((skb->ip_summed == CHECKSUM_UNNECESSARY) || >> skb->csum_valid || >> (skb->ip_summed == CHECKSUM_PARTIAL && >> skb_checksum_start_offset(skb) >= 0)); >> } >> >> But really only the first 2 conditions are reachable > > .. from this codepath. That function is called in other codepaths as well. > >> , as we already know >> skb->ip_summed is not CHECKSUM_PARTIAL when we call it. >> >> So our unmodified check is: >> >> else if (skb->pkt_type != PACKET_OUTGOING && >> (skb->ip_summed == CHECKSUM_COMPLETE || >> skb->ip_summed == CHECKSUM_UNNECESSARY || >> skb->csum_valid)) >> >> Should this become something like: >> >> else if (skb->pkt_type != PACKET_OUTGOING && >> (skb->ip_summed == CHECKSUM_COMPLETE && >> skb->csum_valid) || >> skb->ip_summed == CHECKSUM_UNNECESSARY) >> >> Is this what you had in mind? > > I don't suggest modifying skb_csum_unnecessary probably. Certainly not > until I've looked at all other callers of it. > > But in case of packet sockets, yes, adding that csum_valid check is my > first rough approximation. > > That said, first let's give others more familiar with > TP_STATUS_CSUM_VALID some time to comment. > I did some more experiments, on real hw this time. I made the following change to 5.7.0 (wasn't brave enough to remote upgrade a box to netnext): diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 29bd405adbbd..3afb1913837a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2216,8 +2216,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; else if (skb->pkt_type != PACKET_OUTGOING && - (skb->ip_summed == CHECKSUM_COMPLETE || - skb_csum_unnecessary(skb))) + ((skb->ip_summed == CHECKSUM_COMPLETE && skb->csum_valid) || + skb->ip_summed == CHECKSUM_UNNECESSARY)) status |= TP_STATUS_CSUM_VALID; if (snaplen > res) With this change it seems the TP_STATUS_CSUM_VALID flag is *never* set for the nfp driver. The capture on the ixgbe driver looks unchanged, but that seems to make sense as I think it uses CHECKSUM_UNNECESSARY and not CHECKSUM_COMPLETE. For the nfp driver I have these settings: root@z820:~# ethtool -k ens3np0|grep rx rx-checksumming: on rx-vlan-offload: off [fixed] rx-vlan-filter: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] rx-udp_tunnel-port-offload: on tls-hw-rx-offload: off [fixed] rx-gro-hw: off [fixed] rx-gro-list: off Since the driver suggests 'rx-checksumming' is enabled, I'm wondering how we can actually get a result from it? Jakub, do you know? -- --------------------------------------------- Victor Julien http://www.inliniac.net/ PGP: http://www.inliniac.net/victorjulien.asc ---------------------------------------------