Martin KaFai Lau <kafai@xxxxxx> writes: > On Fri, Feb 18, 2022 at 06:50:29PM +0100, Toke Høiland-Jørgensen wrote: > [ .. ] > >> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >> new file mode 100644 >> index 000000000000..af3cffccc794 >> --- /dev/null >> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >> @@ -0,0 +1,85 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +#include <vmlinux.h> >> +#include <bpf/bpf_helpers.h> >> + >> +#define ETH_ALEN 6 >> +const volatile int ifindex_out; >> +const volatile int ifindex_in; >> +const volatile __u8 expect_dst[ETH_ALEN]; >> +volatile int pkts_seen_xdp = 0; >> +volatile int pkts_seen_tc = 0; >> +volatile int retcode = XDP_REDIRECT; >> + >> +SEC("xdp") >> +int xdp_redirect(struct xdp_md *xdp) >> +{ >> + __u32 *metadata = (void *)(long)xdp->data_meta; >> + void *data = (void *)(long)xdp->data; >> + int ret = retcode; >> + >> + if (xdp->ingress_ifindex != ifindex_in) >> + return XDP_ABORTED; >> + >> + if (metadata + 1 > data) >> + return XDP_ABORTED; >> + >> + if (*metadata != 0x42) >> + return XDP_ABORTED; >> + >> + if (bpf_xdp_adjust_meta(xdp, 4)) >> + return XDP_ABORTED; >> + >> + if (retcode > XDP_PASS) >> + retcode--; >> + >> + if (ret == XDP_REDIRECT) >> + return bpf_redirect(ifindex_out, 0); >> + >> + return ret; >> +} >> + >> +static bool check_pkt(void *data, void *data_end) >> +{ >> + struct ethhdr *eth = data; >> + struct ipv6hdr *iph = (void *)(eth + 1); >> + struct udphdr *udp = (void *)(iph + 1); >> + __u8 *payload = (void *)(udp + 1); >> + >> + if (payload + 1 > data_end) >> + return false; >> + >> + if (iph->nexthdr != IPPROTO_UDP || *payload != 0x42) >> + return false; >> + >> + /* reset the payload so the same packet doesn't get counted twice when >> + * it cycles back through the kernel path and out the dst veth >> + */ >> + *payload = 0; >> + return true; >> +} >> + >> +SEC("xdp") >> +int xdp_count_pkts(struct xdp_md *xdp) >> +{ >> + void *data = (void *)(long)xdp->data; >> + void *data_end = (void *)(long)xdp->data_end; >> + >> + if (check_pkt(data, data_end)) >> + pkts_seen_xdp++; >> + >> + return XDP_PASS; > If it is XDP_DROP here (@veth-ingress), the packet will be put back to > the page pool with zero-ed payload and that will be closer to the real > scenario when xmit-ing out of a real NIC instead of veth? Just to > ensure I understand the recycling and pkt rewrite description in patch > 2 correctly because it seems the test always getting a data init-ed > page. Ah, yeah, good point, we do end up releasing all the pages on the other end of the veth, so they don't get recycled. I'll change to XDP_DROP the packets, and change the xdp_redirect() function to explicitly set the payload instead of expecting it to come from userspace. > Regarding to the tcp trafficgen in the xdptool repo, > do you have thoughts on how to handle retransmit (e.g. after seeing > SACK or dupack)? Is it possible for the regular xdp receiver (i.e. > not test_run) to directly retransmit it after seeing SACK if it knows > the tcp payload? Hmm, that's an interesting idea. Yeah, I think it should be possible for the XDP program on the interface to reply with the missing packet directly: it can just resize the ACK coming in, rewrite the TCP header, fill it out with the payload, and return XDP_TX. However, this will obviously only work if every SACK can be fulfilled with a single re-transmission, which I don't think we can assume in the general case? So I think some more state needs to be kept; however, such direct reply hole-filling could potentially be a nice optimisation to have on top in any case, so thank you for the idea! > An off topic question, I expect the test_run approach is faster. > Mostly curious, do you have a rough guess on what may be the perf > difference with doing it in xsk? Good question. There certainly exists very high performance DPDK-based traffic generators; and AFAIK, XSK can more or less match DPDK performance in zero-copy mode, so in this case I think it should be possible to match the test_run in raw performance. Not sure about copy mode; and of course in both cases it comes with the usual limitation of having to dedicate a suitably configured NIC queue, whereas the in-kernel trafficgen can run without interfering with other traffic (except for taking up the link capacity, of course). -Toke