John Fastabend <john.fastabend@xxxxxxxxx> writes: > Hangbin Liu wrote: >> From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> >> >> This changes the devmap XDP program support to run the program when the >> bulk queue is flushed instead of before the frame is enqueued. This has >> a couple of benefits: >> >> - It "sorts" the packets by destination devmap entry, and then runs the >> same BPF program on all the packets in sequence. This ensures that we >> keep the XDP program and destination device properties hot in I-cache. >> >> - It makes the multicast implementation simpler because it can just >> enqueue packets using bq_enqueue() without having to deal with the >> devmap program at all. >> >> The drawback is that if the devmap program drops the packet, the enqueue >> step is redundant. However, arguably this is mostly visible in a >> micro-benchmark, and with more mixed traffic the I-cache benefit should >> win out. The performance impact of just this patch is as follows: >> >> The bq_xmit_all's logic is also refactored and error label is removed. >> When bq_xmit_all() is called from bq_enqueue(), another packet will >> always be enqueued immediately after, so clearing dev_rx, xdp_prog and >> flush_node in bq_xmit_all() is redundant. Let's move the clear to >> __dev_flush(), and only check them once in bq_enqueue() since they are >> all modified together. >> >> By using xdp_redirect_map in sample/bpf and send pkts via pktgen cmd: >> ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64 >> >> There are about +/- 0.1M deviation for native testing, the performance >> improved for the base-case, but some drop back with xdp devmap prog attached. >> >> Version | Test | Generic | Native | Native + 2nd xdp_prog >> 5.10 rc6 | xdp_redirect_map i40e->i40e | 2.0M | 9.1M | 8.0M >> 5.10 rc6 | xdp_redirect_map i40e->veth | 1.7M | 11.0M | 9.7M >> 5.10 rc6 + patch | xdp_redirect_map i40e->i40e | 2.0M | 9.5M | 7.5M >> 5.10 rc6 + patch | xdp_redirect_map i40e->veth | 1.7M | 11.6M | 9.1M >> > > [...] > >> +static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog, >> + struct xdp_frame **frames, int n, >> + struct net_device *dev) >> +{ >> + struct xdp_txq_info txq = { .dev = dev }; >> + struct xdp_buff xdp; >> + int i, nframes = 0; >> + >> + for (i = 0; i < n; i++) { >> + struct xdp_frame *xdpf = frames[i]; >> + u32 act; >> + int err; >> + >> + xdp_convert_frame_to_buff(xdpf, &xdp); >> + xdp.txq = &txq; >> + >> + act = bpf_prog_run_xdp(xdp_prog, &xdp); >> + switch (act) { >> + case XDP_PASS: >> + err = xdp_update_frame_from_buff(&xdp, xdpf); >> + if (unlikely(err < 0)) >> + xdp_return_frame_rx_napi(xdpf); >> + else >> + frames[nframes++] = xdpf; >> + break; >> + default: >> + bpf_warn_invalid_xdp_action(act); >> + fallthrough; >> + case XDP_ABORTED: >> + trace_xdp_exception(dev, xdp_prog, act); >> + fallthrough; >> + case XDP_DROP: >> + xdp_return_frame_rx_napi(xdpf); >> + break; >> + } >> + } >> + return nframes; /* sent frames count */ >> +} >> + >> static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) >> { >> struct net_device *dev = bq->dev; >> - int sent = 0, drops = 0, err = 0; >> + unsigned int cnt = bq->count; >> + int drops = 0, err = 0; >> + int to_send = cnt; >> + int sent = cnt; >> int i; >> >> - if (unlikely(!bq->count)) >> + if (unlikely(!cnt)) >> return; >> >> - for (i = 0; i < bq->count; i++) { >> + for (i = 0; i < cnt; i++) { >> struct xdp_frame *xdpf = bq->q[i]; >> >> prefetch(xdpf); >> } >> >> - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); >> + if (bq->xdp_prog) { >> + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); >> + if (!to_send) { >> + sent = 0; >> + goto out; >> + } >> + drops = cnt - to_send; >> + } > > I might be missing something about how *bq works here. What happens when > dev_map_bpf_prog_run returns to_send < cnt? > > So I read this as it will send [0, to_send] and [to_send, cnt] will be > dropped? How do we know the bpf prog would have dropped the set, > [to_send+1, cnt]? Because dev_map_bpf_prog_run() compacts the array: + case XDP_PASS: + err = xdp_update_frame_from_buff(&xdp, xdpf); + if (unlikely(err < 0)) + xdp_return_frame_rx_napi(xdpf); + else + frames[nframes++] = xdpf; + break; [...] >> int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, >> @@ -489,12 +516,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, >> { >> struct net_device *dev = dst->dev; >> >> - if (dst->xdp_prog) { >> - xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog); >> - if (!xdp) >> - return 0; > > So here it looks like dev_map_run_prog will not drop extra > packets, but will see every single packet. > > Are we changing the semantics subtle here? This looks like > a problem to me. We should not drop packets in the new case > unless bpf program tells us to. It's not a change in semantics (see above), but I'll grant you that it's subtle :) -Toke