John Fastabend <john.fastabend@xxxxxxxxx> writes: > Jesper Dangaard Brouer wrote: >> On Wed, 27 Jan 2021 13:20:50 +0100 >> Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> wrote: >> >> > On Wed, Jan 27, 2021 at 10:41:44AM +0100, Toke Høiland-Jørgensen wrote: >> > > John Fastabend <john.fastabend@xxxxxxxxx> writes: >> > > >> > > > Hangbin Liu wrote: >> > > >> From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> >> > > >> >> > > >> This changes the devmap XDP program support to run the program when the >> > > >> bulk queue is flushed instead of before the frame is enqueued. This has >> > > >> a couple of benefits: >> > > >> >> > > >> - It "sorts" the packets by destination devmap entry, and then runs the >> > > >> same BPF program on all the packets in sequence. This ensures that we >> > > >> keep the XDP program and destination device properties hot in I-cache. >> > > >> >> > > >> - It makes the multicast implementation simpler because it can just >> > > >> enqueue packets using bq_enqueue() without having to deal with the >> > > >> devmap program at all. >> > > >> >> > > >> The drawback is that if the devmap program drops the packet, the enqueue >> > > >> step is redundant. However, arguably this is mostly visible in a >> > > >> micro-benchmark, and with more mixed traffic the I-cache benefit should >> > > >> win out. The performance impact of just this patch is as follows: >> > > >> >> > > >> The bq_xmit_all's logic is also refactored and error label is removed. >> > > >> When bq_xmit_all() is called from bq_enqueue(), another packet will >> > > >> always be enqueued immediately after, so clearing dev_rx, xdp_prog and >> > > >> flush_node in bq_xmit_all() is redundant. Let's move the clear to >> > > >> __dev_flush(), and only check them once in bq_enqueue() since they are >> > > >> all modified together. >> > > >> >> > > >> By using xdp_redirect_map in sample/bpf and send pkts via pktgen cmd: >> > > >> ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64 >> > > >> >> > > >> There are about +/- 0.1M deviation for native testing, the performance >> > > >> improved for the base-case, but some drop back with xdp devmap prog attached. >> > > >> >> > > >> Version | Test | Generic | Native | Native + 2nd xdp_prog >> > > >> 5.10 rc6 | xdp_redirect_map i40e->i40e | 2.0M | 9.1M | 8.0M >> > > >> 5.10 rc6 | xdp_redirect_map i40e->veth | 1.7M | 11.0M | 9.7M >> > > >> 5.10 rc6 + patch | xdp_redirect_map i40e->i40e | 2.0M | 9.5M | 7.5M >> > > >> 5.10 rc6 + patch | xdp_redirect_map i40e->veth | 1.7M | 11.6M | 9.1M >> > > >> >> > > > >> > > > [...] > > Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> > >> > > >> static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) >> > > >> { >> > > >> struct net_device *dev = bq->dev; >> > > >> - int sent = 0, drops = 0, err = 0; >> > > >> + unsigned int cnt = bq->count; >> > > >> + int drops = 0, err = 0; >> > > >> + int to_send = cnt; >> > > >> + int sent = cnt; >> > > >> int i; >> > > >> >> > > >> - if (unlikely(!bq->count)) >> > > >> + if (unlikely(!cnt)) >> > > >> return; >> > > >> >> > > >> - for (i = 0; i < bq->count; i++) { >> > > >> + for (i = 0; i < cnt; i++) { >> > > >> struct xdp_frame *xdpf = bq->q[i]; >> > > >> >> > > >> prefetch(xdpf); >> > > >> } >> > > >> >> > > >> - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); >> > > >> + if (bq->xdp_prog) { >> > > >> + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); >> > > >> + if (!to_send) { >> > > >> + sent = 0; >> > > >> + goto out; >> > > >> + } >> > > >> + drops = cnt - to_send; >> > > >> + } >> > > > >> > > > I might be missing something about how *bq works here. What happens when >> > > > dev_map_bpf_prog_run returns to_send < cnt? >> > > > >> > > > So I read this as it will send [0, to_send] and [to_send, cnt] will be >> > > > dropped? How do we know the bpf prog would have dropped the set, >> > > > [to_send+1, cnt]? >> > >> > You know that via recalculation of 'drops' value after you returned from >> > dev_map_bpf_prog_run() which later on is provided onto trace_xdp_devmap_xmit. >> > >> > > >> > > Because dev_map_bpf_prog_run() compacts the array: >> > > >> > > + case XDP_PASS: >> > > + err = xdp_update_frame_from_buff(&xdp, xdpf); >> > > + if (unlikely(err < 0)) >> > > + xdp_return_frame_rx_napi(xdpf); >> > > + else >> > > + frames[nframes++] = xdpf; >> > > + break; >> > >> > To expand this a little, 'frames' array is reused and 'nframes' above is >> > the value that is returned and we store it onto 'to_send' variable. >> > > > In the morning with coffee looks good to me. Thanks Toke, Jesper. Haha, yeah, coffee does tend to help, doesn't it? You're welcome :) -Toke