在 2022/12/27 17:10, Heng Qi 写道:
在 2022/12/27 下午2:46, Jason Wang 写道:在 2022/12/20 22:14, Heng Qi 写道:Support xdp for multi buffer packets in mergeable mode. Putting the first buffer as the linear part for xdp_buff, and the rest of the buffers as non-linear fragments to struct skb_shared_info in the tailroom belonging to xdp_buff. Signed-off-by: Heng Qi <hengqi@xxxxxxxxxxxxxxxxx> Reviewed-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> ---drivers/net/virtio_net.c | 78 ++++++++++++++++++++++++++++++++++++++++1 file changed, 78 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 08f209d7b0bf..8fc3b1841d92 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c@@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct net_device *dev,return NULL; } +/* TODO: build xdp in big mode */ +static int virtnet_build_xdp_buff_mrg(struct net_device *dev, + struct virtnet_info *vi, + struct receive_queue *rq, + struct xdp_buff *xdp, + void *buf, + unsigned int len, + unsigned int frame_sz, + u16 *num_buf, + unsigned int *xdp_frags_truesize, + struct virtnet_rq_stats *stats) +{+ unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));+ struct virtio_net_hdr_mrg_rxbuf *hdr = buf; + unsigned int truesize, headroom; + struct skb_shared_info *shinfo; + unsigned int xdp_frags_truesz = 0; + unsigned int cur_frag_size; + struct page *page; + skb_frag_t *frag; + int offset; + void *ctx; + + xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq); + xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,+ VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);+ + if (*num_buf > 1) { + shinfo = xdp_get_shared_info_from_buff(xdp); + shinfo->nr_frags = 0; + shinfo->xdp_frags_size = 0; + } + + if ((*num_buf - 1) > MAX_SKB_FRAGS) + return -EINVAL; + + while ((--*num_buf) >= 1) { + buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx); + if (unlikely(!buf)) { + pr_debug("%s: rx error: %d buffers out of %d missing\n", + dev->name, *num_buf, + virtio16_to_cpu(vi->vdev, hdr->num_buffers)); + dev->stats.rx_length_errors++; + return -EINVAL; + } + + if (!xdp_buff_has_frags(xdp)) + xdp_buff_set_frags_flag(xdp);Any reason to put this inside the loop?I'll move it outside the loop in the next version.+ + stats->bytes += len; + page = virt_to_head_page(buf); + offset = buf - page_address(page); + truesize = mergeable_ctx_to_truesize(ctx); + headroom = mergeable_ctx_to_headroom(ctx); ++ cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);+ xdp_frags_truesz += cur_frag_size;Not related to this patch, but it would easily confuse the future readers that the we need another math for truesize. I think at least we need some comments for this orYes it might, I'll add more comments on this.I guess the root cause is in get_mergeable_buf_len: static unsigned int get_mergeable_buf_len(struct receive_queue *rq, struct ewma_pkt_len *avg_pkt_len, unsigned int room) { struct virtnet_info *vi = rq->vq->vdev->priv; const size_t hdr_len = vi->hdr_len; unsigned int len; if (room) return PAGE_SIZE - room; And we do len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); ... ctx = mergeable_len_to_ctx(len, headroom);I wonder if it's better to pack the real truesize (PAGE_SIZE) here. This may ease a lot of things.I don't know the historical reason for not packing, but I guess this is for the convenience of comparing the actual length len given by the device with truesize. Therefore, I think it wouldbe better to keep the above practice for now?
The problem is: We had static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)It means the truesize should be calculated before packing into the context. So having another math to get truesize is very confusing, I think it's better to fix that. Otherwise we may end up code that is hard to be reviewed even with the help of comments.
Thanks
Perhaps, I can add more explanation to the above code to help future readers try not to get confused. Thanks.Thanks+ if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) { + pr_debug("%s: rx error: len %u exceeds truesize %lu\n", + dev->name, len, (unsigned long)ctx); + dev->stats.rx_length_errors++; + return -EINVAL; + } + + frag = &shinfo->frags[shinfo->nr_frags++]; + __skb_frag_set_page(frag, page); + skb_frag_off_set(frag, offset); + skb_frag_size_set(frag, len); + if (page_is_pfmemalloc(page)) + xdp_buff_set_frag_pfmemalloc(xdp); + + shinfo->xdp_frags_size += len; + } + + *xdp_frags_truesize = xdp_frags_truesz; + return 0; +} + static struct sk_buff *receive_mergeable(struct net_device *dev, struct virtnet_info *vi, struct receive_queue *rq,