On Tue, 11 Jan 2022 10:46:41 -0500, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > On Tue, Jan 11, 2022 at 08:40:56AM +0000, Karlsson, Magnus wrote: > > > > > > > -----Original Message----- > > > From: Jason Wang <jasowang@xxxxxxxxxx> > > > Sent: Tuesday, January 11, 2022 9:26 AM > > > To: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > Cc: Karlsson, Magnus <magnus.karlsson@xxxxxxxxx>; virtualization > > > <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>; Michael S.Tsirkin > > > <mst@xxxxxxxxxx> > > > Subject: Re: RE: [PATCH 0/6] virtio: support advance DMA > > > > > > On Tue, Jan 11, 2022 at 4:17 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > wrote: > > > > > > > > On Tue, 11 Jan 2022 08:04:05 +0000, Karlsson, Magnus > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > > > > Sent: Tuesday, January 11, 2022 7:17 AM > > > > > > To: Jason Wang <jasowang@xxxxxxxxxx> > > > > > > Cc: virtualization <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>; > > > > > > Michael S.Tsirkin <mst@xxxxxxxxxx>; Karlsson, Magnus > > > > > > <magnus.karlsson@xxxxxxxxx> > > > > > > Subject: Re: [PATCH 0/6] virtio: support advance DMA > > > > > > > > > > > > On Tue, 11 Jan 2022 10:54:45 +0800, Jason Wang > > > > > > <jasowang@xxxxxxxxxx> > > > > > > wrote: > > > > > > > On Mon, Jan 10, 2022 at 5:59 PM Michael S. Tsirkin > > > > > > > <mst@xxxxxxxxxx> > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Jan 07, 2022 at 02:33:00PM +0800, Xuan Zhuo wrote: > > > > > > > > > virtqueue_add() only supports virtual addresses, dma is > > > > > > > > > completed in virtqueue_add(). > > > > > > > > > > > > > > > > > > In some scenarios (such as the AF_XDP scenario), DMA is > > > > > > > > > completed in advance, so it is necessary for us to support > > > > > > > > > passing the DMA address > > > > > > to virtqueue_add(). > > > > > > > > > > > > > > > > > > This patch set stipulates that if sg->dma_address is not > > > > > > > > > NULL, use this address as the DMA address. And record this > > > > > > > > > information in > > > > > > > > > extra->flags, which can be skipped when executing dma unmap. > > > > > > > > > > > > > > > > > > extra->flags |= VRING_DESC_F_PREDMA; > > > > > > > > > > > > > > > > > > But the indirect desc does not have a corresponding extra, > > > > > > > > > so the second and third patches of this patch set are to > > > > > > > > > allocate the corresponding extra while allocating the > > > > > > > > > indirect desc. Each desc must have a corresponding extra > > > > > > > > > because it is possible in an sgs some are advance DMA, while > > > > > > > > > others are virtual addresses. So we must > > > > > > allocate an extra for each indirect desc. > > > > > > > > > > > > > > > > > > > > > > > > I didn't realize AF_XDP didn't have space to stuff the header into. > > > > > > > > Jason, is that expected? > > > > > > > > > > > > > > I might be wrong but it looks to me AF_XDP allows to reserve > > > > > > > sufficient headroom via xdp_umem_reg_v1. > > > > > > > > > > > > > > > > > > > I understand that there is a headroom for receiving packages, > > > > > > which can be used to put virtio headers. But there is no headroom > > > > > > defined in the direction of sending packages. I hope Magnus > > > > > > Karlsson can help confirm whether there is any misunderstanding. > > > > > > > > > > You can specify the amount of headroom you want on Tx by adjusting the > > > "addr" field in the descriptor of the Tx ring. If your chunk starts at address X > > > in the umem and you want 128 bytes of headroom, just write your packet > > > into X+128 and put that address into the Tx descriptor. Will this solve your > > > problem? If not, what would you need from AF_XDP to make it work? > > > > > > > > > > On Rx, there is always 256 bytes worth of headroom as specified by XDP. > > > If you need extra, you can set the headroom variable when you register the > > > umem. > > > > > > > > The driver of virtio net, when passing the packet to the hardware, > > > > should add a virtnet hdr (12 bytes) in front of the packet. Both rx > > > > and tx should add such a header. AF_XDP has a space of 256 bytes in > > > > the rx process. We can reuse this space. The direction of AF_XDP tx has no > > > such regulation. > > > > > > > > The method you mentioned requires user cooperation, which is not a > > > > good method for driver implementation. > > > > > > This will result in a non-portable userspace program. I wonder why TX has > > > become a problem here actually, anyhow we can use a dedicated sg for vnet > > > hdr? And if we packed all vnet headers in an array it will give less > > > performance impact. > > > > Yes, it would be great if you could put this somewhere else 😊. Especially since I do not see a way out of the problem of the driver requiring headroom on Tx, since this is completely left up to user-space and user-space packet offset == kernel-space packet offset within the chunk, always. The only way to change that is through copying, but then it is not zero-copy anymore. One could wish there was a mechanism of communicating this requirement to user-space, but introducing that now would make any existing app not work for virtio-net. So probably easier if you could come up with a solution not requiring headroom for the driver. If not, please let me know. > > We can put the headroom in a separate buffer with a gather list. Gather > list will always be slower than making it linear though. virtio is > likely not unique in that it works better with a headroom. > > So it could be an optional thing where we handle a gather list too, > let's say it's not a requirement but a performance hint. > XDP generally is not too stingy with memory, so maybe just > ask apps to preferably add a couple of hundred bytes headspace > if they can, and then it's portable across drivers. > And virtio will legacy where that space is not there. > How does this sound? I think the same way, if the application leaves us enough space, then we can use this space to place the vnet hdr. If not, use a separate buffer. The problem now is that AF_XDP needs a general mechanism to inform the driver that there is a space in front of the packet. This actually led me to some other thoughts, AF_XDP should pass more information to the driver, such as using csum and other hardware offload. When xdp_desc.options has a specified flag, addr points to not packet, but a struct struct { int offset; // packet offset with addr int csum_start; int csum_offset; ....... } @Karlsson, Magnus Thanks. > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > It would be best if we could not use indirect. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > It would be best to fix that, performance is best if header is > > > > > > > > linear with the data ... > > > > > > > > > > > > > > This looks like a must otherwise we may meet trouble in zerocopy > > > receive. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > Or maybe we can reduce the use of indirect somewhat, at least > > > > > > > > while the ring is mostly empty? > > > > > > > > > > > > > > > > > Xuan Zhuo (6): > > > > > > > > > virtio: rename vring_unmap_state_packed() to > > > > > > > > > vring_unmap_extra_packed() > > > > > > > > > virtio: split: alloc indirect desc with extra > > > > > > > > > virtio: packed: alloc indirect desc with extra > > > > > > > > > virtio: split: virtqueue_add_split() support dma address > > > > > > > > > virtio: packed: virtqueue_add_packed() support dma address > > > > > > > > > virtio: add api virtio_dma_map() for advance dma > > > > > > > > > > > > > > > > > > drivers/virtio/virtio_ring.c | 387 ++++++++++++++++++++--------- > > > ------ > > > > > > > > > include/linux/virtio.h | 9 + > > > > > > > > > 2 files changed, 232 insertions(+), 164 deletions(-) > > > > > > > > > > > > > > > > > > -- > > > > > > > > > 2.31.0 > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization