Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped

Jason Wang <jasowang@xxxxxxxxxx> · Wed, 24 Apr 2024 10:34:56 +0800

On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > used, we can reuse them without needing to unmap and remap.
> > > > >
> > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > store the DMA address from the pp structure inside the page.
> > > > >
> > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > we remap it before returning it to the chain.
> > > > >
> > > > > Based on the following points, we do not use page pool to manage these
> > > > > pages:
> > > > >
> > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > 2. But when the page pool releases the page, we have no chance to
> > > > >    execute dma unmap.
> > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > >    execute unmap so frequently.)
> > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > >    unsafe behavior.
> > > > >
> > > > > More:
> > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@xxxxxxxxxxxxxx/
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > >  }
> > > > >
> > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > +{
> > > > > +       sg->dma_address = addr;
> > > > > +       sg->length = len;
> > > > > +}
> > > > > +
> > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > + * address.
> > > > > + */
> > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > +{
> > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > >
> > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > >
> > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > +               p->pp_magic = upper_32_bits(addr);
> > > >
> > > > And this uses three fields on page_pool which I'm not sure the other
> > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > >
> > > > I think a more safe way is to reuse page pool, for example introducing
> > > > a new flag with dma callbacks?
> > >
> > > If we use page pool, how can we chain the pages allocated for a packet?
> >
> > I'm not sure I get this, it is chained via the descriptor flag.
>
>
> In the big mode, we will commit many pages to the virtio core by
> virtqueue_add_inbuf().
>
> By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> Other pages are chained by the "private".
>
> If we use the page pool, how can we chain the pages.
> After virtqueue_add_inbuf(), we need to get the pages to fill the skb.

Right, technically it could be solved by providing helpers in the
virtio core, but considering it's an optimization for big mode which
is not popular, it's not worth to bother.

>
>
>
> >
> > >
> > > Yon know the "private" can not be used.
> > >
> > >
> > > If the pp struct inside the page is not safe, how about:
> > >
> > >                 struct {        /* Page cache and anonymous pages */
> > >                         /**
> > >                          * @lru: Pageout list, eg. active_list protected by
> > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > >                          * by the page owner.
> > >                          */
> > >                         union {
> > >                                 struct list_head lru;
> > >
> > >                                 /* Or, for the Unevictable "LRU list" slot */
> > >                                 struct {
> > >                                         /* Always even, to negate PageTail */
> > >                                         void *__filler;
> > >                                         /* Count page's or folio's mlocks */
> > >                                         unsigned int mlock_count;
> > >                                 };
> > >
> > >                                 /* Or, free page */
> > >                                 struct list_head buddy_list;
> > >                                 struct list_head pcp_list;
> > >                         };
> > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > >                         struct address_space *mapping;
> > >                         union {
> > >                                 pgoff_t index;          /* Our offset within mapping. */
> > >                                 unsigned long share;    /* share count for fsdax */
> > >                         };
> > >                         /**
> > >                          * @private: Mapping-private opaque data.
> > >                          * Usually used for buffer_heads if PagePrivate.
> > >                          * Used for swp_entry_t if PageSwapCache.
> > >                          * Indicates order in the buddy system if PageBuddy.
> > >                          */
> > >                         unsigned long private;
> > >                 };
> > >
> > > Or, we can map the private space of the page as a new structure.
> >
> > It could be a way. But such allocation might be huge if we are using
> > indirect descriptors or I may miss something.
>
> No. we only need to store the "chain next" and the dma as this patch set did.
> The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> That is enough for us.
>
> If you worry about the change of the pp structure, we can use the "private" as
> origin and use the "struct list_head lru" to store the dma.

This looks even worse, as it uses fields belonging to the different
structures in the union.

>
> The min size of "struct list_head lru" is 8 bytes, that is enough for the
> dma_addr_t.
>
> We can do this more simper:
>
> static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> {
>         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
>
>         dma_addr_t *addr;
>
>         addr = &page->lru;
>
>         *addr = dma;
> }

So we had this in the kernel code.

       /*
         * Five words (20/40 bytes) are available in this union.
         * WARNING: bit 0 of the first word is used for PageTail(). That
         * means the other users of this union MUST NOT use the bit to
         * avoid collision and false-positive PageTail().
         */

And by looking at the discussion that introduces the pp_magic, reusing
fields seems to be tricky as we may end up with side effects of
aliasing fields in page structure. Technically, we can invent new
structures in the union, but it might not be worth it to bother.

So I think we can leave the fallback code and revisit this issue in the future.

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>