Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > >
> > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > >
> > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > >                          * by the page owner.
> > > > > > > > > >                          */
> > > > > > > > > >                         union {
> > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > >
> > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > >                                 struct {
> > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > >                                         void *__filler;
> > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > >                                 };
> > > > > > > > > >
> > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > >                         };
> > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > >                         union {
> > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > >                         };
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > >                          */
> > > > > > > > > >                         unsigned long private;
> > > > > > > > > >                 };
> > > > > > > > > >
> > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > >
> > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > >                          */
> > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > >                 };
> > > > > > > > > >
> > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > bother the dma stuffs.
> > > > > > > >
> > > > > > > >
> > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > So we can not use the page pool directly.
> > > > > > >
> > > > > > > I found this:
> > > > > > >
> > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > >                                         * map/unmap
> > > > > > >
> > > > > > > It seems to work here?
> > > > > >
> > > > > >
> > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > to the page.
> > > > > >
> > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > unmapping and remapping steps.
> > > > > >
> > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > without a chance for unmapping.
> > > > >
> > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > it requires expensive synchronization between producer and consumer.
> > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > already seen a requirement in the past for virito-blk).
> > > >
> > > > The point is that the page will be released by page pool directly,
> > > > we will have no change to unmap that, if we work with page pool.
> > >
> > > I mean if we have a fallback, there would be no need to release these
> > > pages but put them into a link list.
> >
> >
> > What fallback?
>
> https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@xxxxxxxxxx/
>
> >
> > If we put the pages to the link list, why we use the page pool?
>
> The size of the cache and ptr_ring needs to be fixed.
>
> Again, as explained above, it needs more benchmarks and looks like a
> separate topic.
>
> >
> >
> > >
> > > >
> > > > >
> > > > > > If we were to unmap pages each time before
> > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > mapping and unmapping process altogether.
> > > > >
> > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > exception where dma_addr is used outside the page pool.
> > > >
> > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > pool.
> > > >
> > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > we may need to push the page-pool to support dma by drivers.
> > >
> > > Adding Jesper for some comments.
> > >
> > > >
> > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > the pages.
> > > >
> > > > What do you think?
> > >
> > > I might be wrong, but I think if we need to either
> > >
> > > 1) seek a way to manage the pages by yourself but not touching page
> > > pool metadata (or Jesper is fine with this)
> >
> > Do you mean working with page pool or not?
> >
>
> I meant if Jesper is fine with reusing page pool metadata like this patch.
>
> > If we manage the pages by self(no page pool), we do not care the metadata is for
> > page pool or not. We just use the space of pages like the "private".
>
> That's also fine.
>
> >
> >
> > > 2) optimize the unmap for page pool
> > >
> > > or even
> > >
> > > 3) just do dma_unmap before returning the page back to the page pool,
> > > we don't get all the benefits of page pool but we end up with simple
> > > codes (no fallback for premapping).
> >
> > I am ok for this.
>
> Right, we just need to make sure there's no performance regression,
> then it would be fine.
>
> I see for example mana did this as well.

I think we should not use page pool directly now,
because the mana does not need a space to store the dma address.
We need to store the dma address for unmapping.

If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
page.dma_addr, I think that is not safe.

I think the way of this patch set is fine. We just use the
space of the page whatever it is page pool or not to store
the link and dma address.

Thanks.

>
> Thanks
>
> >
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Maybe for big mode it doesn't matter too much if there's no
> > > > > performance improvement.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux