Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote:
>
>
> On 17/04/2024 10.20, Xuan Zhuo wrote:
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
> >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
> >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
> >>>>>>>>>>>>> variable for storing the DMA addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* Page cache and anonymous pages */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list protected by
> >>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used as a generic list
> >>>>>>>>>>>>>                           * by the page owner.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  struct list_head lru;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, for the Unevictable "LRU list" slot */
> >>>>>>>>>>>>>                                  struct {
> >>>>>>>>>>>>>                                          /* Always even, to negate PageTail */
> >>>>>>>>>>>>>                                          void *__filler;
> >>>>>>>>>>>>>                                          /* Count page's or folio's mlocks */
> >>>>>>>>>>>>>                                          unsigned int mlock_count;
> >>>>>>>>>>>>>                                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, free page */
> >>>>>>>>>>>>>                                  struct list_head buddy_list;
> >>>>>>>>>>>>>                                  struct list_head pcp_list;
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >>>>>>>>>>>>>                          struct address_space *mapping;
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  pgoff_t index;          /* Our offset within mapping. */
> >>>>>>>>>>>>>                                  unsigned long share;    /* share count for fsdax */
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @private: Mapping-private opaque data.
> >>>>>>>>>>>>>                           * Usually used for buffer_heads if PagePrivate.
> >>>>>>>>>>>>>                           * Used for swp_entry_t if PageSwapCache.
> >>>>>>>>>>>>>                           * Indicates order in the buddy system if PageBuddy.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long private;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But within the page pool struct, we have a variable called
> >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
> >>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* page_pool used by netstack */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @pp_magic: magic value to avoid recycling non
> >>>>>>>>>>>>>                           * page_pool allocated pages.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long pp_magic;
> >>>>>>>>>>>>>                          struct page_pool *pp;
> >>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
> >>>>>>>>>>>>>                          unsigned long dma_addr;
> >>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
> >>>>>>>>>>>>> So this patch replaces the "private" with "pp".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
> >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
> >>>>>>>>>>>> bother the dma stuffs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
> >>>>>>>>>>> So we can not use the page pool directly.
> >>>>>>>>>>
> >>>>>>>>>> I found this:
> >>>>>>>>>>
> >>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> >>>>>>>>>>                                          * map/unmap
> >>>>>>>>>>
> >>>>>>>>>> It seems to work here?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
> >>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
> >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
> >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> >>>>>>>>> to the page.
> >>>>>>>>>
> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
> >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
> >>>>>>>>> unmapping and remapping steps.
> >>>>>>>>>
> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
> >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
> >>>>>>>>> without a chance for unmapping.
>
> Could Jakub's memory provider for PP help your use-case?
>
> See: [1]
> https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@xxxxxxxxxx/
> Or: [2]
> https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@xxxxxxxxxx/T/


It can not. That make the pp can get page by the callbacks.

Here we talk about the map/unmap.

The virtio-net has the different DMA APIs.

	dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size_t size,
						  enum dma_data_direction dir, unsigned long attrs);
	void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
					      size_t size, enum dma_data_direction dir,
					      unsigned long attrs);
	dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
						size_t offset, size_t size,
						enum dma_data_direction dir,
						unsigned long attrs);
	void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
					    size_t size, enum dma_data_direction dir,
					    unsigned long attrs);
	int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);

	bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
	void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t addr,
						     unsigned long offset, size_t size,
						     enum dma_data_direction dir);
	void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, dma_addr_t addr,
							unsigned long offset, size_t size,
							enum dma_data_direction dir);


Thanks.

>
>
> [...]
> >>>>>>
> >>>>>> Adding Jesper for some comments.
> >>>>>>
> >>>>>>>
> >>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
> >>>>>>> the pages.
> >>>>>>>
>
> For context the patch:
>   [3]
> https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@xxxxxxxxxxxxxxxxx/
>
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> I might be wrong, but I think if we need to either
> >>>>>>
> >>>>>> 1) seek a way to manage the pages by yourself but not touching page
> >>>>>> pool metadata (or Jesper is fine with this)
> >>>>>
> >>>>> Do you mean working with page pool or not?
> >>>>>
> >>>>
> >>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
> >>>>
> >>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
> >>>>> page pool or not. We just use the space of pages like the "private".
> >>>>
> >>>> That's also fine.
> >>>>
>
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
>
>   +#define page_chain_next(p)	((struct page *)((p)->pp))
>   +#define page_chain_add(p, n)	((p)->pp = (void *)n)
>
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
>
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).
>
> --Jesper
>
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux