On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote: > > > On 17/04/2024 10.20, Xuan Zhuo wrote: > > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>>> > >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>>>>> > >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>>>>>>> > >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable. > >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support > >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable > >>>>>>>>>>>>> variable for storing the DMA addr. > >>>>>>>>>>>>> > >>>>>>>>>>>>> struct { /* Page cache and anonymous pages */ > >>>>>>>>>>>>> /** > >>>>>>>>>>>>> * @lru: Pageout list, eg. active_list protected by > >>>>>>>>>>>>> * lruvec->lru_lock. Sometimes used as a generic list > >>>>>>>>>>>>> * by the page owner. > >>>>>>>>>>>>> */ > >>>>>>>>>>>>> union { > >>>>>>>>>>>>> struct list_head lru; > >>>>>>>>>>>>> > >>>>>>>>>>>>> /* Or, for the Unevictable "LRU list" slot */ > >>>>>>>>>>>>> struct { > >>>>>>>>>>>>> /* Always even, to negate PageTail */ > >>>>>>>>>>>>> void *__filler; > >>>>>>>>>>>>> /* Count page's or folio's mlocks */ > >>>>>>>>>>>>> unsigned int mlock_count; > >>>>>>>>>>>>> }; > >>>>>>>>>>>>> > >>>>>>>>>>>>> /* Or, free page */ > >>>>>>>>>>>>> struct list_head buddy_list; > >>>>>>>>>>>>> struct list_head pcp_list; > >>>>>>>>>>>>> }; > >>>>>>>>>>>>> /* See page-flags.h for PAGE_MAPPING_FLAGS */ > >>>>>>>>>>>>> struct address_space *mapping; > >>>>>>>>>>>>> union { > >>>>>>>>>>>>> pgoff_t index; /* Our offset within mapping. */ > >>>>>>>>>>>>> unsigned long share; /* share count for fsdax */ > >>>>>>>>>>>>> }; > >>>>>>>>>>>>> /** > >>>>>>>>>>>>> * @private: Mapping-private opaque data. > >>>>>>>>>>>>> * Usually used for buffer_heads if PagePrivate. > >>>>>>>>>>>>> * Used for swp_entry_t if PageSwapCache. > >>>>>>>>>>>>> * Indicates order in the buddy system if PageBuddy. > >>>>>>>>>>>>> */ > >>>>>>>>>>>>> unsigned long private; > >>>>>>>>>>>>> }; > >>>>>>>>>>>>> > >>>>>>>>>>>>> But within the page pool struct, we have a variable called > >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr. > >>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage. > >>>>>>>>>>>>> > >>>>>>>>>>>>> struct { /* page_pool used by netstack */ > >>>>>>>>>>>>> /** > >>>>>>>>>>>>> * @pp_magic: magic value to avoid recycling non > >>>>>>>>>>>>> * page_pool allocated pages. > >>>>>>>>>>>>> */ > >>>>>>>>>>>>> unsigned long pp_magic; > >>>>>>>>>>>>> struct page_pool *pp; > >>>>>>>>>>>>> unsigned long _pp_mapping_pad; > >>>>>>>>>>>>> unsigned long dma_addr; > >>>>>>>>>>>>> atomic_long_t pp_ref_count; > >>>>>>>>>>>>> }; > >>>>>>>>>>>>> > >>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct. > >>>>>>>>>>>>> So this patch replaces the "private" with "pp". > >>>>>>>>>>>>> > >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > >>>>>>>>>>>>> --- > >>>>>>>>>>>> > >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply > >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to > >>>>>>>>>>>> bother the dma stuffs. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs. > >>>>>>>>>>> So we can not use the page pool directly. > >>>>>>>>>> > >>>>>>>>>> I found this: > >>>>>>>>>> > >>>>>>>>>> define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA > >>>>>>>>>> * map/unmap > >>>>>>>>>> > >>>>>>>>>> It seems to work here? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it > >>>>>>>>> directly. We can make the page pool to bypass the DMA operations. > >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page > >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address > >>>>>>>>> to the page. > >>>>>>>>> > >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return > >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the > >>>>>>>>> unmapping and remapping steps. > >>>>>>>>> > >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects > >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished > >>>>>>>>> without a chance for unmapping. > > Could Jakub's memory provider for PP help your use-case? > > See: [1] > https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@xxxxxxxxxx/ > Or: [2] > https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@xxxxxxxxxx/T/ It can not. That make the pp can get page by the callbacks. Here we talk about the map/unmap. The virtio-net has the different DMA APIs. dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size_t size, enum dma_data_direction dir, unsigned long attrs); void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr, size_t size, enum dma_data_direction dir, unsigned long attrs); dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs); void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr, size_t size, enum dma_data_direction dir, unsigned long attrs); int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr); bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr); void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t addr, unsigned long offset, size_t size, enum dma_data_direction dir); void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, dma_addr_t addr, unsigned long offset, size_t size, enum dma_data_direction dir); Thanks. > > > [...] > >>>>>> > >>>>>> Adding Jesper for some comments. > >>>>>> > >>>>>>> > >>>>>>> Back to this patch set, I think we should keep the virtio-net to manage > >>>>>>> the pages. > >>>>>>> > > For context the patch: > [3] > https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@xxxxxxxxxxxxxxxxx/ > > >>>>>>> What do you think? > >>>>>> > >>>>>> I might be wrong, but I think if we need to either > >>>>>> > >>>>>> 1) seek a way to manage the pages by yourself but not touching page > >>>>>> pool metadata (or Jesper is fine with this) > >>>>> > >>>>> Do you mean working with page pool or not? > >>>>> > >>>> > >>>> I meant if Jesper is fine with reusing page pool metadata like this patch. > >>>> > >>>>> If we manage the pages by self(no page pool), we do not care the metadata is for > >>>>> page pool or not. We just use the space of pages like the "private". > >>>> > >>>> That's also fine. > >>>> > > I'm not sure it is "fine" to, explicitly choosing not to use page pool, > and then (ab)use `struct page` member (pp) that intended for page_pool > for other stuff. (In this case create a linked list of pages). > > +#define page_chain_next(p) ((struct page *)((p)->pp)) > +#define page_chain_add(p, n) ((p)->pp = (void *)n) > > I'm not sure that I (as PP maintainer) can make this call actually, as I > think this area belong with the MM "page" maintainers (Cc MM-list + > people) to judge. > > Just invention new ways to use struct page fields without adding your > use-case to struct page, will make it harder for MM people to maintain > (e.g. make future change). > > --Jesper > >