On Tue, Nov 14, 2023 at 4:49 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > On 2023/11/14 20:21, Mina Almasry wrote: > > On Tue, Nov 14, 2023 at 12:23 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > >> > >> +cc Christian, Jason and Willy > >> > >> On 2023/11/14 7:05, Jakub Kicinski wrote: > >>> On Mon, 13 Nov 2023 05:42:16 -0800 Mina Almasry wrote: > >>>> You're doing exactly what I think you're doing, and what was nacked in RFC v1. > >>>> > >>>> You've converted 'struct page_pool_iov' to essentially become a > >>>> duplicate of 'struct page'. Then, you're casting page_pool_iov* into > >>>> struct page* in mp_dmabuf_devmem_alloc_pages(), then, you're calling > >>>> mm APIs like page_ref_*() on the page_pool_iov* because you've fooled > >>>> the mm stack into thinking dma-buf memory is a struct page. > >> > >> Yes, something like above, but I am not sure about the 'fooled the mm > >> stack into thinking dma-buf memory is a struct page' part, because: > >> 1. We never let the 'struct page' for devmem leaking out of net stacking > >> through the 'not kmap()able and not readable' checking in your patchset. > > > > RFC never used dma-buf pages outside the net stack, so that is the same. > > > > You are not able to get rid of the 'net kmap()able and not readable' > > checking with this approach, because dma-buf memory is fundamentally > > unkmapable and unreadable. This approach would still need > > skb_frags_not_readable checks in net stack, so that is also the same. > > Yes, I am agreed that checking is still needed whatever the proposal is. > > > > >> 2. We inititiate page->_refcount for devmem to one and it remains as one, > >> we will never call page_ref_inc()/page_ref_dec()/get_page()/put_page(), > >> instead, we use page pool's pp_frag_count to do reference counting for > >> devmem page in patch 6. > >> > > > > I'm not sure that moves the needle in terms of allowing dma-buf > > memory to look like struct pages. > > > >>>> > >>>> RFC v1 was almost exactly the same, except instead of creating a > >>>> duplicate definition of struct page, it just allocated 'struct page' > >>>> instead of allocating another struct that is identical to struct page > >>>> and casting it into struct page. > >> > >> Perhaps it is more accurate to say this is something between RFC v1 and > >> RFC v3, in order to decouple 'struct page' for devmem from mm subsystem, > >> but still have most unified handling for both normal memory and devmem > >> in page pool and net stack. > >> > >> The main difference between this patchset and RFC v1: > >> 1. The mm subsystem is not supposed to see the 'struct page' for devmem > >> in this patchset, I guess we could say it is decoupled from the mm > >> subsystem even though we still call PageTail()/page_ref_count()/ > >> page_is_pfmemalloc() on 'struct page' for devmem. > >> > > > > In this patchset you pretty much allocate a struct page for your > > dma-buf memory, and then cast it into a struct page, so all the mm > > calls in page_pool.c are seeing a struct page when it's really dma-buf > > memory. > > > > 'even though we still call > > PageTail()/page_ref_count()/page_is_pfmemalloc() on 'struct page' for > > devmem' is basically making dma-buf memory look like struct pages. > > > > Actually because you put the 'strtuct page for devmem' in > > skb->bv_frag, the net stack will grab the 'struct page' for devmem > > using skb_frag_page() then call things like page_address(), kmap, > > get_page, put_page, etc, etc, etc. > > Yes, as above, skb_frags_not_readable() checking is still needed for > kmap() and page_address(). > > get_page, put_page related calling is avoided in page_pool_frag_ref() > and napi_pp_put_page() for devmem page as the above checking is true > for devmem page: > (pp_iov->pp_magic & ~0x3UL) == PP_SIGNATURE > So, devmem needs special handling with if statement for refcounting, even after using struct pages for devmem, which is not allowed (IIUC the dma-buf maintainer). > > > >> The main difference between this patchset and RFC v3: > >> 1. It reuses the 'struct page' to have more unified handling between > >> normal page and devmem page for net stack. > > > > This is what was nacked in RFC v1. > > > >> 2. It relies on the page->pp_frag_count to do reference counting. > >> > > > > I don't see you change any of the page_ref_* calls in page_pool.c, for > > example this one: > > > > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L601 > > > > So the reference the page_pool is seeing is actually page->_refcount, > > not page->pp_frag_count? I'm confused here. Is this a bug in the > > patchset? > > page->_refcount is the same as page_pool_iov->_refcount for devmem, which > is ensured by the 'PAGE_POOL_MATCH(_refcount, _refcount);', and > page_pool_iov->_refcount is set to one in mp_dmabuf_devmem_alloc_pages() > by calling 'refcount_set(&ppiov->_refcount, 1)' and always remains as one. > > So the 'page_ref_count(page) == 1' checking is always true for devmem page. Which, of course, is a bug in the patchset, and it only works because it's a POC for you. devmem pages (which shouldn't exist according to the dma-buf maintainer, IIUC) can't be recycled all the time. See SO_DEVMEM_DONTNEED patch in my RFC and refcounting needed for devmem. -- Thanks, Mina