On Mon, Jul 3, 2023 at 2:43 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Sun, Jul 02, 2023 at 11:22:33PM -0700, Mina Almasry wrote: > > On Sun, Jul 2, 2023 at 9:20 PM David Ahern <dsahern@xxxxxxxxxx> wrote: > > > > > > On 6/29/23 8:27 PM, Mina Almasry wrote: > > > > > > > > Hello Jakub, I'm looking into device memory (peer-to-peer) networking > > > > actually, and I plan to pursue using the page pool as a front end. > > > > > > > > Quick description of what I have so far: > > > > current implementation uses device memory with struct pages; I am > > > > putting all those pages in a gen_pool, and we have written an > > > > allocator that allocates pages from the gen_pool. In the driver, we > > > > use this allocator instead of alloc_page() (the driver in question is > > > > gve which currently doesn't use the page pool). When the driver is > > > > done with the p2p page, it simply decrements the refcount on it and > > > > the page is freed back to the gen_pool. > > > > Quick update here, I was able to get my implementation working with > > the page pool as a front end with the memory provider API Jakub wrote > > here: > > https://github.com/kuba-moo/linux/tree/pp-providers > > > > The main complication indeed was the fact that my device memory pages > > are ZONE_DEVICE pages, which are incompatible with the page_pool due > > to the union in struct page. I thought of a couple of approaches to > > resolve that. > > > > 1. Make my device memory pages non-ZONE_DEVICE pages. > > Hard no on this from a mm perspective.. We need P2P memory to be > properly tagged and have the expected struct pages to be DMA mappable > and otherwise, you totally break everything if you try to do this.. > > > 2. Convert the pages from ZONE_DEVICE pages to page_pool pages and > > vice versa as they're being inserted and removed from the page pool. > > This is kind of scary, it is very, very, fragile to rework the pages > like this. Eg what happens when the owning device unplugs and needs to > revoke these pages? I think it would likely crash.. > > I think it also technically breaks the DMA API as we may need to look > into the pgmap to do cache ops on some architectures. > > I suggest you try to work with 8k folios and then the tail page's > struct page is empty enough to store the information you need.. Hi Jason, sorry for the late reply, I think this could work, and the page pool already supports > order 0 allocations. It may end up being a big change to the GVE driver which as I understand currently deals with order 0 allocations exclusively. Another issue is that in networks with low MTU, we could be DMAing 1400/1500 bytes into each allocation, which is problematic if the allocation is 8K+. I would need to investigate a bit to see if/how to solve that, and we may end up having to split the page and again run into the 'not enough room in struct page' problem. > Or allocate per page memory and do a memdesc like thing.. > I need to review memdesc more closely. Do you imagine I add a pointer in struct page that points to the memdesc? Or implement a page to memdesc mapping in the page_pool? Either approach could work. I think the concern would be accessing the memdesc entries may be a cache miss unacceptable in fast paths, but I think I already dereference page->pgmap in a few places and it doesn't seem to be an issue. > Though overall, you won't find devices creating struct pages for their > P2P memory today, so I'm not sure what the purpose is. Jonathan > already got highly slammed for proposing code to the kernel that was > unusable. Please don't repeat that. Other than a special NVMe use case > the interface for P2P is DMABUF right now and it is not struct page > backed. > Our approach is actually to extend DMABUF to provide struct page backed attachment mappings, which as far as I understand sidesteps the issues Jonathan ran into. Our code is fully functional with any device that supports dmabuf and in fact a lot of my tests use udmabuf to minimize the dependencies. The RFC may come with a udmabuf selftest to showcase that any dmabuf, even a mocked one, would be supported. > Even if we did get to struct pages for device memory, it is highly > likely cases you are interested in will be using larger than 4k > folios, so page pool would need to cope with this nicely as well. > -- Thanks, Mina