On Sun, Jul 2, 2023 at 9:20 PM David Ahern <dsahern@xxxxxxxxxx> wrote: > > On 6/29/23 8:27 PM, Mina Almasry wrote: > > > > Hello Jakub, I'm looking into device memory (peer-to-peer) networking > > actually, and I plan to pursue using the page pool as a front end. > > > > Quick description of what I have so far: > > current implementation uses device memory with struct pages; I am > > putting all those pages in a gen_pool, and we have written an > > allocator that allocates pages from the gen_pool. In the driver, we > > use this allocator instead of alloc_page() (the driver in question is > > gve which currently doesn't use the page pool). When the driver is > > done with the p2p page, it simply decrements the refcount on it and > > the page is freed back to the gen_pool. Quick update here, I was able to get my implementation working with the page pool as a front end with the memory provider API Jakub wrote here: https://github.com/kuba-moo/linux/tree/pp-providers The main complication indeed was the fact that my device memory pages are ZONE_DEVICE pages, which are incompatible with the page_pool due to the union in struct page. I thought of a couple of approaches to resolve that. 1. Make my device memory pages non-ZONE_DEVICE pages. The issue with that is that if the page is not ZONE_DEVICE, put_page(page) will attempt to free it to the buddy allocator I think, which is not correct. The only places where the mm stack currently allow custom freeing callback (AFAIK), are for ZONE_DEVICE page where free_zone_device_page() will call the provided callback in page->pgmap->ops->page_free, and compound pages where the compound_dtor is specified. My device memory pages aren't compound pages so only ZONE_DEVICE pages do what I want. 2. Convert the pages from ZONE_DEVICE pages to page_pool pages and vice versa as they're being inserted and removed from the page pool. This, I think, works elegantly without any issue, and is the option I went with. The info from ZONE_DEVICE that I care about for device memory TCP is the page->zone_device_data which holds the dma_addr, and the page->pgmap which holds the page_free op. I'm able to store both in my memory provider so I can swap pages from ZONE_DEVICE and page_pool back and forth. So far I haven't needed to make any modifications to the memory provider implementation Jakub has pretty much, and my functionality tests are passing. If there are no major objections I'll look into cleaning up the interface a bit and propose it for merge. This is a prerequisite of device memory TCP via the page_pool. > > I take it these are typical Linux networking applications using standard > socket APIs (not dpdk or xdp sockets or such)? If so, what does tcpdump > show for those skbs with pages for the device memory? > Yes these are using (mostly) standing socket APIs. We have small extensions to sendmsg() and recvmsg() to pass a reference to the device memory in both these cases, but that's about it. tcpdump is able to access the header of these skbs which is in host memory, but not the payload in device memory. Here is an example session with my netcat-like test for device memory TCP: https://pastebin.com/raw/FRjKf0kv tcpdump seems to work, and the length of the packets above is correct. tcpdump -A however isn't able to print the payload of the packets: https://pastebin.com/raw/2PcNxaZV -- Thanks, Mina