Jason Gunthorpe wrote: > On Mon, Jul 10, 2023 at 04:02:59PM -0700, Mina Almasry wrote: > > On Mon, Jul 10, 2023 at 10:44 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > On Wed, Jul 05, 2023 at 06:17:39PM -0700, Mina Almasry wrote: > > > > > > > Another issue is that in networks with low MTU, we could be DMAing > > > > 1400/1500 bytes into each allocation, which is problematic if the > > > > allocation is 8K+. I would need to investigate a bit to see if/how to > > > > solve that, and we may end up having to split the page and again run > > > > into the 'not enough room in struct page' problem. > > > > > > You don't have an intree driver to use this with, so who knows, but > > > the out of tree GPU drivers tend to use a 64k memory management page > > > size, and I don't expect you'd make progress with a design where a 64K > > > naturaly sized allocator is producing 4k/8k non-compound pages just > > > for netdev. We are still struggling with pagemap support for variable > > > page size folios, so there is a bunch of technical blockers before > > > drivers could do this. > > > > > > This is why it is so important to come with a complete in-tree > > > solution, as we cannot review this design if your work is done with > > > hacked up out of tree drivers. > > > > > > > I think you're assuming the proposal requires dma-buf exporter driver > > changes, and I have a 'hacked up out of tree driver' not visible to > > you. > > Oh, I thought it was obvious what you did in patch 1 was a total > non-starter when I said you can't abuse the ZONE_DEVICE pages like > this. > > You must create ZONE_DEVICE P2P pages, not MEMORY_DEVICE_PRIVATE to > represent P2P memory, and you can't do that automatically from the > dmabuf core code. > > Without doing this the DMA API doesn't actually work properly because > it doesn't have enough information to know about what the underlying > exporter is. > > The entire point of DEVICE_PRIVATE is that the page content, and > access to the page's physical location, is *explicitly* unavailable to > anyone but the pgmap owner. > > > > Fully and properly adding P2P ZONE_DEVICE to a real world driver is a > > > pretty big ask still. > > > > There is no such ask. > > Well, there is from me if you want to use stuct pages as handles for > P2P memory. That is what we have defined in the pgmap area. > > Also I should warn you that your 'option 2' is being NAK'd by > Christoph for now, we are not adding any new code around DMABUF's > hacky use of NULL sg_page scatterlist for P2P memory either. I've been > working on solutions here but it is slow going. > > > On dma-buf changes required. I do need approval from the dma-buf > > maintainers, > > It is a neat hack, of sorts, to abuse DEVICE_PRIVATE to create struct > pages for the exclusive use of pagepool - but you need more approval > than just dmabuf maintainers to abuse the pgmap framework like > this. > > At least from my position I want to see MEMORY_DEVICE_PCI_P2PDMA used > to represent P2P memory. You haven't CC'd anyone from the mm community > so I've added some people here to see if there are other opinions. > > To be clear, you need an explicit ack from mm people on the abusive > use of pgmap in patch 1. This thread is long so I am only reacting to this first message I am copied on, but yes agree with Jason anything peer-to-peer DMA needs to reuse p2pdma and it definitely needs in-tree producers and consumers for all infrastructure. One piece of technical debt standing in the way of any proposed expansion of pgmap usage is the final resolution of this topic: https://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx/ I am also generally reticent to entertain taking on new pgmap maintenance. I.e. now that accelerator memory attached over a coherent link is an open hardware standard (CXL) that addresses what pgmap infrastructure enabled in software. > I know it is not what you want to hear, but there are actual reasons > why the P2P DMA problem has been festering for so long, and hacky > quick fixes like this are not going to be enough.. > > Jason