On 2020-11-06 12:53 p.m., Jason Gunthorpe wrote: > On Fri, Nov 06, 2020 at 12:44:59PM -0700, Logan Gunthorpe wrote: >> >> >> On 2020-11-06 12:30 p.m., Jason Gunthorpe wrote: >>>> I certainly can't make decisions for code that isn't currently >>>> upstream. >>> >>> The rdma drivers are all upstream, what are you thinking about? >> >> Really? I feel like you should know what I mean here... >> >> I mean upstream code that actually uses the APIs that I'd have to >> introduce. I can't say here's an API feature that no code uses but the >> already upstream rdma driver might use eventually. It's fairly easy to >> send patches that make the necessary changes when someone adds a use of >> those changes inside the rdma code. > > Sure, but that doesn't mean you have to actively design things to be > unusable beyond this narrow case. The RDMA drivers are all there, all > upstream, if this work is accepted then the changes to insert P2P > pages into their existing mmap flows is a reasonable usecase to > consider at this point when building core code APIs. > > You shouldn't add dead code, but at least have a design in mind for > what it needs to look like and some allowance. I don't see anything I've done that's at odds with that. You will still need to make changes to the p2pdma code to implement your use case. >>>> Ultimately, if you aren't using the genpool you will have to implement >>>> your own mmap operation that somehow allocates the pages and your own >>>> page_free hook. >>> >>> Sure, the mlx5 driver already has a specialized alloctor for it's BAR >>> pages. >> >> So it *might* make sense to carve out a common helper to setup a VMA for >> P2PDMA to do the vm_flags check and set VM_MIXEDMAP... but besides that, >> there's no code that would be common to the two cases. > > I think the whole insertion of P2PDMA pages into a VMA should be > similar to io_remap_pfn() so all the VM flags, pgprot_decrypted and > other subtle details are all in one place. (also it needs a > pgprot_decrypted before doing vmf_insert, I just learned that this > month) I don't think a function like that will work for the p2pmem use case. In order to implement proper page freeing I expect I'll need a loop around the allocator and vm_insert_mixed()... Something roughly like: for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { vaddr = pci_alloc_p2pmem(pdev, PAGE_SIZE); ret = vmf_insert_mixed(vma, addr, __pfn_to_pfn_t(virt_to_pfn(vaddr), PFN_DEV | PFN_MAP)); } That way we can call pci_free_p2pmem() when a page's ref count goes to zero. I suspect your use case will need to do something similar. Logan