Re: [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Tue, 28 Jan 2025 15:48:54 +0100

On Tue, 2025-01-28 at 09:20 -0400, Jason Gunthorpe wrote:
> On Tue, Jan 28, 2025 at 09:51:52AM +0100, Thomas Hellström wrote:
> 
> > How would the pgmap device know whether P2P is actually possible
> > without knowing the client device, (like calling
> > pci_p2pdma_distance)
> > and also if looking into access control, whether it is allowed?
> 
> The DMA API will do this, this happens after this patch is put on top
> of Leon's DMA API patches. The mapping operation will fail and it
> will
> likely be fatal to whatever is going on.
>  
> get_dma_pfn_for_device() returns a new PFN, but that is not a DMA
> mapped address, it is just a PFN that has another struct page under
> it.
> 
> There is an implicit assumption here that P2P will work and we don't
> need a 3rd case to handle non-working P2P..

OK. We will have the case where we want pfnmaps with driver-private
fast interconnects to return "interconnect possible, don't migrate"
whereas possibly other gpus and other devices would return
"interconnect unsuitable, do migrate", so (as I understand it)
something requiring a more flexible interface than this.

> 
> > but leaves any dma- mapping or pfn mangling to be done after the
> > call to hmm_range_fault(), since hmm_range_fault() really only
> > needs
> > to know whether it has to migrate to system or not.
> 
> See above, this is already the case..

Well what I meant was at hmm_range_fault() time only consider whether
to migrate or not. Afterwards at dma-mapping time you'd expose the
alternative pfns that could be used for dma-mapping.

We were actually looking at a solution where the pagemap implements
something along

bool devmem_allowed(pagemap, client); //for hmm_range_fault

plus dma_map() and dma_unmap() methods.

In this way you'd don't need to expose special p2p dma pages and the
interface could also handle driver-private interconnects, where
dma_maps and dma_unmap() methods become trivial.

> 
> > One benefit of using this alternative
> > approach is that struct hmm_range can be subclassed by the caller
> > and
> > for example cache device pairs for which p2p is allowed.
> 
> If you want to directly address P2P non-uniformity I'd rather do it
> directly in the core code than using a per-driver callback. Every
> driver needs exactly the same logic for such a case.

Yeah, and that would look something like the above, although initially
we intended to keep these methods in drm allocator around its pagemaps,
but could of course look into doing this directly in dev_pagemap ops. 
But still would probably need some guidance into what's considered
acceptable, and I don't think the solution proposed in this patch meets
our needs.

Thanks,
Thomas

> 
> Jason