Hi, Jonatan On Sun, 2024-12-01 at 12:36 +0200, Yonatan Maman wrote: > From: Yonatan Maman <Ymaman@xxxxxxxxxx> > > hmm_range_fault() by default triggered a page fault on device private > when HMM_PFN_REQ_FAULT flag was set. pages, migrating them to RAM. In > some > cases, such as with RDMA devices, the migration overhead between the > device (e.g., GPU) and the CPU, and vice-versa, significantly > degrades > performance. Thus, enabling Peer-to-Peer (P2P) DMA access for device > private page might be crucial for minimizing data transfer overhead. > > Introduced an API to support P2P DMA for device private > pages,includes: > - Leveraging the struct pagemap_ops for P2P Page Callbacks. This > callback > involves mapping the page for P2P DMA and returning the > corresponding > PCI_P2P page. > > - Utilizing hmm_range_fault for initializing P2P DMA. The API > also adds the HMM_PFN_REQ_TRY_P2P flag option for the > hmm_range_fault caller to initialize P2P. If set, hmm_range_fault > attempts initializing the P2P connection first, if the owner > device > supports P2P, using p2p_page. In case of failure or lack of > support, > hmm_range_fault will continue with the regular flow of migrating > the > page to RAM. > > This change does not affect previous use-cases of hmm_range_fault, > because both the caller and the page owner must explicitly request > and > support it to initialize P2P connection. > > Signed-off-by: Yonatan Maman <Ymaman@xxxxxxxxxx> > Signed-off-by: Gal Shalom <GalShalom@xxxxxxxxxx> > --- > include/linux/hmm.h | 3 ++- > include/linux/memremap.h | 8 ++++++ > mm/hmm.c | 57 +++++++++++++++++++++++++++++++++----- > -- > 3 files changed, 57 insertions(+), 11 deletions(-) It appears we're working on a very similar thing, (In fact the original proposals were sent out the same day. I have a couple of questions). > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 62980ca8f3c5..017f22cef893 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -26,6 +26,7 @@ struct mmu_interval_notifier; > * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output > transformation > * to mark that page is already DMA mapped > + * HMM_PFN_ALLOW_P2P - Allow returning PCI P2PDMA page > * > * On input: > * 0 - Return the current state of the page, do not > fault it. > @@ -41,7 +42,7 @@ enum hmm_pfn_flags { > HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3), > > /* Sticky flag, carried from Input to Output */ > + HMM_PFN_ALLOW_P2P = 1UL << (BITS_PER_LONG - 6), > HMM_PFN_DMA_MAPPED = 1UL << (BITS_PER_LONG - 7), > > HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 3f7143ade32c..cdf5189be5e9 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -89,6 +89,14 @@ struct dev_pagemap_ops { > */ > vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf); > > + /* > + * Used for private (un-addressable) device memory only. > Return a > + * corresponding PFN for a page that can be mapped to device > + * (e.g using dma_map_page) > + */ > + int (*get_dma_pfn_for_device)(struct page *private_page, > + unsigned long *dma_pfn); > + > /* > * Handle the memory failure happens on a range of pfns. > Notify the > * processes who are using these pfns, and try to recover > the data on > diff --git a/mm/hmm.c b/mm/hmm.c > index a852d8337c73..1c080bc00ee8 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -226,6 +226,51 @@ static inline unsigned long > pte_to_hmm_pfn_flags(struct hmm_range *range, > return pte_write(pte) ? (HMM_PFN_VALID | HMM_PFN_WRITE) : > HMM_PFN_VALID; > } > > +static bool hmm_handle_device_private(struct hmm_range *range, > + unsigned long pfn_req_flags, > + swp_entry_t entry, > + unsigned long *hmm_pfn) > +{ > + struct page *page = pfn_swap_entry_to_page(entry); > + struct dev_pagemap *pgmap = page->pgmap; > + int ret; > > + pfn_req_flags &= range->pfn_flags_mask; > + pfn_req_flags |= range->default_flags; > + > + /* > + * Don't fault in device private pages owned by the caller, > + * just report the PFN. > + */ > + if (pgmap->owner == range->dev_private_owner) { > + *hmm_pfn = swp_offset_pfn(entry); > + goto found; > + } > + > + /* > + * P2P for supported pages, and according to caller request > + * translate the private page to the match P2P page if it > fails > + * continue with the regular flow > + */ > + if (pfn_req_flags & HMM_PFN_ALLOW_P2P && > + pgmap->ops->get_dma_pfn_for_device) { > + ret = pgmap->ops->get_dma_pfn_for_device(page, > hmm_pfn); How would the pgmap device know whether P2P is actually possible without knowing the client device, (like calling pci_p2pdma_distance) and also if looking into access control, whether it is allowed? I wonder whether you could consider using something that is a little more generic that would fit also our use-case. Here the caller provides a callback as to whether devmem access is allowed, but leaves any dma- mapping or pfn mangling to be done after the call to hmm_range_fault(), since hmm_range_fault() really only needs to know whether it has to migrate to system or not. One benefit of using this alternative approach is that struct hmm_range can be subclassed by the caller and for example cache device pairs for which p2p is allowed. Current version (after the feedback from Jason looks like this). It looks like your use-case could easily be made to fit this one, but, as I understand it, not vice versa: (Could send this as a separate patch if needed). Thanks, Thomas diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 126a36571667..8ac1f4125e30 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -76,6 +76,21 @@ static inline unsigned int hmm_pfn_to_map_order(unsigned long hmm_pfn) return (hmm_pfn >> HMM_PFN_ORDER_SHIFT) & 0x1F; } +struct hmm_range; + +/** + * struct hmm_range_ops - Functions for detailed cross-device access. + */ +struct hmm_range_ops { + /** + * @devmem_allow: Whether to allow cross-device access to device_private pages. + * @hrange: Pointer to a struct hmm_range. Typically subclassed by the caller + * to provide needed information. + * @page: The page being queried. + */ + bool (*devmem_allow)(struct hmm_range *hrange, struct page *page); +}; + /* * struct hmm_range - track invalidation lock on virtual address range * @@ -87,6 +102,7 @@ static inline unsigned int hmm_pfn_to_map_order(unsigned long hmm_pfn) * @default_flags: default flags for the range (write, read, ... see hmm doc) * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter * @dev_private_owner: owner of device private pages + * @ops: Pointer to a struct hmm_range_ops or NULL if no ops provided. */ struct hmm_range { struct mmu_interval_notifier *notifier; @@ -97,6 +113,7 @@ struct hmm_range { unsigned long default_flags; unsigned long pfn_flags_mask; void *dev_private_owner; + const struct hmm_range_ops *ops; }; /* diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..ea4e08caa14a 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -220,6 +220,15 @@ static inline unsigned long pte_to_hmm_pfn_flags(struct hmm_range *range, return pte_write(pte) ? (HMM_PFN_VALID | HMM_PFN_WRITE) : HMM_PFN_VALID; } +static bool hmm_devmem_allow(struct hmm_range *range, struct page *page) +{ + if (likely(page->pgmap->owner == range->dev_private_owner)) + return true; + if (likely(!range->ops)) + return false; + return range->ops->devmem_allow(range, page); +} + static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, unsigned long end, pmd_t *pmdp, pte_t *ptep, unsigned long *hmm_pfn) @@ -245,11 +254,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, /* * Don't fault in device private pages owned by the caller, - * just report the PFN. + * or that are accessible to the caller. Just report the PFN. */ if (is_device_private_entry(entry) && - pfn_swap_entry_to_page(entry)->pgmap->owner == - range->dev_private_owner) { + hmm_devmem_allow(range, pfn_swap_entry_to_page(entry))) { cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |= HMM_PFN_WRITE; -- 2.48.1