Yonatan Maman <ymaman@xxxxxxxxxx> writes: > From: Yonatan Maman <Ymaman@xxxxxxxxxx> > > hmm_range_fault() natively triggers a page fault on device private > pages, migrating them to RAM. In some cases, such as with RDMA devices, > the migration overhead between the device (e.g., GPU) and the CPU, and > vice-versa, significantly damages performance. Thus, enabling Peer-to- > Peer (P2P) DMA access for device private page might be crucial for > minimizing data transfer overhead. > > This change introduces an API to support P2P connections for device > private pages by implementing the following: > > - Leveraging the struct pagemap_ops for P2P Page Callbacks. This > callback involves mapping the page to MMIO and returning the > corresponding PCI_P2P page. > > - Utilizing hmm_range_fault for Initializing P2P Connections. The API > also adds the HMM_PFN_REQ_TRY_P2P flag option for the > hmm_range_fault caller to initialize P2P. If set, hmm_range_fault > attempts initializing the P2P connection first, if the owner device > supports P2P, using p2p_page. In case of failure or lack of support, > hmm_range_fault will continue with the regular flow of migrating the > page to RAM. > > This change does not affect previous use-cases of hmm_range_fault, > because both the caller and the page owner must explicitly request and > support it to initialize P2P connection. > > Signed-off-by: Yonatan Maman <Ymaman@xxxxxxxxxx> > Reviewed-by: Gal Shalom <GalShalom@xxxxxxxxxx> > --- > include/linux/hmm.h | 2 ++ > include/linux/memremap.h | 7 +++++++ > mm/hmm.c | 28 ++++++++++++++++++++++++++++ > 3 files changed, 37 insertions(+) > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 126a36571667..7154f5ed73a1 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -41,6 +41,8 @@ enum hmm_pfn_flags { > /* Input flags */ > HMM_PFN_REQ_FAULT = HMM_PFN_VALID, > HMM_PFN_REQ_WRITE = HMM_PFN_WRITE, > + /* allow returning PCI P2PDMA pages */ > + HMM_PFN_REQ_ALLOW_P2P = 1, > > HMM_PFN_FLAGS = 0xFFUL << HMM_PFN_ORDER_SHIFT, > }; > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 3f7143ade32c..0ecfd3d191fa 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -89,6 +89,13 @@ struct dev_pagemap_ops { > */ > vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf); > > + /* > + * Used for private (un-addressable) device memory only. Return a > + * corresponding struct page, that can be mapped to device > + * (e.g using dma_map_page) > + */ > + struct page *(*get_dma_page_for_device)(struct page *private_page); It would be nice to add some documentation about this feature to Documentation/mm/hmm.rst. In particular some notes on the page lifetime/refcounting rules. On that note how is the refcounting of the returned p2pdma page expected to work? We don't want the driver calling hmm_range_fault() to be able to pin the page with eg. get_page(), so the returned p2pdma page should have a zero refcount to enforce that. > + > /* > * Handle the memory failure happens on a range of pfns. Notify the > * processes who are using these pfns, and try to recover the data on > diff --git a/mm/hmm.c b/mm/hmm.c > index 7e0229ae4a5a..987dd143d697 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -230,6 +230,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > unsigned long cpu_flags; > pte_t pte = ptep_get(ptep); > uint64_t pfn_req_flags = *hmm_pfn; > + struct page *(*get_dma_page_handler)(struct page *private_page); > + struct page *dma_page; > > if (pte_none_mostly(pte)) { > required_fault = > @@ -257,6 +259,32 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > return 0; > } > > + /* > + * P2P for supported pages, and according to caller request > + * translate the private page to the match P2P page if it fails > + * continue with the regular flow > + */ > + if (is_device_private_entry(entry)) { > + get_dma_page_handler = > + pfn_swap_entry_to_page(entry) > + ->pgmap->ops->get_dma_page_for_device; > + if ((hmm_vma_walk->range->default_flags & > + HMM_PFN_REQ_ALLOW_P2P) && > + get_dma_page_handler) { > + dma_page = get_dma_page_handler( > + pfn_swap_entry_to_page(entry)); > + if (!IS_ERR(dma_page)) { > + cpu_flags = HMM_PFN_VALID; > + if (is_writable_device_private_entry( > + entry)) > + cpu_flags |= HMM_PFN_WRITE; > + *hmm_pfn = page_to_pfn(dma_page) | > + cpu_flags; > + return 0; > + } > + } > + } > + > required_fault = > hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); > if (!required_fault) {