Re: [PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]<

 





On 16/10/2024 7:49, Christoph Hellwig wrote:
The subject does not make sense.  All P2P is on ZONE_DEVICE pages.
It seems like this is about device private memory?

Correct, thanks, I will change it to `mm/hmm: HMM API to enable P2P DMA for device private pages`, on v2.

On Tue, Oct 15, 2024 at 06:23:45PM +0300, Yonatan Maman wrote:
From: Yonatan Maman <Ymaman@xxxxxxxxxx>

hmm_range_fault() natively triggers a page fault on device private
pages, migrating them to RAM.

That "natively" above doesn't make sense to me.
What I meant to convey is that hmm_range_fault() by default triggered a page fault on device private pages, which results in them being migrated to RAM. In this patch, we are trying to provide a different (more optimized) handling flow of P2P instead of migration.

I will clarify this on v2.

In some cases, such as with RDMA devices,
the migration overhead between the device (e.g., GPU) and the CPU, and
vice-versa, significantly damages performance. Thus, enabling Peer-to-

s/damages/degrades/
Fixed (v2), thanks

Peer (P2P) DMA access for device private page might be crucial for
minimizing data transfer overhead.

This change introduces an API to support P2P connections for device
private pages by implementing the following:

"This change.. " or "This patch.." is pointless, just explain what you
are doing.



Fixed (v2), thanks


  - Leveraging the struct pagemap_ops for P2P Page Callbacks. This
    callback involves mapping the page to MMIO and returning the
    corresponding PCI_P2P page.

While P2P uses the same underlying PCIe TLPs as MMIO, it is not
MMIO by definition, as memory mapped I/O is by definition about
the CPU memory mappping so that load and store instructions cause
the I/O.  It also uses very different concepts in Linux.


Got it. I just wanted to emphasize that this function could be used to dynamically map the page for MMIO. However, I understand the potential confusion between MMIO and P2P, so I’ll remove that part.

  - Utilizing hmm_range_fault for Initializing P2P Connections. The API

There is no concept of a "connection" in PCIe dta transfers.


what about "initializing P2P Transfers" or "initializing P2P DMA"?

    also adds the HMM_PFN_REQ_TRY_P2P flag option for the
    hmm_range_fault caller to initialize P2P. If set, hmm_range_fault
    attempts initializing the P2P connection first, if the owner device
    supports P2P, using p2p_page. In case of failure or lack of support,
    hmm_range_fault will continue with the regular flow of migrating the
    page to RAM.

What is the need for the flag?  As far as I can tell from reading
the series, the P2P mapping is entirely transparent to the callers
of hmm_range_fault.


This flag mirrors the GUP flag (FOLL_PCI_P2PDMA). The purpose of this flag is to ensure compatibility with existing flows that utilizing hmm_range_fault but may not inherently support PCI P2P.

+	/*
+	 * Used for private (un-addressable) device memory only. Return a
+	 * corresponding struct page, that can be mapped to device
+	 * (e.g using dma_map_page)
+	 */
+	struct page *(*get_dma_page_for_device)(struct page *private_page);

We are talking about P2P memory here.  How do you manage to get a page
that dma_map_page can be used on?  All P2P memory needs to use the P2P
aware dma_map_sg as the pages for P2P memory are just fake zone device
pages.


According to our experiments dma_map_page is fully worked with PCI_P2P pages, I will double check that. If not we can either return this information using HMM flags or adding support to dma_map_page.


+		 * P2P for supported pages, and according to caller request
+		 * translate the private page to the match P2P page if it fails
+		 * continue with the regular flow
+		 */
+		if (is_device_private_entry(entry)) {
+			get_dma_page_handler =
+				pfn_swap_entry_to_page(entry)
+					->pgmap->ops->get_dma_page_for_device;
+			if ((hmm_vma_walk->range->default_flags &
+			    HMM_PFN_REQ_ALLOW_P2P) &&
+			    get_dma_page_handler) {
+				dma_page = get_dma_page_handler(
+					pfn_swap_entry_to_page(entry));

This is really messy.  You probably really want to share a branch
with the private page handling for the owner so that you only need
a single is_device_private_entry and can use a local variable for
to shortcut finding the page.  Probably best done with a little helper:

Then  this becomes:

static bool hmm_handle_device_private(struct hmm_range *range,
		swp_entry_t entry, unsigned long *hmm_pfn)
{
	struct page *page = pfn_swap_entry_to_page(entry);
	struct dev_pagemap *pgmap = page->pgmap;

	if (pgmap->owner == range->dev_private_owner) {
		*hmm_pfn = swp_offset_pfn(entry);
		goto found;
	}

	if (pgmap->ops->get_dma_page_for_device) {
		*hmm_pfn =
			page_to_pfn(pgmap->ops->get_dma_page_for_device(page));
		goto found;
	}

	return false;

found:
	*hmm_pfn |= HMM_PFN_VALID
	if (is_writable_device_private_entry(entry))
		*hmm_pfn |= HMM_PFN_WRITE;
	return true;
}

which also makes it clear that returning a page from the method is
not that great, a PFN might work a lot better, e.g.

	unsigned long (*device_private_dma_pfn)(struct page *page);

I Agree, I will fix that (v2).



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux