Re: hmm_range_fault interaction between different drivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




    
On 7/22/22 08:34, Jason Gunthorpe wrote:
On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
Hi all,

We're noticing some unexpected behaviour when the amdgpu and Mellanox
drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
driver migrated pages to DEVICE_PRIVATE memory, we would expect
hmm_range_fault called by the Mellanox driver to fault them back to system
memory. But that's not happening. Instead hmm_range_fault fails.

For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
pages like device_exclusive pages, which gave us the expected behaviour. It
would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
hmm_range_fault would return system memory pages to the Mellanox driver.

So something is clearly wrong. It could be:

 * our expectations are wrong,
 * the implementation of hmm_range_fault is wrong, or
 * our driver is missing something when migrating to DEVICE_PRIVATE memory.

Do you have any insights?
I think it is a bug

Jason
Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls
hmm_is_device_private_entry() which correctly handles the case where
the device private entry is owned by the driver calling hmm_range_fault()
but then does nothing to fault in the page if it is a device private
entry not owned by the driver.

I'll work with Alistair and one of us will post a fix.
Thanks for finding this!

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux