Hi all,
We're noticing some unexpected behaviour when the amdgpu and Mellanox
drivers are interacting on shared memory with hmm_range_fault. If the
amdgpu driver migrated pages to DEVICE_PRIVATE memory, we would expect
hmm_range_fault called by the Mellanox driver to fault them back to
system memory. But that's not happening. Instead hmm_range_fault fails.
For an experiment, Philip hacked hmm_vma_handle_pte to treat
DEVICE_PRIVATE pages like device_exclusive pages, which gave us the
expected behaviour. It would result in a dev_pagemap_ops.migrate_to_ram
callback in our driver, and hmm_range_fault would return system memory
pages to the Mellanox driver.
So something is clearly wrong. It could be:
* our expectations are wrong,
* the implementation of hmm_range_fault is wrong, or
* our driver is missing something when migrating to DEVICE_PRIVATE memory.
Do you have any insights?
Thank you,
Felix