Hi Christian,
I'm thinking about this problem, trying to come up with a solution. The
fundamental problem is that we need low-overhead access to the page
table in the MMU notifier, without much memory management or locking.
There is one "driver lock" that we're supposed to take in the MMU
notifier as well as when we update page tables that is prescribed by the
HMM documentation (Documentation/vm/hmm.rst). I don't currently see such
a lock in amdgpu. We'll probably need to add that anyway, with all the
usual precautions about lock dependencies around MMU notifiers. Then we
could use that lock to protect page table residency state, in addition
to the reservation of the top-level page directory.
We don't want to block eviction of page tables unconditionally, so the
MMU notifier must be able to deal with the situation that page tables
are not resident at the moment. But the lock can delay page tables from
being evicted while an MMU notifier is in progress and protect us from
race conditions between MMU notifiers invalidating PTEs, and page tables
getting evicted.
amdgpu_vm_bo_invalidate could detect when a page table is being evicted,
and update a new "vm_resident" flag inside the amdgpu_vm while holding
the "HMM driver lock". If an MMU notifier is in progress, trying to take
the "HMM driver lock" will delay the eviction long enough for any
pending PTE invalidation to complete.
In the case that page tables are not resident (vm_resident flag is
false), it means the GPU is currently not accessing any memory in that
amdgpu_vm address space. So we don't need to invalidate the PTEs right
away. I think we could implement a deferred invalidation mechanism for
this case, that delays the invalidation until the next time the page
tables are made resident. amdgpu_amdkfd_gpuvm_restore_process_bos would
replay any deferred PTE invalidations after validating the page tables
but before restarting the user mode queues for the process. If graphics
ever implements page-fault-based memory management, you'd need to do
something similar in amdgpu_cs.
Once all that is in place, we should be able to update PTEs in MMU
notifiers without reserving the page tables.
If we use SDMA for updating page tables, we may need a pre-allocated IB
for use in MMU notifiers. And there is problably other details to be
worked out about exactly how we implement the PTE invalidation in MMU
notifiers and reflecting that in the state of the amdgpu_vm and
amdgpu_bo_va_mapping.
Does this idea sound reasonable to you? Can you think of a simpler solution?
Thanks,
Felix
On 2019-11-27 10:02 a.m., Christian König wrote:
Hi Alejandro,
yes I'm very aware of this issue, but unfortunately can't give an easy
solution either.
I'm working for over a year now on getting this fixed, but
unfortunately it turned out that this problem is much bigger than
initially thought.
Setting the appropriate GFP flags for the job allocation is actually
the trivial part.
The really really hard thing is that we somehow need to add a lock to
prevent the page tables from being evicted. And as you also figured
out that lock can't be taken easily anywhere else.
I've already wrote a prototype for this, but didn't had time to hammer
it into shape for upstreaming yet.
Regards,
Christian.
Am 27.11.19 um 15:55 schrieb Sierra Guiza, Alejandro (Alex):
Hi Christian,
As you know, we’re working on the HMM enablement. Im working on the
dGPU page table entries invalidation on the userptr mapping case.
Currently, the MMU notifiers handle stops all user mode queues,
schedule a delayed worker to re-validate userptr mappings and restart
the queues.
Part of the HMM functionality, we need to invalidate the page table
entries instead of stopping the queues. At the same time we need to
move the revalidation of the userptr mappings into the page fault
handler.
We’re seeing a deadlock warning after we try to invalidate the PTEs
inside the MMU notifier handler. More specific, when we try to update
the BOs to invalidate PTEs using amdgpu_vm_bo_update. This uses
kmalloc on the amdgpu_job_alloc which seems to be causing this problem.
Based on @Kuehling, Felix <mailto:Felix.Kuehling@xxxxxxx> comments,
kmalloc without any special flags can cause memory reclaim. Doing
that inside an MMU notifier is problematic, because an MMU notifier
may be called inside a memory-reclaim operation itself. That would
result in recursion. Also, reclaim shouldn't be done while holding a
lock that can be taken in an MMU notifier for the same reason. If you
cause a reclaim while holding that lock, then an MMU notifier called
by the reclaim can deadlock trying to take the same lock.
Please let us know if you have any advice to enable this the right way
Thanks in advanced,
Alejandro
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx