Re: [PATCH 1/5] drm/amdgpu: allow direct submission in the VM backends

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Thu, 18 Jul 2019 10:47:42 +0200

Am 17.07.19 um 22:31 schrieb Kuehling, Felix:
On 2019-07-17 5:10, Christian König wrote:
Am 16.07.19 um 18:40 schrieb Kuehling, Felix:
On 2019-07-16 9:36 a.m., Christian König wrote:
Am 02.07.19 um 21:35 schrieb Kuehling, Felix:
This assumes that page tables are resident when a page fault is
handled.
Yeah, that is correct. I also haven't completely figured out how we
can prevent the VM from being destroyed while handling the fault.
There are other cases I had in mind: Page tables can be evicted. For KFD
processes which can be preempted with CWSR, it's possible that a wave
that caused a page fault is preempted due to a page table eviction. That
means, by the time the page fault is handled, the page table is no
longer resident.
This is a corner case we can handle later on. As long as the VM is
still alive just allocating page tables again should be sufficient for
this.
Do you mean, instead of migrating page tables back, throwing them away
and allocating a new one?

Yes, exactly that's the idea here.

Also, this may be a corner case. But I feel you're limiting yourself to
a small range of current use cases. I'm not convinced that the design
you're building here will scale to future use cases for HMM updating
page tables for random virtual addresses. I'm looking for a general
solution that will work for those future use cases. Otherwise we'll end
up having to rewrite this page-table-update-in-fault-handler code from
scratch in a month or two.

Well actually I'm keeping mostly HMM in mind. Filling page tables on 
demand is just a step in between.

I also want to support a use case where per-VM-BOs are swapped in and 
out on demand, but I think that we will just use that for testing.

I mean it's perfectly possible that the process is killed while faults
are still in the pipeline.

I think it's possible that a page table gets evicted while a page
fault
is pending. Maybe not with graphics, but definitely with compute where
waves can be preempted while waiting for a page fault. In that case
the
direct access would break.

Even with graphics I think it's still possible that new page tables
need
to be allocated to handle a page fault. When that happens, you need to
wait for fences to let new page tables be validated and initialized.
Yeah, the problem here is that when you wait on fences which in turn
depend on your submission your end up in a deadlock.

I think this implies that you have amdgpu_cs fences attached to page
tables. I believe this is the fundamental issue that needs to be fixed.
We still need this cause even with page faults the root PD can't be
evicted.

What we can probably do is to split up the PDs/PTs into the root PD
and everything else.
Yeah, the root PD always exists as long as the VM exists. Everything
else can be created/destroyed/moved dynamically.

Yeah, exactly. The question is how do we want to keep the root PD in place?

We could still add the fence or we could pin it permanently.

If you want to manage page tables in page fault interrupt handlers, you
can't have command submission fences attached to your page tables. You
can allow page tables to be evicted while the command submission is in
progress. A page fault will fault it back in if it's needed. If you
eliminate command submission fences on the page tables, you remove the
potential for deadlocks.
No, there is still a huge potential for deadlocks here.

Additional to the root PDs you can have a MM submission which needs to
wait for a compute submission to be finished.
I assume by MM you mean "memory manger", not "multi-media". [SNIP]

Sorry I meant "multi-media", so just snipped your response.

What I want to say here is that I don't believe we can keep user CS 
fences our of memory management.

See there can be submission from engines which don't support or don't 
want to enabled recoverable page faults which depend on submissions 
which do use recoverable page faults.

I mean it was your requirement that we have a mix of page fault and 
pre-filled page tables in the same process.

If you then make your new allocation depend on the MM submission to be
finished you have a classical circle dependency and a deadlock at hand.
I don't see it. Allocate page table, wait for fence associated with that
page table initialization, update PTEs. At no point do we depend on the
user CS being stalled by the page fault. There is not user submission on
the paging ring. Anything that has been scheduled on the paging ring has
its dependencies satisfied.

Allocation is the main problem here. We need to make sure that we never 
ever depend on user CS when making memory allocation in the page fault 
handler.

We may need separate scheduler entities
(queues) for regular MM submissions that can depend on user fences and
VM submissions that must not.

Yeah, thought about that as well but even then you need a way to note 
that you want to use this separate entity.

The only way around that is to allocate the new page tables with the
no_wait_gpu flag set and so avoid having any dependencies on ongoing
operations.
We discussed this before. I suggested an emergency pool for page tables.
That pool can have a limited size. If page tables don't have user fences
on them, they can always be evicted, so we can always make room in this
emergency pool.

You underestimate the problem. For page tables I can make sure rather 
easily that we can always allocate something, but ALL allocations made 
during page fault can't depend on user CS.

This means we need to use this for pages which are used for HMM based 
migration and for this you can't have a fixed pool.

But you do need fences on page tables related to the allocation and
migration of page tables themselves. And your page table updates must
wait for those fences. Therefore I think the whole approach of direct
submission for page table updates is fundamentally broken.
For the reasons noted above you can't have any fences related to the
allocation and migration on page tables.

What can happen later on is that you need to wait for a BO move to
finish before we can update the page tables.
A page table updated coming from a page fault handler should never have  could want to migrate memory
to wait for any BO move. The fact that there was a page fault means,
someone is trying to access this memory right now.

Well essentially with HMM we want to migrate memory to VRAM during the 
page fault handler, don't we?

But I think that this is a completely different operation which
shouldn't be handled in the fault handler.
Right. If you have page table updates done to prepare for a CS, they can
depend on use fences. Page table updates done as part of the page fault
handler must not. Again, I think this could be handled by using separate
scheduler entities to avoid false dependencies.

Agreed, but using a separate entity means that we are sending the 
updates to a separate kernel thread first which then commits them to the 
ring buffer.

I was already a step further and thought that we can avoid this extra 
overhead and write directly to the ring buffer.

In those cases the fault handler would just silence the retry fault
and continue crunching on other faults.
I don't think that's the right approach. If you have a retry f
ault for a
virtual address, it means you already have something running on the GPU
accessing it. It can't be something that depends on an in-flight page
table update, because the scheduler would not have emitted that to the
ring. You either need to fill in a valid address, or if there is nothing
mapped at that address (yet), treat it as an application error and
convert it into a no-retry-fault which kills the application.

Mhm, and what do we do if we want to migrate a page to VRAM in a fault 
handler?

I mean that's what HMM is mostly all about, isn't it?

Regards,
Christian.

Regards,
    Felix

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx