Re: [PATCH 1/5] drm/amdgpu: allow direct submission in the VM backends

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 19.07.19 um 00:34 schrieb Kuehling, Felix:
On 2019-07-18 4:47 a.m., Christian König wrote:

[SNIP]
There is also a more general issue with direct submission that I found
while you were on vacation. There is no locking of the ring buffer. So
direct and non-direct submission to the same ring is broken at the moment.

Correct. That's why I reserved an extra paging queue for this.

Thinking more about it I've found that this is actually quite a killer argument for using a separate scheduler entity, cause for updates outside of the fault handler we actually need to have everything in order here.

[SNIP]
Yeah, the root PD always exists as long as the VM exists. Everything
else can be created/destroyed/moved dynamically.
Yeah, exactly. The question is how do we want to keep the root PD in
place?

We could still add the fence or we could pin it permanently.
Right. I was thinking permanent pinning can lead to fragmentation. It
would be good if those small root PDs could be moved around to make room
for bigger contiguous allocations when needed.

Yeah, exactly my fear as well.

It's just that adding the user CS to the root PD will definitely make it leak into the memory management.

[SNIP]
I mean it was your requirement that we have a mix of page fault and
pre-filled page tables in the same process.
Right. There are a few different requirements:

  1. Disable retry faults and instruction replay for a VM completely
     (better performance for ML shaders)
  2. Pre-fill page tables even when retry faults are enabled

In case #2 we could deal with page tables being evicted (not fenced).
But MM engines that don't support retry faults would throw a wrench in
this idea.

Yeah, MM-engines still need to add all user CS fences to the BOs for the foreseeable future.

So there will be a mix of page fault and fenced handling in the MM anyway, the optional pre-filling doesn't make that any more complicated.

The only way around that is to allocate the new page tables with the
no_wait_gpu flag set and so avoid having any dependencies on ongoing
operations.
We discussed this before. I suggested an emergency pool for page tables.
That pool can have a limited size. If page tables don't have user fences
on them, they can always be evicted, so we can always make room in this
emergency pool.
You underestimate the problem. For page tables I can make sure rather
easily that we can always allocate something, but ALL allocations made
during page fault can't depend on user CS.

This means we need to use this for pages which are used for HMM based
migration and for this you can't have a fixed pool.
That is if you do migration in the page fault handler. We could do
migration outside of the page fault handler. See below.

You can always have the case that a page fault will move back something which was evicted or swapped out.

But you do need fences on page tables related to the allocation and
migration of page tables themselves. And your page table updates must
wait for those fences. Therefore I think the whole approach of direct
submission for page table updates is fundamentally broken.
For the reasons noted above you can't have any fences related to the
allocation and migration on page tables.

What can happen later on is that you need to wait for a BO move to
finish before we can update the page tables.
A page table updated coming from a page fault handler should never
have  could want to migrate memory
to wait for any BO move. The fact that there was a page fault means,
someone is trying to access this memory right now.
Well essentially with HMM we want to migrate memory to VRAM during the
page fault handler, don't we?
The page fault handler could inform migration decisions. But the
migration itself doesn't need to be in the page fault handler. A
migration can be on an application thread (e.g. triggered by an hbind or
similar call) or on a worker thread that gets triggered by asynchonous
events such as page faults, polling of performance counters, etc. A
migration would trigger an MMU notifier that would invalidate the
respective page table entries. Updating the page table entries with the
new physical addresses would happen likely in a page fault handler after
the migration is complete.

Well now you are limiting the use case to HMM. As I noted before with both Vega and Navi based hw generations we probably can't always do this because of performance restrictions.

To minimize the amount of page faults while the migration is in progress
we could also put PTEs in silent retry mode first. After the migration
is complete we could update the PTEs with non-silent retry or with the
new physical addresses.

Way to risky. We can only do this if we can be absolutely sure that the migration is happening.

E.g. we can do this if we have a BO which is moving under driver control, but not with an MMU notifier. The handling here is simply not under driver control.

But I think that this is a completely different operation which
shouldn't be handled in the fault handler.
Right. If you have page table updates done to prepare for a CS, they can
depend on use fences. Page table updates done as part of the page fault
handler must not. Again, I think this could be handled by using separate
scheduler entities to avoid false dependencies.
Agreed, but using a separate entity means that we are sending the
updates to a separate kernel thread first which then commits them to
the ring buffer.

I was already a step further and thought that we can avoid this extra
overhead and write directly to the ring buffer.
OK. If we can justify the assumptions made in the direct submission
code. Basically we can only rely on implicit synchronization with other
operation that use direct submission. That means all page table
migration would have to use direct submission. Or we can't migrate page
tables and instead reallocate them every time.

Correct. Well I will drop this approach for now and just use a separate entity instead.

In those cases the fault handler would just silence the retry fault
and continue crunching on other faults.
I don't think that's the right approach. If you have a retry f
ault for a
virtual address, it means you already have something running on the GPU
accessing it. It can't be something that depends on an in-flight page
table update, because the scheduler would not have emitted that to the
ring. You either need to fill in a valid address, or if there is nothing
mapped at that address (yet), treat it as an application error and
convert it into a no-retry-fault which kills the application.
Mhm, and what do we do if we want to migrate a page to VRAM in a fault
handler?

I mean that's what HMM is mostly all about, isn't it?
Not really. I don't think we need or want to migrate in a page fault
handler. It's the other way around. Page faults may be the result of a
migration, because it would trigger an MMU notifier that invalidates
PTEs. Page faults are GPU-specific. Migrations can affect page tables on
multiple GPUs that access the same pages.

We'll need to deal with some interesting cases when multiple GPUs fault
on the same page nearly at the same time. In that case it may be better
to leave that page in system memory where it can be accessed by multiple
GPUs, rather than bouncing it around between GPUs. With XGMI we'd have
even more options to consider.

Initially I'm planning not to put too much intelligence into any
automatic migration heuristics and rely more on hints from applications.
Get the mechanics right first, then add policy and heuristics on top.

Yeah, completely agree.

Christian.


Regards,
    Felix


Regards,
Christian.

Regards,
     Felix

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux