Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

Christian König <christian.koenig@xxxxxxx> · Fri, 3 Jun 2022 19:41:55 +0200

Am 03.06.22 um 15:23 schrieb Bas Nieuwenhuizen:
On Fri, Jun 3, 2022 at 2:49 PM Christian König <christian.koenig@xxxxxxx> wrote:
Am 03.06.22 um 14:39 schrieb Bas Nieuwenhuizen:
On Fri, Jun 3, 2022 at 2:08 PM Christian König <christian.koenig@xxxxxxx> wrote:
Am 03.06.22 um 13:07 schrieb Bas Nieuwenhuizen:
On Fri, Jun 3, 2022 at 12:16 PM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 03.06.22 um 12:08 schrieb Bas Nieuwenhuizen:
[SNIP]
I do have to fix some stuff indeed, especially for the GEM close but
with that we should be able to keep the same basic approach?
Nope, not even remotely.

What we need is the following:
1. Rolling out my drm_exec patch set, so that we can lock buffers as needed.
2. When we get a VM operation we not only lock the VM page tables, but
also all buffers we potentially need to unmap.
3. Nuking the freed list in the amdgpu_vm structure by updating freed
areas directly when they are unmapped.
4. Tracking those updates inside the bo_va structure for the BO+VM
combination.
5. When the bo_va structure is destroy because of closing the handle
move the last clear operation over to the VM as implicit sync.

Hi Christian, isn't that a different problem though (that we're also
trying to solve, but in your series)?

What this patch tries to achieve:

(t+0) CS submission setting BOOKKEEP fences (i.e. no implicit sync)
(t+1) a VM operation on a BO/VM accessed by the CS.

to run concurrently. What it *doesn't* try is

(t+0) a VM operation on a BO/VM accessed by the CS.
(t+1) CS submission setting BOOKKEEP fences (i.e. no implicit sync)

to run concurrently. When you write

Only when all this is done we then can resolve the dependency that the
CS currently must wait for any clear operation on the VM.
isn't that all about the second problem?
No, it's the same.

See what we do in the VM code is to artificially insert a bubble so that
all VM clear operations wait for all CS operations and then use the
clear fence to indicate when the backing store of the BO can be freed.
Isn't that remediated with something like the code below? At least the
gem_close case should be handled with this, and the move case was
already handled by the copy operation.
That is one necessary puzzle piece, yes. But you need more than that.

Especially the explicit unmap operation needs to be converted into an
implicit unmap to get the TLB flush right.
This doesn't change anything about the TLB flush though? Since all
unmap -> later jobs dependencies are still implicit.

So the worst what could happen (i.f. e.g. userspace gets the
waits/dependencies wrong) is

1) non-implicit CS gets submitted that touches a BO
2)  VM unmap on that BO happens
2.5) the CS from 1 is still active due to missing dependencies
2.6) but any CS submission after 2 will trigger a TLB flush
Yeah, but that's exactly the bubble we try to avoid. Isn't it?
For this series, not really.  To clarify there are two sides for
getting GPU bubbles and no overlap:

(1) VM operations implicitly wait for earlier CS submissions
(2) CS submissions implicitly wait for earlier VM operations

Together, these combine to ensure that you get a (potentially small)
bubble any time VM work happens.

Your series (and further ideas) tackles (2), and is a worthwhile thing
to do. However, while writing the userspace for this I noticed this
isn't enough to get rid of all our GPU bubbles. In particular when
doing a non-sparse map of a new BO, that tends to need to be waited on
for the next CS anyway for API semantics. Due to VM operations
happening on a single timeline that means this high priority map can
end up being blocked by earlier sparse maps and hence the bubble in
that case still exists.

So in this series I try to tackle (1) instead. Since GPU work
typically lags behind CPU submissions and VM operations aren't that
slow, we can typically execute VM operations early enough that any
implicit syncs from (2) are less/no issue.

Ok, once more since you don't seem to understand what I want to say: It 
isn't possible to fix #1 before you have fixed #2.

The VM unmap operation here is a barrier which divides the CS operations 
in a before and after. This is intentional design.

To get rid of this barrier you must first fix the part where CS 
submissions wait for the VM operation to complete, e.g. the necessity of 
the barrier.

I'm working on this for a couple of years now and I'm really running out 
of idea how to explain this restriction.

Regards,
Christian.