Re: Explicit VM updates

Christian König <christian.koenig@xxxxxxx> · Thu, 2 Jun 2022 08:47:36 +0200

Am 01.06.22 um 19:39 schrieb Felix Kuehling:

Am 2022-06-01 um 13:22 schrieb Christian König:
Am 01.06.22 um 19:07 schrieb Felix Kuehling:
Am 2022-06-01 um 12:29 schrieb Christian König:
Am 01.06.22 um 17:05 schrieb Felix Kuehling:

Am 2022-06-01 um 08:40 schrieb Christian König:
Hey guys,

so today Bas came up with a new requirement regarding the 
explicit synchronization to VM updates and a bunch of prototype 
patches.

I've been thinking about that stuff for quite some time before, 
but to be honest it's one of the most trickiest parts of the driver.

So my current thinking is that we could potentially handle those 
requirements like this:

1. We add some new EXPLICIT flag to context (or CS?) and VM 
IOCTL. This way we either get the new behavior for the whole 
CS+VM or the old one, but never both mixed.

2. When memory is unmapped we keep around the last unmap 
operation inside the bo_va.

3. When memory is freed we add all the CS fences which could 
access that memory + the last unmap operation as BOOKKEEP fences 
to the BO and as mandatory sync fence to the VM.

Memory freed either because of an eviction or because of 
userspace closing the handle will be seen as a combination of 
unmap+free.

The result is the following semantic for userspace to avoid 
implicit synchronization as much as possible:

1. When you allocate and map memory it is mandatory to either 
wait for the mapping operation to complete or to add it as 
dependency for your CS.
    If this isn't followed the application will run into CS 
faults (that's what we pretty much already implemented).

This makes sense.

2. When memory is freed you must unmap that memory first and then 
wait for this unmap operation to complete before freeing the memory.
    If this isn't followed the kernel will add a forcefully wait 
to the next CS to block until the unmap is completed.

This would work for now, but it won't work for user mode 
submission in the future. I find it weird that user mode needs to 
wait for the unmap. For user mode, unmap and free should always be 
asynchronous. I can't think of any good reasons to make user mode 
wait for the driver to clean up its stuff.

Could the waiting be done in kernel mode instead? TTM already does 
delayed freeing if there are fences outstanding on a BO being 
freed. This should make it easy to delay freeing until the unmap 
is done without blocking the user mode thread.

This is not about blocking, but synchronization dependencies.

Then I must have misunderstood this sentence: "When memory is freed 
you must unmap that memory first and then wait for this unmap 
operation to complete before freeing the memory." If the pronoun 
"you" is the user mode driver, it means user mode must wait for 
kernel mode to finish unmapping memory before freeing it. Was that 
not what you meant?

Ah, yes. The UMD must wait for the kernel to finish unmapping all the 
maps from the BO before it drops the handle of the BO and with that 
frees it.

In other words the free is not waiting for the unmap to complete, 
but causes command submissions through the kernel to depend on the 
unmap.

I guess I don't understand that dependency. The next command 
submission obviously cannot use the memory that was unmapped. But 
why does it need to synchronize with the unmap operation?

Because of the necessary TLB flush, only after that one is executed 
we can be sure that nobody has access to the memory any more and 
actually free it.

So freeing the memory has to wait for the TLB flush. Why does the next 
command submission need to wait?

Because that's the one triggering the TLB flush. The issue is that 
flushing the TLB while the VMID is in use is really unreliable on most 
hardware generations.

User mode submissions are completely unrelated to that.

I mention user mode command submission because there is no way to 
enforce the synchronization you describe here on a user mode queue. 
So this approach is not very future proof.

With user mode queues you need to wait for the work on the queue to 
finish anyway or otherwise you run into VM faults if you just unmap 
or free the memory.

If the next command submission doesn't use the unmapped/freed memory, 
why does it need to wait for the TLB flush?

Because it could potentially use it. When userspace lies to the kernel 
and still accesses the mapping we would allow access to freed up memory 
and create a major security problem.

If it is using the unmapped/freed memory, that's a user mode bug. But 
waiting for the TLB flush won't fix that. It will only turn a likely 
VM fault into a certain VM fault.

Yeah, exactly that's the intention here.

The guarantee you need to give is, that the memory is not freed and 
reused by anyone else until the TLB flush is done. This dependency 
requires synchronization of the "free" operation with the TLB flush. 
It does not require synchronization with any future command 
submissions in the context that freed the memory.

See above, the future command submission is what triggers the TLB flush 
because only then we can easily execute it without to much hassle.

For Vega and Navi 2x we could use async TLB flushes and on gfx6, gfx7 
and gfx8 we could use double TLB flushes with grace time, but Navi 1x is 
so horrible broken regarding this that I don't see how else we could do 
that.

Regards,
Christian.

Regards,
  Felix

The signal that TLB flush is completed comes from the MES in this case.

Regards,
Christian.

Regards,
  Felix

Regards,
Christian.

Regards,
  Felix

3. All VM operations requested by userspace will still be 
executed in order, e.g. we can't run unmap + map in parallel or 
something like this.

Is that something you guys can live with? As far as I can see it 
should give you the maximum freedom possible, but is still doable.

Regards,
Christian.