Re: Explicit VM updates

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Thu, 2 Jun 2022 20:52:17 +0200

Am 02.06.22 um 16:21 schrieb Felix Kuehling:
[SNIP]
In other words the free is not waiting for the unmap to complete, 
but causes command submissions through the kernel to depend on 
the unmap.
I guess I don't understand that dependency. The next command 
submission obviously cannot use the memory that was unmapped. But 
why does it need to synchronize with the unmap operation?
Because of the necessary TLB flush, only after that one is executed 
we can be sure that nobody has access to the memory any more and 
actually free it.
So freeing the memory has to wait for the TLB flush. Why does the 
next command submission need to wait?
Because that's the one triggering the TLB flush. The issue is that 
flushing the TLB while the VMID is in use is really unreliable on 
most hardware generations.
It's been working well enough with ROCm. With user mode command 
submission there is no way to block GPU work while a TLB flush is in 
progress.
Yeah, but at least on Navi 1x that's so horrible broken that the SDMA 
could write anywhere when we would try this.
User mode submissions are completely unrelated to that.
I mention user mode command submission because there is no way to 
enforce the synchronization you describe here on a user mode 
queue. So this approach is not very future proof.
With user mode queues you need to wait for the work on the queue to 
finish anyway or otherwise you run into VM faults if you just unmap 
or free the memory.
If the next command submission doesn't use the unmapped/freed 
memory, why does it need to wait for the TLB flush?
Because it could potentially use it. When userspace lies to the 
kernel and still accesses the mapping we would allow access to freed 
up memory and create a major security problem.
I'm aware of the potential security problem. That's why I'm 
recommending you don't actually free the memory until the TLB flush is 
done. So a bogus access will either harmlessly access memory that's 
not freed yet, or it will VM fault. It will never access memory that's 
already freed and potentially allocated by someone else.
Yes, that's the idea. The question is just when we can do the TLB flush.

If it is using the unmapped/freed memory, that's a user mode bug. 
But waiting for the TLB flush won't fix that. It will only turn a 
likely VM fault into a certain VM fault.
Yeah, exactly that's the intention here.

The guarantee you need to give is, that the memory is not freed and 
reused by anyone else until the TLB flush is done. This dependency 
requires synchronization of the "free" operation with the TLB flush. 
It does not require synchronization with any future command 
submissions in the context that freed the memory.
See above, the future command submission is what triggers the TLB 
flush because only then we can easily execute it without to much hassle.
That seems to be a limitation of your current command submission 
model. User mode command submission will not be able to trigger a TLB 
flush. Unmapping or freeing memory should be the trigger in that case.
That's how it works with KFD. That said, our TLB flushes aren't as 
well pipelined (which could probably be improved), and your strategy 
can probably batch more TLB flushes, so I see where you're coming from.
Well the mapping/unmapping IOCTL should certainly trigger the TLB 
flushes for the user mode queues, but as I said this is completely 
independent to this here.
The limitation is on the kernel CS IOCTL, not the VM IOCTL. So that is 
completely unrelated to this.
For Vega and Navi 2x we could use async TLB flushes and on gfx6, gfx7 
and gfx8 we could use double TLB flushes with grace time, but Navi 1x 
is so horrible broken regarding this that I don't see how else we 
could do that.
We're using heavy-weight TLB flushes on SOC15 GPUs. On Vega20 with 
XGMI we need double flushes to be safe.
I'm raising my concerns because I don't think making user mode wait is 
a good strategy long-term. And I believe this explicit sync and 
explicit VM update should be designed with an eye for future user-mode 
command submission models.
Yeah, but as already discussed with Daniel and Jason that will never 
ever work correctly. IOCTLs can't depend on user mode queues in any way. 
So user space can only block or rather call the map, unmap, free 
functions at the right time.
If you need short-term workarounds for broken hardware, that's another 
issue. But it would be good if that could be kept out of the API.
Well as I said that is completely unrelated to user mode queues. The 
restriction is on the CS API, not the VM API.
Regards,
Christian.

Regards,
  Felix

Regards,
Christian.

Regards,
  Felix

The signal that TLB flush is completed comes from the MES in this 
case.
Regards,
Christian.

Regards,
  Felix

Regards,
Christian.

Regards,
  Felix

3. All VM operations requested by userspace will still be 
executed in order, e.g. we can't run unmap + map in parallel or 
something like this.
Is that something you guys can live with? As far as I can see 
it should give you the maximum freedom possible, but is still 
doable.
Regards,
Christian.