Am 06.01.22 um 17:45 schrieb Felix Kuehling:
Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
Am 05.01.22 um 17:16 schrieb Felix Kuehling:
[SNIP]
But KFD doesn't know anything about the inherited BOs
from the parent process.
Ok, why that? When the KFD is reinitializing it's context why
shouldn't it cleanup those VMAs?
That cleanup has to be initiated by user mode. Basically closing the old
KFD and DRM file descriptors, cleaning up all the user mode VM state,
unmapping all the VMAs, etc. Then it reopens KFD and the render nodes
and starts from scratch.
User mode will do this automatically when it tries to reinitialize ROCm.
However, in this case the child process doesn't do that (e.g. a python
application using the multi-processing package). The child process does
not use ROCm. But you're left with all the dangling VMAs in the child
process indefinitely.
Oh, not that one again. I'm unfortunately pretty sure that this is an
clear NAK then.
This python multi-processing package is violating various
specifications by doing this fork() and we already had multiple
discussions about that.
Well, it's in wide-spread use. We can't just throw up our hands and say
they're buggy and not supported.
Because that's not my NAK, but rather from upstream.
Also, why does your ACK or NAK depend on this at all. If it's the right
thing to do, it's the right thing to do regardless of who benefits from
it. In addition, how can a child process that doesn't even use the GPU
be in violation of any GPU-driver related specifications.
The argument is that the application is broken and needs to be fixed
instead of worked around inside the kernel.
Regards,
Christian.
Regards,
Felix
Let's talk about this on Mondays call. Thanks for giving the whole
context.
Regards,
Christian.
Regards,
Felix