Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

Christian König <christian.koenig@xxxxxxx> · Fri, 14 Jan 2022 18:26:14 +0100

Am 14.01.22 um 17:44 schrieb Daniel Vetter:
Top post because I tried to catch up on the entire discussion here.

So fundamentally I'm not opposed to just close this fork() hole once and
for all. The thing that worries me from a upstream/platform pov is really
only if we don't do it consistently across all drivers.

So maybe as an idea:
- Do the original patch, but not just for ttm but all gem rendernode
   drivers at least (or maybe even all gem drivers, no idea), with the
   below discussion cleaned up as justification.

I know of at least one use case which this will break.

A couple of years back we had a discussion on the Mesa mailing list 
because (IIRC) Marek introduced a background thread to push command 
submissions to the kernel.

That broke because some compositor used to initialize OpenGL and then do 
a fork(). This indeed worked previously (no GPUVM at that time), but 
with the addition of the backround thread obviously broke.

The conclusion back then was that the compositor is broken and needs 
fixing, but it still essentially means that there could be people out 
there with really old userspace where this setting would just break the 
desktop.

I'm not really against that change either, but at least in theory we 
could make fork() work perfectly fine even with VMs and background threads.

Regards,
Christian.

- Get acks from userspace driver folks who know real-world gl/vk usage and
   khr specs in-depth enough to be able to tell us how much we'll regret
   this.

- Merge it. Or come up with something new. Or maybe stick to the Nack, but
   then maybe it would be good to document that somewhere in-tree?

This entire can of worms just feels way too tricky to have it be handled
inconsistently across drivers. And trying to fix these kind of low-level
things across drivers once divergences exists is just really painful (e.g.
trying to make all dma-buf mmap VM_SPECIAL or the herculeian task
Christian is doing to get our dma_resv rules consistent across drivers).

Cheers, Daniel

On Fri, Jan 07, 2022 at 12:47:45PM -0500, Felix Kuehling wrote:
Am 2022-01-07 um 3:56 a.m. schrieb Christian König:
Am 06.01.22 um 17:51 schrieb Felix Kuehling:
Am 2022-01-06 um 11:48 a.m. schrieb Christian König:
Am 06.01.22 um 17:45 schrieb Felix Kuehling:
Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
[SNIP]
Also, why does your ACK or NAK depend on this at all. If it's the
right
thing to do, it's the right thing to do regardless of who benefits
from
it. In addition, how can a child process that doesn't even use the GPU
be in violation of any GPU-driver related specifications.
The argument is that the application is broken and needs to be fixed
instead of worked around inside the kernel.
I still don't get how they the application is broken. Like I said, the
child process is not using the GPU. How can the application be fixed in
this case?
Sounds like I'm still missing some important puzzle pieces for the
full picture to figure out why this doesn't work.

Are you saying, any application that forks and doesn't immediately call
exec is broken?
More or less yes. We essentially have three possible cases here:

1. Application is already using (for example) OpenGL or OpenCL to do
some rendering on the GPU and then calls fork() and expects to use
OpenGL both in the parent and the child at the same time.
     As far as I know that's illegal from the Khronos specification
point of view and working around inside the kernel for something not
allowed in the first place is seen as futile effort.

2. Application opened the file descriptor, forks and then initializes
OpenGL/Vulkan/OpenCL.
     That's what some compositors are doing to drop into the backround
and is explicitly legal.

3. Application calls fork() and then doesn't use the GPU in the child.
Either by not using it or calling exec.
     That should be legal and not cause any problems in the first place.

But from your description I still don't get why we are running into
problems here.

I was assuming that you have case #1 because we previously had some
examples of this with this python library, but from your description
it seems to be case #3.
Correct. #3 has at least one issue we previously worked around in the
Thunk: The inherited VMAs prevent BOs from being freed in the parent
process. This manifests as an apparent memory leak. Therefore the Thunk
calls madvise(..., MADV_DONTFORK) on all its VRAM allocation. The BOs
that are causing problems with CRIU are GTT BOs that weren't covered by
this previous workaround.

The new issue with CRIU is, that CRIU saves and restores all the VMAs.
When trying to restore the inherited VMAs in the child process, the mmap
call fails because the child process' render node FD is no longer
inherited from the parent, but is instead created by its own "open"
system call. The mmap call in the child fails for at least two reasons:

   * The child process' render node FD doesn't have permission to map the
     parent process' BO any more
   * The parent BO doesn't get restored in the child process, so its mmap
     offset doesn't get updated to the new mmap offset of the restored
     parent BO by the amdgpu CRIU plugin

We could maybe add a whole bunch of complexity in CRIU and our CRIU
plugin to fix this. But it's pointless because like you said, actually
doing anything with the BO in the child process is illegal anyway
(scenario #1 above). The easiest solution seems to be, to just not
inherit the VMA in the first place.

Regards,
   Felix

Or does an application that forks need to be aware that some other part
of the application used the GPU and explicitly free any GPU resources?
Others might fill that information in, but I think that was the plan
for newer APIs like Vulkan.

Regards,
Christian.

Thanks,
    Felix

Regards,
Christian.

Regards,
     Felix

Let's talk about this on Mondays call. Thanks for giving the whole
context.

Regards,
Christian.

Regards,
      Felix