On Mon, 21 Jun 2021 15:29:55 +0100 Steven Price <steven.price@xxxxxxx> wrote: > On 21/06/2021 14:57, Alyssa Rosenzweig wrote: > >> Jobs can be in-flight when the file descriptor is closed (either because > >> the process did not terminate properly, or because it didn't wait for > >> all GPU jobs to be finished), and apparently panfrost_job_close() does > >> not cancel already running jobs. Let's refcount the MMU context object > >> so it's lifetime is no longer bound to the FD lifetime and running jobs > >> can finish properly without generating spurious page faults. > > > > Remind me - why can't we hard stop in-flight jobs when the fd is closed? > > I've seen cases where kill -9'ing a badly behaved process doesn't end > > the fault storm, or unfreeze the desktop. > > > > Hard-stopping the in-flight jobs would also make sense. But unless we > want to actually hang the close() then there will be a period between > issuing the hard-stop and actually having completed all jobs in the context. Patch 10 is doing that, I just didn't want to backport all the dependencies, so I kept it split in 2 halves: one patch fixing the use-after-free bug, and the other part killing in-flight jobs. > > But equally to be fair I've been cherry-picking this patch myself for > quite some time, so we should just merge it and improve from there. So > you can have my: > > Reviewed-by: Steven Price <steven.price@xxxxxxx>