On Wed, 5 Feb 2020 13:39:21 +0000 Steven Price <steven.price@xxxxxxx> wrote: > On 04/02/2020 14:35, Boris Brezillon wrote: > > Jobs can be in-flight when the file descriptor is closed (either because > > the process did not terminate properly, or because it didn't wait for > > all GPU jobs to be finished), and apparently panfrost_job_close() does > > not cancel already running jobs. Let's refcount the MMU context object > > so it's lifetime is no longer bound to the FD lifetime and running jobs > > can finish properly without generating spurious page faults. > > Is there any good reason not to just make panfrost_job_close() kill off > any running jobs? Nope, I just didn't know how to do that without stopping all other jobs (should have looked at how mali_kbase is doing that before posting this patch :)). > I'm not sure what the benefit is of allowing the jobs > to still run after the file descriptor has closed. None that I can think of. > > In particular this could cause problems when(/if) Panfrost starts trying > to deal with "compute" work loads that might have long runtimes. It's > quite possible to produce a job which never (naturally) exits, currently > we have a simplistic timeout which kills anything which doesn't complete > promptly. However there is nothing conceptually wrong with a job which > takes seconds (or even minutes) to complete. Absolutely. That was also one of my concerns. > The hardware has support > for task switching ('soft stopping') between jobs so this can be done to > prevent blocking other applications. Okay. I guess it's implemented in mali_kbase. I'll have a look. > > If panfrost_job_close() doesn't kill the jobs then removing the timeouts > could lead to the situation where there is an 'infinite' job with no > owner and no way of killing it off. Which doesn't seem like a great > feature ;) Didn't know you were planning to remove the timeouts. > > Another approach could be simply to silence the page fault output in > this case - switching the address space to UNMAPPED is actually an > effective way of killing jobs - at some point I think this was a > workaround to a hardware bug, but IIRC that was unreleased hardware :) Okay. I'll check how it's done in mali_kbase. Thanks for the feedback. Boris