On 05/02/2020 14:01, Boris Brezillon wrote: > On Wed, 5 Feb 2020 13:39:21 +0000 > Steven Price <steven.price@xxxxxxx> wrote: > >> On 04/02/2020 14:35, Boris Brezillon wrote: >>> Jobs can be in-flight when the file descriptor is closed (either because >>> the process did not terminate properly, or because it didn't wait for >>> all GPU jobs to be finished), and apparently panfrost_job_close() does >>> not cancel already running jobs. Let's refcount the MMU context object >>> so it's lifetime is no longer bound to the FD lifetime and running jobs >>> can finish properly without generating spurious page faults. >> >> Is there any good reason not to just make panfrost_job_close() kill off >> any running jobs? > > Nope, I just didn't know how to do that without stopping all other jobs > (should have looked at how mali_kbase is doing that before posting this > patch :)). Yeah - this is an area Panfrost might need some improvement, you need to target the HARD_STOP carefully to ensure you kill of the correct jobs (and only the correct ones). It's not too difficult at the moment because we still don't have the _NEXT registers actively in use (I must get round to reposting that having fixed up the devfreq integration). >> I'm not sure what the benefit is of allowing the jobs >> to still run after the file descriptor has closed. > > None that I can think of. > >> >> In particular this could cause problems when(/if) Panfrost starts trying >> to deal with "compute" work loads that might have long runtimes. It's >> quite possible to produce a job which never (naturally) exits, currently >> we have a simplistic timeout which kills anything which doesn't complete >> promptly. However there is nothing conceptually wrong with a job which >> takes seconds (or even minutes) to complete. > > Absolutely. That was also one of my concerns. > >> The hardware has support >> for task switching ('soft stopping') between jobs so this can be done to >> prevent blocking other applications. > > Okay. I guess it's implemented in mali_kbase. I'll have a look. > >> >> If panfrost_job_close() doesn't kill the jobs then removing the timeouts >> could lead to the situation where there is an 'infinite' job with no >> owner and no way of killing it off. Which doesn't seem like a great >> feature ;) > > Didn't know you were planning to remove the timeouts. Well I don't have any immediate plans, but when(/if) there's interest in compute the relatively short timeouts for graphics tend to get in the way. I'm not in a hurry to do it because it will make the scheduling more complex :) >> >> Another approach could be simply to silence the page fault output in >> this case - switching the address space to UNMAPPED is actually an >> effective way of killing jobs - at some point I think this was a >> workaround to a hardware bug, but IIRC that was unreleased hardware :) > > Okay. I'll check how it's done in mali_kbase. > > Thanks for the feedback. Let me know if you need any pointers about how mali_kbase does it - the code is not exactly the prettiest ;) Steve _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel