Re: [PATCH 1/2] drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv

Steven Price <steven.price@xxxxxxx> · Wed, 5 Feb 2020 14:08:29 +0000

On 05/02/2020 14:01, Boris Brezillon wrote:
> On Wed, 5 Feb 2020 13:39:21 +0000
> Steven Price <steven.price@xxxxxxx> wrote:
> 
>> On 04/02/2020 14:35, Boris Brezillon wrote:
>>> Jobs can be in-flight when the file descriptor is closed (either because
>>> the process did not terminate properly, or because it didn't wait for
>>> all GPU jobs to be finished), and apparently panfrost_job_close() does
>>> not cancel already running jobs. Let's refcount the MMU context object
>>> so it's lifetime is no longer bound to the FD lifetime and running jobs
>>> can finish properly without generating spurious page faults.  
>>
>> Is there any good reason not to just make panfrost_job_close() kill off
>> any running jobs?
> 
> Nope, I just didn't know how to do that without stopping all other jobs
> (should have looked at how mali_kbase is doing that before posting this
> patch :)).

Yeah - this is an area Panfrost might need some improvement, you need to
target the HARD_STOP carefully to ensure you kill of the correct jobs
(and only the correct ones). It's not too difficult at the moment
because we still don't have the _NEXT registers actively in use (I must
get round to reposting that having fixed up the devfreq integration).

>> I'm not sure what the benefit is of allowing the jobs
>> to still run after the file descriptor has closed.
> 
> None that I can think of.
> 
>>
>> In particular this could cause problems when(/if) Panfrost starts trying
>> to deal with "compute" work loads that might have long runtimes. It's
>> quite possible to produce a job which never (naturally) exits, currently
>> we have a simplistic timeout which kills anything which doesn't complete
>> promptly. However there is nothing conceptually wrong with a job which
>> takes seconds (or even minutes) to complete.
> 
> Absolutely. That was also one of my concerns.
> 
>> The hardware has support
>> for task switching ('soft stopping') between jobs so this can be done to
>> prevent blocking other applications.
> 
> Okay. I guess it's implemented in mali_kbase. I'll have a look.
> 
>>
>> If panfrost_job_close() doesn't kill the jobs then removing the timeouts
>> could lead to the situation where there is an 'infinite' job with no
>> owner and no way of killing it off. Which doesn't seem like a great
>> feature ;)
> 
> Didn't know you were planning to remove the timeouts.

Well I don't have any immediate plans, but when(/if) there's interest in
compute the relatively short timeouts for graphics tend to get in the
way. I'm not in a hurry to do it because it will make the scheduling
more complex :)

>>
>> Another approach could be simply to silence the page fault output in
>> this case - switching the address space to UNMAPPED is actually an
>> effective way of killing jobs - at some point I think this was a
>> workaround to a hardware bug, but IIRC that was unreleased hardware :)
> 
> Okay. I'll check how it's done in mali_kbase.
> 
> Thanks for the feedback.

Let me know if you need any pointers about how mali_kbase does it - the
code is not exactly the prettiest ;)

Steve
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel