Re: [PATCH RFC 05/24] Revert "drm: Nerf the preclose callback for modern drivers"

Qiang Yu <yuq825@xxxxxxxxx> · Thu, 24 May 2018 20:54:28 +0800

On Thu, May 24, 2018 at 5:41 PM, Christian König
<christian.koenig@xxxxxxx> wrote:
> Am 24.05.2018 um 11:24 schrieb Qiang Yu:
>>
>> On Thu, May 24, 2018 at 2:46 PM, Christian König
>> <christian.koenig@xxxxxxx> wrote:
>> [SNIP]
>>>
>>> Because of this we have a separate tracking in amdgpu so that we not only
>>> know who is using which BO, who is using which VM.
>>
>> amdgpu's VM implementation seems too complicated for this simple mali GPU,
>> but I may investigate it more to see if I can make it better.
>
>
> Yeah, completely agree.
>
> The VM handling in amdgpu is really complicated because we had to tune it
> for multiple use cases. E.g. partial resident textures, delayed updates etc
> etc....
>
> But you should at least be able to take the lessons learned we had with that
> VM code and not make the same mistakes again.
>
>>> We intentionally removed the preclose callback to prevent certain use
>>> cases,
>>> bringing it back to allow your use case looks rather fishy to me.
>>
>> Seems other drivers do either the deffer or wait way to adopt the drop
>> of preclose. I can do the same as you suggested, but just not understand
>> why
>> we make our life harder. Can I know what's the case you want to prevent?
>
>
> I think what matters most for your case is the issue is that drivers should
> handle closing a BO because userspace said so in the same way it handles
> closing a BO because of a process termination, but see below.
>
>>> BTW: What exactly is the issue with using the postclose callback?
>>
>> The issue is, when Ctrl+C to terminate an application, if no wait or
>> deffer
>> unmap, buffer just gets unmapped before task is done, so kernel driver
>> gets MMU fault and HW reset to recover the GPU.
>
>
> Yeah, that sounds like exactly one of the reasons we had the callback in the
> first place and worked on to removing it.
>
> See the intention is to have reliable handling, e.g. use the same code path
> for closing a BO because of an IOCTL and closing a BO because of process
> termination.
>
> In other words what happens when userspace closes a BO while the GPU is
> still using it? Would you then run into a GPU reset as well?

Yes, also a MMU fault and GPU reset when user space driver error usage like
this. I think I don't need to avoid this case because it's user error
usage which deserve a GPU reset, but process termination is not. But you
remind me they indeed share the same code path if remove preclose now.

Regards,
Qiang

>
> I mean it's your driver stack, so I'm not against it as long as you can live
> with it. But it's exactly the thing we wanted to avoid here.

Seems

>
> Regards,
> Christian.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html