Re: [PATCH RFC 05/24] Revert "drm: Nerf the preclose callback for modern drivers"

Qiang Yu <yuq825@xxxxxxxxx> · Thu, 24 May 2018 16:55:17 +0800

On Thu, May 24, 2018 at 3:51 PM, Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Thu, May 24, 2018 at 09:18:04AM +0800, Qiang Yu wrote:
>> On Thu, May 24, 2018 at 4:31 AM, Daniel Vetter <daniel@xxxxxxxx> wrote:
>> > On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@xxxxxxxxx> wrote:
>> >> On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@xxxxxxxx> wrote:
>> >>> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote:
>> >>>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@xxxxxxxxxx> wrote:
>> >>>> > Qiang Yu <yuq825@xxxxxxxxx> writes:
>> >>>> >
>> >>>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b.
>> >>>> >>
>> >>>> >> lima driver need preclose to wait all task in the context
>> >>>> >> created within closing file to finish before free all the
>> >>>> >> buffer object. Otherwise pending tesk may fail and get
>> >>>> >> noisy MMU fault message.
>> >>>> >>
>> >>>> >> Move this wait to each buffer object free function can
>> >>>> >> achieve the same result but some buffer object is shared
>> >>>> >> with other file context, but we only want to wait the
>> >>>> >> closing file context's tasks. So the implementation is
>> >>>> >> not that straight forword compared to the preclose one.
>> >>>> >
>> >>>> > You should just separate your MMU structures from drm_file, and have
>> >>>> > drm_file and the jobs using it keep a reference on them.  This is what
>> >>>> > I've done in V3D as well.
>> >>>>
>> >>>> It's not the VM/MMU struct that causes this problem, it's each buffer
>> >>>> object that gets freed before task is done (postclose is after buffer free).
>> >>>> If you mean I should keep reference of all buffers for tasks, that's not
>> >>>> as simple as just waiting task done before free buffers.
>> >>>
>> >>> Why can't you do that waiting in the postclose hook? If it's the lack of
>> >>> reference-counting in your driver for gem bo, then I'd say you need to
>> >>> roll out some reference counting. Relying on the implicit reference
>> >>> provided by the core is kinda not so great (which was the reason I've
>> >>> thrown out the preclose hook). There's also per-bo open/close hooks.
>> >>
>> >> It's possible to not use preclose, but the implementation is not as simple
>> >> and straight forward as the preclose I think. There're two method I can
>> >> think of:
>> >> 1. do wait when free buffers callback unmap buffer from this process's
>> >>     lima VM (wait buffer reservation object), this is fine and simple, but
>> >>     there's case that this buffer is shared between two processes, so the
>> >>     best way should be only waiting fences from this process, so we'd
>> >>     better do some record for fences for a "perfect waiting"
>> >> 2. keep a reference of involved buffers for a task, unreference it when
>> >>     task done, also keep a reference of the buffer mapping in this process's
>> >>     lima VM (this is more complicated to implement)
>> >>
>> >> But if there's a preclose, just wait all this process's task done, then
>> >> unmap/free buffers, it's simple and straight forward. I'd like to hear if
>> >> there's other better way for only use postclose.
>> >
>> > Refcount your buffers. Borrowing references from other places tends to
>> > result in a maintenance headache with no end. So solution 2.
>>
>> In current lima implementation, refcount involved buffer for task is done
>> in user space. So kernel's task object don't keep that. User space
>> driver is responsible not unmap/free buffer before task is complete. This
>> works simple and fine except the case that user press Ctrl+C to terminate
>> the application which will force to close drm fd. I really don't think adding
>> buffer refcount for tasks in kernel just for this case is valuable because
>> it has no benefits for normal case but some extra load.
>
> If kernel correctness relies on refcounting you have a giantic security
> problem. You need to fix that. Kernel _must_ assume that userspace is
> evil, trying to pull it over the table.

It is OK if evil user free/unmap the buffer when task is not done
in my implementation. It will generate a MMU fault in that case and kernel
driver will do recovery.

So does the Ctrl+C case, if don't deal with it, just get some noisy MMU
fault warning and a HW reset recovery.

Regards,
Qiang

>
> Yes, you need refcounting.
> -Daniel
>>
>> Regards,
>> Qiang
>>
>> > -Daniel
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel