Re: [PATCH v3 2/2] drm/i915: add syncobj timeline support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/08/2019 11:08, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-08-01 08:43:24)
On 31/07/2019 23:03, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-07-31 15:07:33)
I think I have convinced myself that with the split between wait before,
signal after combined with the rule that seqno point along the syncobj
are monotonic, you should not be able to generate an AB-BA deadlock
between concurrent clients.

Except that we need to fixup drm_syncobj_add_point() to actually apply
that rule. Throwing an error and leaving the client stuck is less than
ideal, we can't even recover with a GPU reset, and as they are native
fences we don't employ a failsafe timer.

Unfortunately we're not the only user of add_point() and amdgpu doesn't
want it to fail.
It's dangerously exposing the deadlock from incorrect seqno ordering to
userspace. We should be able to hit that DRM_ERROR() in testing, and be
able to detect the out-of-sequence timeline.

Best I can come up with is take the syncobj lock when signaling in
get_timeline_fence_array() do the check there, unlock in __free_fence_array.
I think the intermediate step is a "safe" version of add_point that
doesn't leave the timeline broken. That still leaves a glitch visible to
userspace, but it should not deadlock.


Right, sounds doable. What happens in execbuf after add_point() fails?

We've already handed the request to the scheduler and potentially it might be running already.



The way I was looking at it was to reserve a placeholder syncpt before
building the request and allow replacing the placeholder afterwards.


That sounds fairly tricky to get right :(

The fence stuff is already full of magic.



If we abort the submission part way, we leave the placeholder in the
timeline to be replaced later, or subsumed by a later syncpt.

Doesn't WAIT_FOR_SUBMIT throw a spanner in the works and allow for
deadlocks?

Hm... I can't see how.

Unless you mean clients could deadlock them themselves the same way it
would using 2 pthread_mutex and having 2 threads trying to lock both
mutexes in opposite order.
Yes.
Is it that the kernel's problem?
It becomes a problem for us as it ties up kernel resources that we can
not reap currently. Userspace is not meant to be able to break the
kernel on a whim.  Even futexes are robust. ;)
-Chris


Thanks,


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux