On Sun, Oct 16, 2016 at 2:54 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > On Sun, Oct 16, 2016 at 02:29:51PM -0400, Rob Clark wrote: >> On Sun, Oct 16, 2016 at 1:49 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: >> > On Sun, Oct 16, 2016 at 12:39:35PM -0400, Rob Clark wrote: >> >> This is the alternative approach for solving a deadlock situation with >> >> array-fences. >> >> >> >> Currently with fence-array, we have a potential deadlock situation. If we >> >> fence_add_callback() on an array-fence, the array-fence's lock is aquired >> >> first, and in it's ->enable_signaling() callback, it will install cb's on >> >> it's array-member fences, so the array-member's lock is acquired second. >> >> >> >> But in the signal path, the array-member's lock is acquired first, and the >> >> array-fence's lock acquired second. >> > >> > I'm feeling this is the better approach, it puts the complexity inside >> > the trickster. However, I think it is better to move the wq to >> > enable-signaling rather than the fence_signal() cb. (Latency at the >> > start of the wait will hopefully be absorbed by the wait, but we cannot >> > hide latency at the end). >> >> my spidey sense is telling me that would race with the fence getting signalled.. > > Which we don't care about, since we haven't yet added ourselves to the > cb_list of the fence so haven't yet adjusted the fence-array's pending > count and so haven't yet signaled the fence array. i.e. we can tolerate > enabling signaling on the fence array after all of its children are > signaled. Imo, better to have that latency upfront. btw, another fun one, and I think another argument for doing the wq on the callback side so we can avoid the final fence_put() from the callback: --------- ============================================= [ INFO: possible recursive locking detected ] 4.7.0-rc7+ #548 Not tainted --------------------------------------------- surfaceflinger/2038 is trying to acquire lock: (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085908a8>] timeline_fence_release+0x38/0xa8 but task is already holding lock:[ 168.064283] (&(&obj->child_list_lock)->rlock){......}, at: [<ffff000008590b40>] sw_sync_ioctl+0x228/0x3b0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&obj->child_list_lock)->rlock); lock(&(&obj->child_list_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by surfaceflinger/2038: #0: (&(&obj->child_list_lock)->rlock){......}, at: [<ffff000008590b40>] sw_sync_ioctl+0x228/0x3b0 stack backtrace: CPU: 3 PID: 2038 Comm: surfaceflinger Not tainted 4.7.0-rc7+ #548 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) Call trace: [<ffff000008088be8>] dump_backtrace+0x0/0x1a8 [<ffff000008088da4>] show_stack+0x14/0x20 [<ffff0000083b8234>] dump_stack+0xb4/0xf0 [<ffff0000081080f4>] __lock_acquire+0xf0c/0x18d8 [<ffff000008108e0c>] lock_acquire+0x4c/0x68 [<ffff000008ac72b4>] _raw_spin_lock_irqsave+0x54/0x70 [<ffff0000085908a8>] timeline_fence_release+0x38/0xa8 [<ffff00000858d758>] fence_release+0x28/0x50 [<ffff00000858f878>] fence_array_release+0x88/0xd0 [<ffff00000858d758>] fence_release+0x28/0x50 [<ffff00000858fa94>] fence_array_cb_func+0x64/0x88 [<ffff00000858d31c>] fence_signal_locked+0x8c/0x100 [<ffff000008590c10>] sw_sync_ioctl+0x2f8/0x3b0 [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790 [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0 [<ffff000008084f30>] el0_svc_naked+0x24/0x28 --------- _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel