On Thu, Oct 13, 2016 at 04:04:05PM -0400, Rob Clark wrote: > We were holding the wrong lock to be using fence_is_signaled_locked(). > And holding the child_list_lock over something that could end up calling > fence cb's angers lockdep: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 4.7.0-rc7+ #489 Not tainted > ------------------------------------------------------- > surfaceflinger/2034 is trying to acquire lock: > (&(&array->lock)->rlock){......}, at: [<ffff00000858cddc>] fence_signal+0x5c/0xf8 > > but task is already holding lock: > (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (&(&obj->child_list_lock)->rlock){......}: > [<ffff000008108924>] __lock_acquire+0x173c/0x18d8 > [<ffff000008108e0c>] lock_acquire+0x4c/0x68 > [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 > [<ffff00000858d05c>] fence_add_callback+0x3c/0x100 > [<ffff00000858f100>] fence_array_enable_signaling+0x80/0x170 > [<ffff00000858d0d8>] fence_add_callback+0xb8/0x100 > [<ffff00000858f504>] sync_file_poll+0xd4/0xf0 > [<ffff0000081fd3a0>] do_sys_poll+0x220/0x438 > [<ffff0000081fd8d0>] SyS_ppoll+0x1b0/0x1d8 > [<ffff000008084f30>] el0_svc_naked+0x24/0x28 > > -> #0 (&(&array->lock)->rlock){......}: > [<ffff000008104768>] print_circular_bug+0x80/0x2e0 > [<ffff0000081089ac>] __lock_acquire+0x17c4/0x18d8 > [<ffff000008108e0c>] lock_acquire+0x4c/0x68 > [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 > [<ffff00000858cddc>] fence_signal+0x5c/0xf8 > [<ffff00000858f268>] fence_array_cb_func+0x78/0x88 > [<ffff00000858cb28>] fence_signal_locked+0x80/0xe0 > [<ffff0000085903c8>] sw_sync_ioctl+0x2f8/0x3b0 > [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790 > [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0 > [<ffff000008084f30>] el0_svc_naked+0x24/0x28 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&(&obj->child_list_lock)->rlock); > lock(&(&array->lock)->rlock); > lock(&(&obj->child_list_lock)->rlock); > lock(&(&array->lock)->rlock); > > *** DEADLOCK *** > > 1 lock held by surfaceflinger/2034: > #0: (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0 > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxx> > --- > The fence_get()/_put() might be overkill.. wasn't sure if there was > any path where the ref could get dropped while child_list_lock was > released.. > > drivers/dma-buf/sw_sync.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index 62e8e6d..3bf8f5c 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -146,8 +146,17 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > list_for_each_entry_safe(pt, next, &obj->active_list_head, > active_list) { > - if (fence_is_signaled_locked(&pt->base)) > + struct fence *fence = fence_get(&pt->base); > + bool signaled; > + > + spin_unlock_irqrestore(&obj->child_list_lock, flags); Hmm. The obj->child_list_lock and fence->lock are one and the same ( fence->lock = &obj->child_list_lock). The problem lockdep is complaining about appears to nesting of identical lockclasses. (The current fence_cb design allows for unbounded recursion from the signal callbacks.) Aiui, this patch shouldn't be fixing anything as the fence_signal is still fired from under the very same obj->child_list_lock. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel