Op 16-10-16 om 18:03 schreef Rob Clark: > Currently with fence-array, we have a potential deadlock situation. If we > fence_add_callback() on an array-fence, the array-fence's lock is aquired > first, and in it's ->enable_signaling() callback, it will install cb's on > it's array-member fences, so the array-member's lock is acquired second. > > But in the signal path, the array-member's lock is acquired first, and the > array-fence's lock acquired second. > > One approach to deal with this is avoid holding the fence's lock when > calling the cb. It is a bit intrusive and I haven't fixed up all the > other drivers that call directly or indirectly fence_signal_locked(). > > I guess the other option would be introduce a work-queue for array-fence? > Or?? Enable signaling when creating the fence array is an option. As an optimization we don't enable signaling when creating a single fence, but when you merge fences you're probably interested in the result anyway. The real problem is fence_add_callback in .enable_signaling, not the signaling itself. > lockdep splat: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 4.7.0-rc7+ #489 Not tainted > ------------------------------------------------------- > surfaceflinger/2034 is trying to acquire lock: > (&(&array->lock)->rlock){......}, at: [<ffff00000858cddc>] fence_signal+0x5c/0xf8 > > but task is already holding lock: > (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (&(&obj->child_list_lock)->rlock){......}: > [<ffff000008108924>] __lock_acquire+0x173c/0x18d8 > [<ffff000008108e0c>] lock_acquire+0x4c/0x68 > [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 > [<ffff00000858d05c>] fence_add_callback+0x3c/0x100 > [<ffff00000858f100>] fence_array_enable_signaling+0x80/0x170 > [<ffff00000858d0d8>] fence_add_callback+0xb8/0x100 > [<ffff00000858f504>] sync_file_poll+0xd4/0xf0 > [<ffff0000081fd3a0>] do_sys_poll+0x220/0x438 > [<ffff0000081fd8d0>] SyS_ppoll+0x1b0/0x1d8 > [<ffff000008084f30>] el0_svc_naked+0x24/0x28 > > -> #0 (&(&array->lock)->rlock){......}: > [<ffff000008104768>] print_circular_bug+0x80/0x2e0 > [<ffff0000081089ac>] __lock_acquire+0x17c4/0x18d8 > [<ffff000008108e0c>] lock_acquire+0x4c/0x68 > [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 > [<ffff00000858cddc>] fence_signal+0x5c/0xf8 > [<ffff00000858f268>] fence_array_cb_func+0x78/0x88 > [<ffff00000858cb28>] fence_signal_locked+0x80/0xe0 > [<ffff0000085903c8>] sw_sync_ioctl+0x2f8/0x3b0 > [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790 > [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0 > [<ffff000008084f30>] el0_svc_naked+0x24/0x28 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&(&obj->child_list_lock)->rlock); > lock(&(&array->lock)->rlock); > lock(&(&obj->child_list_lock)->rlock); > lock(&(&array->lock)->rlock); > > *** DEADLOCK *** > > 1 lock held by surfaceflinger/2034: > #0: (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] sw_sync_ioctl+0x228/0x3b0 > --- > drivers/dma-buf/fence.c | 22 ++++++++++++++-------- > drivers/dma-buf/sw_sync.c | 2 +- > drivers/dma-buf/sync_debug.c | 16 +++++++++------- > include/linux/fence.h | 6 +++--- > 4 files changed, 27 insertions(+), 19 deletions(-) > > diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c > index cc05ddd..917f905 100644 > --- a/drivers/dma-buf/fence.c > +++ b/drivers/dma-buf/fence.c > @@ -63,9 +63,9 @@ EXPORT_SYMBOL(fence_context_alloc); > * > * Unlike fence_signal, this function must be called with fence->lock held. > */ > -int fence_signal_locked(struct fence *fence) > +int fence_signal_locked(struct fence *fence, unsigned long flags) > { > - struct fence_cb *cur, *tmp; > + struct fence_cb *cur; > int ret = 0; > > lockdep_assert_held(fence->lock); > @@ -88,9 +88,12 @@ int fence_signal_locked(struct fence *fence) > } else > trace_fence_signaled(fence); > > - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) { > + while (!list_empty(&fence->cb_list)) { > + cur = list_first_entry(&fence->cb_list, struct fence_cb, node); > list_del_init(&cur->node); > + spin_unlock_irqrestore(fence->lock, flags); > cur->func(fence, cur); > + spin_lock_irqsave(fence->lock, flags); Dropping the lock in this function is a terrible idea. The fence code assumes that signaling is atomic with setting the signaled bit. This could probably introduce some race conditions, like when this function is called inside a driver's interrupt handler which requires the lock to update other bookkeeping. ~Maarten _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel