On Tue, Oct 18, 2016 at 11:27 AM, Christian König <deathsimple@xxxxxxxxxxx> wrote: > Am 18.10.2016 um 16:23 schrieb Rob Clark: >> >> On Tue, Oct 18, 2016 at 6:07 AM, Christian König >> <deathsimple@xxxxxxxxxxx> wrote: >>> >>> Am 16.10.2016 um 18:03 schrieb Rob Clark: >>>> >>>> Currently with fence-array, we have a potential deadlock situation. If >>>> we >>>> fence_add_callback() on an array-fence, the array-fence's lock is >>>> aquired >>>> first, and in it's ->enable_signaling() callback, it will install cb's >>>> on >>>> it's array-member fences, so the array-member's lock is acquired second. >>>> >>>> But in the signal path, the array-member's lock is acquired first, and >>>> the >>>> array-fence's lock acquired second. >>>> >>>> One approach to deal with this is avoid holding the fence's lock when >>>> calling the cb. It is a bit intrusive and I haven't fixed up all the >>>> other drivers that call directly or indirectly fence_signal_locked(). >>> >>> >>> In general: Oh! Yes! Please! We have the same issue in the AMD scheduler >>> when we want to register a new callback on the next fence in the list. >>> >>>> I guess the other option would be introduce a work-queue for >>>> array-fence? >>> >>> >>> That's what we do in the GPU scheduler and it adds quite a bunch of extra >>> overhead. >>> >>> So my preferences are clearly to fix calling the cb with any locks held >>> before using another work item for the array fences. But I'm not sure if >>> that is possible with all drivers. >> >> I guess it is probably not 100% possible to ensure driver isn't >> holding other of it's own locks when cb is called.. so I guess wq for >> cb that needs to take other locks is a safe solution. >> >> That said, and maybe I need more coffee, but I think the spinlock for >> iterating cb_list is probably not needed.. I think we could arrange >> things so the test_and_set(SIGNALED) is enough to protect iterating >> the list and calling the cb's. (Maybe protect the >> test_and_set(SIGNALED) w/ fence->lock just so it doesn't race someone >> who already got past the test_bit(SIGNALED) in _add_callback()?) >> >> Then "all" we have to do is figure out how to kill the >> fence_signal_locked()/fence_is_signaled_locked() paths.. >> >> I think I also wouldn't mind pushing the locking down into the >> ->enable_signaling() cb too for drivers that need that. Maybe we >> don't strictly need that. From quick look, seems like half the >> drivers just 'return true;' and seems a bit silly to grab a lock for >> that. >> >> Maybe wq in fence-array in the short term at least is the best way >> just to get things working without such an invasive change. But seems >> like if we could kill the current _locked() paths that there is >> potential to make the fence->lock situation less annoying. > > > Maybe a heretic question, but do we really need the lock after all ? > > I mean the only use case for a double linked list I can see is removing the > callbacks in the case of a timed out wait and that should be a rather rare > operation. > > So wouldn't a single linked list with atomic swaps do as well? Possibly, although I guess keeping the lock as long as it isn't held over callbacks would be sufficient. Either way, I guess this approach starts by untangling the fence_signal_locked()/fence_is_signaled_locked() paths.. that is the part I'm less sure how to do yet. BR, -R > Christian. > > >> >> BR, >> -R >> >>> Regards, >>> Christian. >>> >>> >>>> Or?? >>>> >>>> lockdep splat: >>>> >>>> ====================================================== >>>> [ INFO: possible circular locking dependency detected ] >>>> 4.7.0-rc7+ #489 Not tainted >>>> ------------------------------------------------------- >>>> surfaceflinger/2034 is trying to acquire lock: >>>> (&(&array->lock)->rlock){......}, at: [<ffff00000858cddc>] >>>> fence_signal+0x5c/0xf8 >>>> >>>> but task is already holding lock: >>>> (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>] >>>> sw_sync_ioctl+0x228/0x3b0 >>>> >>>> which lock already depends on the new lock. >>>> >>>> the existing dependency chain (in reverse order) is: >>>> >>>> -> #1 (&(&obj->child_list_lock)->rlock){......}: >>>> [<ffff000008108924>] __lock_acquire+0x173c/0x18d8 >>>> [<ffff000008108e0c>] lock_acquire+0x4c/0x68 >>>> [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 >>>> [<ffff00000858d05c>] fence_add_callback+0x3c/0x100 >>>> [<ffff00000858f100>] fence_array_enable_signaling+0x80/0x170 >>>> [<ffff00000858d0d8>] fence_add_callback+0xb8/0x100 >>>> [<ffff00000858f504>] sync_file_poll+0xd4/0xf0 >>>> [<ffff0000081fd3a0>] do_sys_poll+0x220/0x438 >>>> [<ffff0000081fd8d0>] SyS_ppoll+0x1b0/0x1d8 >>>> [<ffff000008084f30>] el0_svc_naked+0x24/0x28 >>>> >>>> -> #0 (&(&array->lock)->rlock){......}: >>>> [<ffff000008104768>] print_circular_bug+0x80/0x2e0 >>>> [<ffff0000081089ac>] __lock_acquire+0x17c4/0x18d8 >>>> [<ffff000008108e0c>] lock_acquire+0x4c/0x68 >>>> [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70 >>>> [<ffff00000858cddc>] fence_signal+0x5c/0xf8 >>>> [<ffff00000858f268>] fence_array_cb_func+0x78/0x88 >>>> [<ffff00000858cb28>] fence_signal_locked+0x80/0xe0 >>>> [<ffff0000085903c8>] sw_sync_ioctl+0x2f8/0x3b0 >>>> [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790 >>>> [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0 >>>> [<ffff000008084f30>] el0_svc_naked+0x24/0x28 >>>> >>>> other info that might help us debug this: >>>> >>>> Possible unsafe locking scenario: >>>> >>>> CPU0 CPU1 >>>> ---- ---- >>>> lock(&(&obj->child_list_lock)->rlock); >>>> lock(&(&array->lock)->rlock); >>>> lock(&(&obj->child_list_lock)->rlock); >>>> lock(&(&array->lock)->rlock); >>>> >>>> *** DEADLOCK *** >>>> >>>> 1 lock held by surfaceflinger/2034: >>>> #0: (&(&obj->child_list_lock)->rlock){......}, at: >>>> [<ffff0000085902f8>] >>>> sw_sync_ioctl+0x228/0x3b0 >>>> --- >>>> drivers/dma-buf/fence.c | 22 ++++++++++++++-------- >>>> drivers/dma-buf/sw_sync.c | 2 +- >>>> drivers/dma-buf/sync_debug.c | 16 +++++++++------- >>>> include/linux/fence.h | 6 +++--- >>>> 4 files changed, 27 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c >>>> index cc05ddd..917f905 100644 >>>> --- a/drivers/dma-buf/fence.c >>>> +++ b/drivers/dma-buf/fence.c >>>> @@ -63,9 +63,9 @@ EXPORT_SYMBOL(fence_context_alloc); >>>> * >>>> * Unlike fence_signal, this function must be called with fence->lock >>>> held. >>>> */ >>>> -int fence_signal_locked(struct fence *fence) >>>> +int fence_signal_locked(struct fence *fence, unsigned long flags) >>>> { >>>> - struct fence_cb *cur, *tmp; >>>> + struct fence_cb *cur; >>>> int ret = 0; >>>> lockdep_assert_held(fence->lock); >>>> @@ -88,9 +88,12 @@ int fence_signal_locked(struct fence *fence) >>>> } else >>>> trace_fence_signaled(fence); >>>> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) { >>>> + while (!list_empty(&fence->cb_list)) { >>>> + cur = list_first_entry(&fence->cb_list, struct fence_cb, >>>> node); >>>> list_del_init(&cur->node); >>>> + spin_unlock_irqrestore(fence->lock, flags); >>>> cur->func(fence, cur); >>>> + spin_lock_irqsave(fence->lock, flags); >>>> } >>>> return ret; >>>> } >>>> @@ -124,12 +127,15 @@ int fence_signal(struct fence *fence) >>>> trace_fence_signaled(fence); >>>> if (test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) { >>>> - struct fence_cb *cur, *tmp; >>>> + struct fence_cb *cur; >>>> spin_lock_irqsave(fence->lock, flags); >>>> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, >>>> node) >>>> { >>>> + while (!list_empty(&fence->cb_list)) { >>>> + cur = list_first_entry(&fence->cb_list, struct >>>> fence_cb, node); >>>> list_del_init(&cur->node); >>>> + spin_unlock_irqrestore(fence->lock, flags); >>>> cur->func(fence, cur); >>>> + spin_lock_irqsave(fence->lock, flags); >>>> } >>>> spin_unlock_irqrestore(fence->lock, flags); >>>> } >>>> @@ -211,7 +217,7 @@ void fence_enable_sw_signaling(struct fence *fence) >>>> spin_lock_irqsave(fence->lock, flags); >>>> if (!fence->ops->enable_signaling(fence)) >>>> - fence_signal_locked(fence); >>>> + fence_signal_locked(fence, flags); >>>> spin_unlock_irqrestore(fence->lock, flags); >>>> } >>>> @@ -266,7 +272,7 @@ int fence_add_callback(struct fence *fence, struct >>>> fence_cb *cb, >>>> trace_fence_enable_signal(fence); >>>> if (!fence->ops->enable_signaling(fence)) { >>>> - fence_signal_locked(fence); >>>> + fence_signal_locked(fence, flags); >>>> ret = -ENOENT; >>>> } >>>> } >>>> @@ -366,7 +372,7 @@ fence_default_wait(struct fence *fence, bool intr, >>>> signed long timeout) >>>> trace_fence_enable_signal(fence); >>>> if (!fence->ops->enable_signaling(fence)) { >>>> - fence_signal_locked(fence); >>>> + fence_signal_locked(fence, flags); >>>> goto out; >>>> } >>>> } >>>> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c >>>> index 62e8e6d..2271f7f 100644 >>>> --- a/drivers/dma-buf/sw_sync.c >>>> +++ b/drivers/dma-buf/sw_sync.c >>>> @@ -146,7 +146,7 @@ static void sync_timeline_signal(struct >>>> sync_timeline >>>> *obj, unsigned int inc) >>>> list_for_each_entry_safe(pt, next, &obj->active_list_head, >>>> active_list) { >>>> - if (fence_is_signaled_locked(&pt->base)) >>>> + if (fence_is_signaled_locked(&pt->base, flags)) >>>> list_del_init(&pt->active_list); >>>> } >>>> diff --git a/drivers/dma-buf/sync_debug.c >>>> b/drivers/dma-buf/sync_debug.c >>>> index 2dd4c3d..a7556d3 100644 >>>> --- a/drivers/dma-buf/sync_debug.c >>>> +++ b/drivers/dma-buf/sync_debug.c >>>> @@ -71,12 +71,13 @@ static const char *sync_status_str(int status) >>>> return "error"; >>>> } >>>> -static void sync_print_fence(struct seq_file *s, struct fence >>>> *fence, >>>> bool show) >>>> +static void sync_print_fence(struct seq_file *s, struct fence *fence, >>>> + bool show, unsigned long flags) >>>> { >>>> int status = 1; >>>> struct sync_timeline *parent = fence_parent(fence); >>>> - if (fence_is_signaled_locked(fence)) >>>> + if (fence_is_signaled_locked(fence, flags)) >>>> status = fence->status; >>>> seq_printf(s, " %s%sfence %s", >>>> @@ -124,13 +125,14 @@ static void sync_print_obj(struct seq_file *s, >>>> struct sync_timeline *obj) >>>> list_for_each(pos, &obj->child_list_head) { >>>> struct sync_pt *pt = >>>> container_of(pos, struct sync_pt, child_list); >>>> - sync_print_fence(s, &pt->base, false); >>>> + sync_print_fence(s, &pt->base, false, flags); >>>> } >>>> spin_unlock_irqrestore(&obj->child_list_lock, flags); >>>> } >>>> static void sync_print_sync_file(struct seq_file *s, >>>> - struct sync_file *sync_file) >>>> + struct sync_file *sync_file, >>>> + unsigned long flags) >>>> { >>>> int i; >>>> @@ -141,9 +143,9 @@ static void sync_print_sync_file(struct seq_file >>>> *s, >>>> struct fence_array *array = >>>> to_fence_array(sync_file->fence); >>>> for (i = 0; i < array->num_fences; ++i) >>>> - sync_print_fence(s, array->fences[i], true); >>>> + sync_print_fence(s, array->fences[i], true, >>>> flags); >>>> } else { >>>> - sync_print_fence(s, sync_file->fence, true); >>>> + sync_print_fence(s, sync_file->fence, true, flags); >>>> } >>>> } >>>> @@ -172,7 +174,7 @@ static int sync_debugfs_show(struct seq_file *s, >>>> void *unused) >>>> struct sync_file *sync_file = >>>> container_of(pos, struct sync_file, >>>> sync_file_list); >>>> - sync_print_sync_file(s, sync_file); >>>> + sync_print_sync_file(s, sync_file, flags); >>>> seq_puts(s, "\n"); >>>> } >>>> spin_unlock_irqrestore(&sync_file_list_lock, flags); >>>> diff --git a/include/linux/fence.h b/include/linux/fence.h >>>> index 0d76305..073f380 100644 >>>> --- a/include/linux/fence.h >>>> +++ b/include/linux/fence.h >>>> @@ -220,7 +220,7 @@ static inline void fence_put(struct fence *fence) >>>> } >>>> int fence_signal(struct fence *fence); >>>> -int fence_signal_locked(struct fence *fence); >>>> +int fence_signal_locked(struct fence *fence, unsigned long flags); >>>> signed long fence_default_wait(struct fence *fence, bool intr, signed >>>> long timeout); >>>> int fence_add_callback(struct fence *fence, struct fence_cb *cb, >>>> fence_func_t func); >>>> @@ -239,13 +239,13 @@ void fence_enable_sw_signaling(struct fence >>>> *fence); >>>> * This function requires fence->lock to be held. >>>> */ >>>> static inline bool >>>> -fence_is_signaled_locked(struct fence *fence) >>>> +fence_is_signaled_locked(struct fence *fence, unsigned long flags) >>>> { >>>> if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>>> return true; >>>> if (fence->ops->signaled && fence->ops->signaled(fence)) { >>>> - fence_signal_locked(fence); >>>> + fence_signal_locked(fence, flags); >>>> return true; >>>> } >>>> >>> >>> > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel