Re: [RFC] dma-buf/fence: avoid holding lock while calling cb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 18.10.2016 um 16:23 schrieb Rob Clark:
On Tue, Oct 18, 2016 at 6:07 AM, Christian König
<deathsimple@xxxxxxxxxxx> wrote:
Am 16.10.2016 um 18:03 schrieb Rob Clark:
Currently with fence-array, we have a potential deadlock situation.  If we
fence_add_callback() on an array-fence, the array-fence's lock is aquired
first, and in it's ->enable_signaling() callback, it will install cb's on
it's array-member fences, so the array-member's lock is acquired second.

But in the signal path, the array-member's lock is acquired first, and the
array-fence's lock acquired second.

One approach to deal with this is avoid holding the fence's lock when
calling the cb.  It is a bit intrusive and I haven't fixed up all the
other drivers that call directly or indirectly fence_signal_locked().

In general: Oh! Yes! Please! We have the same issue in the AMD scheduler
when we want to register a new callback on the next fence in the list.

I guess the other option would be introduce a work-queue for array-fence?

That's what we do in the GPU scheduler and it adds quite a bunch of extra
overhead.

So my preferences are clearly to fix calling the cb with any locks held
before using another work item for the array fences. But I'm not sure if
that is possible with all drivers.
I guess it is probably not 100% possible to ensure driver isn't
holding other of it's own locks when cb is called.. so I guess wq for
cb that needs to take other locks is a safe solution.

That said, and maybe I need more coffee, but I think the spinlock for
iterating cb_list is probably not needed.. I think we could arrange
things so the test_and_set(SIGNALED) is enough to protect iterating
the list and calling the cb's.  (Maybe protect the
test_and_set(SIGNALED) w/ fence->lock just so it doesn't race someone
who already got past the test_bit(SIGNALED) in _add_callback()?)

Then "all" we have to do is figure out how to kill the
fence_signal_locked()/fence_is_signaled_locked() paths..

I think I also wouldn't mind pushing the locking down into the
->enable_signaling() cb too for drivers that need that.  Maybe we
don't strictly need that.  From quick look, seems like half the
drivers just 'return true;' and seems a bit silly to grab a lock for
that.

Maybe wq in fence-array in the short term at least is the best way
just to get things working without such an invasive change.  But seems
like if we could kill the current _locked() paths that there is
potential to make the fence->lock situation less annoying.

Maybe a heretic question, but do we really need the lock after all ?

I mean the only use case for a double linked list I can see is removing the callbacks in the case of a timed out wait and that should be a rather rare operation.

So wouldn't a single linked list with atomic swaps do as well?

Christian.


BR,
-R

Regards,
Christian.


Or??

lockdep splat:

   ======================================================
[ INFO: possible circular locking dependency detected ]
4.7.0-rc7+ #489 Not tainted
-------------------------------------------------------
surfaceflinger/2034 is trying to acquire lock:
   (&(&array->lock)->rlock){......}, at: [<ffff00000858cddc>]
fence_signal+0x5c/0xf8

but task is already holding lock:
   (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>]
sw_sync_ioctl+0x228/0x3b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&obj->child_list_lock)->rlock){......}:
         [<ffff000008108924>] __lock_acquire+0x173c/0x18d8
         [<ffff000008108e0c>] lock_acquire+0x4c/0x68
         [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70
         [<ffff00000858d05c>] fence_add_callback+0x3c/0x100
         [<ffff00000858f100>] fence_array_enable_signaling+0x80/0x170
         [<ffff00000858d0d8>] fence_add_callback+0xb8/0x100
         [<ffff00000858f504>] sync_file_poll+0xd4/0xf0
         [<ffff0000081fd3a0>] do_sys_poll+0x220/0x438
         [<ffff0000081fd8d0>] SyS_ppoll+0x1b0/0x1d8
         [<ffff000008084f30>] el0_svc_naked+0x24/0x28

-> #0 (&(&array->lock)->rlock){......}:
         [<ffff000008104768>] print_circular_bug+0x80/0x2e0
         [<ffff0000081089ac>] __lock_acquire+0x17c4/0x18d8
         [<ffff000008108e0c>] lock_acquire+0x4c/0x68
         [<ffff000008ac6a6c>] _raw_spin_lock_irqsave+0x54/0x70
         [<ffff00000858cddc>] fence_signal+0x5c/0xf8
         [<ffff00000858f268>] fence_array_cb_func+0x78/0x88
         [<ffff00000858cb28>] fence_signal_locked+0x80/0xe0
         [<ffff0000085903c8>] sw_sync_ioctl+0x2f8/0x3b0
         [<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790
         [<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0
         [<ffff000008084f30>] el0_svc_naked+0x24/0x28

other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&(&obj->child_list_lock)->rlock);
                                 lock(&(&array->lock)->rlock);
                                 lock(&(&obj->child_list_lock)->rlock);
    lock(&(&array->lock)->rlock);

   *** DEADLOCK ***

1 lock held by surfaceflinger/2034:
   #0:  (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085902f8>]
sw_sync_ioctl+0x228/0x3b0
---
   drivers/dma-buf/fence.c      | 22 ++++++++++++++--------
   drivers/dma-buf/sw_sync.c    |  2 +-
   drivers/dma-buf/sync_debug.c | 16 +++++++++-------
   include/linux/fence.h        |  6 +++---
   4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
index cc05ddd..917f905 100644
--- a/drivers/dma-buf/fence.c
+++ b/drivers/dma-buf/fence.c
@@ -63,9 +63,9 @@ EXPORT_SYMBOL(fence_context_alloc);
    *
    * Unlike fence_signal, this function must be called with fence->lock
held.
    */
-int fence_signal_locked(struct fence *fence)
+int fence_signal_locked(struct fence *fence, unsigned long flags)
   {
-       struct fence_cb *cur, *tmp;
+       struct fence_cb *cur;
         int ret = 0;
         lockdep_assert_held(fence->lock);
@@ -88,9 +88,12 @@ int fence_signal_locked(struct fence *fence)
         } else
                 trace_fence_signaled(fence);
   -     list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
+       while (!list_empty(&fence->cb_list)) {
+               cur = list_first_entry(&fence->cb_list, struct fence_cb,
node);
                 list_del_init(&cur->node);
+               spin_unlock_irqrestore(fence->lock, flags);
                 cur->func(fence, cur);
+               spin_lock_irqsave(fence->lock, flags);
         }
         return ret;
   }
@@ -124,12 +127,15 @@ int fence_signal(struct fence *fence)
         trace_fence_signaled(fence);
         if (test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
-               struct fence_cb *cur, *tmp;
+               struct fence_cb *cur;
                 spin_lock_irqsave(fence->lock, flags);
-               list_for_each_entry_safe(cur, tmp, &fence->cb_list, node)
{
+               while (!list_empty(&fence->cb_list)) {
+                       cur = list_first_entry(&fence->cb_list, struct
fence_cb, node);
                         list_del_init(&cur->node);
+                       spin_unlock_irqrestore(fence->lock, flags);
                         cur->func(fence, cur);
+                       spin_lock_irqsave(fence->lock, flags);
                 }
                 spin_unlock_irqrestore(fence->lock, flags);
         }
@@ -211,7 +217,7 @@ void fence_enable_sw_signaling(struct fence *fence)
                 spin_lock_irqsave(fence->lock, flags);
                 if (!fence->ops->enable_signaling(fence))
-                       fence_signal_locked(fence);
+                       fence_signal_locked(fence, flags);
                 spin_unlock_irqrestore(fence->lock, flags);
         }
@@ -266,7 +272,7 @@ int fence_add_callback(struct fence *fence, struct
fence_cb *cb,
                 trace_fence_enable_signal(fence);
                 if (!fence->ops->enable_signaling(fence)) {
-                       fence_signal_locked(fence);
+                       fence_signal_locked(fence, flags);
                         ret = -ENOENT;
                 }
         }
@@ -366,7 +372,7 @@ fence_default_wait(struct fence *fence, bool intr,
signed long timeout)
                 trace_fence_enable_signal(fence);
                 if (!fence->ops->enable_signaling(fence)) {
-                       fence_signal_locked(fence);
+                       fence_signal_locked(fence, flags);
                         goto out;
                 }
         }
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 62e8e6d..2271f7f 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -146,7 +146,7 @@ static void sync_timeline_signal(struct sync_timeline
*obj, unsigned int inc)
         list_for_each_entry_safe(pt, next, &obj->active_list_head,
                                  active_list) {
-               if (fence_is_signaled_locked(&pt->base))
+               if (fence_is_signaled_locked(&pt->base, flags))
                         list_del_init(&pt->active_list);
         }
   diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c
index 2dd4c3d..a7556d3 100644
--- a/drivers/dma-buf/sync_debug.c
+++ b/drivers/dma-buf/sync_debug.c
@@ -71,12 +71,13 @@ static const char *sync_status_str(int status)
         return "error";
   }
   -static void sync_print_fence(struct seq_file *s, struct fence *fence,
bool show)
+static void sync_print_fence(struct seq_file *s, struct fence *fence,
+               bool show, unsigned long flags)
   {
         int status = 1;
         struct sync_timeline *parent = fence_parent(fence);
   -     if (fence_is_signaled_locked(fence))
+       if (fence_is_signaled_locked(fence, flags))
                 status = fence->status;
         seq_printf(s, "  %s%sfence %s",
@@ -124,13 +125,14 @@ static void sync_print_obj(struct seq_file *s,
struct sync_timeline *obj)
         list_for_each(pos, &obj->child_list_head) {
                 struct sync_pt *pt =
                         container_of(pos, struct sync_pt, child_list);
-               sync_print_fence(s, &pt->base, false);
+               sync_print_fence(s, &pt->base, false, flags);
         }
         spin_unlock_irqrestore(&obj->child_list_lock, flags);
   }
     static void sync_print_sync_file(struct seq_file *s,
-                                 struct sync_file *sync_file)
+                                 struct sync_file *sync_file,
+                                 unsigned long flags)
   {
         int i;
   @@ -141,9 +143,9 @@ static void sync_print_sync_file(struct seq_file *s,
                 struct fence_array *array =
to_fence_array(sync_file->fence);
                 for (i = 0; i < array->num_fences; ++i)
-                       sync_print_fence(s, array->fences[i], true);
+                       sync_print_fence(s, array->fences[i], true,
flags);
         } else {
-               sync_print_fence(s, sync_file->fence, true);
+               sync_print_fence(s, sync_file->fence, true, flags);
         }
   }
   @@ -172,7 +174,7 @@ static int sync_debugfs_show(struct seq_file *s,
void *unused)
                 struct sync_file *sync_file =
                         container_of(pos, struct sync_file,
sync_file_list);
   -             sync_print_sync_file(s, sync_file);
+               sync_print_sync_file(s, sync_file, flags);
                 seq_puts(s, "\n");
         }
         spin_unlock_irqrestore(&sync_file_list_lock, flags);
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d76305..073f380 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -220,7 +220,7 @@ static inline void fence_put(struct fence *fence)
   }
     int fence_signal(struct fence *fence);
-int fence_signal_locked(struct fence *fence);
+int fence_signal_locked(struct fence *fence, unsigned long flags);
   signed long fence_default_wait(struct fence *fence, bool intr, signed
long timeout);
   int fence_add_callback(struct fence *fence, struct fence_cb *cb,
                        fence_func_t func);
@@ -239,13 +239,13 @@ void fence_enable_sw_signaling(struct fence *fence);
    * This function requires fence->lock to be held.
    */
   static inline bool
-fence_is_signaled_locked(struct fence *fence)
+fence_is_signaled_locked(struct fence *fence, unsigned long flags)
   {
         if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
                 return true;
         if (fence->ops->signaled && fence->ops->signaled(fence)) {
-               fence_signal_locked(fence);
+               fence_signal_locked(fence, flags);
                 return true;
         }




_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux