Re: [PATCH 1/2] drm/i915: Seal races between async GPU cancellation, retirement and signaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 8, 2019 at 2:06 PM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
>
> Currently there is an underlying assumption that i915_request_unsubmit()
> is synchronous wrt the GPU -- that is the request is no longer in flight
> as we remove it. In the near future that may change, and this may upset
> our signaling as we can process an interrupt for that request while it
> is no longer in flight.
>
> CPU0                                    CPU1
> intel_engine_breadcrumbs_irq
> (queue request completion)
>                                         i915_request_cancel_signaling
> ...                                     ...
>                                         i915_request_enable_signaling
> dma_fence_signal
>
> Hence in the time it took us to drop the lock to signal the request, a
> preemption event may have occurred and re-queued the request. In the
> process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
> so reused the rq->signal_link that was in use on CPU0, leading to bad
> pointer chasing in intel_engine_breadcrumbs_irq.
>
> A related issue was that if someone started listening for a signal on a
> completed but no longer in-flight request, we missed the opportunity to
> immediately signal that request.
>
> Furthermore, as intel_contexts may be immediately released during
> request retirement, in order to be entirely sure that
> intel_engine_breadcrumbs_irq may no longer dereference the intel_context
> (ce->signals and ce->signal_link), we must wait for irq spinlock.
>
> In order to prevent the race, we use a bit in the fence.flags to signal
> the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
> For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
> quickly signals to any outside observer that the fence is indeed signaled.
>
> v2: Sketch out potential dma-fence API for manual signaling
> v3: And the test_and_set_bit()
>
> Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Ok chatted a bit with Chris on irc, and we already allow drivers to
pass whatever spinlock is suitable for their request/fence
handling/retiring, so allowing to split up this all makes sense I
think. Has my ack on the approach.

I think it'd be good to spell out the optimization this allows
(synchronous cleanup instead of refcounting all. the. things.) plus
showcase the link between the fence->lock pointer and the split-up
__dma_signal functions in the kerneldoc. Definitely for the core
patch.

Also not sure why __notify and __timestamp has a double underscore in
the middle. That color choice confuses me a bit :-)

Cheers, Daniel

> ---
>  drivers/dma-buf/dma-fence.c                 |  1 +
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 78 +++++++++++++++------
>  drivers/gpu/drm/i915/i915_request.c         |  1 +
>  drivers/gpu/drm/i915/intel_guc_submission.c |  1 -
>  4 files changed, 59 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3aa8733f832a..9bf06042619a 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -29,6 +29,7 @@
>
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>
>  static DEFINE_SPINLOCK(dma_fence_stub_lock);
>  static struct dma_fence dma_fence_stub;
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index fe455f01aa65..c092bdf5f0bf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -23,6 +23,7 @@
>   */
>
>  #include <linux/kthread.h>
> +#include <trace/events/dma_fence.h>
>  #include <uapi/linux/sched/types.h>
>
>  #include "i915_drv.h"
> @@ -96,9 +97,39 @@ check_signal_order(struct intel_context *ce, struct i915_request *rq)
>         return true;
>  }
>
> +static bool
> +__dma_fence_signal(struct dma_fence *fence)
> +{
> +       return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> +}
> +
> +static void
> +__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp)
> +{
> +       fence->timestamp = timestamp;
> +       set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> +       trace_dma_fence_signaled(fence);
> +}
> +
> +static void
> +__dma_fence_signal__notify(struct dma_fence *fence)
> +{
> +       struct dma_fence_cb *cur, *tmp;
> +
> +       lockdep_assert_held(fence->lock);
> +       lockdep_assert_irqs_disabled();
> +
> +       list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> +               INIT_LIST_HEAD(&cur->node);
> +               cur->func(fence, cur);
> +       }
> +       INIT_LIST_HEAD(&fence->cb_list);
> +}
> +
>  void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>  {
>         struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +       const ktime_t timestamp = ktime_get();
>         struct intel_context *ce, *cn;
>         struct list_head *pos, *next;
>         LIST_HEAD(signal);
> @@ -122,6 +153,10 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>
>                         GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
>                                              &rq->fence.flags));
> +                       clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +
> +                       if (!__dma_fence_signal(&rq->fence))
> +                               continue;
>
>                         /*
>                          * Queue for execution after dropping the signaling
> @@ -129,14 +164,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>                          * more signalers to the same context or engine.
>                          */
>                         i915_request_get(rq);
> -
> -                       /*
> -                        * We may race with direct invocation of
> -                        * dma_fence_signal(), e.g. i915_request_retire(),
> -                        * so we need to acquire our reference to the request
> -                        * before we cancel the breadcrumb.
> -                        */
> -                       clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>                         list_add_tail(&rq->signal_link, &signal);
>                 }
>
> @@ -159,7 +186,12 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>                 struct i915_request *rq =
>                         list_entry(pos, typeof(*rq), signal_link);
>
> -               dma_fence_signal(&rq->fence);
> +               __dma_fence_signal__timestamp(&rq->fence, timestamp);
> +
> +               spin_lock(&rq->lock);
> +               __dma_fence_signal__notify(&rq->fence);
> +               spin_unlock(&rq->lock);
> +
>                 i915_request_put(rq);
>         }
>  }
> @@ -261,19 +293,17 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>
>  bool i915_request_enable_breadcrumb(struct i915_request *rq)
>  {
> -       struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
> -
> -       GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
> +       lockdep_assert_held(&rq->lock);
> +       lockdep_assert_irqs_disabled();
>
> -       if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> -               return true;
> -
> -       spin_lock(&b->irq_lock);
> -       if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
> -           !__request_completed(rq)) {
> +       if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
> +               struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>                 struct intel_context *ce = rq->hw_context;
>                 struct list_head *pos;
>
> +               spin_lock(&b->irq_lock);
> +               GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
> +
>                 __intel_breadcrumbs_arm_irq(b);
>
>                 /*
> @@ -303,8 +333,8 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
>                 GEM_BUG_ON(!check_signal_order(ce, rq));
>
>                 set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +               spin_unlock(&b->irq_lock);
>         }
> -       spin_unlock(&b->irq_lock);
>
>         return !__request_completed(rq);
>  }
> @@ -313,9 +343,15 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
>  {
>         struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>
> -       if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
> -               return;
> +       lockdep_assert_held(&rq->lock);
> +       lockdep_assert_irqs_disabled();
>
> +       /*
> +        * We must wait for b->irq_lock so that we know the interrupt handler
> +        * has released its reference to the intel_context and has completed
> +        * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
> +        * required).
> +        */
>         spin_lock(&b->irq_lock);
>         if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
>                 struct intel_context *ce = rq->hw_context;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index fa955b7b6def..bed213148cbb 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -437,6 +437,7 @@ void __i915_request_submit(struct i915_request *request)
>         set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>
>         if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
> +           !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &request->fence.flags) &&
>             !i915_request_enable_breadcrumb(request))
>                 intel_engine_queue_breadcrumbs(engine);
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 380d83a2bfb6..ea0e3734d37c 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -23,7 +23,6 @@
>   */
>
>  #include <linux/circ_buf.h>
> -#include <trace/events/dma_fence.h>
>
>  #include "gt/intel_engine_pm.h"
>  #include "gt/intel_lrc_reg.h"
> --
> 2.20.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux