Re: [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 25/04/2019 11:42, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-04-25 11:35:01)

On 25/04/2019 10:19, Chris Wilson wrote:
Currently there is an underlying assumption that i915_request_unsubmit()
is synchronous wrt the GPU -- that is the request is no longer in flight
as we remove it. In the near future that may change, and this may upset
our signaling as we can process an interrupt for that request while it
is no longer in flight.

CPU0                                  CPU1
intel_engine_breadcrumbs_irq
(queue request completion)
                                       i915_request_cancel_signaling
...                                   ...
                                       i915_request_enable_signaling
dma_fence_signal

Hence in the time it took us to drop the lock to signal the request, a
preemption event may have occurred and re-queued the request. In the
process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
so reused the rq->signal_link that was in use on CPU0, leading to bad
pointer chasing in intel_engine_breadcrumbs_irq.

A related issue was that if someone started listening for a signal on a
completed but no longer in-flight request, we missed the opportunity to
immediately signal that request.

Furthermore, as intel_contexts may be immediately released during
request retirement, in order to be entirely sure that
intel_engine_breadcrumbs_irq may no longer dereference the intel_context
(ce->signals and ce->signal_link), we must wait for irq spinlock.

In order to prevent the race, we use a bit in the fence.flags to signal
the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
quickly signals to any outside observer that the fence is indeed signaled.

Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
   drivers/dma-buf/dma-fence.c                 |  1 +
   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 +++++++++++++--------
   drivers/gpu/drm/i915/i915_request.c         |  1 +
   3 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3aa8733f832a..9bf06042619a 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -29,6 +29,7 @@
EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
   EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
static DEFINE_SPINLOCK(dma_fence_stub_lock);
   static struct dma_fence dma_fence_stub;
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 3cbffd400b1b..4283224249d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -23,6 +23,7 @@
    */
#include <linux/kthread.h>
+#include <trace/events/dma_fence.h>
   #include <uapi/linux/sched/types.h>
#include "i915_drv.h"
@@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq)
   void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
   {
       struct intel_breadcrumbs *b = &engine->breadcrumbs;
+     const ktime_t timestamp = ktime_get();
       struct intel_context *ce, *cn;
       struct list_head *pos, *next;
       LIST_HEAD(signal);
@@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
                                            &rq->fence.flags));
+                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+                     if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+                                          &rq->fence.flags))
+                             continue;
/*
                        * Queue for execution after dropping the signaling
@@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
                        * more signalers to the same context or engine.
                        */
                       i915_request_get(rq);
-
-                     /*
-                      * We may race with direct invocation of
-                      * dma_fence_signal(), e.g. i915_request_retire(),
-                      * so we need to acquire our reference to the request
-                      * before we cancel the breadcrumb.
-                      */
-                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
                       list_add_tail(&rq->signal_link, &signal);
               }
@@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
       list_for_each_safe(pos, next, &signal) {
               struct i915_request *rq =
                       list_entry(pos, typeof(*rq), signal_link);
+             struct dma_fence_cb *cur, *tmp;
+
+             trace_dma_fence_signaled(&rq->fence);
+
+             rq->fence.timestamp = timestamp;
+             set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.flags);
+
+             spin_lock(&rq->lock);
+             list_for_each_entry_safe(cur, tmp, &rq->fence.cb_list, node) {
+                     INIT_LIST_HEAD(&cur->node);
+                     cur->func(&rq->fence, cur);
+             }
+             INIT_LIST_HEAD(&rq->fence.cb_list);
+             spin_unlock(&rq->lock);
- dma_fence_signal(&rq->fence);

I posted some comments on this patch already. In essence it was a
suggestion to not open-code-and-optimize dma_fence_signal, but split it
into two low-level helpers and export from the parent location. Like
__dma_fence_maybe/start_signal and __dma_fence_finish_signal.

bool __dma_fence_start_signal(...)
{
         return test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
                                 fence->flags);
}
EXPORT_SYMBOL(...)

void __dma_fence_finish_signal(...)
{
         trace_dma_fence_signaled...
         timestamp...
         callbacks...
}
EXPORT_SYMBOL(...)

This way we don't add coupling to low level implementation details in i915.

You mean midlayer implementation details :-p

The alternative I mentioned was that we use a redundant bit. But that
has duplicity of purpose.

The patch needs to be stable-friendly for the fixes, so I opted for
taking control.

I might buy it if you put a big fat comment this is open-coded dma_fence_singal and follow up with a patch to move code back into the midlayer following the stable marked patch. ;)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux