Re: [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 15 Jul 2020 13:26:23 +0100

On 15/07/2020 13:06, Tvrtko Ursulin wrote:

On 15/07/2020 11:50, Chris Wilson wrote:
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Matthew Auld <matthew.auld@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
  1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..bb4eb1a8780e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, 
unsigned int cpu)
      return this_cpu != cpu;
  }
-static bool __i915_spin_request(const struct i915_request * const rq, 
int state)
+static bool __i915_spin_request(struct i915_request * const rq, int 
state)
  {
      unsigned long timeout_ns;
      unsigned int cpu;
@@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct 
i915_request * const rq, int state)
      timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
      timeout_ns += local_clock_ns(&cpu);
      do {
-        if (i915_request_completed(rq))
+        if (dma_fence_is_signaled(&rq->fence))
              return true;
          if (signal_pending_state(state, current))
@@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
       * duration, which we currently lack.
       */
      if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
-        __i915_spin_request(rq, state)) {
-        dma_fence_signal(&rq->fence);
+        __i915_spin_request(rq, state))
          goto out;
-    }
      /*
       * This client is about to stall waiting for the GPU. In many cases
@@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
      for (;;) {
          set_current_state(state);
-        if (i915_request_completed(rq)) {
-            dma_fence_signal(&rq->fence);
+        if (dma_fence_is_signaled(&rq->fence))
              break;
-        }
          intel_engine_flush_submission(rq->engine);


In other words putting some latency back into the waiters, which is 
probably okay, since sync waits is not our primary model.

I have a slight concern about the remaining value of busy spinning if 
i915_request_completed check is removed from there as well. Of course it 
doesn't make sense to have different completion criteria between the 
two.. We could wait a bit longer if real check in busyspin said request 
is actually completed, just not signal it but wait for the breadcrumbs 
to do it.

What a load of nonsense.. :)

Okay, I think the only real question is i915_request_completed vs 
dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles 
waiting on GPU and breadcrumb irq work, or just the GPU.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx