Quoting Chris Wilson (2020-07-15 15:47:15) > Quoting Tvrtko Ursulin (2020-07-15 13:26:23) > > > > On 15/07/2020 13:06, Tvrtko Ursulin wrote: > > > > > > On 15/07/2020 11:50, Chris Wilson wrote: > > >> Currently, we use i915_request_completed() directly in > > >> i915_request_wait() and follow up with a manual invocation of > > >> dma_fence_signal(). This appears to cause a large number of contentions > > >> on i915_request.lock as when the process is woken up after the fence is > > >> signaled by an interrupt, we will then try and call dma_fence_signal() > > >> ourselves while the signaler is still holding the lock. > > >> dma_fence_is_signaled() has the benefit of checking the > > >> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so > > >> avoids most of that contention. > > >> > > >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > >> Cc: Matthew Auld <matthew.auld@xxxxxxxxx> > > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > >> --- > > >> drivers/gpu/drm/i915/i915_request.c | 12 ++++-------- > > >> 1 file changed, 4 insertions(+), 8 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_request.c > > >> b/drivers/gpu/drm/i915/i915_request.c > > >> index 0b2fe55e6194..bb4eb1a8780e 100644 > > >> --- a/drivers/gpu/drm/i915/i915_request.c > > >> +++ b/drivers/gpu/drm/i915/i915_request.c > > >> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, > > >> unsigned int cpu) > > >> return this_cpu != cpu; > > >> } > > >> -static bool __i915_spin_request(const struct i915_request * const rq, > > >> int state) > > >> +static bool __i915_spin_request(struct i915_request * const rq, int > > >> state) > > >> { > > >> unsigned long timeout_ns; > > >> unsigned int cpu; > > >> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct > > >> i915_request * const rq, int state) > > >> timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns); > > >> timeout_ns += local_clock_ns(&cpu); > > >> do { > > >> - if (i915_request_completed(rq)) > > >> + if (dma_fence_is_signaled(&rq->fence)) > > >> return true; > > >> if (signal_pending_state(state, current)) > > >> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq, > > >> * duration, which we currently lack. > > >> */ > > >> if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) && > > >> - __i915_spin_request(rq, state)) { > > >> - dma_fence_signal(&rq->fence); > > >> + __i915_spin_request(rq, state)) > > >> goto out; > > >> - } > > >> /* > > >> * This client is about to stall waiting for the GPU. In many cases > > >> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq, > > >> for (;;) { > > >> set_current_state(state); > > >> - if (i915_request_completed(rq)) { > > >> - dma_fence_signal(&rq->fence); > > >> + if (dma_fence_is_signaled(&rq->fence)) > > >> break; > > >> - } > > >> intel_engine_flush_submission(rq->engine); > > >> > > > > > > In other words putting some latency back into the waiters, which is > > > probably okay, since sync waits is not our primary model. > > > > > > I have a slight concern about the remaining value of busy spinning if > > > i915_request_completed check is removed from there as well. Of course it > > > doesn't make sense to have different completion criteria between the > > > two.. We could wait a bit longer if real check in busyspin said request > > > is actually completed, just not signal it but wait for the breadcrumbs > > > to do it. > > > > What a load of nonsense.. :) > > > > Okay, I think the only real question is i915_request_completed vs > > dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles > > waiting on GPU and breadcrumb irq work, or just the GPU. > > dma_fence_is_signaled() { > if (test_bit(SIGNALED_BIT)) > return true; > > if (i915_request_completed()) { > dma_fence_signal(); > return true; > } > > return false; > } > > with the indirection. So the question is whether the indirection is > worth the extra test bit. Just purely looking at the i915_request.lock > contention suggests that it probably is. For the spinner, burning a few > extra CPU cycles for *vfunc is not an issue, it's the wakeup latency, > and since we are calling dma_fence_signal() upon wakeup we do take the > spinlock without checking for an early return from the SIGNALED_BIT. > So I think it's a net positive. The alternative was to write > > if (i915_request_completed()) { > if (!i915_request_is_signaled()) > dma_fence_signal(); > break; > } > > but > > if (dma_fence_is_signaled()) > break; > > does appear simpler, if only by virtue of hiding the details in an > inline. Or a patch to add test_bit() to dma_fence_signal, but this looked better. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx