Quoting Francisco Jerez (2020-03-20 22:14:51) > Francisco Jerez <currojerez@xxxxxxxxxx> writes: > > > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > >> We dropped calling process_csb prior to handling direct submission in > >> order to avoid the nesting of spinlocks and lift process_csb() and the > >> majority of the tasklet out of irq-off. However, we do want to avoid > >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh > >> local to direct submission if we can acquire the tasklet's lock. > >> > >> v2: Tweak the balance to avoid over submitting lite-restores > >> > >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> Cc: Francisco Jerez <currojerez@xxxxxxxxxx> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++++++++++++++------ > >> drivers/gpu/drm/i915/gt/selftest_lrc.c | 2 +- > >> 2 files changed, 36 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >> index f09dd87324b9..dceb65a0088f 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine, > >> set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); > >> } > >> > >> -static void __submit_queue_imm(struct intel_engine_cs *engine) > >> +static bool pending_csb(const struct intel_engine_execlists *el) > >> { > >> - struct intel_engine_execlists * const execlists = &engine->execlists; > >> + return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head); > >> +} > >> > >> - if (reset_in_progress(execlists)) > >> - return; /* defer until we restart the engine following reset */ > >> +static bool skip_lite_restore(struct intel_engine_execlists *el, > >> + const struct i915_request *rq) > >> +{ > >> + struct i915_request *inflight = execlists_active(el); > >> > >> - if (execlists->tasklet.func == execlists_submission_tasklet) > >> - __execlists_submission_tasklet(engine); > >> - else > >> - tasklet_hi_schedule(&execlists->tasklet); > >> + return inflight && inflight->context == rq->context; > >> } > >> > >> static void submit_queue(struct intel_engine_cs *engine, > >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine, > >> if (rq_prio(rq) <= execlists->queue_priority_hint) > >> return; > >> > >> + if (reset_in_progress(execlists)) > >> + return; /* defer until we restart the engine following reset */ > >> + > >> + /* > >> + * Suppress immediate lite-restores, leave that to the tasklet. > >> + * > >> + * However, we leave the queue_priority_hint unset so that if we do > >> + * submit a second context, we push that into ELSP[1] immediately. > >> + */ > >> + if (skip_lite_restore(execlists, rq)) > >> + return; > >> + > > Why do you need to treat lite-restore specially here? Lite-restore have a noticeable impact on no-op loads. A part of that is that a lite-restore is about 1us, and the other part is that the driver has a lot more work to do. There's a balance point around here for not needlessly interrupting ourselves and ensuring that there is no bubble. > > > > Anyway, trying this out now in combination with my patches now. > > > > This didn't seem to help (together with your other suggestion to move > the overload accounting to __execlists_schedule_in/out). And it makes > the current -5% SynMark OglMultithread regression with my series go down > to -10%. My previous suggestion of moving the > intel_gt_pm_active_begin() call to process_csb() when the submission is > ACK'ed by the hardware does seem to help (and it roughly halves the > OglMultithread regression), possibly because that way we're able to > determine whether the first context was actually overlapping at the > point that the second was received by the hardware -- I haven't tested > it extensively yet though. Grumble, it just seems like we are setting and clearing the flag on completely unrelated events -- which I still think boils down to working around latency in the driver. Or at least I hope there's an explanation and bug to fix that improves responsiveness for all. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx