Quoting Michel Thierry (2018-03-21 17:01:12) > On 3/21/2018 3:46 AM, Mika Kuoppala wrote: > > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > >> We were relying on the uncached reads when processing the CSB to provide > >> ourselves with the serialisation with the interrupt handler (so we could > >> detect new interrupts in the middle of processing the old one). However, > >> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD > >> from the HWSP") those uncached reads were eliminated (on one path at > >> least) and along with them our serialisation. The result is that we > >> would very rarely miss notification of a new interrupt and leave a > >> context-switch unprocessed, hanging the GPU. > >> > >> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP") > >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> Cc: Michel Thierry <michel.thierry@xxxxxxxxx> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++------------- > >> 1 file changed, 8 insertions(+), 13 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > >> index 53f1c009ed7b..67b6a0f658d6 100644 > >> --- a/drivers/gpu/drm/i915/intel_lrc.c > >> +++ b/drivers/gpu/drm/i915/intel_lrc.c > >> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data) > >> struct drm_i915_private *dev_priv = engine->i915; > >> bool fw = false; > >> > >> - /* We can skip acquiring intel_runtime_pm_get() here as it was taken > >> + /* > >> + * We can skip acquiring intel_runtime_pm_get() here as it was taken > >> * on our behalf by the request (see i915_gem_mark_busy()) and it will > >> * not be relinquished until the device is idle (see > >> * i915_gem_idle_work_handler()). As a precaution, we make sure > >> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data) > >> */ > >> GEM_BUG_ON(!dev_priv->gt.awake); > >> > >> - /* Prefer doing test_and_clear_bit() as a two stage operation to avoid > >> + /* > >> + * Prefer doing test_and_clear_bit() as a two stage operation to avoid > >> * imposing the cost of a locked atomic transaction when submitting a > >> * new request (outside of the context-switch interrupt). > >> */ > >> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data) > >> execlists->csb_head = -1; /* force mmio read of CSB ptrs */ > >> } > >> > >> - /* The write will be ordered by the uncached read (itself > >> - * a memory barrier), so we do not need another in the form > >> - * of a locked instruction. The race between the interrupt > >> - * handler and the split test/clear is harmless as we order > >> - * our clear before the CSB read. If the interrupt arrived > >> - * first between the test and the clear, we read the updated > >> - * CSB and clear the bit. If the interrupt arrives as we read > >> - * the CSB or later (i.e. after we had cleared the bit) the bit > >> - * is set and we do a new loop. > >> - */ > >> - __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > >> + /* Clear before reading to catch new interrupts */ > >> + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > >> + smp_mb__after_atomic(); > > Checkpatch wants a comment for the memory barrier... Are we being strict > about it? (https://patchwork.freedesktop.org/series/40359/) > > > > > I was confused about this memory barrier as our test is in the > > same context and ordered wrt this. Chris noted in irc that this is for > > the documentation for ordering wrt the code that follows. > > > > I am fine with that so, > > Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > > > Fine by me too, > > Reviewed-by: Michel Thierry <michel.thierry@xxxxxxxxx> It definitely appears to be fixing an issue I've been seeing for the last few months since using HWSP for execlists. But I only seeing in conjunction with another set of patches, so my presumption was upon those and not drm-tip (which kept on testing clear). Thanks for the review, pushed. I'm sure I'll moan about the locked instruction appearing in the profiles, just as much as I moan about the locked instructions for tasklet_schedule() dominating some profiles. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx