Quoting Tvrtko Ursulin (2018-05-16 10:49:34) > > On 16/05/2018 10:25, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-05-16 10:15:34) > >> > >> On 16/05/2018 07:47, Chris Wilson wrote: > >>> We cannot call kthread_park() from softirq context, so let's avoid it > >>> entirely during the reset. We wanted to suspend the signaler so that it > >>> would not mark a request as complete at the same time as we marked it as > >>> being in error. Instead of parking the signaling, stop the engine from > >>> advancing so that the GPU doesn't emit the breadcrumb for our chosen > >>> "guilty" request. > >>> > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > >>> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > >>> CC: Michel Thierry <michel.thierry@xxxxxxxxx> > >>> Cc: Jeff McGee <jeff.mcgee@xxxxxxxxx> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>> --- > > > >>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > >>> index af5a178366ed..bb88a92fcc1e 100644 > >>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > >>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > >>> @@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine) > >>> return ret; > >>> } > >>> > >>> +static void set_stop_engine(struct intel_engine_cs *engine) > >>> +{ > >>> + struct drm_i915_private *dev_priv = engine->i915; > >>> + const u32 base = engine->mmio_base; > >>> + const i915_reg_t mode = RING_MI_MODE(base); > >>> + > >>> + I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING)); > >>> + if (__intel_wait_for_register_fw(dev_priv, > >>> + mode, MODE_IDLE, MODE_IDLE, > >>> + 1000, 0, > >>> + NULL)) > >>> + DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", > >>> + engine->name); > >>> +} > >> > >> Looks to be exactly the same implementation as in intel_lrc.c apart from > >> debug vs trace. Move to intel_engine_cs.c? > > > > Not unless you also suggest a name I like ;) > > > > Mika once had plans for engine->stop() so I didn't think too hard about > > this bit, expecting it to be superseded. > > Vfunc, or intel_engine_stop, anything but two of the same. > > Timed out message should also possibly even be upgraded to notice level. It doesn't make any difference unless an error occurs later. The STOP_RING will time out, we know it does. MODE_IDLE will only be set on the next arbitration point (aiui) :( Such cases are usually only resolved by wedging as the reset itself also fail. > >>> static struct i915_request *reset_prepare(struct intel_engine_cs *engine) > >>> { > >>> + if (INTEL_GEN(engine->i915) >= 3) > >>> + set_stop_engine(engine); > >>> + > >>> if (engine->irq_seqno_barrier) > >>> engine->irq_seqno_barrier(engine); > >>> > >>> > >> > >> Most important question - is stopping the engine expect to mostly work > >> with a palette of different hang types? > > > > The good news is that if the engine is hung, it doesn't matter! So what > > it reliably stops is Command Streamer execution along the ring, and we > > only need to rely on it stopping before the MI_STORE_DWORD of the > > breadcrumb to be sure that we won't see the breadcrumb advance as we > > process reset. > > > > Now what's stopping the breadcrumb still being in flight (from the GPU) > > as we sample it (from the CPU)? > > Not much. > > If it reliably stops the command streamer then that sounds good enough > to me. > > The in-flight write should be unlikely, since we detected a hang that > means any activity happened long time ago. Rule of thumb is to always use "hang" ;) - we may have falsely declared a hang and the gpu was just slow - another engine is hung, this one is chugging along oblivious However, the uncached read is just what I would use to allow the write from the GPU to be visible to the CPU, so maybe just add another? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx