Quoting Tvrtko Ursulin (2019-07-17 14:04:34) > > On 16/07/2019 13:49, Chris Wilson wrote: > > Push the engine stop into the back reset_prepare (where it already was!) > > This allows us to avoid dangerously setting the RING registers to 0 for > > logical contexts. If we clear the register on a live context, those > > invalid register values are recorded in the logical context state and > > replayed (with hilarious results). > > So essentially statement is gen3_stop_engine is not needed and even > dangerous with execlists? Yes. It has been a nuisance in the past, which is why we try to avoid it. I have come to conclusion that it serves no purpose for execlists and only makes recovery worse. > > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 16 +++++- > > drivers/gpu/drm/i915/gt/intel_reset.c | 58 ---------------------- > > drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 40 ++++++++++++++- > > 3 files changed, 53 insertions(+), 61 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 9e0992498087..9b87a2fc186c 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -2173,11 +2173,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) > > __tasklet_disable_sync_once(&execlists->tasklet); > > GEM_BUG_ON(!reset_in_progress(execlists)); > > > > - intel_engine_stop_cs(engine); > > - > > /* And flush any current direct submission. */ > > spin_lock_irqsave(&engine->active.lock, flags); > > spin_unlock_irqrestore(&engine->active.lock, flags); > > + > > + /* > > + * We stop engines, otherwise we might get failed reset and a > > + * dead gpu (on elk). Also as modern gpu as kbl can suffer > > + * from system hang if batchbuffer is progressing when > > + * the reset is issued, regardless of READY_TO_RESET ack. > > + * Thus assume it is best to stop engines on all gens > > + * where we have a gpu reset. > > + * > > + * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) > > + * > > + * FIXME: Wa for more modern gens needs to be validated > > + */ > > + intel_engine_stop_cs(engine); > > } > > > > static void reset_csb_pointers(struct intel_engine_cs *engine) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > > index 7ddedfb16aa2..55e2ddcbd215 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > > @@ -135,47 +135,6 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) > > } > > } > > > > -static void gen3_stop_engine(struct intel_engine_cs *engine) > > -{ > > - struct intel_uncore *uncore = engine->uncore; > > - const u32 base = engine->mmio_base; > > - > > - GEM_TRACE("%s\n", engine->name); > > - > > - if (intel_engine_stop_cs(engine)) > > - GEM_TRACE("%s: timed out on STOP_RING\n", engine->name); > > - > > - intel_uncore_write_fw(uncore, > > - RING_HEAD(base), > > - intel_uncore_read_fw(uncore, RING_TAIL(base))); > > - intel_uncore_posting_read_fw(uncore, RING_HEAD(base)); /* paranoia */ > > - > > - intel_uncore_write_fw(uncore, RING_HEAD(base), 0); > > - intel_uncore_write_fw(uncore, RING_TAIL(base), 0); > > - intel_uncore_posting_read_fw(uncore, RING_TAIL(base)); > > - > > - /* The ring must be empty before it is disabled */ > > - intel_uncore_write_fw(uncore, RING_CTL(base), 0); > > - > > - /* Check acts as a post */ > > - if (intel_uncore_read_fw(uncore, RING_HEAD(base))) > > - GEM_TRACE("%s: ring head [%x] not parked\n", > > - engine->name, > > - intel_uncore_read_fw(uncore, RING_HEAD(base))); > > -} > > - > > -static void stop_engines(struct intel_gt *gt, intel_engine_mask_t engine_mask) > > -{ > > - struct intel_engine_cs *engine; > > - intel_engine_mask_t tmp; > > - > > - if (INTEL_GEN(gt->i915) < 3) > > - return; > > - > > - for_each_engine_masked(engine, gt->i915, engine_mask, tmp) > > - gen3_stop_engine(engine); > > -} > > - > > static bool i915_in_reset(struct pci_dev *pdev) > > { > > u8 gdrst; > > @@ -607,23 +566,6 @@ int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask) > > */ > > intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL); > > for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) { > > - /* > > - * We stop engines, otherwise we might get failed reset and a > > - * dead gpu (on elk). Also as modern gpu as kbl can suffer > > - * from system hang if batchbuffer is progressing when > > - * the reset is issued, regardless of READY_TO_RESET ack. > > - * Thus assume it is best to stop engines on all gens > > - * where we have a gpu reset. > > - * > > - * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) > > - * > > - * WaMediaResetMainRingCleanup:ctg,elk (presumably) > > - * > > - * FIXME: Wa for more modern gens needs to be validated > > - */ > > - if (retry) > > - stop_engines(gt, engine_mask); > > - > > Only other functional change I see is that we stop retrying to stop the > engines before reset attempts. I don't know if that is a concern or not. Ah, but we do stop the engine before resets in *reset_prepare. The other path to arrive is in sanitize where we don't know enough state to safely tweak the engines. For those, I claim it shouldn't matter as the engines should be idle and we only need the reset to clear stale context state. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx