Quoting Tvrtko Ursulin (2019-07-17 14:21:50) > > On 17/07/2019 14:08, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-07-17 14:04:34) > >> > >> On 16/07/2019 13:49, Chris Wilson wrote: > >>> Push the engine stop into the back reset_prepare (where it already was!) > >>> This allows us to avoid dangerously setting the RING registers to 0 for > >>> logical contexts. If we clear the register on a live context, those > >>> invalid register values are recorded in the logical context state and > >>> replayed (with hilarious results). > >> > >> So essentially statement is gen3_stop_engine is not needed and even > >> dangerous with execlists? > > > > Yes. It has been a nuisance in the past, which is why we try to avoid > > it. I have come to conclusion that it serves no purpose for execlists > > and only makes recovery worse. > > > >> > >>> > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 16 +++++- > >>> drivers/gpu/drm/i915/gt/intel_reset.c | 58 ---------------------- > >>> drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 40 ++++++++++++++- > >>> 3 files changed, 53 insertions(+), 61 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> index 9e0992498087..9b87a2fc186c 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> @@ -2173,11 +2173,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) > >>> __tasklet_disable_sync_once(&execlists->tasklet); > >>> GEM_BUG_ON(!reset_in_progress(execlists)); > >>> > >>> - intel_engine_stop_cs(engine); > >>> - > >>> /* And flush any current direct submission. */ > >>> spin_lock_irqsave(&engine->active.lock, flags); > >>> spin_unlock_irqrestore(&engine->active.lock, flags); > >>> + > >>> + /* > >>> + * We stop engines, otherwise we might get failed reset and a > >>> + * dead gpu (on elk). Also as modern gpu as kbl can suffer > >>> + * from system hang if batchbuffer is progressing when > >>> + * the reset is issued, regardless of READY_TO_RESET ack. > >>> + * Thus assume it is best to stop engines on all gens > >>> + * where we have a gpu reset. > >>> + * > >>> + * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) > >>> + * > >>> + * FIXME: Wa for more modern gens needs to be validated > >>> + */ > >>> + intel_engine_stop_cs(engine); > >>> } > >>> > >>> static void reset_csb_pointers(struct intel_engine_cs *engine) > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > >>> index 7ddedfb16aa2..55e2ddcbd215 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > >>> @@ -135,47 +135,6 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) > >>> } > >>> } > >>> > >>> -static void gen3_stop_engine(struct intel_engine_cs *engine) > >>> -{ > >>> - struct intel_uncore *uncore = engine->uncore; > >>> - const u32 base = engine->mmio_base; > >>> - > >>> - GEM_TRACE("%s\n", engine->name); > >>> - > >>> - if (intel_engine_stop_cs(engine)) > >>> - GEM_TRACE("%s: timed out on STOP_RING\n", engine->name); > >>> - > >>> - intel_uncore_write_fw(uncore, > >>> - RING_HEAD(base), > >>> - intel_uncore_read_fw(uncore, RING_TAIL(base))); > >>> - intel_uncore_posting_read_fw(uncore, RING_HEAD(base)); /* paranoia */ > >>> - > >>> - intel_uncore_write_fw(uncore, RING_HEAD(base), 0); > >>> - intel_uncore_write_fw(uncore, RING_TAIL(base), 0); > >>> - intel_uncore_posting_read_fw(uncore, RING_TAIL(base)); > >>> - > >>> - /* The ring must be empty before it is disabled */ > >>> - intel_uncore_write_fw(uncore, RING_CTL(base), 0); > >>> - > >>> - /* Check acts as a post */ > >>> - if (intel_uncore_read_fw(uncore, RING_HEAD(base))) > >>> - GEM_TRACE("%s: ring head [%x] not parked\n", > >>> - engine->name, > >>> - intel_uncore_read_fw(uncore, RING_HEAD(base))); > >>> -} > >>> - > >>> -static void stop_engines(struct intel_gt *gt, intel_engine_mask_t engine_mask) > >>> -{ > >>> - struct intel_engine_cs *engine; > >>> - intel_engine_mask_t tmp; > >>> - > >>> - if (INTEL_GEN(gt->i915) < 3) > >>> - return; > >>> - > >>> - for_each_engine_masked(engine, gt->i915, engine_mask, tmp) > >>> - gen3_stop_engine(engine); > >>> -} > >>> - > >>> static bool i915_in_reset(struct pci_dev *pdev) > >>> { > >>> u8 gdrst; > >>> @@ -607,23 +566,6 @@ int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask) > >>> */ > >>> intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL); > >>> for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) { > >>> - /* > >>> - * We stop engines, otherwise we might get failed reset and a > >>> - * dead gpu (on elk). Also as modern gpu as kbl can suffer > >>> - * from system hang if batchbuffer is progressing when > >>> - * the reset is issued, regardless of READY_TO_RESET ack. > >>> - * Thus assume it is best to stop engines on all gens > >>> - * where we have a gpu reset. > >>> - * > >>> - * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) > >>> - * > >>> - * WaMediaResetMainRingCleanup:ctg,elk (presumably) > >>> - * > >>> - * FIXME: Wa for more modern gens needs to be validated > >>> - */ > >>> - if (retry) > >>> - stop_engines(gt, engine_mask); > >>> - > >> > >> Only other functional change I see is that we stop retrying to stop the > >> engines before reset attempts. I don't know if that is a concern or not. > > > > Ah, but we do stop the engine before resets in *reset_prepare. The other > > path to arrive is in sanitize where we don't know enough state to safely > > tweak the engines. For those, I claim it shouldn't matter as the engines > > should be idle and we only need the reset to clear stale context state. > > Yes I know that we do call stop in prepare, just not on the reset retry > path. It's the above loop, if reset was failing and needed retries > before we would re-retried stopping engines and now we would not. The engines are still stopped. The functional change is to remove the dangerous clearing of RING_HEAD/CTL. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx