Quoting Mika Kuoppala (2020-12-15 17:09:39) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > We assume that the contents of the HWSP are lost across suspend, and so > > upon resume we must restore critical values such as the timeline seqno. > > Keep track of every timeline allocated that uses the HWSP as its storage > > and so we can then reset all seqno values by walking that list. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 ++++- > > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 6 ++++ > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + > > .../drm/i915/gt/intel_execlists_submission.c | 11 ++++-- > > .../gpu/drm/i915/gt/intel_ring_submission.c | 35 +++++++++++++++++++ > > drivers/gpu/drm/i915/gt/intel_timeline.h | 13 +++++-- > > .../gpu/drm/i915/gt/intel_timeline_types.h | 2 ++ > > 7 files changed, 71 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > index 71bd052628f4..6c08e74edcae 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > @@ -648,6 +648,8 @@ static int init_status_page(struct intel_engine_cs *engine) > > void *vaddr; > > int ret; > > > > + INIT_LIST_HEAD(&engine->status_page.timelines); > > + > > /* > > * Though the HWS register does support 36bit addresses, historically > > * we have had hangs and corruption reported due to wild writes if > > @@ -936,6 +938,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) > > fput(engine->default_state); > > > > if (engine->kernel_context) { > > + list_del(&engine->kernel_context->timeline->engine_link); > > intel_context_unpin(engine->kernel_context); > > intel_context_put(engine->kernel_context); > > } > > @@ -1281,8 +1284,12 @@ void intel_engines_reset_default_submission(struct intel_gt *gt) > > struct intel_engine_cs *engine; > > enum intel_engine_id id; > > > > - for_each_engine(engine, gt, id) > > + for_each_engine(engine, gt, id) { > > + if (engine->sanitize) > > + engine->sanitize(engine); > > + > > engine->set_default_submission(engine); > > + } > > } > > > > bool intel_engine_can_store_dword(struct intel_engine_cs *engine) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > index 99574378047f..1e5bad0b9a82 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > @@ -60,6 +60,12 @@ static int __engine_unpark(struct intel_wakeref *wf) > > > > /* Scrub the context image after our loss of control */ > > ce->ops->reset(ce); > > + > > + CE_TRACE(ce, "reset { seqno:%x, *hwsp:%x, ring:%x }\n", > > + ce->timeline->seqno, > > + READ_ONCE(*ce->timeline->hwsp_seqno), > > + ce->ring->emit); > > + GEM_BUG_ON(ce->timeline->seqno != *ce->timeline->hwsp_seqno); > > Compiler should be satified but could still have been READ_ONCE, > for the reader and for the fine bug on which might get delivered to console. Yup, I hesitated as it meant a newline :) > But main thing is that now coherency is enforced from the get go. > > +static void xcs_sanitize(struct intel_engine_cs *engine) > > +{ > > + /* > > + * Poison residual state on resume, in case the suspend didn't! > > + * > > + * We have to assume that across suspend/resume (or other loss > > + * of control) that the contents of our pinned buffers has been > > + * lost, replaced by garbage. Since this doesn't always happen, > > + * let's poison such state so that we more quickly spot when > > + * we falsely assume it has been preserved. > > + */ > > + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) > > + memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE); > > + > > + /* > > + * The kernel_context HWSP is stored in the status_page. As above, > > + * that may be lost on resume/initialisation, and so we need to > > + * reset the value in the HWSP. > > + */ > > + sanitize_hwsp(engine); > > + > > + /* And scrub the dirty cachelines for the HWSP */ > > + clflush_cache_range(engine->status_page.addr, PAGE_SIZE); > > The flush could be part of the actual writing of the seqno with > that range. But then you would need to track the debug so. Better > to make sure to transfer everything to be visible. Yes. It's also just part of the general 'sanitize' paranoia; we scrub everything until we are convinced that we only see our own bugs reflected back at us. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx