Hi Nirmoy, On Mon, Apr 22, 2024 at 10:19:51PM +0200, Nirmoy Das wrote: > Currently intel_gt_reset() kills the GuC and then resets requested > engines. This is problematic because there is a dedicated CSB FIFO > which only GuC can access and if that FIFO fills up, the hardware > will block on the next context switch until there is space that means > the system is effectively hung. If an engine is reset whilst actively > executing a context, a CSB entry will be sent to say that the context > has gone idle. Thus if reset happens on a very busy system then > killing GuC before killing the engines will lead to deadlock because > of filled up CSB FIFO. is this a fix? > To address this issue, the GuC should be killed only after resetting > the requested engines and before calling intel_gt_init_hw(). > > v2: Improve commit message(John) > > Cc: John Harrison <john.c.harrison@xxxxxxxxx> > Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index b1393863ca9b..6161f7a3ff70 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -879,8 +879,17 @@ static intel_engine_mask_t reset_prepare(struct intel_gt *gt) > intel_engine_mask_t awake = 0; > enum intel_engine_id id; > > - /* For GuC mode, ensure submission is disabled before stopping ring */ > - intel_uc_reset_prepare(>->uc); > + /** > + * For GuC mode with submission enabled, ensure submission > + * is disabled before stopping ring. nit: "stopping *the* ring" > + * > + * For GuC mode with submission disabled, ensure that GuC is not > + * sanitized, do that after engine reset. reset_prepare() > + * is followed by engine reset which in this mode requires GuC to > + * process any CSB FIFO entries generated by the resets. > + */ > + if (intel_uc_uses_guc_submission(>->uc)) > + intel_uc_reset_prepare(>->uc); > > for_each_engine(engine, gt, id) { > if (intel_engine_pm_get_if_awake(engine)) > @@ -1227,6 +1236,9 @@ void intel_gt_reset(struct intel_gt *gt, > > intel_overlay_reset(gt->i915); > > + /* sanitize uC after engine reset */ > + if (!intel_uc_uses_guc_submission(>->uc)) > + intel_uc_reset_prepare(>->uc); Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> Thanks, Andi