Re: [PATCH v2 2/2] drm/i915: Fix gt reset with GuC submission is disabled

Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> · Tue, 23 Apr 2024 16:42:51 +0200

Hi Nirmoy,

> > > Currently intel_gt_reset() kills the GuC and then resets requested
> > > engines. This is problematic because there is a dedicated CSB FIFO
> > > which only GuC can access and if that FIFO fills up, the hardware
> > > will block on the next context switch until there is space that means
> > > the system is effectively hung. If an engine is reset whilst actively
> > > executing a context, a CSB entry will be sent to say that the context
> > > has gone idle. Thus if reset happens on a very busy system then
> > > killing GuC before killing the engines will lead to deadlock because
> > > of filled up CSB FIFO.
> > is this a fix?
> 
> I went quite far back in the commit logs, and it appears to me that we've
> always been using the current reset flow.
> 
> I believe we don't perform a GT reset immediately after sending a number of
> requests, which is what the current failed test is doing.
> 
> So, I don't think there will be any visible impact on the user with the
> current flow.

Agree... good thinking here... we often abuse on the Fixes tag.

Thanks,
Andi