Hi Nirmoy, > > > Currently intel_gt_reset() kills the GuC and then resets requested > > > engines. This is problematic because there is a dedicated CSB FIFO > > > which only GuC can access and if that FIFO fills up, the hardware > > > will block on the next context switch until there is space that means > > > the system is effectively hung. If an engine is reset whilst actively > > > executing a context, a CSB entry will be sent to say that the context > > > has gone idle. Thus if reset happens on a very busy system then > > > killing GuC before killing the engines will lead to deadlock because > > > of filled up CSB FIFO. > > is this a fix? > > I went quite far back in the commit logs, and it appears to me that we've > always been using the current reset flow. > > I believe we don't perform a GT reset immediately after sending a number of > requests, which is what the current failed test is doing. > > So, I don't think there will be any visible impact on the user with the > current flow. Agree... good thinking here... we often abuse on the Fixes tag. Thanks, Andi