On Wed, Dec 20, 2023 at 11:08:59PM +0000, Teres Alexis, Alan Previn wrote: > On Wed, 2023-12-13 at 16:23 -0500, Vivi, Rodrigo wrote: > > On Tue, Dec 12, 2023 at 08:57:16AM -0800, Alan Previn wrote: > > > If we are at the end of suspend or very early in resume > > > its possible an async fence signal (via rcu_call) is triggered > > > to free_engines which could lead us to the execution of > > > the context destruction worker (after a prior worker flush). > alan:snip > > > > > Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_- > > > contexts if guc_lrc_desc_unpin fails due to CT send falure. > > > When unrolling, keep the context in the GuC's destroy-list so > > > it can get picked up on the next destroy worker invocation > > > (if suspend aborted) or get fully purged as part of a GuC > > > sanitization (end of suspend) or a reset flow. > > > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx> > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@xxxxxxxxx> > > > Tested-by: Mousumi Jana <mousumi.jana@xxxxxxxxx> > > > Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> > > > > Thanks for all the explanations, patience and great work! > > > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > > alan: Thanks Rodrigo for the RB last week, just quick update: > > I've cant reproduce the BAT failures that seem to be intermittent > on platform and test - however, a noticable number of failures > do keep occuring on i915_selftest @live @requests where the > last test leaked a wakeref and the failing test hangs waiting > for gt to idle before starting its test. > > i have to debug this further although from code inspection > is unrelated to the patches in this series. > Hopefully its a different issue. Yeap, likely not related. Anyway, I'm sorry for not merging this sooner. Could you please send a rebased version? This on is not applying cleanly anymore.