Additional update from the most recent testing. When relying solely on guc_lrc_desc_unpin getting a failure from deregister_context as a means for identifying that we are in the "deregister-context-vs-suspend-late" race, it is too late a location to handle this safely. This is because one of the first things destroyed_worker_func does it to take a gt pm wakeref - which triggers the gt_unpark function that does a whole lot bunch of other flows including triggering more workers and taking additional refs. That said, its best to not even call deregister_destroyed_contexts from the worker when !intel_guc_is_ready (ct-is-disabled). ...alan On Fri, 2023-08-25 at 11:54 -0700, Teres Alexis, Alan Previn wrote: > just a follow up note-to-self: > > On Tue, 2023-08-15 at 12:08 -0700, Teres Alexis, Alan Previn wrote: > > On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote: > > > On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote: > > > > > [snip] > > in guc_submission_send_busy_loop, we are incrementing the following > that needs to be decremented if the function fails. > > atomic_inc(&guc->outstanding_submission_g2h); > > also, it seems that even with thie unroll design - we are still > leaking a wakeref elsewhere. this is despite a cleaner redesign of > flows in function "guc_lrc_desc_unpin" > (discussed earlier that wasnt very readible). > > will re-rev today but will probably need more follow ups > tracking that one more leaking gt-wakeref (one in thousands-cycles) > but at least now we are not hanging mid-suspend.. we bail from suspend > with useful kernel messages. > > > >