> > > alan:snip > > > > @@ -3279,6 +3322,17 @@ static void destroyed_worker_func(struct > > work_struct *w) > > > > struct intel_gt *gt = guc_to_gt(guc); > > > > int tmp; > > > > > > > > + /* > > > > + * In rare cases we can get here via async context-free fence-signals > > that > > > > + * come very late in suspend flow or very early in resume flows. In > > these > > > > + * cases, GuC won't be ready but just skipping it here is fine as these > > > > + * pending-destroy-contexts get destroyed totally at GuC reset time at > > the > > > > + * end of suspend.. OR.. this worker can be picked up later on the next > > > > + * context destruction trigger after resume-completes > > > > > > who is triggering the work queue again? > > > > alan: short answer: we dont know - and still hunting this (getting closer now.. > > using task tgid str-name lookups). > > in the few times I've seen it, the callstack I've seen looked like this: > > > > [33763.582036] Call Trace: > > [33763.582038] <TASK> > > [33763.582040] dump_stack_lvl+0x69/0x97 [33763.582054] > > guc_context_destroy+0x1b5/0x1ec [33763.582067] > > free_engines+0x52/0x70 [33763.582072] rcu_do_batch+0x161/0x438 > > [33763.582084] rcu_nocb_cb_kthread+0xda/0x2d0 [33763.582093] > > kthread+0x13a/0x152 [33763.582102] ? > > rcu_nocb_gp_kthread+0x6a7/0x6a7 [33763.582107] ? css_get+0x38/0x38 > > [33763.582118] ret_from_fork+0x1f/0x30 [33763.582128] </TASK> > Alan above trace is not due to missing GT wakeref, it is due to a intel_context_put(), > Which called asynchronously by rcu_call(__free_engines), we need insert rcu_barrier() to flush all > rcu callback in late suspend. > > Thanks, > Anshuman. > > Thanks Anshuman for following up with the ongoing debug. I shall re-rev accordingly. ...alan