On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote: > On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote: > > If we are at the end of suspend or very early in resume > > its possible an async fence signal could lead us to the > > execution of the context destruction worker (after the > > prior worker flush). [snip] > > > @@ -3199,10 +3206,20 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) > > if (unlikely(disabled)) { > > release_guc_id(guc, ce); > > __guc_context_destroy(ce); > > - return; > > + return 0; > > + } > > + > > + if (deregister_context(ce, ce->guc_id.id)) { > > + /* Seal race with concurrent suspend by unrolling */ > > + spin_lock_irqsave(&ce->guc_state.lock, flags); > > + set_context_registered(ce); > > + clr_context_destroyed(ce); > > + intel_gt_pm_put(gt); > > how sure we are this is not calling unbalanced puts? alan: in this function (guc_lrc_desc_unpin), the summarized flow is: check-status-stuff if guc-is-not-disabled, take-pm, change ctx-state-bits [implied else] if guc-is-disabled, scub-ctx and return thus derigster_context is only called if we didnt exit, i.e. when guc-is-not-disabled, i.e. after the pm was taken. > could we wrap this in some existent function to make this clear? alan: yeah - not so readible as it now - let me tweak this function and make it cleaner > > > + spin_unlock_irqrestore(&ce->guc_state.lock, flags); > > + return -ENODEV; > > } > > > > - deregister_context(ce, ce->guc_id.id); > > + return 0; > > }