Re: [PATCH v2 2/3] drm/i915/guc: Close deregister-context race against CT-loss

"Teres Alexis, Alan Previn" <alan.previn.teres.alexis@xxxxxxxxx> · Tue, 15 Aug 2023 19:08:15 +0000



On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote:
> On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote:
> > If we are at the end of suspend or very early in resume
> > its possible an async fence signal could lead us to the
> > execution of the context destruction worker (after the
> > prior worker flush).

[snip]
> 
> > @@ -3199,10 +3206,20 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >  	if (unlikely(disabled)) {
> >  		release_guc_id(guc, ce);
> >  		__guc_context_destroy(ce);
> > -		return;
> > +		return 0;
> > +	}
> > +
> > +	if (deregister_context(ce, ce->guc_id.id)) {
> > +		/* Seal race with concurrent suspend by unrolling */
> > +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +		set_context_registered(ce);
> > +		clr_context_destroyed(ce);
> > +		intel_gt_pm_put(gt);
> 
> how sure we are this is not calling unbalanced puts?
alan: in this function (guc_lrc_desc_unpin), the summarized flow is:

	check-status-stuff
	if guc-is-not-disabled, take-pm, change ctx-state-bits
	[implied else] if guc-is-disabled, scub-ctx and return
	
thus derigster_context is only called if we didnt exit, i.e. when guc-is-not-disabled, i.e. after the pm was taken.
> could we wrap this in some existent function to make this clear?
alan: yeah - not so readible as it now - let me tweak this function and make it cleaner

> 
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +		return -ENODEV;
> >  	}
> >  
> > -	deregister_context(ce, ce->guc_id.id);
> > +	return 0;
> >  }