On Fri, 2021-11-12 at 15:32 +0000, Matthew Auld wrote: > In intel_context_do_pin_ww, when calling into the pre_pin hook(which > is > passed the ww context) it could in theory return -EDEADLK(which is > very > likely with debug kernels), once we start adding more ww locking in > there, > like in the next patch. If so then we need to be mindful of having to > restart the do_pin at this point. > > If this is the kernel_context, or some other early in-kernel context > where we have yet to setup the default_state, then we always inhibit > the > context restore, and instead rely on the delayed active_release to > set > the CONTEXT_VALID_BIT for us(if we even care), which should indicate > that we have context switched away, and that our newly saved context > state should now be valid. However, since we currently grab the > active > reference before the potential ww dance, we can end up setting the > CONTEXT_VALID_BIT much too early, if we need to backoff, and then > upon > re-trying the do_pin, we could potentially cause the hardware to > incorrectly load some garbage context state when later context > switching > to that context, but at the very least this will trigger the > GEM_BUG_ON() in __engine_unpark. For now let's just move any ww dance > stuff prior to arming the active reference. > > For normal user contexts this shouldn't be a concern, since we should > already have the default_state ready when initialising the lrc state, > and so there should be no concern with active_release somehow > prematurely setting the CONTEXT_VALID_BIT. > > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/intel_context.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index 5634d14052bc..ad44860faaf3 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -228,17 +228,17 @@ int __intel_context_do_pin_ww(struct > intel_context *ce, > if (err) > return err; > > - err = i915_active_acquire(&ce->active); > + err = ce->ops->pre_pin(ce, ww, &vaddr); > if (err) > goto err_ctx_unpin; > > - err = ce->ops->pre_pin(ce, ww, &vaddr); > + err = i915_active_acquire(&ce->active); > if (err) > - goto err_release; > + goto err_post_unpin; Hmm, If i915_active_acquire() fails, wouldn't we end up calling i915_active_release() here? Thanks, Thomas