Re: [PATCH] drm/i915: Recursive i915_reset_trylock() verboten

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 12 Feb 2019 11:18:25 +0000

Quoting Mika Kuoppala (2019-02-12 11:12:05)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > We cannot nest i915_reset_trylock() as the inner may wait for the
> > I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
> > waiting for our outermost lock. As we take the reset srcu around the
> > fence update, we have to defer taking it in i915_gem_fault() until after
> > we acquire the pin on the fence to avoid nesting. This is a little ugly,
> > but still works. If a reset occurs between i915_vma_pin_fence() and the
> > second reset lock, the reset will restore the fence register back to the
> > pinned value before the reset lock allows us to proceed (our mmap won't
> > be revoked as we haven't yet marked it as being a userfault as that
> > requires us to hold the reset lock), so the pagefault is still
> > serialised with the revocation in reset.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
> > Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c8c355bec091..ae1467a74a08 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
> >       if (ret)
> >               goto err_unpin;
> >  
> > +     ret = i915_vma_pin_fence(vma);
> > +     if (ret)
> > +             goto err_unpin;
> > +
> 
> As this is obviusness slipped past us, would it
> be worthwhile, in retrospect, to build a debug in
> i915_reset_trylock to be vocal about it failing
> to make progress?

If we stick a timeout in there, we just send that back to
userspace. Deadlock resolved just with a sporadic delay.
It is interruptible so it's not a complete loss, and more obvious if it
stalls? That's my thinking for not sending along the quick conversion to
wait_event_interruptible_timeout().

What I think we can do is stick a might_lock() so we get the lockdep
splat before the wait?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx