Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 26 Feb 2019 20:27:07 +0000



Quoting Rodrigo Vivi (2019-02-26 19:53:47)
> On Wed, Feb 06, 2019 at 01:03:12PM +0000, Chris Wilson wrote:
> > Previously, we were able to rely on the recursive properties of
> > struct_mutex to allow us to serialise revoking mmaps and reacquiring the
> > FENCE registers with them being clobbered over a global device reset.
> > I then proceeded to throw out the baby with the bath water in order to
> > pursue a struct_mutex-less reset.
> > 
> > Perusing LWN for alternative strategies, the dilemma on how to serialise
> > access to a global resource on one side was answered by
> > https://lwn.net/Articles/202847/ -- Sleepable RCU:
> > 
> >     1  int readside(void) {
> >     2      int idx;
> >     3      rcu_read_lock();
> >     4    if (nomoresrcu) {
> >     5          rcu_read_unlock();
> >     6        return -EINVAL;
> >     7      }
> >     8    idx = srcu_read_lock(&ss);
> >     9    rcu_read_unlock();
> >     10           /* SRCU read-side critical section. */
> >     11           srcu_read_unlock(&ss, idx);
> >     12           return 0;
> >     13 }
> >     14
> >     15 void cleanup(void)
> >     16 {
> >     17     nomoresrcu = 1;
> >     18     synchronize_rcu();
> >     19     synchronize_srcu(&ss);
> >     20     cleanup_srcu_struct(&ss);
> >     21 }
> > 
> > No more worrying about stop_machine, just an uber-complex mutex,
> > optimised for reads, with the overhead pushed to the rare reset path.
> > 
> > However, we do run the risk of a deadlock as we allocate underneath the
> > SRCU read lock, and the allocation may require a GPU reset, causing a
> > dependency cycle via the in-flight requests. We resolve that by declaring
> > the driver wedged and cancelling all in-flight rendering.
> > 
> > v2: Use expedited rcu barriers to match our earlier timing
> > characteristics.
> > v3: Try to annotate locking contexts for sparse
> > v4: Reduce selftest lock duration to avoid a reset deadlock with fences
> > v5: s/srcu/reset_backoff_srcu/
> > 
> > Testcase: igt/gem_mmap_gtt/hang
> > Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
> 
> Hi Chris,
> 
> this patch didn't applied cleanly on dinf
> so I noticed that I could also get
> 
> 115ff80a97cf ("drm/i915: Defer removing fence register tracking to rpm wakeup")
> 
> so that applies, but then I noticed that there's a Fixes of
> this patch here:
> 
> de3a87cf6352 ("drm/i915: Recursive i915_reset_trylock() verboten")
> 
> It seems sane to pick 3 of them, but I'd like to confirm with you first.
> 
> Thoughts?

There's a at least one later patch as well. I think the safest course of
option is to ignore these fixups, as the glitch is very rare (after a
gpu hang as well), only temporary and does not impact stability or security
(they can only see their own buffers/data in a slightly different order).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx