Re: [PATCH 1/2] drm/i915: Pull sync_scru for device reset outside of wedge_mutex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Mika Kuoppala (2019-02-11 15:09:48)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > We need to flush our srcu protecting resources about to be clobbered
> > by the reset, inside of our timer failsafe but outside of the
> > error->wedge_mutex, so that the failsafe can run in case the
> > synchronize_srcu() takes too long (hits a shrinker deadlock?).
> >
> > Fixes: 72eb16df010a ("drm/i915: Serialise resets with wedging")
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=109605
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/i915_reset.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> > index 9494b015185a..c2b7570730c2 100644
> > --- a/drivers/gpu/drm/i915/i915_reset.c
> > +++ b/drivers/gpu/drm/i915/i915_reset.c
> > @@ -941,9 +941,6 @@ static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> >  {
> >       int err, i;
> >  
> > -     /* Flush everyone currently using a resource about to be clobbered */
> > -     synchronize_srcu(&i915->gpu_error.reset_backoff_srcu);
> > -
> >       err = intel_gpu_reset(i915, ALL_ENGINES);
> >       for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
> >               msleep(10 * (i + 1));
> > @@ -1140,6 +1137,9 @@ static void i915_reset_device(struct drm_i915_private *i915,
> >       i915_wedge_on_timeout(&w, i915, 5 * HZ) {
> >               intel_prepare_reset(i915);
> >  
> > +             /* Flush everyone using a resource about to be clobbered */
> > +             synchronize_srcu(&error->reset_backoff_srcu);
> > +
> 
> Do we easily see which one it will be? This one or
> the block below to timeout on wedge?

It would be easy to reconstruct if we have all the stack traces so we
can switch which process is stuck where, but we do not. Failing that my
hunch is that it's sync_srcu taking too long, and by design we know it
can deadlock around an unfortunate shrinker interaction :( But I'm not
entirely convinced we're hitting that.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux