On Thu, Aug 03, 2017 at 12:35:48PM -0700, Michel Thierry wrote: > Hi, > > First sorry about the delay... > > On 7/20/2017 10:57 AM, Daniel Vetter wrote: > > There's no reason to entirely wedge the gpu, for the minimal deadlock > > bugfix we only need to unbreak/decouple the atomic commit from the gpu > > reset. The simplest way to fix that is by replacing the > > unconditional fence wait a the top of commit_tail by a wait which > > completes either when the fences are done (normal case, or when a > > reset doesn't need to touch the display state). Or when the gpu reset > > needs to force-unblock all pending modeset states. > > > > The lesser source of deadlocks is when we try to pin a new framebuffer > > and run into a stall. There's a bunch of places this can happen, like > > eviction, changing the caching mode, acquiring a fence on older > > platforms. And we can't just break the depency loop and keep going, > > the only way would be to break out and restart. But the problem with > > that approach is that we must stall for the reset to complete before > > we grab any locks, and with the atomic infrastructure that's a bit > > tricky. The only place is the ioctl code, and we don't want to insert > > code into e.g. the BUSY ioctl. Hence for that problem just create a > > critical section, and if any code is in there, wedge the GPU. For the > > steady-state this should never be a problem. > > > > Note that in both cases TDR itself keeps working, so from a userspace > > pov this trickery isn't observable. Users themselvs might spot a short > > glitch while the rendering is catching up again, but that's still > > better than pre-TDR where we've thrown away all the rendering, > > including innocent batches. Also, this fixes the regression TDR > > introduced of making gpu resets deadlock-prone when we do need to > > touch the display. > > > > One thing I noticed is that gpu_error.flags seems to use both our own > > wait-queue in gpu_error.wait_queue, and the generic wait_on_bit > > facilities. Not entirely sure why this inconsistency exists, I just > > picked one style. > > > > A possible future avenue could be to insert the gpu reset in-between > > ongoing modeset changes, which would avoid the momentary glitch. But > > that's a lot more work to implement in the atomic commit machinery, > > and given that we only need this for pre-g4x hw, of questionable > > utility just for the sake of polishing gpu reset even more on those > > old boxes. It might be useful for other features though. > > > > v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/. > > > > v3: Really emabarrassing fixup, I checked the wrong bit and broke the > > unbreak/wakeup logic. > > > > v4: Also handle deadlocks in pin_to_display. > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_drv.h | 3 +++ > > drivers/gpu/drm/i915/intel_display.c | 45 +++++++++++++++++++++++++++++++----- > > 2 files changed, 42 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index 07e98b07c5bc..9b055deca36d 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -1505,6 +1505,8 @@ struct i915_gpu_error { > > /* Protected by the above dev->gpu_error.lock. */ > > struct i915_gpu_state *first_error; > > > > + atomic_t pending_fb_pin; > > + > > unsigned long missed_irq_rings; > > > > /** > > @@ -1564,6 +1566,7 @@ struct i915_gpu_error { > > unsigned long flags; > > #define I915_RESET_BACKOFF 0 > > #define I915_RESET_HANDOFF 1 > > +#define I915_RESET_MODESET 2 > > #define I915_WEDGED (BITS_PER_LONG - 1) > > #define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES) > > > > Since you still need to resend this, could you also update the > I915_RESET_ENGINE overflow check in i915_handle_error? i.e.: > > ---BUILD_BUG_ON(I915_RESET_HANDOFF >= I915_RESET_ENGINE); > +++BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); > > Not that we would have this problem anytime soon... Yeah missed that, will fix. > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > > index f6bd6282d7f7..63ea8d6b2ebd 100644 > > --- a/drivers/gpu/drm/i915/intel_display.c > > +++ b/drivers/gpu/drm/i915/intel_display.c > > @@ -2162,6 +2162,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) > > */ > > intel_runtime_pm_get(dev_priv); > > > > + atomic_inc(&dev_priv->gpu_error.pending_fb_pin); > > + > > Do we also need this in intel_overlay_do_put_image? It is also being called > when the struct_mutex is held. Indeed I missed that, will fix too and resubmit the pile (once we agree on a comment for the previous patch). Thanks, Daniel > > > > vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view); > > if (IS_ERR(vma)) > > goto err; > > @@ -2189,6 +2191,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) > > > > i915_vma_get(vma); > > err: > > + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); > > + > > intel_runtime_pm_put(dev_priv); > > return vma; > > } > > @@ -3471,11 +3475,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) > > !gpu_reset_clobbers_display(dev_priv)) > > return; > > > > - /* We have a modeset vs reset deadlock, defensively unbreak it. > > - * > > - * FIXME: We can do a _lot_ better, this is just a first iteration.*/ > > - i915_gem_set_wedged(dev_priv); > > - DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); > > + /* We have a modeset vs reset deadlock, defensively unbreak it. */ > > + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); > > + wake_up_all(&dev_priv->gpu_error.wait_queue); > > + > > + if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) { > > + DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n"); > > + i915_gem_set_wedged(dev_priv); > > + } > > > > /* > > * Need mode_config.mutex so that we don't > > @@ -3570,6 +3577,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv) > > drm_modeset_drop_locks(ctx); > > drm_modeset_acquire_fini(ctx); > > mutex_unlock(&dev->mode_config.mutex); > > + > > + clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); > > } > > > > static bool abort_flip_on_reset(struct intel_crtc *crtc) > ^^^ this is the only reason of why we need a rebase. > > > @@ -12381,6 +12390,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work) > > intel_atomic_helper_free_state(dev_priv); > > } > > > > +static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state) > > +{ > > + struct wait_queue_entry wait_fence, wait_reset; > > + struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev); > > + > > + init_wait_entry(&wait_fence, 0); > > + init_wait_entry(&wait_reset, 0); > > + for (;;) { > > + prepare_to_wait(&intel_state->commit_ready.wait, > > + &wait_fence, TASK_UNINTERRUPTIBLE); > > + prepare_to_wait(&dev_priv->gpu_error.wait_queue, > > + &wait_reset, TASK_UNINTERRUPTIBLE); > > + > > + > > + if (i915_sw_fence_done(&intel_state->commit_ready) > > + || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags)) > > + break; > > + > > + schedule(); > > + } > > + finish_wait(&intel_state->commit_ready.wait, &wait_fence); > > + finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset); > > +} > > + > > static void intel_atomic_commit_tail(struct drm_atomic_state *state) > > { > > struct drm_device *dev = state->dev; > > @@ -12394,7 +12427,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state) > > unsigned crtc_vblank_mask = 0; > > int i; > > > > - i915_sw_fence_wait(&intel_state->commit_ready); > > + intel_atomic_commit_fence_wait(intel_state); > > > > drm_atomic_helper_wait_for_dependencies(state); > > > > Looks good to me. > > -Michel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx