This is a note to let you know that I've just added the patch titled drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset to the 3.17-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-i915-ignore-surflive-and-flip-counter-when-the-gpu-gets-reset.patch and it can be found in the queue-3.17 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From bdfa7542d40e6251c232a802231b37116bd31b11 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@xxxxxxxxxxxxxxx> Date: Tue, 27 May 2014 21:33:09 +0300 Subject: drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@xxxxxxxxxxxxxxx> commit bdfa7542d40e6251c232a802231b37116bd31b11 upstream. During a GPU reset we need to get pending page flip cleared out since the ring contents are gone and flip will never complete on its own. This used to work until the mmio vs. CS flip race detection came about. That piece of code is looking for a specific surface address in the SURFLIVE register, but as a flip to that address may never happen the check may never pass. So we should just skip the SURFLIVE and flip counter checks when the GPU gets reset. intel_display_handle_reset() tries to effectively complete the flip anyway by calling .update_primary_plane(). But that may not satisfy the conditions of the mmio vs. CS race detection since there's no guarantee that a modeset didn't sneak in between the GPU reset and intel_display_handle_reset(). Such a modeset will not wait for pending flips due to the ongoing GPU reset, and then the primary plane updates performed by intel_display_handle_reset() will already use the new surface address, and thus the surface address the flip is waiting for might never appear in SURFLIVE. The result is that the flip will never complete and attempts to perform further page flips will fail with -EBUSY. During the GPU reset intel_crtc_has_pending_flip() will return false regardless, so the deadlock with a modeset vs. the error work acquiring crtc->mutex was avoided. And the reset_counter check in intel_crtc_has_pending_flip() actually made this bug even less severe since it allowed normal modesets to go through even though there's a pending flip. This is a regression introduced by me here: commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9 Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> Date: Tue Apr 15 21:41:34 2014 +0300 drm/i915: Fix mmio vs. CS flip race on ILK+ Testcase: igt/kms_flip/flip-vs-panning-vs-hang Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/intel_display.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -9217,6 +9217,10 @@ static bool page_flip_finished(struct in struct drm_device *dev = crtc->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; + if (i915_reset_in_progress(&dev_priv->gpu_error) || + crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) + return true; + /* * The relevant registers doen't exist on pre-ctg. * As the flip done interrupt doesn't trigger for mmio Patches currently in stable-queue which might be from ville.syrjala@xxxxxxxxxxxxxxx are queue-3.17/drm-i915-ignore-surflive-and-flip-counter-when-the-gpu-gets-reset.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html