This is a note to let you know that I've just added the patch titled drm/i915: fix gpu hang vs. flip stall deadlocks to the 3.10-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-i915-fix-gpu-hang-vs.-flip-stall-deadlocks.patch and it can be found in the queue-3.10 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 122f46badaafbe651f05c2c0f24cadee692f761b Mon Sep 17 00:00:00 2001 From: Daniel Vetter <daniel.vetter@xxxxxxxx> Date: Wed, 4 Sep 2013 17:36:14 +0200 Subject: drm/i915: fix gpu hang vs. flip stall deadlocks From: Daniel Vetter <daniel.vetter@xxxxxxxx> commit 122f46badaafbe651f05c2c0f24cadee692f761b upstream. Since we've started to clean up pending flips when the gpu hangs in commit 96a02917a0131e52efefde49c2784c0421d6c439 Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> Date: Mon Feb 18 19:08:49 2013 +0200 drm/i915: Finish page flips and update primary planes after a GPU reset the gpu reset work now also grabs modeset locks. But since work items on our private work queue are not allowed to do that due to the flush_workqueue from the pageflip code this results in a neat deadlock: INFO: task kms_flip:14676 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kms_flip D ffff88019283a5c0 0 14676 13344 0x00000004 ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8 ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000 ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898 Call Trace: [<ffffffff8138ee7b>] schedule+0x60/0x62 [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915] [<ffffffff81050ff4>] ? finish_wait+0x60/0x60 [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915] [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm] [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm] [<ffffffff810e44da>] ? might_fault+0x38/0x86 [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm] [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm] [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d [<ffffffff81117f33>] vfs_ioctl+0x18/0x34 [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454 [<ffffffff81396b37>] ? sysret_check+0x1b/0x56 [<ffffffff81118886>] SyS_ioctl+0x52/0x7d [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b 2 locks held by kms_flip/14676: #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm] #1: (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm] INFO: task kworker/u8:4:175 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u8:4 D ffff88018de9a5c0 0 175 2 0x00000000 Workqueue: i915 i915_error_work_func [i915] ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8 ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018 0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0 Call Trace: [<ffffffff8138ee7b>] schedule+0x60/0x62 [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1 [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915] [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915] [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915] [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915] [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0 [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275 [<ffffffff8105076d>] kthread+0xac/0xb4 [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0 [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60 [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0 [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60 3 locks held by kworker/u8:4/175: #0: (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a #1: ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a #2: (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915] This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible on one of my older machines. Unfortunately (despite the proper lockdep annotations for flush_workqueue) lockdep still doesn't detect this correctly, so we need to rely on chance to discover these bugs. Apply the usual bugfix and schedule the reset work on the system workqueue to keep our own driver workqueue free of any modeset lock grabbing. Note that this is not a terribly serious regression since before the offending commit we'd simply have stalled userspace forever due to failing to abort all outstanding pageflips. v2: Add a comment as requested by Chris. Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_irq.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1727,7 +1727,13 @@ void i915_handle_error(struct drm_device wake_up_all(&ring->irq_queue); } - queue_work(dev_priv->wq, &dev_priv->gpu_error.work); + /* + * Our reset work can grab modeset locks (since it needs to reset the + * state of outstanding pagelips). Hence it must not be run on our own + * dev-priv->wq work queue for otherwise the flush_work in the pageflip + * code will deadlock. + */ + schedule_work(&dev_priv->gpu_error.work); } static void __always_unused i915_pageflip_stall_check(struct drm_device *dev, int pipe) Patches currently in stable-queue which might be from daniel.vetter@xxxxxxxx are queue-3.10/drm-i915-fix-wait_for_pending_flips-vs-gpu-hang-deadlock.patch queue-3.10/drm-i915-fix-gpu-hang-vs.-flip-stall-deadlocks.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html