On Wed, Sep 04, 2013 at 05:36:14PM +0200, Daniel Vetter wrote: > Since we've started to clean up pending flips when the gpu hangs in > > commit 96a02917a0131e52efefde49c2784c0421d6c439 > Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Date: Mon Feb 18 19:08:49 2013 +0200 > > drm/i915: Finish page flips and update primary planes after a GPU reset > > the gpu reset work now also grabs modeset locks. But since since work > items on our private work queue are not allowed to do that due to the > flush_workqueue from the pageflip code this results in a neat > deadlock: > > INFO: task kms_flip:14676 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kms_flip D ffff88019283a5c0 0 14676 13344 0x00000004 > ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8 > ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000 > ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898 > Call Trace: > [<ffffffff8138ee7b>] schedule+0x60/0x62 > [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915] > [<ffffffff81050ff4>] ? finish_wait+0x60/0x60 > [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915] > [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm] > [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm] > [<ffffffff810e44da>] ? might_fault+0x38/0x86 > [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm] > [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf > [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm] > [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d > [<ffffffff81117f33>] vfs_ioctl+0x18/0x34 > [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454 > [<ffffffff81396b37>] ? sysret_check+0x1b/0x56 > [<ffffffff81118886>] SyS_ioctl+0x52/0x7d > [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b > 2 locks held by kms_flip/14676: > #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm] > #1: (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm] > INFO: task kworker/u8:4:175 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u8:4 D ffff88018de9a5c0 0 175 2 0x00000000 > Workqueue: i915 i915_error_work_func [i915] > ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8 > ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018 > 0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0 > Call Trace: > [<ffffffff8138ee7b>] schedule+0x60/0x62 > [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb > [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1 > [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915] > [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915] > [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915] > [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915] > [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a > [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a > [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0 > [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275 > [<ffffffff8105076d>] kthread+0xac/0xb4 > [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0 > [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60 > [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0 > [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60 > 3 locks held by kworker/u8:4/175: > #0: (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a > #1: ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a > #2: (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915] > > This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible > on one of my older machines. > > Unfortunately (despite the proper lockdep annotations for > flush_workqueue) lockdep still doesn't detect this correctly, so we > need to rely on change to discover these bugs. > > Apply the usual bugfix and schedule the reset work on the system > workqueue to keep our own driver workqueue free of any modeset lock > grabbing. > > Note that this is not a terribly serious regression since before the > offending commit we'd simply have stalled userspace forever due to > failing to abort all outstanding pageflips. > > v2: Add a comment as requested by Chris. > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx> Picked up for -fixes with a bit of the spelling fail rectified and Chris' irc r-b added (he still suffers from overtly hungry mail eating demons in his cellar ...). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html