On 02/01/2017 08:46 AM, Bart Van Assche wrote: > On Tue, 2017-01-31 at 22:38 -0800, Jens Axboe wrote: >> I think this patch: >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.11/block&id=12d70958a2e8d587acaa51dafd5d6620e00b7543 >> >> should fix it for you. I just ran into the same thing tonight, testing >> an unrelated thing. It's the only reason that state should be 0x4 for >> you, so it has the same finger print. >> >> The patch has been merged into for-next. > > Hello Jens, > > Thanks for having looked into this. However, after having pulled the latest > block for-next tree (dbb85b06229f) another lockup was triggered soon (02-sq > is the name of a shell script of the srp-test suite): > > [ 243.021265] sysrq: SysRq : Show Blocked State > [ 243.021301] task PC stack pid father > [ 243.022909] 02-sq D 0 10864 10509 0x00000000 > [ 243.022933] Call Trace: > [ 243.022956] __schedule+0x2da/0xb00 > [ 243.022979] schedule+0x38/0x90 > [ 243.023002] blk_mq_freeze_queue_wait+0x51/0xa0 > [ 243.023025] ? remove_wait_queue+0x70/0x70 > [ 243.023047] blk_mq_freeze_queue+0x15/0x20 > [ 243.023070] elevator_switch+0x24/0x220 > [ 243.023093] __elevator_change+0xd3/0x110 > [ 243.023115] elv_iosched_store+0x21/0x60 > [ 243.023140] queue_attr_store+0x54/0x90 > [ 243.023164] sysfs_kf_write+0x40/0x50 > [ 243.023188] kernfs_fop_write+0x137/0x1c0 > [ 243.023214] __vfs_write+0x23/0x140 > [ 243.023242] ? rcu_read_lock_sched_held+0x45/0x80 > [ 243.023265] ? rcu_sync_lockdep_assert+0x2a/0x50 > [ 243.023287] ? __sb_start_write+0xde/0x200 > [ 243.023308] ? vfs_write+0x190/0x1e0 > [ 243.023329] vfs_write+0xc3/0x1e0 > [ 243.023351] SyS_write+0x44/0xa0 > [ 243.023373] entry_SYSCALL_64_fastpath+0x18/0xad So that's changing the elevator - did this happen while heavy IO was going to the drive, or was it idle? > My attempt to query the state of the blk-mq queues triggered the > following hang: > > [ 243.023555] grep D 0 11010 11008 0x00000000 > [ 243.023578] Call Trace: > [ 243.023599] __schedule+0x2da/0xb00 > [ 243.023619] schedule+0x38/0x90 > [ 243.023640] schedule_preempt_disabled+0x10/0x20 > [ 243.023662] mutex_lock_nested+0x23a/0x650 > [ 243.023683] ? hctx_tags_show+0x2c/0x60 > [ 243.023703] hctx_tags_show+0x2c/0x60 > [ 243.023725] seq_read+0xf2/0x3d0 > [ 243.023746] ? full_proxy_poll+0xb0/0xb0 > [ 243.023776] full_proxy_read+0x83/0xb0 > [ 243.023798] ? full_proxy_poll+0xb0/0xb0 > [ 243.023821] __vfs_read+0x23/0x130 > [ 243.023843] vfs_read+0xa3/0x170 > [ 243.023865] SyS_read+0x44/0xa0 > [ 243.023888] entry_SYSCALL_64_fastpath+0x18/0xad That's because the previous elevator switch is stalled in sysfs, and we grab the queue sysfs lock for any of the show/store functions. So if one hangs, all of them will... -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html