On Fri, Mar 31, 2017 at 04:38:57PM -0400, Joe Korty wrote: > scsi: mpt3sas: fix hang on ata passthrough commands > > commit 16236802bfecb1082144a48b7d6fa60997824662 upstream, in v4.9 in linux-stable. > commit ffb58456589443ca572221fabbdef3db8483a779 upstream, in master. > > Please backport the above mentioned v4.9 version of the commit into > v4.4. It fixes a 'inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage' > bug introduced when two other mpt3sas patches were backported into > v4.4.28. Ok, now done. > In v4.4.28, a call to scsi_internal_device_unblock() was added > to the mpt3sas driver's interrupt level routine, but that service > expects to be called only from base level, so not all of its uses > of spin locks are protected from interrupts. Thus self deadlock > is possible. In this case, the 'spin_lock(&hctx->lock)' in > __blk_mq_run_hw_queue() is the immediate cause of this lockdep > assertion. This happens on the first use of the mpt3sas driver. > > [ 28.340336] ================================= > [ 28.344799] [ INFO: inconsistent lock state ] > [ 28.349229] 4.4.53 #2 Not tainted > [ 28.352566] --------------------------------- > [ 28.357004] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. > [ 28.363019] swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: > [ 28.368202] (&(&hctx->lock)->rlock){?.+...}, at: [<ffffffff815349a2>] __blk_mq_run_hw_queue+0x172/0x3b0 > [ 28.377872] {HARDIRQ-ON-W} state was registered at: > [ 28.382829] [<ffffffff810cdf34>] __lock_acquire+0x8e4/0xe80 > [ 28.388612] [<ffffffff810ce5ae>] lock_acquire+0xde/0x310 > [ 28.390151] [<ffffffff8203094b>] _raw_spin_lock+0x3b/0x50 > [ 28.390154] [<ffffffff81534a76>] __blk_mq_run_hw_queue+0x246/0x3b0 > [ 28.390157] [<ffffffff81535345>] blk_mq_run_hw_queue+0x65/0xf0 > [ 28.390159] [<ffffffff815357ad>] blk_sq_make_request+0x24d/0x740 > [ 28.390163] [<ffffffff81529bca>] generic_make_request+0xfa/0x190 > [ 28.390166] [<ffffffff81529cdf>] submit_bio+0x7f/0x160 > [ 28.390172] [<ffffffff8126286e>] submit_bh_wbc+0x13e/0x180 > [ 28.390175] [<ffffffff812628c2>] submit_bh+0x12/0x20 > [ 28.390179] [<ffffffff812c837c>] __ext4_get_inode_loc+0x21c/0x590 > [ 28.390181] [<ffffffff812c8fa8>] ext4_iget+0x88/0xc30 > [ 28.390183] [<ffffffff812f14f5>] ext4_fill_super+0x1cc5/0x3660 > [ 28.390187] [<ffffffff81226cc5>] mount_bdev+0x1b5/0x200 > [ 28.390190] [<ffffffff812e9985>] ext4_mount+0x15/0x20 > [ 28.390193] [<ffffffff81226883>] mount_fs+0x43/0x170 > [ 28.390196] [<ffffffff81249ac6>] vfs_kern_mount+0x76/0x160 > [ 28.390198] [<ffffffff8124a313>] do_mount+0x263/0xf40 > [ 28.390200] [<ffffffff8124b06b>] SyS_mount+0x7b/0xc0 > [ 28.390204] [<ffffffff82bdc56e>] do_mount_root+0x1e/0x97 > [ 28.390206] [<ffffffff82bdc82e>] mount_block_root+0x10f/0x24b > [ 28.390208] [<ffffffff82bdca60>] mount_root+0xf6/0x101 > [ 28.390210] [<ffffffff82bdcbdb>] prepare_namespace+0x170/0x1a9 > [ 28.390213] [<ffffffff82bdbbf0>] kernel_init_freeable+0x254/0x26b > [ 28.390215] [<ffffffff8202816e>] kernel_init+0xe/0xe0 > [ 28.390218] [<ffffffff82031a1f>] ret_from_fork+0x3f/0x70 > [ 28.390219] irq event stamp: 482812 > [ 28.390223] hardirqs last enabled at (482809): [<ffffffff8101202c>] default_idle+0x2c/0x240 > [ 28.390226] hardirqs last disabled at (482810): [<ffffffff82032187>] common_interrupt+0x87/0x8c > [ 28.390229] softirqs last enabled at (482812): [<ffffffff81073261>] _local_bh_enable+0x21/0x50 > [ 28.390231] softirqs last disabled at (482811): [<ffffffff8107349b>] irq_enter+0x4b/0x70 > [ 28.390232] > other info that might help us debug this: > [ 28.390233] Possible unsafe locking scenario: > > [ 28.390233] CPU0 > [ 28.390234] ---- > [ 28.390235] lock(&(&hctx->lock)->rlock); > [ 28.390236] <Interrupt> > [ 28.390237] lock(&(&hctx->lock)->rlock); > [ 28.390238] > *** DEADLOCK *** > > [ 28.390238] no locks held by swapper/0/0. > [ 28.390239] > stack backtrace: > [ 28.390241] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.53 #2 > [ 28.390242] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.0b 02/01/2013 > [ 28.390246] 0000000000000000 ffff88021fc03858 ffffffff8155ba95 0000000000000001 > [ 28.390249] 0000000000000003 ffffffff82a17500 ffffffff83200800 ffff88021fc038a8 > [ 28.390252] ffffffff810c9cdf 0000000000000000 ffffffff00000000 0000000000000001 > [ 28.390253] Call Trace: > [ 28.390257] <IRQ> [<ffffffff8155ba95>] dump_stack+0x89/0xd4 > [ 28.390260] [<ffffffff810c9cdf>] print_usage_bug+0x23f/0x300 > [ 28.390263] [<ffffffff810ca11d>] mark_lock+0x37d/0x690 > [ 28.390266] [<ffffffff810c89ad>] ? trace_hardirqs_off+0xd/0x10 > [ 28.390268] [<ffffffff810cdfbe>] __lock_acquire+0x96e/0xe80 > [ 28.390272] [<ffffffff8158ffaf>] ? check_unmap+0x3df/0x970 > [ 28.390275] [<ffffffff81561266>] ? radix_tree_delete_item+0xb6/0x110 > [ 28.390278] [<ffffffff810ce5ae>] lock_acquire+0xde/0x310 > [ 28.390281] [<ffffffff815349a2>] ? __blk_mq_run_hw_queue+0x172/0x3b0 > [ 28.390284] [<ffffffff8203094b>] _raw_spin_lock+0x3b/0x50 > [ 28.390286] [<ffffffff815349a2>] ? __blk_mq_run_hw_queue+0x172/0x3b0 > [ 28.390288] [<ffffffff815349a2>] __blk_mq_run_hw_queue+0x172/0x3b0 > [ 28.390293] [<ffffffff8192e038>] ? _scsih_io_done+0x48/0xa60 > [ 28.390296] [<ffffffff81535345>] blk_mq_run_hw_queue+0x65/0xf0 > [ 28.390298] [<ffffffff810cdcb6>] ? __lock_acquire+0x666/0xe80 > [ 28.390301] [<ffffffff815364f3>] blk_mq_start_stopped_hw_queues+0x63/0x80 > [ 28.390304] [<ffffffff81723a2b>] scsi_internal_device_unblock+0x4b/0xa0 > [ 28.390307] [<ffffffff8192e105>] _scsih_io_done+0x115/0xa60 > [ 28.390310] [<ffffffff810cdcb6>] ? __lock_acquire+0x666/0xe80 > [ 28.390313] [<ffffffff819234b8>] _base_interrupt+0x1e8/0xb90 > [ 28.390317] [<ffffffff8157a617>] ? debug_smp_processor_id+0x17/0x20 > [ 28.390320] [<ffffffff810e4585>] ? __rcu_is_watching+0x15/0x30 > [ 28.390323] [<ffffffff810d95c4>] handle_irq_event_percpu+0xb4/0x530 > [ 28.390325] [<ffffffff810de0fb>] ? handle_edge_irq+0x2b/0x150 > [ 28.390327] [<ffffffff810d9a7f>] ? handle_irq_event+0x3f/0x70 > [ 28.390330] [<ffffffff810d9a87>] handle_irq_event+0x47/0x70 > [ 28.390332] [<ffffffff810de1ae>] handle_edge_irq+0xde/0x150 > [ 28.390335] [<ffffffff8100951a>] handle_irq+0x7a/0x190 > [ 28.390338] [<ffffffff8157a617>] ? debug_smp_processor_id+0x17/0x20 > [ 28.390340] [<ffffffff810e4585>] ? __rcu_is_watching+0x15/0x30 > [ 28.390342] [<ffffffff8203403e>] do_IRQ+0x7e/0x150 > [ 28.390345] [<ffffffff8203218c>] common_interrupt+0x8c/0x8c > [ 28.390349] <EOI> [<ffffffff81055136>] ? native_safe_halt+0x6/0x10 > [ 28.390351] [<ffffffff810ca86d>] ? trace_hardirqs_on+0xd/0x10 > [ 28.390353] [<ffffffff81012031>] default_idle+0x31/0x240 > [ 28.390356] [<ffffffff810e6600>] ? rcu_eqs_enter_common+0xb0/0x140 > [ 28.390358] [<ffffffff81011a6f>] arch_cpu_idle+0xf/0x20 > [ 28.390360] [<ffffffff810c021e>] default_idle_call+0x2e/0x50 > [ 28.390362] [<ffffffff810c046b>] cpu_startup_entry+0x22b/0x570 > [ 28.390365] [<ffffffff8109f591>] ? get_parent_ip+0x11/0x50 > [ 28.390367] [<ffffffff8109f591>] ? get_parent_ip+0x11/0x50 > [ 28.390370] [<ffffffff820280f0>] rest_init+0xf0/0x160 > [ 28.390372] [<ffffffff82028000>] ? csum_partial_copy_generic+0x170/0x170 > [ 28.390375] [<ffffffff82c049f8>] ? ftrace_init+0xc9/0x15c > [ 28.390377] [<ffffffff82bdc38c>] start_kernel+0x4e7/0x4f4 > [ 28.390380] [<ffffffff82bdbcc1>] ? set_init_arg+0x5f/0x5f > [ 28.390382] [<ffffffff82bdb117>] ? early_idt_handler_array+0x117/0x120 > [ 28.390385] [<ffffffff82bdb5df>] x86_64_start_reservations+0x2a/0x2c > [ 28.390387] [<ffffffff82bdb77d>] x86_64_start_kernel+0x19c/0x1ab > > PS: This follows the form of 'Option 3' in Documentation/stable_kernel_rules.txt > PPS: The original authors of this patch should review and ack before it is accepted. > > Signed-off-by: Joe Korty <joe.korty@xxxxxxxx> I don't understand, you only need/want one of these patches in 4.4, right? thanks, greg k-h