On 2021/03/12 16:59, Johannes Thumshirn wrote: > On 12/03/2021 08:27, Damien Le Moal wrote: >> On 2021/03/12 13:38, Shinichiro Kawasaki wrote: >>> On Mar 11, 2021 / 15:54, Johannes Thumshirn wrote: >>>> On 11/03/2021 16:48, Bart Van Assche wrote: >>>>> On 3/11/21 7:18 AM, Johannes Thumshirn wrote: >>>>>> On 11/03/2021 16:13, Bart Van Assche wrote: >>>>>>> On 3/10/21 1:48 AM, Johannes Thumshirn wrote: >>>>>>>> Recent changes [ ... ] >>>>>>> >>>>>>> Please add Fixes: and/or Cc: stable tags as appropriate. >>>>>> >>>>>> I couldn't pin down the offending commit and I can't reproduce it locally >>>>>> as well, so I opted out of this. But it must be something between v5.11 and v5.12-rc2. >>>>> >>>>> That's weird. Did Shinichiro use a HBA? Could this be the result of a >>>>> behavior change in the HBA driver? >>>> >>>> Yes I've looked at the commits in mpt3sas, but can't really pinpoint the >>>> offending commit TBH. 664f0dce2058 ("scsi: mpt3sas: Add support for shared >>>> host tagset for CPU hotplug") is the only one that /looks/ as if it could >>>> be causing it, but I don't know mpt3sas well enough. >>>> >>>> FWIW added Sreekanth >>> >>> The WARNING was found in kernel v5.12-rc2 test with a SAS SMR drive and HBA >>> Broadcom 9400. It can be recreated by running blktests block/004 on the drive >>> (after reboot). It is also recreated with SATA SMR drive with the HBA, but not >>> observed with SATA drives connected to AHCI. >>> >>> I reverted the commit 664f0dce2058, then the WARNING disappeared. I suppose >>> it indicates that the commit changed HBA driver behavior. >> >> Can you send the warning splat with backtrace ? >> > > The warning splat is in the commit message: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc2+ #2 > Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015 > RIP: 0010:__local_bh_disable_ip+0x3f/0x50 > RSP: 0018:ffff8883e1409ba8 EFLAGS: 00010006 > RAX: 0000000080010001 RBX: 0000000000000001 RCX: 0000000000000013 > RDX: ffff888129e4d200 RSI: 0000000000000201 RDI: ffffffff915b9dbd > RBP: ffff888113e9a540 R08: ffff888113e9a540 R09: 00000000000077f0 > R10: 0000000000080000 R11: 0000000000000001 R12: ffff888129e4d200 > R13: 0000000000001000 R14: 00000000000077f0 R15: ffff888129e4d218 > FS: 0000000000000000(0000) GS:ffff8883e1400000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2f8418ebc0 CR3: 000000021202a006 CR4: 00000000001706f0 > Call Trace: > <IRQ> > _raw_spin_lock_bh+0x18/0x40 > sd_zbc_complete+0x43d/0x1150 > sd_done+0x631/0x1040 > ? mark_lock+0xe4/0x2fd0 > ? provisioning_mode_store+0x3f0/0x3f0 > scsi_finish_command+0x31b/0x5c0 > _scsih_io_done+0x960/0x29e0 [mpt3sas] > ? mpt3sas_scsih_scsi_lookup_get+0x1c7/0x340 [mpt3sas] > ? __lock_acquire+0x166b/0x58b0 > ? _get_st_from_smid+0x4a/0x80 [mpt3sas] > _base_process_reply_queue+0x23f/0x26e0 [mpt3sas] > ? lock_is_held_type+0x98/0x110 > ? find_held_lock+0x2c/0x110 > ? mpt3sas_base_sync_reply_irqs+0x360/0x360 [mpt3sas] > _base_interrupt+0x8d/0xd0 [mpt3sas] > ? rcu_read_lock_sched_held+0x3f/0x70 > __handle_irq_event_percpu+0x24d/0x600 > handle_irq_event+0xef/0x240 > ? handle_irq_event_percpu+0x110/0x110 > handle_edge_irq+0x1f6/0xb60 > __common_interrupt+0x75/0x160 > common_interrupt+0x7b/0xa0 > </IRQ> > asm_common_interrupt+0x1e/0x40 > Looking at patch 664f0dce2058, all that seems to be done is to enable nr_hw_queue > 1. I do not see any change of locking context or irq handling. >From the backtrace, it does not look like scsi_finish_command() is called from softirq... Probably a change in that area is responsible ? -- Damien Le Moal Western Digital Research