On 07/04/2023 05:18, kernel test robot wrote:
Hello, kernel test robot noticed "BUG_sdebug_queued_cmd(Tainted:G_S):Objects_remaining_in_sdebug_queued_cmd_on__kmem_cache_shutdown()" on: commit: f28c8a7d0f7a705395439889a52b09e2b61ea422 ("[PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd") url:https://github.com/intel-lab-lkp/linux/commits/John-Garry/scsi-scsi_debug-Fix-check-for-sdev-queue-full/20230327-154448 base:https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git for-next patch link:https://lore.kernel.org/all/20230327074310.1862889-7-john.g.garry@xxxxxxxxxx/ patch subject: [PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd in testcase: blktests version: blktests-x86_64-676d42c-1_20230323 with following parameters: disk: 1HDD test: scsi-group-00 compiler: gcc-11 test machine: 16 threads 1 sockets Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (Broadwell-DE) with 48G memory (please refer to attached dmesg/kmsg for entire log/backtrace)
I don't know how I missed this. Maybe it's because running blktests with buildroot initrd is not streamlined.
Anyway, the issue is that we don't properly abort the scsi cmnd in scsi_debug_device_reset() after the scsi cmnd timeouts for the 2nd time.
We get away with this in the previous code as all active IOs are terminated when the in scsi_debug_exit() -> stop_all_queued(), which was not the right thing to do.
I suppose scsi_debug_device_reset() should abort all IO for that sdev (which it doesn't do) - I'll look to make that change.
Thanks, John
If you fix the issue, kindly add following tag | Reported-by: kernel test robot<yujie.liu@xxxxxxxxx> | Link:https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@xxxxxxxxx [ 101.910746][ T7924] scsi host6: waking up host to restart [ 101.910751][ T7924] scsi host6: scsi_eh_6: sleeping [ 101.976012][ T203] Buffer I/O error on dev sdc, logical block 2032, async page read [ 102.135530][ T8020] sd 6:0:0:0: [sdc] Synchronizing SCSI cache [ 102.312331][ T8020] ============================================================================= [ 102.322321][ T8020] BUG sdebug_queued_cmd (Tainted: G S ): Objects remaining in sdebug_queued_cmd on __kmem_cache_shutdown() [ 102.336810][ T8020] ----------------------------------------------------------------------------- [ 102.336810][ T8020] [ 102.349880][ T8020] Slab 0x0000000013ac9b84 objects=32 used=1 fp=0x00000000a6dc3cb1 flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 102.365549][ T8020] CPU: 4 PID: 8020 Comm: modprobe Tainted: G S 6.3.0-rc1-00188-gf28c8a7d0f7a #1 [ 102.376919][ T8020] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016 [ 102.386904][ T8020] Call Trace: [ 102.391151][ T8020] <TASK> [ 102.395042][ T8020] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 102.400503][ T8020] slab_err (mm/slub.c:995) [ 102.405432][ T8020] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169) [ 102.411316][ T8020] ? start_poll_synchronize_srcu (kernel/rcu/srcutree.c:1306) [ 102.418070][ T8020] __kmem_cache_shutdown (include/linux/spinlock.h:350 mm/slub.c:4555 mm/slub.c:4586 mm/slub.c:4618) [ 102.424308][ T8020] kmem_cache_destroy (mm/slab_common.c:457 mm/slab_common.c:497 mm/slab_common.c:480) [ 102.430196][ T8020] scsi_debug_exit (drivers/scsi/scsi_debug.c:7807) scsi_debug [ 102.436885][ T8020] __do_sys_delete_module+0x2ea/0x530 [ 102.444259][ T8020] ? module_flags (kernel/module/main.c:694) [ 102.449892][ T8020] ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:227 include/linux/atomic/atomic-instrumented.h:35 fs/file.c:1015) [ 102.455439][ T8020] ? __blkcg_punt_bio_submit (block/blk-cgroup.c:1840) [ 102.462034][ T8020] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 102.467667][ T8020] ? exit_to_user_mode_loop (include/linux/sched.h:2326 include/linux/resume_user_mode.h:61 kernel/entry/common.c:171) [ 102.474080][ T8020] ? exit_to_user_mode_prepare (kernel/entry/common.c:203) [ 102.480660][ T8020] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 102.486014][ T8020] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 102.492844][ T8020] RIP: 0033:0x7f4dddaaa417 [ 102.498191][ T8020] Code: 73 01 c3 48 8b 0d 79 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 1a 0d 00 f7 d8 64 89 01 48 All code