Hi all, I ran the latest blktests (git hash: 83781f257857) with the v6.12 kernel. Also, I checked CKI project runs for the kernel. I observed five failures below. Comparing with the previous report using the v6.12-rc1 kernel [1], two failures were resolved: nvme/014 and srp group. On the other hand, four new failures were observed. [1] https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/ List of failures ================ #1: nvme/031 (fc transport) #2: nvme/037 (fc transport) #3: nvme/041 (fc transport) #4: nvme/052 (loop transport) #5: throtl/001 (CKI project, s390 arch) Failure description =================== #1: nvme/031 (fc transport) With the trtype=fc configuration, nvme/031 fails due to KASAN slab-use-after-free [2]. An INFO message about lock confusion is sometimes printed before the KASAN message. This failure was not observed before. The trigger change is not yet identified. Further debug is needed. #2: nvme/037 (fc transport) With the trtype=fc configuration, nvme/037 fails: nvme/037 => nvme0n1 (tr=fc) (test deletion of NVMeOF passthru controllers immediately after setup) [failed] runtime 5.569s ... runtime 5.569s ... 5.543s --- tests/nvme/037.out 2024-11-05 17:04:40.576903661 +0900 +++ /home/shin/Blktests/blktests/results/nvme0n1_tr_fc/nvme/037.out.bad 2024-11-23 16:31:13.580069487 +0900 @@ -1,2 +1,3 @@ Running nvme/037 +FAIL: Failed to find passthru target namespace Test complete This failure was found during the preparation for ANA test cases [3]. The failure disappears when the test case is modified to add short waits after nvme disconnect and target cleanup. Further debug is needed. [3] https://lore.kernel.org/linux-nvme/2e4efaf9-d6cc-46b2-8783-d400f6e49829@flourine.local/ #3: nvme/041 (fc transport) With the trtype=fc configuration, nvme/041 fails. nvme/041 (Create authenticated connections) [failed] runtime 2.677s ... 4.823s --- tests/nvme/041.out 2023-11-29 12:57:17.206898664 +0900 +++ /home/shin/Blktests/blktests/results/nodev/nvme/041.out.bad 2024-03-19 14:50:56.399101323 +0900 @@ -2,5 +2,5 @@ Test unauthenticated connection (should fail) disconnected 0 controller(s) Test authenticated connection -disconnected 1 controller(s) +disconnected 0 controller(s) Test complete nvme/044 had same failure symptom until the kernel v6.9. A solution was suggested and discussed in Feb/2024 [4]. [4] https://lore.kernel.org/linux-nvme/20240221132404.6311-1-dwagner@xxxxxxx/ #4: nvme/052 (loop transport) The test case fails due to the "BUG: sleeping function called from invalid context" [5]. A fix candidate was posted which sets NVME_F_BLOCKING to loop transport, but it is not the best solution [6]. It is desired to have a better fix and the test case to confirm it. [5] https://lore.kernel.org/linux-nvme/tqcy3sveity7p56v7ywp7ssyviwcb3w4623cnxj3knoobfcanq@yxgt2mjkbkam/ [6] https://lore.kernel.org/linux-nvme/20241017172052.2603389-1-kbusch@xxxxxxxx/ #5: throtl/001 (CKI project, s390 arch) Recently, CKI project added a configuration to run the blktests thortl group, and failures have been repeatedly observed for s390 architecture [7]. I suspect the failure message below implies that system performance may affect the test result. Further debug is needed. throtl/001 (basic functionality) [failed] runtime ... 6.309s --- tests/throtl/001.out 2024-11-23 20:53:13.446546653 +0000 +++ /mnt/tests/s3.amazonaws.com/arr-cki-prod-lookaside/lookaside/kernel-tests-public/kernel-tests-production.zip/storage/blktests/throtl/blktests/results/nodev/throtl/001.out.bad 2024-11-23 20:53:21.332699696 +0000 @@ -1,6 +1,6 @@ Running throtl/001 +2 1 -1 -1 +2 1 ... (Run 'diff -u tests/throtl/001.out /mnt/tests/s3.amazonaws.com/arr-cki-prod-lookaside/lookaside/kernel-tests-public/kernel-tests-production.zip/storage/blktests/throtl/blktests/results/nodev/throtl/001.out.bad' to see the entire diff) [7] https://datawarehouse.cki-project.org/kcidb/tests?tree_filter=mainline&kernel_version_filter=6.12.0&test_filter=blktests&page=1 [2] kernel message during nvme/031 run with fc transport [ 41.909054] [ T996] run blktests nvme/031 at 2024-11-26 09:19:23 [ 41.989534] [ T1044] loop0: detected capacity change from 0 to 2097152 [ 42.057779] [ T1061] nvmet: adding nsid 1 to subsystem blktests-subsystem-0 [ 42.185556] [ T70] nvme nvme1: NVME-FC{0}: create association : host wwpn 0x20001100aa000001 rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 42.189922] [ T12] (NULL device *): Create Association LS failed: Association Allocation Failed [ 42.191731] [ T70] (NULL device *): queue 0 connect admin queue failed (-6). [ 42.192357] [ T70] nvme nvme1: NVME-FC{0}: reset: Reconnect attempt failed (-6) [ 42.193111] [ T70] nvme nvme1: NVME-FC{0}: Reconnect attempt in 2 seconds [ 42.194243] [ T1062] nvme nvme1: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:869798ae-feb0-47d7-b9be-0445cc28afbc [ 42.206255] [ T12] nvme nvme2: NVME-FC{1}: create association : host wwpn 0x20001100aa000001 rport wwpn 0x20001100ab000002: NQN "blktests-subsystem-0" [ 42.208544] [ T11] (NULL device *): {1:0} Association created [ 42.210139] [ T11] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-0 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 42.214131] [ T12] nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [ 42.218801] [ T12] nvme nvme2: NVME-FC{1}: controller connect complete [ 42.220739] [ T1085] nvme nvme2: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-0", hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [ 42.325043] [ T1101] nvme nvme2: Removing ctrl: NQN "blktests-subsystem-0" [ 42.333646] [ T12] nvme nvme3: NVME-FC{2}: create association : host wwpn 0x20001100aa000001 rport wwpn 0x20001100ab000002: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 42.335156] [ T47] (NULL device *): {1:1} Association created [ 42.336156] [ T47] nvmet: creating discovery controller 2 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:1ed3f3e5-20f1-4bb9-905f-a528d31f4b7c. [ 42.342550] [ T12] nvme nvme3: NVME-FC{2}: controller connect complete [ 42.343199] [ T1091] nvme nvme3: NVME-FC{2}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:1ed3f3e5-20f1-4bb9-905f-a528d31f4b7c [ 42.350091] [ T1091] nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 42.365624] [ T70] (NULL device *): {1:1} Association deleted [ 42.374017] [ T47] (NULL device *): {1:0} Association deleted [ 42.374600] [ T70] (NULL device *): {1:1} Association freed [ 42.376457] [ T65] (NULL device *): Disconnect LS failed: No Association [ 42.430613] [ T47] INFO: trying to register non-static key. [ 42.431210] [ T47] The code is fine but needs lockdep annotation, or maybe [ 42.431914] [ T47] you didn't initialize this object before use? [ 42.432559] [ T47] turning off the locking correctness validator. [ 42.433168] [ T47] CPU: 1 UID: 0 PID: 47 Comm: kworker/u16:2 Not tainted 6.12.0+ #363 [ 42.434053] [ T47] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 42.434999] [ T47] Workqueue: nvmet-wq nvmet_fc_delete_assoc_work [nvmet_fc] [ 42.435758] [ T47] Call Trace: [ 42.436082] [ T47] <TASK> [ 42.437591] [ T47] dump_stack_lvl+0x6a/0x90 [ 42.439258] [ T47] register_lock_class+0xe2a/0x10a0 [ 42.441000] [ T47] ? __lock_acquire+0xd1b/0x5f20 [ 42.442696] [ T47] ? __pfx_register_lock_class+0x10/0x10 [ 42.444449] [ T47] __lock_acquire+0x81e/0x5f20 [ 42.446132] [ T47] ? lock_is_held_type+0xd5/0x130 [ 42.447812] [ T47] ? find_held_lock+0x2d/0x110 [ 42.449416] [ T47] ? __pfx___lock_acquire+0x10/0x10 [ 42.451073] [ T47] ? lock_release+0x460/0x7a0 [ 42.452673] [ T47] ? __pfx_lock_release+0x10/0x10 [ 42.454386] [ T47] lock_acquire.part.0+0x12d/0x360 [ 42.455991] [ T47] ? xa_erase+0xd/0x30 [ 42.457444] [ T47] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 42.459054] [ T47] ? rcu_is_watching+0x11/0xb0 [ 42.460571] [ T47] ? trace_lock_acquire+0x12f/0x1a0 [ 42.462099] [ T47] ? __pfx___flush_work+0x10/0x10 [ 42.463630] [ T47] ? xa_erase+0xd/0x30 [ 42.465026] [ T47] ? lock_acquire+0x2d/0xc0 [ 42.466425] [ T47] ? xa_erase+0xd/0x30 [ 42.467778] [ T47] _raw_spin_lock+0x2f/0x40 [ 42.469125] [ T47] ? xa_erase+0xd/0x30 [ 42.470419] [ T47] xa_erase+0xd/0x30 [ 42.471663] [ T47] nvmet_ctrl_destroy_pr+0x10e/0x1c0 [nvmet] [ 42.473087] [ T47] ? __pfx_nvmet_ctrl_destroy_pr+0x10/0x10 [nvmet] [ 42.474572] [ T47] ? __pfx___might_resched+0x10/0x10 [ 42.475891] [ T47] nvmet_ctrl_free+0x2f0/0x830 [nvmet] [ 42.477200] [ T47] ? lockdep_hardirqs_on+0x78/0x100 [ 42.478482] [ T47] ? __cancel_work+0x166/0x230 [ 42.479678] [ T47] ? __pfx_nvmet_ctrl_free+0x10/0x10 [nvmet] [ 42.480970] [ T47] ? rcu_is_watching+0x11/0xb0 [ 42.482131] [ T47] ? kfree+0x13e/0x4a0 [ 42.483221] [ T47] ? lockdep_hardirqs_on+0x78/0x100 [ 42.484389] [ T47] nvmet_sq_destroy+0x1f2/0x3a0 [nvmet] [ 42.485595] [ T47] nvmet_fc_target_assoc_free+0x3a5/0x1fd0 [nvmet_fc] [ 42.486863] [ T47] ? __pfx_nvmet_fc_target_assoc_free+0x10/0x10 [nvmet_fc] [ 42.488169] [ T47] ? lock_is_held_type+0xd5/0x130 [ 42.489258] [ T47] nvmet_fc_delete_assoc_work+0xcc/0x2d0 [nvmet_fc] [ 42.490486] [ T47] process_one_work+0x85a/0x1460 [ 42.491550] [ T47] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 42.492678] [ T47] ? __pfx_process_one_work+0x10/0x10 [ 42.493786] [ T47] ? assign_work+0x16c/0x240 [ 42.494966] [ T47] ? lock_is_held_type+0xd5/0x130 [ 42.496025] [ T47] worker_thread+0x5e2/0xfc0 [ 42.497041] [ T47] ? __pfx_worker_thread+0x10/0x10 [ 42.498118] [ T47] kthread+0x2d1/0x3a0 [ 42.499088] [ T47] ? _raw_spin_unlock_irq+0x24/0x50 [ 42.500149] [ T47] ? __pfx_kthread+0x10/0x10 [ 42.501148] [ T47] ret_from_fork+0x30/0x70 [ 42.502124] [ T47] ? __pfx_kthread+0x10/0x10 [ 42.503115] [ T47] ret_from_fork_asm+0x1a/0x30 [ 42.504127] [ T47] </TASK> [ 42.505071] [ T47] (NULL device *): {1:0} Association freed [ 42.506183] [ T234] (NULL device *): Disconnect LS failed: No Association [ 42.548620] [ T1110] nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000002:pn-0x20001100ab000002 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found [ 42.555441] [ T1124] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 42.584147] [ T47] ================================================================== [ 42.585392] [ T47] BUG: KASAN: slab-use-after-free in nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.586789] [ T47] Read of size 8 at addr ffff88812229e890 by task kworker/u16:2/47 [ 42.588794] [ T47] CPU: 2 UID: 0 PID: 47 Comm: kworker/u16:2 Not tainted 6.12.0+ #363 [ 42.590019] [ T47] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 42.591405] [ T47] Workqueue: nvmet-wq fcloop_tgt_rscn_work [nvme_fcloop] [ 42.592607] [ T47] Call Trace: [ 42.593522] [ T47] <TASK> [ 42.594405] [ T47] dump_stack_lvl+0x6a/0x90 [ 42.595452] [ T47] ? nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.596611] [ T47] print_report+0x174/0x505 [ 42.597720] [ T47] ? nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.598895] [ T47] ? __virt_addr_valid+0x208/0x410 [ 42.599962] [ T47] ? nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.601133] [ T47] kasan_report+0xa7/0x180 [ 42.601139] [ T47] ? nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.601145] [ T47] nvme_fc_rescan_remoteport+0x3c/0x50 [nvme_fc] [ 42.601150] [ T47] fcloop_tgt_rscn_work+0x52/0x70 [nvme_fcloop] [ 42.601154] [ T47] process_one_work+0x85a/0x1460 [ 42.601160] [ T47] ? __pfx_process_one_work+0x10/0x10 [ 42.601165] [ T47] ? assign_work+0x16c/0x240 [ 42.601169] [ T47] worker_thread+0x5e2/0xfc0 [ 42.601174] [ T47] ? __pfx_worker_thread+0x10/0x10 [ 42.601176] [ T47] kthread+0x2d1/0x3a0 [ 42.601178] [ T47] ? _raw_spin_unlock_irq+0x24/0x50 [ 42.613300] [ T47] ? __pfx_kthread+0x10/0x10 [ 42.614351] [ T47] ret_from_fork+0x30/0x70 [ 42.615397] [ T47] ? __pfx_kthread+0x10/0x10 [ 42.616438] [ T47] ret_from_fork_asm+0x1a/0x30 [ 42.617521] [ T47] </TASK> [ 42.619281] [ T47] Allocated by task 1063: [ 42.620296] [ T47] kasan_save_stack+0x2c/0x50 [ 42.621322] [ T47] kasan_save_track+0x10/0x30 [ 42.622337] [ T47] __kasan_kmalloc+0xa6/0xb0 [ 42.623360] [ T47] __kmalloc_noprof+0x1c5/0x480 [ 42.624384] [ T47] nvme_fc_register_remoteport+0x27c/0x1330 [nvme_fc] [ 42.625559] [ T47] fcloop_create_remote_port+0x1c3/0x660 [nvme_fcloop] [ 42.626726] [ T47] kernfs_fop_write_iter+0x39e/0x5a0 [ 42.627787] [ T47] vfs_write+0x5f9/0xe90 [ 42.628764] [ T47] ksys_write+0xf7/0x1d0 [ 42.629735] [ T47] do_syscall_64+0x93/0x180 [ 42.630428] [ T12] nvme nvme2: NVME-FC{1}: create association : host wwpn 0x20001100aa000001 rport wwpn 0x20001100ab000002: NQN "blktests-subsystem-1" [ 42.630738] [ T47] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 42.630743] [ T47] Freed by task 996: [ 42.630745] [ T47] kasan_save_stack+0x2c/0x50 [ 42.630747] [ T47] kasan_save_track+0x10/0x30 [ 42.631946] [ T70] (NULL device *): Create Association LS failed: Association Allocation Failed [ 42.632275] [ T47] kasan_save_free_info+0x37/0x70 [ 42.632496] [ T12] (NULL device *): queue 0 connect admin queue failed (-6). [ 42.632733] [ T47] __kasan_slab_free+0x4b/0x70 [ 42.633113] [ T12] nvme nvme2: NVME-FC{1}: reset: Reconnect attempt failed (-6) [ 42.633428] [ T47] kfree+0x13e/0x4a0 [ 42.634178] [ T12] nvme nvme2: NVME-FC{1}: Reconnect attempt in 2 seconds [ 42.634527] [ T47] nvme_fc_free_rport+0x238/0x370 [nvme_fc] [ 42.634533] [ T47] nvme_fc_unregister_remoteport+0x365/0x470 [nvme_fc] [ 42.635137] [ T1147] nvme nvme2: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1", hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 [ 42.635436] [ T47] fcloop_delete_remote_port+0x324/0x4c0 [nvme_fcloop] [ 42.651451] [ T47] kernfs_fop_write_iter+0x39e/0x5a0 [ 42.652539] [ T47] vfs_write+0x5f9/0xe90 [ 42.653528] [ T47] ksys_write+0xf7/0x1d0 [ 42.654509] [ T47] do_syscall_64+0x93/0x180 [ 42.655529] [ T47] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 42.657461] [ T47] Last potentially related work creation: [ 42.658555] [ T47] kasan_save_stack+0x2c/0x50 [ 42.659652] [ T47] __kasan_record_aux_stack+0xad/0xc0 [ 42.660704] [ T47] insert_work+0x2d/0x2d0 [ 42.661681] [ T47] __queue_work+0x6b4/0xc90 [ 42.662674] [ T47] queue_work_on+0x73/0xa0 [ 42.663684] [ T47] fcloop_h2t_xmt_ls_rsp+0x295/0x390 [nvme_fcloop] [ 42.664823] [ T47] nvmet_fc_xmt_ls_rsp+0xe1/0x1a0 [nvmet_fc] [ 42.665929] [ T47] nvmet_fc_target_assoc_free+0x57c/0x1fd0 [nvmet_fc] [ 42.667110] [ T47] nvmet_fc_delete_assoc_work+0xcc/0x2d0 [nvmet_fc] [ 42.668288] [ T47] process_one_work+0x85a/0x1460 [ 42.669356] [ T47] worker_thread+0x5e2/0xfc0 [ 42.670398] [ T47] kthread+0x2d1/0x3a0 [ 42.671419] [ T47] ret_from_fork+0x30/0x70 [ 42.672452] [ T47] ret_from_fork_asm+0x1a/0x30 [ 42.674366] [ T47] Second to last potentially related work creation: [ 42.675576] [ T47] kasan_save_stack+0x2c/0x50 [ 42.676620] [ T47] __kasan_record_aux_stack+0xad/0xc0 [ 42.677706] [ T47] insert_work+0x2d/0x2d0 [ 42.678727] [ T47] __queue_work+0x6b4/0xc90 [ 42.679759] [ T47] queue_work_on+0x73/0xa0 [ 42.680764] [ T47] nvme_fc_rcv_ls_req+0x729/0xb20 [nvme_fc] [ 42.681877] [ T47] nvmet_fc_target_assoc_free+0x13c8/0x1fd0 [nvmet_fc] [ 42.683080] [ T47] nvmet_fc_delete_assoc_work+0xcc/0x2d0 [nvmet_fc] [ 42.684276] [ T47] process_one_work+0x85a/0x1460 [ 42.685330] [ T47] worker_thread+0x5e2/0xfc0 [ 42.686336] [ T47] kthread+0x2d1/0x3a0 [ 42.687314] [ T47] ret_from_fork+0x30/0x70 [ 42.688307] [ T47] ret_from_fork_asm+0x1a/0x30 [ 42.690162] [ T47] The buggy address belongs to the object at ffff88812229e800 which belongs to the cache kmalloc-512 of size 512 [ 42.692494] [ T47] The buggy address is located 144 bytes inside of freed 512-byte region [ffff88812229e800, ffff88812229ea00) [ 42.695571] [ T47] The buggy address belongs to the physical page: [ 42.696659] [ T47] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12229c [ 42.697938] [ T47] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 42.699223] [ T47] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ 42.700522] [ T47] page_type: f5(slab) [ 42.701441] [ T47] raw: 0017ffffc0000040 ffff888100042c80 0000000000000000 dead000000000001 [ 42.702733] [ T47] raw: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 [ 42.703966] [ T47] head: 0017ffffc0000040 ffff888100042c80 0000000000000000 dead000000000001 [ 42.705208] [ T47] head: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 [ 42.706450] [ T47] head: 0017ffffc0000002 ffffea000488a701 ffffffffffffffff 0000000000000000 [ 42.707727] [ T47] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 42.708960] [ T47] page dumped because: kasan: bad access detected [ 42.710842] [ T47] Memory state around the buggy address: [ 42.711870] [ T47] ffff88812229e780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 42.713076] [ T47] ffff88812229e800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.714298] [ T47] >ffff88812229e880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.715540] [ T47] ^ [ 42.716501] [ T47] ffff88812229e900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.717713] [ T47] ffff88812229e980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.718939] [ T47] ================================================================== [ 42.779285] [ T12] nvme nvme3: NVME-FC{2}: create association : host wwpn 0x20001100aa000001 rport wwpn 0x20001100ab000002: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 42.782208] [ T47] (NULL device *): Create Association LS failed: Association Allocation Failed [ 42.784050] [ T12] (NULL device *): queue 0 connect admin queue failed (-6). ...