On 2/6/21 20:51, Yi Zhang wrote: > The issue was introduced after merge the NVMe updates > > commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD) > Merge: 44d10e4b2f2c 0d7389718c32 > Author: Jens Axboe <axboe@xxxxxxxxx> > Date: Tue Feb 2 07:12:06 2021 -0700 > > Merge branch 'for-5.12/drivers' into for-next > > * for-5.12/drivers: (22 commits) > nvme-tcp: use cancel tagset helper for tear down > nvme-rdma: use cancel tagset helper for tear down > nvme-tcp: add clean action for failed reconnection > nvme-rdma: add clean action for failed reconnection > nvme-core: add cancel tagset helpers > nvme-core: get rid of the extra space > nvme: add tracing of zns commands > nvme: parse format nvm command details when tracing > nvme: update enumerations for status codes > nvmet: add lba to sect conversion helpers > nvmet: remove extra variable in identify ns > nvmet: remove extra variable in id-desclist > nvmet: remove extra variable in smart log nsid > nvme: refactor ns->ctrl by request > nvme-tcp: pass multipage bvec to request iov_iter > nvme-tcp: get rid of unused helper function > nvme-tcp: fix wrong setting of request iov_iter > nvme: support command retry delay for admin command > nvme: constify static attribute_group structs > nvmet-fc: use RCU proctection for assoc_list > > > On 2/6/21 11:08 AM, Yi Zhang wrote: >> blktests nvme-tcp/012 > Running the test few times I got this once no sign of Oops though :- # nvme_trtype=tcp ./check nvme/012 nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [failed] runtime 37.624s ... 42.272s something found in dmesg: [ 69.198819] run blktests nvme/012 at 2021-02-06 21:32:23 [ 69.330277] loop: module loaded [ 69.333383] loop0: detected capacity change from 2097152 to 0 [ 69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 69.396581] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a. [ 69.399482] nvme nvme1: creating 64 I/O queues. [ 69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues. [ 69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420 [ 72.381282] XFS (nvme1n1): Mounting V5 Filesystem ... (See '/root/blktests/results/nodev/nvme/012.dmesg' for the entire message) blktests (master) # dmesg -c [ 38.347661] nvme nvme0: 63/0/0 default/read/poll queues [ 69.198819] run blktests nvme/012 at 2021-02-06 21:32:23 [ 69.330277] loop: module loaded [ 69.333383] loop0: detected capacity change from 2097152 to 0 [ 69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 69.396581] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a. [ 69.399482] nvme nvme1: creating 64 I/O queues. [ 69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues. [ 69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420 [ 72.381282] XFS (nvme1n1): Mounting V5 Filesystem [ 72.394603] XFS (nvme1n1): Ending clean mount [ 72.395463] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038 (0x7fffffff) [ 73.402838] ====================================================== [ 73.403491] WARNING: possible circular locking dependency detected [ 73.404125] 5.11.0-rc6blk+ #166 Tainted: G OE [ 73.404690] ------------------------------------------------------ [ 73.405332] fio/3671 is trying to acquire lock: [ 73.405803] ffff88881d110cb0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x50 [ 73.406627] but task is already holding lock: [ 73.407246] ffff888128cabcf0 (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xbf/0x250 [xfs] [ 73.408233] which lock already depends on the new lock. [ 73.409096] the existing dependency chain (in reverse order) is: [ 73.409885] -> #3 (&xfs_dir_ilock_class/5){+.+.}-{3:3}: [ 73.410594] lock_acquire+0xd2/0x390 [ 73.411045] down_write_nested+0x47/0x110 [ 73.411530] xfs_ilock+0xbf/0x250 [xfs] [ 73.412068] xfs_create+0x1d9/0x6b0 [xfs] [ 73.412626] xfs_generic_create+0x205/0x2c0 [xfs] [ 73.413255] lookup_open+0x4f6/0x630 [ 73.413710] path_openat+0x298/0xa80 [ 73.414158] do_filp_open+0x93/0x100 [ 73.414606] do_sys_openat2+0x24b/0x310 [ 73.415090] do_sys_open+0x4b/0x80 [ 73.415569] do_syscall_64+0x33/0x40 [ 73.416075] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 73.416747] -> #2 (sb_internal){.+.+}-{0:0}: [ 73.417436] lock_acquire+0xd2/0x390 [ 73.417938] xfs_trans_alloc+0x19e/0x2e0 [xfs] [ 73.418607] xfs_vn_update_time+0xc8/0x250 [xfs] [ 73.419295] touch_atime+0x16b/0x200 [ 73.419814] xfs_file_mmap+0xa7/0xb0 [xfs] [ 73.420447] mmap_region+0x3f6/0x690 [ 73.420955] do_mmap+0x379/0x580 [ 73.421415] vm_mmap_pgoff+0xdf/0x170 [ 73.421929] ksys_mmap_pgoff+0x1dd/0x240 [ 73.422477] do_syscall_64+0x33/0x40 [ 73.422980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 73.423672] -> #1 (&mm->mmap_lock#2){++++}-{3:3}: [ 73.424426] lock_acquire+0xd2/0x390 [ 73.424932] __might_fault+0x5e/0x80 [ 73.425433] _copy_from_user+0x1e/0xa0 [ 73.425966] do_ip_setsockopt.isra.15+0xbba/0x12f0 [ 73.426616] ip_setsockopt+0x34/0x90 [ 73.427120] __sys_setsockopt+0x8f/0x110 [ 73.427681] __x64_sys_setsockopt+0x20/0x30 [ 73.428252] do_syscall_64+0x33/0x40 [ 73.428751] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 73.429422] -> #0 (sk_lock-AF_INET){+.+.}-{0:0}: [ 73.430154] validate_chain+0x8fa/0xec0 [ 73.430681] __lock_acquire+0x563/0x940 [ 73.431212] lock_acquire+0xd2/0x390 [ 73.431719] lock_sock_nested+0x72/0xa0 [ 73.432252] tcp_sendpage+0x23/0x50 [ 73.432740] inet_sendpage+0x4f/0x80 [ 73.433240] kernel_sendpage+0x57/0xc0 [ 73.433760] nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp] [ 73.434435] nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp] [ 73.435104] __blk_mq_try_issue_directly+0x109/0x1b0 [ 73.435768] blk_mq_request_issue_directly+0x4b/0x80 [ 73.436433] blk_mq_try_issue_list_directly+0x62/0xf0 [ 73.437104] blk_mq_sched_insert_requests+0x192/0x2b0 [ 73.437774] blk_mq_flush_plug_list+0x13c/0x260 [ 73.438389] blk_flush_plug_list+0xd7/0x100 [ 73.438960] blk_finish_plug+0x21/0x30 [ 73.439485] _xfs_buf_ioapply+0x2cc/0x3c0 [xfs] [ 73.440171] __xfs_buf_submit+0x85/0x220 [xfs] [ 73.440833] xfs_buf_read_map+0x105/0x2a0 [xfs] [ 73.441511] xfs_trans_read_buf_map+0x2b2/0x610 [xfs] [ 73.442254] xfs_read_agi+0xb4/0x1d0 [xfs] [ 73.442874] xfs_ialloc_read_agi+0x48/0x170 [xfs] [ 73.443568] xfs_dialloc_select_ag+0x94/0x2a0 [xfs] [ 73.444277] xfs_dir_ialloc+0x72/0x630 [xfs] [ 73.444927] xfs_create+0x241/0x6b0 [xfs] [ 73.445550] xfs_generic_create+0x205/0x2c0 [xfs] [ 73.446255] lookup_open+0x4f6/0x630 [ 73.446759] path_openat+0x298/0xa80 [ 73.447268] do_filp_open+0x93/0x100 [ 73.447770] do_sys_openat2+0x24b/0x310 [ 73.448301] do_sys_open+0x4b/0x80 [ 73.448779] do_syscall_64+0x33/0x40 [ 73.449275] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 73.449944] other info that might help us debug this: [ 73.450899] Chain exists of: sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5 [ 73.452222] Possible unsafe locking scenario: [ 73.452931] CPU0 CPU1 [ 73.453479] ---- ---- [ 73.454024] lock(&xfs_dir_ilock_class/5); [ 73.454528] lock(sb_internal); [ 73.455217] lock(&xfs_dir_ilock_class/5); [ 73.456027] lock(sk_lock-AF_INET); [ 73.456463] *** DEADLOCK *** [ 73.457175] 6 locks held by fio/3671: [ 73.457618] #0: ffff8881185a9488 (sb_writers#9){.+.+}-{0:0}, at: path_openat+0xa61/0xa80 [ 73.458604] #1: ffff888128cabfe0 (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x287/0xa80 [ 73.459847] #2: ffff8881185a96a8 (sb_internal){.+.+}-{0:0}, at: xfs_create+0x1b4/0x6b0 [xfs] [ 73.460926] #3: ffff888128cabcf0 (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xbf/0x250 [xfs] [ 73.462100] #4: ffff88881bd54c58 (hctx->srcu){....}-{0:0}, at: hctx_lock+0x62/0xe0 [ 73.463029] #5: ffff888142ab29d0 (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x2fc/0x330 [nvme_tcp] [ 73.464288] stack backtrace: [ 73.464824] CPU: 16 PID: 3671 Comm: fio Tainted: G OE 5.11.0-rc6blk+ #166 [ 73.465785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 73.467176] Call Trace: [ 73.467483] dump_stack+0x8d/0xb5 [ 73.467910] check_noncircular+0x119/0x130 [ 73.468413] ? save_trace+0x3d/0x2c0 [ 73.468854] validate_chain+0x8fa/0xec0 [ 73.469329] __lock_acquire+0x563/0x940 [ 73.469800] lock_acquire+0xd2/0x390 [ 73.470240] ? tcp_sendpage+0x23/0x50 [ 73.470689] lock_sock_nested+0x72/0xa0 [ 73.471163] ? tcp_sendpage+0x23/0x50 [ 73.471624] tcp_sendpage+0x23/0x50 [ 73.472059] inet_sendpage+0x4f/0x80 [ 73.472504] kernel_sendpage+0x57/0xc0 [ 73.472965] ? mark_held_locks+0x4f/0x70 [ 73.473445] nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp] [ 73.474062] nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp] [ 73.474697] __blk_mq_try_issue_directly+0x109/0x1b0 [ 73.475324] blk_mq_request_issue_directly+0x4b/0x80 [ 73.475953] blk_mq_try_issue_list_directly+0x62/0xf0 [ 73.476587] blk_mq_sched_insert_requests+0x192/0x2b0 [ 73.477225] blk_mq_flush_plug_list+0x13c/0x260 [ 73.477802] blk_flush_plug_list+0xd7/0x100 [ 73.478338] blk_finish_plug+0x21/0x30 [ 73.478818] _xfs_buf_ioapply+0x2cc/0x3c0 [xfs] [ 73.479457] __xfs_buf_submit+0x85/0x220 [xfs] [ 73.480091] xfs_buf_read_map+0x105/0x2a0 [xfs] [ 73.480728] ? xfs_read_agi+0xb4/0x1d0 [xfs] [ 73.481331] xfs_trans_read_buf_map+0x2b2/0x610 [xfs] [ 73.482036] ? xfs_read_agi+0xb4/0x1d0 [xfs] [ 73.482635] xfs_read_agi+0xb4/0x1d0 [xfs] [ 73.483221] xfs_ialloc_read_agi+0x48/0x170 [xfs] [ 73.483866] xfs_dialloc_select_ag+0x94/0x2a0 [xfs] [ 73.484513] xfs_dir_ialloc+0x72/0x630 [xfs] [ 73.485092] ? xfs_ilock+0xbf/0x250 [xfs] [ 73.485648] xfs_create+0x241/0x6b0 [xfs] [ 73.486198] xfs_generic_create+0x205/0x2c0 [xfs] [ 73.486857] lookup_open+0x4f6/0x630 [ 73.487317] path_openat+0x298/0xa80 [ 73.487786] ? __lock_acquire+0x581/0x940 [ 73.488302] do_filp_open+0x93/0x100 [ 73.488769] ? do_raw_spin_unlock+0x49/0xc0 [ 73.489307] ? _raw_spin_unlock+0x1f/0x30 [ 73.489825] do_sys_openat2+0x24b/0x310 [ 73.490321] do_sys_open+0x4b/0x80 [ 73.490764] do_syscall_64+0x33/0x40 [ 73.491230] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 73.491890] RIP: 0033:0x7f7d6f2eaead [ 73.492353] Code: c5 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e f4 ff ff 48 89 04 24 b8 02 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 c7 f4 ff ff 48 89 d0 48 83 c4 08 48 3d 01 [ 73.494713] RSP: 002b:00007ffc072e8910 EFLAGS: 00000293 ORIG_RAX: 0000000000000002 [ 73.495671] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7d6f2eaead [ 73.496526] RDX: 00000000000001a4 RSI: 0000000000000041 RDI: 00007f7d6d666690 [ 73.497381] RBP: 0000000000000000 R08: 0000000000000041 R09: 0000000000000041 [ 73.498232] R10: 00007ffc072e85a0 R11: 0000000000000293 R12: 0000000000000000 [ 73.499121] R13: 000000003b600000 R14: 00007f7d47136000 R15: 00007f7d6d666510 [ 111.135896] XFS (nvme1n1): Unmounting Filesystem [ 111.308034] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"