On 9/20/24 16:20, Yi Zhang wrote:
+ Hannes
I did bisect and it seems was introduced with the below commit:
commit 1e48b34c9bc79aa36700fccbfdf87e61e4431d2b
Author: Hannes Reinecke <hare@xxxxxxx>
Date: Mon Jul 22 14:02:22 2024 +0200
nvme: split off TLS sysfs attributes into a separate group
On Thu, Sep 19, 2024 at 12:09 AM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
Hello
CKI reported most of the blktests nvme/tcp tests failed on the linux
tree[1], here is the reproducer and dmesg log, the issue cannot be
reproduced with 6.11.0, seems
it was introduced with the latest block code merge, please help check
it and let me know if you need any info/testing about it, thanks.
[1]
https://datawarehouse.cki-project.org/kcidb/tests/14394423
[2]
# nvme_trtype=tcp ./check nvme/003
nvme/003 (tr=tcp) (test if we're sending keep-alives to a discovery
controller) [failed]
runtime 11.280s ... 11.188s
--- tests/nvme/003.out 2024-09-18 11:30:11.243366401 -0400
+++ /root/blktests/results/nodev_tr_tcp/nvme/003.out.bad
2024-09-18 11:52:32.977112834 -0400
@@ -1,3 +1,3 @@
Running nvme/003
-disconnected 1 controller(s)
+disconnected 0 controller(s)
Test complete
# dmesg
[ 447.213539] run blktests nvme/003 at 2024-09-18 11:52:21
[ 447.229285] loop0: detected capacity change from 0 to 2097152
[ 447.233104] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 447.242398] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 447.251089] sysfs: cannot create duplicate filename
'/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller'
[ 447.251810] CPU: 2 UID: 0 PID: 5241 Comm: nvme Kdump: loaded Not
tainted 6.12.0-0.rc0.adfc3ded5c33.2.test.el10.aarch64 #1
[ 447.252540] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 447.253006] Call trace:
[ 447.253171] dump_backtrace+0xd8/0x130
[ 447.253432] show_stack+0x20/0x38
[ 447.253657] dump_stack_lvl+0x80/0xa8
[ 447.253925] dump_stack+0x18/0x30
[ 447.254152] sysfs_warn_dup+0x6c/0x90
[ 447.254406] sysfs_add_file_mode_ns+0x12c/0x138
[ 447.254713] create_files+0xa8/0x1f8
[ 447.254973] internal_create_group+0x18c/0x358
[ 447.255274] internal_create_groups+0x58/0xe0
[ 447.255558] sysfs_create_groups+0x20/0x40
[ 447.255826] device_add_attrs+0x19c/0x218
[ 447.256093] device_add+0x310/0x6d0
[ 447.256327] cdev_device_add+0x58/0xc0
[ 447.256579] nvme_add_ctrl+0x78/0xd0 [nvme_core]
[ 447.256895] nvme_tcp_create_ctrl+0x3c/0x178 [nvme_tcp]
[ 447.257248] nvmf_create_ctrl+0x150/0x288 [nvme_fabrics]
[ 447.257614] nvmf_dev_write+0x98/0xf8 [nvme_fabrics]
[ 447.257948] vfs_write+0xdc/0x380
[ 447.258174] ksys_write+0x7c/0x120
[ 447.258408] __arm64_sys_write+0x24/0x40
[ 447.258673] invoke_syscall.constprop.0+0x74/0xd0
[ 447.258994] do_el0_svc+0xb0/0xe8
[ 447.259225] el0_svc+0x44/0x1a0
[ 447.259449] el0t_64_sync_handler+0x120/0x130
[ 447.259745] el0t_64_sync+0x1a4/0x1a8
--
Best Regards,
Yi Zhang
How utterly curious.
This mentioned patch moves some sysfs attributes to a different location
in the code. The stacktrace you've posted indicates that we're creating
a controller while the previous one is still present in sysfs, ie that
the lifetime of the controller has changed.
I find it difficult to understand how the cited path could have changed
the lifetime of the controller object, but will continue to check.
Does the error disappear if you just revert the cited patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich