On Sat, Aug 3, 2024 at 12:49 AM Nilay Shroff <nilay@xxxxxxxxxxxxx> wrote: > > > > On 8/2/24 18:04, Shinichiro Kawasaki wrote: > > CC+: Yi Zhang, > > > > On Aug 02, 2024 / 17:46, Nilay Shroff wrote: > >> > >> > >> On 8/2/24 14:39, Shinichiro Kawasaki wrote: > >>> > >>> #3: nvme/052 (CKI failure) > >>> > >>> The CKI project reported that nvme/052 fails occasionally [4]. > >>> This needs further debug effort. > >>> > >>> nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] > >>> runtime ... 22.209s > >>> --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 > >>> +++ /mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-tests-production.zip/storage/blktests/nvme/nvme-loop/blktests/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 > >>> @@ -1,2 +1,4 @@ > >>> Running nvme/052 > >>> +cat: /sys/block/nvme1n2/uuid: No such file or directory > >>> +cat: /sys/block/nvme1n2/uuid: No such file or directory > >>> Test complete > >>> > >>> [4] https://datawarehouse.cki-project.org/kcidb/tests/13669275 > >> > >> I just checked the console logs of the nvme/052 and from the logs it's > >> apparent that all namespaces were created successfully and so it's strange > >> to see that the test couldn't access "/sys/block/nvme1n2/uuid". > > > > I agree that it's strange. I think the "No such file or directory" error > > happened in _find_nvme_ns(), and it checks existence of the uuid file before > > the cat command. I have no idea why the error happens. > > > Yes exactly, and these two operations (checking the existence of uuid > and cat command) are not atomic. So the only plausible theory I have at this > time is "if namespace is deleted after checking the existence of uuid but > before cat command is executed" then this issue may potentially manifests. > Furthermore, as you mentioned, this issue is seen on the test machine > occasionally, so I asked if there's a possibility of simultaneous blktest > or some other tests running on this system. There are no simultaneous tests during the CKI tests running. I reproduced the failure on that server and always can be reproduced within 5 times: # sh a.sh ==============================0 nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [passed] runtime 21.496s ... 21.398s ==============================1 nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] runtime 21.398s ... 21.974s --- tests/nvme/052.out 2024-08-10 00:30:06.989814226 -0400 +++ /root/blktests/results/nodev_tr_loop/nvme/052.out.bad 2024-08-13 02:53:51.635047928 -0400 @@ -1,2 +1,5 @@ Running nvme/052 +cat: /sys/block/nvme1n2/uuid: No such file or directory +cat: /sys/block/nvme1n2/uuid: No such file or directory +cat: /sys/block/nvme1n2/uuid: No such file or directory Test complete # uname -r 6.11.0-rc3 [root@hpe-rl300gen11-04 blktests]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS zram0 252:0 0 8G 0 disk [SWAP] nvme0n1 259:0 0 447.1G 0 disk ├─nvme0n1p1 259:1 0 600M 0 part /boot/efi ├─nvme0n1p2 259:2 0 1G 0 part /boot └─nvme0n1p3 259:3 0 445.5G 0 part └─fedora_hpe--rl300gen11--04-root 253:0 0 445.5G 0 lvm / > > Thanks, > --Nilay > -- Best Regards, Yi Zhang