On Mon, Aug 22, 2022 at 3:59 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Mon, Aug 22, 2022 at 3:44 PM Thomas Deutschmann <whissi@xxxxxxxxx> wrote: > > > > On 2022-08-22 23:52, Song Liu wrote: > > > Hmm.. I still cannot repro the hang in my test. I have: > > > > > > [root@eth50-1 ~]# mount | grep mnt > > > /dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384) > > > [root@eth50-1 ~]# lsblk > > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > > > sr0 11:0 1 1024M 0 rom > > > vda 253:0 0 32G 0 disk > > > ├─vda1 253:1 0 2G 0 part /boot > > > └─vda2 253:2 0 30G 0 part / > > > nvme0n1 259:0 0 4G 0 disk > > > └─md0 9:0 0 12G 0 raid5 /root/mnt > > > nvme2n1 259:1 0 4G 0 disk > > > └─md0 9:0 0 12G 0 raid5 /root/mnt > > > nvme3n1 259:2 0 4G 0 disk > > > └─md0 9:0 0 12G 0 raid5 /root/mnt > > > nvme1n1 259:3 0 4G 0 disk > > > └─md0 9:0 0 12G 0 raid5 /root/mnt > > > > > > [root@eth50-1 ~]# history > > > 381 fio iou/repro.fio > > > 382 fsfreeze --freeze /root/mnt > > > 383 fsfreeze --unfreeze /root/mnt > > > 384 fio iou/repro.fio > > > 385 fsfreeze --freeze /root/mnt > > > 386 fsfreeze --unfreeze /root/mnt > > > ^^^^^^^^^^^^^^ all works fine. > > > > > > Did I miss something? > > > > No :( > > > > I am currently not testing against the mdraid but this shouldn't matter. > > > > However, it looks like you don't test on bare metal, do you? > > > > I tried to test on VMware Workstation 16 myself but VMware's nvme > > implementation is currently broken > > (https://github.com/vmware/open-vm-tools/issues/579). > > I am testing with QEMU emulator version 6.2.0. I can also test with > bare metal. OK, now I got a repro with bare metal: nvme+xfs. This is a 5.19 based kernel, the stack is [ 867.091579] INFO: task fsfreeze:49972 blocked for more than 122 seconds. [ 867.104969] Tainted: G S 5.19.0-0_fbk0_rc1_gc225658be66e #1 [ 867.119750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 867.135381] task:fsfreeze state:D stack: 0 pid:49972 ppid: 22571 flags:0x00004000 [ 867.135388] Call Trace: [ 867.135390] <TASK> [ 867.135394] __schedule+0x3d7/0x700 [ 867.135404] schedule+0x39/0x90 [ 867.135409] percpu_down_write+0x234/0x270 [ 867.135414] freeze_super+0x8a/0x160 [ 867.135422] do_vfs_ioctl+0x8b5/0x920 [ 867.135430] __x64_sys_ioctl+0x52/0xb0 [ 867.135435] do_syscall_64+0x3d/0x90 [ 867.135441] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 867.135447] RIP: 0033:0x7f034f23fcdb [ 867.135453] RSP: 002b:00007ffe2bdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 867.135457] RAX: ffffffffffffffda RBX: 0000000000000066 RCX: 00007f034f23fcdb [ 867.135460] RDX: 0000000000000000 RSI: 00000000c0045877 RDI: 0000000000000003 [ 867.135463] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000 [ 867.135466] R10: 0000000000001000 R11: 0000000000000246 R12: 00007ffe2bdff334 [ 867.135469] R13: 00005650ff68dc40 R14: ffffffff00000000 R15: 00005650ff68c0f5 [ 867.135474] </TASK> I am not very familiar with this code, so I will need more time to look into it. Thomas, have you tried to bisect with the fio repro? Thanks, Song