Re: [5.15-rc1 regression] io_uring: fsstress hangs in do_coredump() on exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/21/21 12:40 AM, Dave Chinner wrote:
> Hi Jens,
> 
> I updated all my trees from 5.14 to 5.15-rc2 this morning and
> immediately had problems running the recoveryloop fstest group on
> them. These tests have a typical pattern of "run load in the
> background, shutdown the filesystem, kill load, unmount and test
> recovery".
> 
> Whent eh load includes fsstress, and it gets killed after shutdown,
> it hangs on exit like so:
> 
> # echo w > /proc/sysrq-trigger 
> [  370.669482] sysrq: Show Blocked State
> [  370.671732] task:fsstress        state:D stack:11088 pid: 9619 ppid:  9615 flags:0x00000000
> [  370.675870] Call Trace:
> [  370.677067]  __schedule+0x310/0x9f0
> [  370.678564]  schedule+0x67/0xe0
> [  370.679545]  schedule_timeout+0x114/0x160
> [  370.682002]  __wait_for_common+0xc0/0x160
> [  370.684274]  wait_for_completion+0x24/0x30
> [  370.685471]  do_coredump+0x202/0x1150
> [  370.690270]  get_signal+0x4c2/0x900
> [  370.691305]  arch_do_signal_or_restart+0x106/0x7a0
> [  370.693888]  exit_to_user_mode_prepare+0xfb/0x1d0
> [  370.695241]  syscall_exit_to_user_mode+0x17/0x40
> [  370.696572]  do_syscall_64+0x42/0x80
> [  370.697620]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> It's 100% reproducable on one of my test machines, but only one of
> them. That one machine is running fstests on pmem, so it has
> synchronous storage. Every other test machine using normal async
> storage (nvme, iscsi, etc) and none of them are hanging.
> 
> A quick troll of the commit history between 5.14 and 5.15-rc2
> indicates a couple of potential candidates. The 5th kernel build
> (instead of ~16 for a bisect) told me that commit 15e20db2e0ce
> ("io-wq: only exit on fatal signals") is the cause of the
> regression. I've confirmed that this is the first commit where the
> problem shows up.

Thanks for the report Dave, I'll take a look. Can you elaborate on
exactly what is being run? And when killed, it's a non-fatal signal?

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux