Hi,
在 2024/08/06 22:10, Christian Theune 写道:
we are seeing an issue that can be triggered with relative ease on a
server that has been working fine for a few weeks. The regular workload
is a backup utility that copies off data from virtual disk images in
4MiB (compressed) chunks from Ceph onto a local NVME-based RAID-6 array
that is encrypted using LUKS.
Today I started a larger rsync job from another server (that has a
couple of million files with around 200-300 gib in total) to migrate
data and we’ve seen the server suddenly lock up twice. Any IO that
interacts with the mountpoint (/srv/backy) will hang indefinitely. A
reset is required to get out of this as the machine will hang trying to
unmount the affected filesystem. No other messages than the hung tasks
are being presented - I have no indicator for hardware faults at the moment.
I’m messaging both dm-devel and linux-raid as I’m suspecting either one
or both (or an interaction) might be the cause.
Kernel:
Linux version 5.15.138 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld (GNU
Binutils) 2.40) #1-NixOS SMP Wed Nov 8 16:26:52 UTC 2023
Since you can trigger this easily, I'll suggest you to try the latest
kernel release first.
Thanks,
Kuai
See the kernel config attached.