Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I tried updating to 5.15.164, but have to struggle against our config management as some options have been shifted that I need to filter out: NFSD_V3 and NFSD2_ACL are now fixed and cause config errors if set - I guess that’s a valid thing to happen within an LTS release. I’ll try again on Friday

> On 7. Aug 2024, at 07:31, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
> 
> Sure,
> 
> would you prefer me testing on 5.15.x or something else?
> 
>> On 7. Aug 2024, at 04:55, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> 在 2024/08/06 22:10, Christian Theune 写道:
>>> we are seeing an issue that can be triggered with relative ease on a server that has been working fine for a few weeks. The regular workload is a backup utility that copies off data from virtual disk images in 4MiB (compressed) chunks from Ceph onto a local NVME-based RAID-6 array that is encrypted using LUKS.
>>> Today I started a larger rsync job from another server (that has a couple of million files with around 200-300 gib in total) to migrate data and we’ve seen the server suddenly lock up twice. Any IO that interacts with the mountpoint (/srv/backy) will hang indefinitely. A reset is required to get out of this as the machine will hang trying to unmount the affected filesystem. No other messages than the hung tasks are being presented - I have no indicator for hardware faults at the moment.
>>> I’m messaging both dm-devel and linux-raid as I’m suspecting either one or both (or an interaction) might be the cause.
>>> Kernel:
>>> Linux version 5.15.138 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.40) #1-NixOS SMP Wed Nov 8 16:26:52 UTC 2023
>> 
>> Since you can trigger this easily, I'll suggest you to try the latest
>> kernel release first.
>> 
>> Thanks,
>> Kuai
>> 
>>> See the kernel config attached.
> 
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 
> 

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux