Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>>> "Christian" == Christian Theune <ct@xxxxxxxxxxxxxxx> writes:

> Hi John,
> Hi Yu,

>> On 10. Aug 2024, at 00:51, John Stoffel <john@xxxxxxxxxxx> wrote:
>> 
>>>>>>> "Christian" == Christian Theune <ct@xxxxxxxxxxxxxxx> writes:
>> 
>>> Hi,
>>>> On 9. Aug 2024, at 03:13, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> Yes, for sure IO are stuck in md127 and never get dispatched to nvme,
>>>> for now I'll say this is a raid5 problem.
>> 
>>> Note, that this is raid6, not raid5! Sorry, I never explicitly
>>> mentioned that and it was buried in the mdstat output.
>> 
>> That's good info.  
>> 
>> I wonder if you could setup some loop devices, build a RAID6 array,
>> put XFS on it and try to replicate the problem by rsyncing a bunch of
>> files. 

> I was about to try this, but I’m wondering what backing devices you
> had in mind here? If I place images for loop on the original
> (defective) RAID 6 setup then this wouldn’t give us much info.

Just try it in RAM at first, if you can make it work.  Or put the
files in /tmp which should be a tmpfs filesystem backed by swap.   

> However, I could take the hot spare and run a sequence of tests
> against that, first with a newer and potentially with an older
> kernel if it doesn’t reproduce in its final form:

That's one option of course.  

> - xfs directly on the nvme drive
> - xfs on encrypted nvme drive
> - xfs on raid 1 on nvme drive, split into two partitions
> - xfs on raid 5 on nvme drive, split into a few partitions
> - xfs on raid 6 on nvme drive, split into a few partitions
> - repeat the tests with raid1/5//6 with encrypted partitions

That's an awesome test setup to run through and might take a bunch of
time.  

> As that will take some time and effort, I’d like to double check
> whether that sounds sensible to you as well?

I'd probably just do the RAID6 tests first, get them out of the way.  

> -- 
> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick







[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux