Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I still haven’t been able to create a reproducer based on a tmpfs setup. I’ve run the xfstests with increased load and time factors, but didn’t trigger a crash.

@yukuai - I’m running out of ideas. Personally my preferred next step would be to gather debug output. If that takes some time, then I’ll remain patient. :)

Something I do have on the horizon: I’ll receive a mostly identical server in the next weeks and can try to reproduce the issue there, taking a few disks aside for a separate debugging array.  Maybe the number of disks is also relevant, so I’ll also try with the full size.

> On 15. Aug 2024, at 21:13, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
> 
>>> On the plus side, I have a script now that can create the various
>>> loopback settings quickly, so I can try out things as needed. Not
>>> that valuable without a reproducer, yet, though.
>> 
>> Yay!  Please share it.
> 
> Will do next week after a bit of cleanup.

Here’s the setup I’ve been using with tmpfs backed software raid:

mkdir /srv/test-raid/
mkdir /srv/test-raid/backing

mount -t tmpfs none /srv/test-raid/backing

loops=()

for i in {0..3}; do 
    dd if=/dev/zero of=/srv/test-raid/backing/img${i}.bin bs=1M seek=1100 count=1
    loops+=($(losetup -f /srv/test-raid/backing/img${i}.bin --show))
done

mdadm --create /dev/md/test --level=6 --raid-devices=4 ${loops[@]}

dd if=/dev/zero of=/srv/test-raid/backing/scratch.bin bs=1M seek=1100 count=1
SCRATCH_DEV=$(losetup -f /srv/test-raid/backing/scratch.bin --show)
loops+=($SCRATCH_DEV)

mkfs.xfs /dev/md/test

mkdir /srv/test-raid/scratch
mkdir /srv/test-raid/test
#mount /dev/md/test /srv/test-raid/test

export TEST_DEV=$(realpath /dev/md/test)
export TEST_DIR=/srv/test-raid/test
# export SCRATCH_DEV=/dev/loop1 # see above
export SCRATCH_MNT=/srv/test-raid/scratch

export LOAD_FACTOR=10
export TIME_FACTOR=10
# export SOAK_DURATION=1m

xfstests-check

# cleanup

umount /srv/test-raid/test
mdadm --stop /dev/md/test
for x in "${loops[@]}"; do losetup -d $x; done

umount /srv/test-raid/backing

rm -r /srv/test-raid


Hugs,
Christian

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux