Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/08/15 19:24, Christian Theune 写道:
Hi,

On 15. Aug 2024, at 13:14, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/08/15 18:03, Christian Theune 写道:
Hi,
small insight: even given my dataset that can reliably trigger this (after around 1.5 hours of rsyncing) it does not trigger on a specific set of files. I’ve deleted the data and started the rsync on a fresh directory (not a fresh filesystem, I can’t delete that as it carries important data) but it doesn’t always get stuck on the same files, even though rsync processes them in a repeatable order.
I’m wondering how to generate more insights from that. Maybe keeping a blktrace log might help?
It sounds like the specific pattern relies on XFS doing a specific thing there …
Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup could reproduce this?
I’m guessing that the xfs people do not regularly run their test suite on a layered setup like mine with encryption and software raid?

That sounds greate.

Alright. I will try that.

@Yu: you mentioned that you might be able to provide me a kernel that produces more error logging to diagnose this? Any chance we could try that route?

Yes, however, I still need some time to sort out the internal process of
raid5. I'm quite busy with some other work stuff and I'm familiar with
raid1/10, but not too much about raid5. :(

Main idea is to figure out why IO are not dispatched to underlying
disks.

Sure, thanks - I’m happy to be patient. :)

Meanwhile, can you try the following patch to bypass bitmap? Let's
see what happens if bitmap counter will not block.

Noted with this patch, the bitmap will not work, and data can be
inconsistent after power failure.

Thanks,
Kuai

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 0a2d37eb38ef..5ad51e9ad805 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1463,8 +1463,7 @@ __acquires(bitmap->lock)

int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors, int behind)
 {
-       if (!bitmap)
-               return 0;
+       return 0;

        if (behind) {
                int bw;
@@ -1528,8 +1527,8 @@ EXPORT_SYMBOL(md_bitmap_startwrite);
 void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
                        unsigned long sectors, int success, int behind)
 {
-       if (!bitmap)
-               return;
+       return;
+
        if (behind) {
                if (atomic_dec_and_test(&bitmap->behind_writes))
                        wake_up(&bitmap->behind_wait);


Christian






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux