Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 4 Nov 2024 20:18:21 +0800

Hi,

在 2024/11/04 19:40, Yu Kuai 写道:
Hi,

在 2024/11/01 16:33, Christian Theune 写道:
I dug out a different one that goes back longer but even that one 
seems like something was missing early on when I didn’t have the 
serial console attached.

I’m wondering whether this indicates an issue during initialization? 
I’m going to reboot the machine and make sure i get the early logs 
with those numbers.

[  405.347345] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22301786792+8) 4294967259

For this log, let's assume the firt start is from here.
[  432.542465] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967260
[  432.542469] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967261
[  434.272964] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967262
[  434.273175] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967263
[  434.273189] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967264
[  434.273285] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(22837701992+8) 4294967265
[  434.274063] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967264
[  434.274066] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967263
[  434.274070] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967262
[  434.274073] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967261
[  434.274078] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967260
[  434.274083] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(22837701992+8) 4294967259
[  434.276609] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23374951848+8) 4294967260
[  434.278939] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23374951848+8) 4294967261
[  464.922354] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23374951848+8) 4294967260
[  464.931833] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23374951848+8) 4294967259
[  466.964557] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23912715112+8) 4294967260
[  466.964616] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23912715112+8) 4294967261
[  474.399930] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23912715112+8) 4294967262
[  474.451451] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(23912715112+8) 4294967263
[  489.447079] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23912715112+8) 4294967262
[  489.456574] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23912715112+8) 4294967261
[  489.466069] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23912715112+8) 4294967260
[  489.475565] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(23912715112+8) 4294967259
[  491.235517] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(24448073512+8) 4294967260
[  491.235602] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(24448073512+8) 4294967261
[  498.153108] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(24716445096+8) 4294967262
[  498.156307] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(24716445096+8) 4294967263
[  530.332619] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(24716445096+8) 4294967262
[  530.342110] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(24716445096+8) 4294967261
[  530.351595] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(24716445096+8) 4294967260
[  530.361082] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(24716445096+8) 4294967259
[  535.176774] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(24985208424+8) 4294967260
[  549.125326] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(24985208424+8) 4294967259

Then until now, everything is good, start and end is balanced for this
stripe head.
[  549.635782] __add_stripe_bio: md127: start 
ff2721beec8c2fa0(25521770024+8) 4294967261
[  590.875593] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(25521770024+8) 4294967260
[  590.885081] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(25521770024+8) 4294967259
[  596.973863] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(26057037928+8) 4294967263
[  596.973866] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(26057037928+8) 4294967262
[  596.973869] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(26057037928+8) 4294967261
[  596.973871] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(26057037928+8) 4294967260
[  596.973881] handle_stripe_clean_event: md127: end 
ff2721beec8c2fa0(26057037928+8) 4294967259

Then, oops, this 'sh' start just once here, and end lots of times. It's
unlikely that those end are corresponding to the log much earlier, so
I'm almost convinced that this problem is due to unbalanced start and
end. And the huge number is due to underflow.

Let me dig more. :)

I think I found a problem by code review, can you test the following
patch? (Noted this is still from latest mainline).

Thanks,
Kuai

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc2ea636d173..04f32173839a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4042,6 +4042,8 @@ static void handle_stripe_clean_event(struct 
r5conf *conf,
                             test_bit(R5_SkipCopy, &dev->flags))) {
                                /* We can return any write requests */
                                struct bio *wbi, *wbi2;
+                               bool written = false;
+
                                pr_debug("Return write for disc %d\n", i);
                                if (test_and_clear_bit(R5_Discard, 
&dev->flags))
                                        clear_bit(R5_UPTODATE, 
&dev->flags);
@@ -4054,6 +4056,9 @@ static void handle_stripe_clean_event(struct 
r5conf *conf,
                                dev->page = dev->orig_page;
                                wbi = dev->written;
                                dev->written = NULL;
+                               if (wbi)
+                                       written = true;
+
                                while (wbi && wbi->bi_iter.bi_sector <
                                        dev->sector + 
RAID5_STRIPE_SECTORS(conf)) {
                                        wbi2 = r5_next_bio(conf, wbi, 
dev->sector);
@@ -4061,10 +4066,13 @@ static void handle_stripe_clean_event(struct 
r5conf *conf,
                                        bio_endio(wbi);
                                        wbi = wbi2;
                                }
- 
conf->mddev->bitmap_ops->endwrite(conf->mddev,
-                                       sh->sector, 
RAID5_STRIPE_SECTORS(conf),
-                                       !test_bit(STRIPE_DEGRADED, 
&sh->state),
-                                       false);
+
+                               if (written)
+ 
conf->mddev->bitmap_ops->endwrite(conf->mddev,
+                                               sh->sector, 
RAID5_STRIPE_SECTORS(conf),
+ 
!test_bit(STRIPE_DEGRADED, &sh->state),
+                                               false);
+
                                if (head_sh->batch_head) {
                                        sh = 
list_first_entry(&sh->batch_list,
                                                              struct 
stripe_head,


Thanks,
Kuai