Hi,
在 2024/11/04 19:40, Yu Kuai 写道:
Hi,
在 2024/11/01 16:33, Christian Theune 写道:
I dug out a different one that goes back longer but even that one
seems like something was missing early on when I didn’t have the
serial console attached.
I’m wondering whether this indicates an issue during initialization?
I’m going to reboot the machine and make sure i get the early logs
with those numbers.
[ 405.347345] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22301786792+8) 4294967259
For this log, let's assume the firt start is from here.
[ 432.542465] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967260
[ 432.542469] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967261
[ 434.272964] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967262
[ 434.273175] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967263
[ 434.273189] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967264
[ 434.273285] __add_stripe_bio: md127: start
ff2721beec8c2fa0(22837701992+8) 4294967265
[ 434.274063] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967264
[ 434.274066] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967263
[ 434.274070] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967262
[ 434.274073] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967261
[ 434.274078] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967260
[ 434.274083] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(22837701992+8) 4294967259
[ 434.276609] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23374951848+8) 4294967260
[ 434.278939] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23374951848+8) 4294967261
[ 464.922354] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23374951848+8) 4294967260
[ 464.931833] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23374951848+8) 4294967259
[ 466.964557] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23912715112+8) 4294967260
[ 466.964616] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23912715112+8) 4294967261
[ 474.399930] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23912715112+8) 4294967262
[ 474.451451] __add_stripe_bio: md127: start
ff2721beec8c2fa0(23912715112+8) 4294967263
[ 489.447079] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23912715112+8) 4294967262
[ 489.456574] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23912715112+8) 4294967261
[ 489.466069] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23912715112+8) 4294967260
[ 489.475565] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(23912715112+8) 4294967259
[ 491.235517] __add_stripe_bio: md127: start
ff2721beec8c2fa0(24448073512+8) 4294967260
[ 491.235602] __add_stripe_bio: md127: start
ff2721beec8c2fa0(24448073512+8) 4294967261
[ 498.153108] __add_stripe_bio: md127: start
ff2721beec8c2fa0(24716445096+8) 4294967262
[ 498.156307] __add_stripe_bio: md127: start
ff2721beec8c2fa0(24716445096+8) 4294967263
[ 530.332619] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(24716445096+8) 4294967262
[ 530.342110] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(24716445096+8) 4294967261
[ 530.351595] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(24716445096+8) 4294967260
[ 530.361082] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(24716445096+8) 4294967259
[ 535.176774] __add_stripe_bio: md127: start
ff2721beec8c2fa0(24985208424+8) 4294967260
[ 549.125326] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(24985208424+8) 4294967259
Then until now, everything is good, start and end is balanced for this
stripe head.
[ 549.635782] __add_stripe_bio: md127: start
ff2721beec8c2fa0(25521770024+8) 4294967261
[ 590.875593] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(25521770024+8) 4294967260
[ 590.885081] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(25521770024+8) 4294967259
[ 596.973863] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(26057037928+8) 4294967263
[ 596.973866] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(26057037928+8) 4294967262
[ 596.973869] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(26057037928+8) 4294967261
[ 596.973871] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(26057037928+8) 4294967260
[ 596.973881] handle_stripe_clean_event: md127: end
ff2721beec8c2fa0(26057037928+8) 4294967259
Then, oops, this 'sh' start just once here, and end lots of times. It's
unlikely that those end are corresponding to the log much earlier, so
I'm almost convinced that this problem is due to unbalanced start and
end. And the huge number is due to underflow.
Let me dig more. :)
I think I found a problem by code review, can you test the following
patch? (Noted this is still from latest mainline).
Thanks,
Kuai
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc2ea636d173..04f32173839a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4042,6 +4042,8 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
test_bit(R5_SkipCopy, &dev->flags))) {
/* We can return any write requests */
struct bio *wbi, *wbi2;
+ bool written = false;
+
pr_debug("Return write for disc %d\n", i);
if (test_and_clear_bit(R5_Discard,
&dev->flags))
clear_bit(R5_UPTODATE,
&dev->flags);
@@ -4054,6 +4056,9 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
dev->page = dev->orig_page;
wbi = dev->written;
dev->written = NULL;
+ if (wbi)
+ written = true;
+
while (wbi && wbi->bi_iter.bi_sector <
dev->sector +
RAID5_STRIPE_SECTORS(conf)) {
wbi2 = r5_next_bio(conf, wbi,
dev->sector);
@@ -4061,10 +4066,13 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
bio_endio(wbi);
wbi = wbi2;
}
-
conf->mddev->bitmap_ops->endwrite(conf->mddev,
- sh->sector,
RAID5_STRIPE_SECTORS(conf),
- !test_bit(STRIPE_DEGRADED,
&sh->state),
- false);
+
+ if (written)
+
conf->mddev->bitmap_ops->endwrite(conf->mddev,
+ sh->sector,
RAID5_STRIPE_SECTORS(conf),
+
!test_bit(STRIPE_DEGRADED, &sh->state),
+ false);
+
if (head_sh->batch_head) {
sh =
list_first_entry(&sh->batch_list,
struct
stripe_head,
Thanks,
Kuai