On Tue, May 16 2017, NeilBrown wrote: > On Tue, May 09 2017, Nix wrote: > >> On 9 May 2017, Chris Murphy verbalised: >> >>> 1. md reports all data drives and the LBAs for the affected stripe >> >> Enough rambling from me. Here's a hilariously untested patch against >> 4.11 (as in I haven't even booted with it: my systems are kind of in >> flux right now as I migrate to the md-based server that got me all >> concerned about this). It compiles! And it's definitely safer than >> trying a repair, and makes it possible to recover from a real mismatch >> without losing all your hair in the process, or determine that a >> mismatch is spurious or irrelevant. And that's enough for me, frankly. >> This is a very rare problem, one hopes. >> >> (It's probably not ideal, because the error is just known to be >> somewhere in that stripe, not on that sector, which makes determining >> the affected data somewhat harder. But at least you can figure out what >> filesystem it's on. :) ) >> >> 8<------------------------------------------------------------->8 >> From: Nick Alcock <nick.alcock@xxxxxxxxxx> >> Subject: [PATCH] md: report sector of stripes with check mismatches >> >> This makes it possible, with appropriate filesystem support, for a >> sysadmin to tell what is affected by the mismatch, and whether >> it should be ignored (if it's inside a swap partition, for >> instance). >> >> We ratelimit to prevent log flooding: if there are so many >> mismatches that ratelimiting is necessary, the individual messages >> are relatively unlikely to be important (either the machine is >> swapping like crazy or something is very wrong with the disk). >> >> Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx> >> --- >> drivers/md/raid5.c | 16 ++++++++++++---- >> 1 file changed, 12 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index ed5cd705b985..bcd2e5150e29 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -3959,10 +3959,14 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh, >> set_bit(STRIPE_INSYNC, &sh->state); >> else { >> atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); >> - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) >> + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) { >> /* don't try to repair!! */ >> set_bit(STRIPE_INSYNC, &sh->state); >> - else { >> + pr_warn_ratelimited("%s: mismatch around sector " >> + "%llu\n", __func__, >> + (unsigned long long) >> + sh->sector); >> + } else { > > I think there is no point giving the function name, > but that you should give the name of the array. > Also "around" is a little vague. > Maybe something like: > >> + pr_warn_ratelimited("%s: mismatch sector in range " >> + "%llu-%llu\n", mdname(conf->mddev), >> + (unsigned long long) sh->sector, >> + (unsigned long long) sh->sector + STRIPE_SECTORS); > > As an optional enhancement, you could add "will recalculate P/Q" or > "left unchanged" as appropriate. > > Providing at least that the array name is included in the message, I > support this patch. Actually, I have another caveat. I don't think we want these messages during initial resync, or any resync. Only during a 'check' or 'repair'. So add a check for MD_RECOVERY_REQUESTED or maybe for sh->sectors >= conf->mddev->recovery_cp NeilBrown > > NeilBrown > > > >> sh->check_state = check_state_compute_run; >> set_bit(STRIPE_COMPUTE_RUN, &sh->state); >> set_bit(STRIPE_OP_COMPUTE_BLK, &s->ops_request); >> @@ -4111,10 +4115,14 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh, >> } >> } else { >> atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); >> - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) >> + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) { >> /* don't try to repair!! */ >> set_bit(STRIPE_INSYNC, &sh->state); >> - else { >> + pr_warn_ratelimited("%s: mismatch around sector " >> + "%llu\n", __func__, >> + (unsigned long long) >> + sh->sector); >> + } else { >> int *target = &sh->ops.target; >> >> sh->ops.target = -1; >> -- >> 2.12.2.212.gea238cf35.dirty >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature