On Tue, May 09 2017, Nix wrote: > On 9 May 2017, Chris Murphy verbalised: > >> 1. md reports all data drives and the LBAs for the affected stripe > > Enough rambling from me. Here's a hilariously untested patch against > 4.11 (as in I haven't even booted with it: my systems are kind of in > flux right now as I migrate to the md-based server that got me all > concerned about this). It compiles! And it's definitely safer than > trying a repair, and makes it possible to recover from a real mismatch > without losing all your hair in the process, or determine that a > mismatch is spurious or irrelevant. And that's enough for me, frankly. > This is a very rare problem, one hopes. > > (It's probably not ideal, because the error is just known to be > somewhere in that stripe, not on that sector, which makes determining > the affected data somewhat harder. But at least you can figure out what > filesystem it's on. :) ) > > 8<------------------------------------------------------------->8 > From: Nick Alcock <nick.alcock@xxxxxxxxxx> > Subject: [PATCH] md: report sector of stripes with check mismatches > > This makes it possible, with appropriate filesystem support, for a > sysadmin to tell what is affected by the mismatch, and whether > it should be ignored (if it's inside a swap partition, for > instance). > > We ratelimit to prevent log flooding: if there are so many > mismatches that ratelimiting is necessary, the individual messages > are relatively unlikely to be important (either the machine is > swapping like crazy or something is very wrong with the disk). > > Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx> > --- > drivers/md/raid5.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index ed5cd705b985..bcd2e5150e29 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -3959,10 +3959,14 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh, > set_bit(STRIPE_INSYNC, &sh->state); > else { > atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); > - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) > + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) { > /* don't try to repair!! */ > set_bit(STRIPE_INSYNC, &sh->state); > - else { > + pr_warn_ratelimited("%s: mismatch around sector " > + "%llu\n", __func__, > + (unsigned long long) > + sh->sector); > + } else { I think there is no point giving the function name, but that you should give the name of the array. Also "around" is a little vague. Maybe something like: > + pr_warn_ratelimited("%s: mismatch sector in range " > + "%llu-%llu\n", mdname(conf->mddev), > + (unsigned long long) sh->sector, > + (unsigned long long) sh->sector + STRIPE_SECTORS); As an optional enhancement, you could add "will recalculate P/Q" or "left unchanged" as appropriate. Providing at least that the array name is included in the message, I support this patch. NeilBrown > sh->check_state = check_state_compute_run; > set_bit(STRIPE_COMPUTE_RUN, &sh->state); > set_bit(STRIPE_OP_COMPUTE_BLK, &s->ops_request); > @@ -4111,10 +4115,14 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh, > } > } else { > atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); > - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) > + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) { > /* don't try to repair!! */ > set_bit(STRIPE_INSYNC, &sh->state); > - else { > + pr_warn_ratelimited("%s: mismatch around sector " > + "%llu\n", __func__, > + (unsigned long long) > + sh->sector); > + } else { > int *target = &sh->ops.target; > > sh->ops.target = -1; > -- > 2.12.2.212.gea238cf35.dirty > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature