Re: A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 09 2017, Nix wrote:

> On 9 May 2017, Chris Murphy verbalised:
>
>> 1. md reports all data drives and the LBAs for the affected stripe
>
> Enough rambling from me. Here's a hilariously untested patch against
> 4.11 (as in I haven't even booted with it: my systems are kind of in
> flux right now as I migrate to the md-based server that got me all
> concerned about this). It compiles! And it's definitely safer than
> trying a repair, and makes it possible to recover from a real mismatch
> without losing all your hair in the process, or determine that a
> mismatch is spurious or irrelevant. And that's enough for me, frankly.
> This is a very rare problem, one hopes.
>
> (It's probably not ideal, because the error is just known to be
> somewhere in that stripe, not on that sector, which makes determining
> the affected data somewhat harder. But at least you can figure out what
> filesystem it's on. :) )
>
> 8<------------------------------------------------------------->8
> From: Nick Alcock <nick.alcock@xxxxxxxxxx>
> Subject: [PATCH] md: report sector of stripes with check mismatches
>
> This makes it possible, with appropriate filesystem support, for a
> sysadmin to tell what is affected by the mismatch, and whether
> it should be ignored (if it's inside a swap partition, for
> instance).
>
> We ratelimit to prevent log flooding: if there are so many
> mismatches that ratelimiting is necessary, the individual messages
> are relatively unlikely to be important (either the machine is
> swapping like crazy or something is very wrong with the disk).
>
> Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx>
> ---
>  drivers/md/raid5.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index ed5cd705b985..bcd2e5150e29 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3959,10 +3959,14 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh,
>  			set_bit(STRIPE_INSYNC, &sh->state);
>  		else {
>  			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
> -			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
> +			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
>  				/* don't try to repair!! */
>  				set_bit(STRIPE_INSYNC, &sh->state);
> -			else {
> +				pr_warn_ratelimited("%s: mismatch around sector "
> +						    "%llu\n", __func__,
> +						    (unsigned long long)
> +						    sh->sector);
> +			} else {

I think there is no point giving the function name,
but that you should give the name of the array.
Also "around" is a little vague.
Maybe something like:

> +				pr_warn_ratelimited("%s: mismatch sector in range "
> +						    "%llu-%llu\n", mdname(conf->mddev),
> +						    (unsigned long long) sh->sector,
> +						    (unsigned long long) sh->sector + STRIPE_SECTORS);

As an optional enhancement, you could add "will recalculate P/Q" or
"left unchanged" as appropriate.

Providing at least that the array name is included in the message, I
support this patch.

NeilBrown



>  				sh->check_state = check_state_compute_run;
>  				set_bit(STRIPE_COMPUTE_RUN, &sh->state);
>  				set_bit(STRIPE_OP_COMPUTE_BLK, &s->ops_request);
> @@ -4111,10 +4115,14 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
>  			}
>  		} else {
>  			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
> -			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
> +			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
>  				/* don't try to repair!! */
>  				set_bit(STRIPE_INSYNC, &sh->state);
> -			else {
> +				pr_warn_ratelimited("%s: mismatch around sector "
> +						    "%llu\n", __func__,
> +						    (unsigned long long)
> +						    sh->sector);
> +			} else {
>  				int *target = &sh->ops.target;
>  
>  				sh->ops.target = -1;
> -- 
> 2.12.2.212.gea238cf35.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux