A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9 May 2017, Chris Murphy verbalised:

> 1. md reports all data drives and the LBAs for the affected stripe

Enough rambling from me. Here's a hilariously untested patch against
4.11 (as in I haven't even booted with it: my systems are kind of in
flux right now as I migrate to the md-based server that got me all
concerned about this). It compiles! And it's definitely safer than
trying a repair, and makes it possible to recover from a real mismatch
without losing all your hair in the process, or determine that a
mismatch is spurious or irrelevant. And that's enough for me, frankly.
This is a very rare problem, one hopes.

(It's probably not ideal, because the error is just known to be
somewhere in that stripe, not on that sector, which makes determining
the affected data somewhat harder. But at least you can figure out what
filesystem it's on. :) )

8<------------------------------------------------------------->8
From: Nick Alcock <nick.alcock@xxxxxxxxxx>
Subject: [PATCH] md: report sector of stripes with check mismatches

This makes it possible, with appropriate filesystem support, for a
sysadmin to tell what is affected by the mismatch, and whether
it should be ignored (if it's inside a swap partition, for
instance).

We ratelimit to prevent log flooding: if there are so many
mismatches that ratelimiting is necessary, the individual messages
are relatively unlikely to be important (either the machine is
swapping like crazy or something is very wrong with the disk).

Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx>
---
 drivers/md/raid5.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ed5cd705b985..bcd2e5150e29 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3959,10 +3959,14 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh,
 			set_bit(STRIPE_INSYNC, &sh->state);
 		else {
 			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
-			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
+			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
 				/* don't try to repair!! */
 				set_bit(STRIPE_INSYNC, &sh->state);
-			else {
+				pr_warn_ratelimited("%s: mismatch around sector "
+						    "%llu\n", __func__,
+						    (unsigned long long)
+						    sh->sector);
+			} else {
 				sh->check_state = check_state_compute_run;
 				set_bit(STRIPE_COMPUTE_RUN, &sh->state);
 				set_bit(STRIPE_OP_COMPUTE_BLK, &s->ops_request);
@@ -4111,10 +4115,14 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
 			}
 		} else {
 			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
-			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
+			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
 				/* don't try to repair!! */
 				set_bit(STRIPE_INSYNC, &sh->state);
-			else {
+				pr_warn_ratelimited("%s: mismatch around sector "
+						    "%llu\n", __func__,
+						    (unsigned long long)
+						    sh->sector);
+			} else {
 				int *target = &sh->ops.target;
 
 				sh->ops.target = -1;
-- 
2.12.2.212.gea238cf35.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux