Hi,
On 1/7/21 00:27, Andy Smith wrote:
Hi,
"md/raid10:md5: sdf: redirecting sector 2979126480 to another
mirror"
I've actually been seeing these messages very occasionally across
many machines for years but have never found anything wrong so kept
putting investigation of it to the bottom of my TODO list. I have
even in the past upon seeing this done a full scrub and check and
found no issue.
Having just seen one of them again now, and having some spare time I
tried to look into it.
So, this messages comes from here:
https://github.com/torvalds/linux/blob/v5.8/drivers/md/raid10.c#L1188
but under what circumstances does it actually happen?
This time, as with the other times, I cannot see any indication of
read error (i.e. no logs of that) and no problems apparent in SMART
data.
err_rdev there can only be set inside the block above that starts
with:
if (r10_bio->devs[slot].rdev) {
/*
* This is an error retry, but we cannot
* safely dereference the rdev in the r10_bio,
* we must use the one in conf.
…but why is this an error retry? Nothing was logged so how do I find
out what the error was?
This is because handle_read_error also calls raid10_read_request, pls
see commit 545250f2480 ("md/raid10: simplify handle_read_error()").
I guess it is better to distinguish the caller to avoid the normal read
path prints the message too (if it is the problem).
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1115,7 +1115,7 @@ static void regular_request_wait(struct mddev
*mddev, struct r10conf *conf,
}
static void raid10_read_request(struct mddev *mddev, struct bio *bio,
- struct r10bio *r10_bio)
+ struct r10bio *r10_bio, bool read_err)
{
struct r10conf *conf = mddev->private;
struct bio *read_bio;
@@ -1128,7 +1128,7 @@ static void raid10_read_request(struct mddev
*mddev, struct bio *bio,
struct md_rdev *err_rdev = NULL;
gfp_t gfp = GFP_NOIO;
- if (slot >= 0 && r10_bio->devs[slot].rdev) {
+ if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
/*
* This is an error retry, but we cannot
* safely dereference the rdev in the r10_bio,
@@ -1495,7 +1495,7 @@ static void __make_request(struct mddev *mddev,
struct bio *bio, int sectors)
memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->copies);
if (bio_data_dir(bio) == READ)
- raid10_read_request(mddev, bio, r10_bio);
+ raid10_read_request(mddev, bio, r10_bio, false);
else
raid10_write_request(mddev, bio, r10_bio);
}
@@ -2586,7 +2586,7 @@ static void handle_read_error(struct mddev *mddev,
struct r10bio *r10_bio)
rdev_dec_pending(rdev, mddev);
allow_barrier(conf);
r10_bio->state = 0;
- raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
+ raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
}
Thanks,
Guoqing