Re: "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"

Andy Smith <andy@xxxxxxxxxxxxxx> · Mon, 11 Jan 2021 17:27:44 +0000

On Wed, Jan 06, 2021 at 11:27:16PM +0000, Andy Smith wrote:
> "md/raid10:md5: sdf: redirecting sector 2979126480 to another
> mirror"

[…]

> So, this messages comes from here:
> 
>     https://github.com/torvalds/linux/blob/v5.8/drivers/md/raid10.c#L1188
> 
> but under what circumstances does it actually happen?

I managed to obtain a stack trace with "perf":

# Line 77 of this function in the raid10 module is where the
# "redirecting sector" message comes from on my kernel, the stock
# Debian buster kernel.
$ sudo perf probe -s ./linux-source-4.19/ -m raid10 --add 'raid10_read_request:77'
Added new events:
  probe:raid10_read_request (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_1 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_2 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_3 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_4 (on raid10_read_request:77 in raid10)

You can now use it in all perf tools, such as:

        perf record -e probe:raid10_read_request_4 -aR sleep 1

# In another window start up a heavy continuous read load on
# /dev/md3.
$ sudo perf record -e probe:raid10_read_request -gaR sleep 120

# In syslog:
Jan 11 17:10:38 hostname kernel: [1318771.689507] md/raid10:md3: nvme1n1p5: redirecting sector 656970992 to another mirror

# "perf record" finishes:
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.757 MB perf.data (2 samples) ]
$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2  of event 'probe:raid10_read_request'
# Event count (approx.): 2
#
# Children      Self  Trace output
# ........  ........  ..................
#
   100.00%   100.00%  (ffffffffc0127e42)
            |
            ---__libc_read
               entry_SYSCALL_64_after_hwframe
               do_syscall_64
               ksys_read
               vfs_read
               new_sync_read
               generic_file_read_iter
               ondemand_readahead
               __do_page_cache_readahead
               read_pages
               mpage_readpages
               submit_bio
               generic_make_request
               md_make_request
               md_handle_request
               raid10_make_request
               raid10_read_request

So I still don't know why this is considered an error and worth
logging about, but at least I don't see any obvious error paths
there.

I will continue to dig in to it ("perf" is all new to me), but if
anyone happens to know why it does this please do put me out of my
misery!

BTW this is a different host to the one I previously saw it on. As I
say I have seen this message occasionally for years now, on multiple
machines and multiple versions of Debian.

Cheers,
Andy