On 8/16/19 7:52 PM, Song Liu wrote:
On Fri, Aug 16, 2019 at 10:02 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
[...]
[ +0.000008] md/raid:md127: 793 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000018] md/raid:md127: 794 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000009] md/raid:md127: 795 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000008] md/raid:md127: 796 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000018] md/raid:md127: 797 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000008] md/raid:md127: 798 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000017] md/raid:md127: 799 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000008] md/raid:md127: 800 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000008] md/raid:md127: 801 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000021] md/raid:md127: 802 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000009] md/raid:md127: 803 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000009] md/raid:md127: 804 read_errors, > 781 stripes
[ +0.000000] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.000008] md/raid:md127: 805 read_errors, > 781 stripes
[ +0.000001] md/raid:md127: Too many read errors, failing device dm-0.
[ +0.928614] md: md127: requested-resync interrupted.
This is a little too noisy. How about we only pr_warn() for
test_bit(Faulty) == 0?
(This is not directly related to this patch, but since we are at it).
Thanks,
Song
From: Nigel Croxon <ncroxon@xxxxxxxxxx>
Date: Mon, 19 Aug 2019 16:01:04 -0400
Subject: [PATCH] raid5 improve too many read errors msg by adding limits
Often limits can be changed by admin. When discussing such things
it helps if you can provide "self-sustained" facts. Also
sometimes the admin thinks he changed a limit, but it did not
take effect for some reason or he changed the wrong thing.
V3: Only pr_warn when Faulty is 0.
V2: Add read_errors value to pr_warn.
Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
---
drivers/md/raid5.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7fde645d2e90..6812cefea308 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2557,10 +2557,15 @@ static void raid5_end_read_request(struct bio * bi)
(unsigned long long)s,
bdn);
} else if (atomic_read(&rdev->read_errors)
- > conf->max_nr_stripes)
- pr_warn("md/raid:%s: Too many read errors, failing device
%s.\n",
- mdname(conf->mddev), bdn);
- else
+ > conf->max_nr_stripes) {
+ if (!test_bit(Faulty, &rdev->flags)) {
+ pr_warn("md/raid:%s: %d read_errors, > %d stripes\n",
+ mdname(conf->mddev), atomic_read(&rdev->read_errors),
+ conf->max_nr_stripes);
+ pr_warn("md/raid:%s: Too many read errors, failing
device %s.\n",
+ mdname(conf->mddev), bdn);
+ }
+ } else
retry = 1;
if (set_bad && test_bit(In_sync, &rdev->flags)
&& !test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
--
2.20.1