Re: problem with recovered array

Roger Heflin <rogerheflin@xxxxxxxxx> · Fri, 3 Nov 2023 07:08:18 -0500

On Thu, Nov 2, 2023 at 6:31 PM <eyal@xxxxxxxxxxxxxx> wrote:
>
> On 03/11/2023 04.05, Roger Heflin wrote:
> > You need to add the -x for extended stats on iostat.  That will catch
> > if one of the disks has difficulty recovering bad blocks and is being
> > super slow.
> >
> > And that super slow will come and go based on if you are touching the
> > bad blocks.
>
> I did not know about '-x'. I see that the total columns (kB_read, kB_wrtn) are not included:-(
>
> Here is one.
>
> Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
> md127            1.88    116.72     0.00   0.00   11.27    62.19    6.31   1523.93     0.00   0.00  218.42   241.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.40   1.72
> sdb              0.67     67.42    16.17  96.02   11.61   100.68    3.74    367.79    89.35  95.98    7.65    98.33    0.00      0.00     0.00   0.00    0.00     0.00    2.02    6.25    0.05   1.92
> sdc              0.81     89.74    21.61  96.39   15.30   110.94    3.74    367.58    89.29  95.98    7.70    98.20    0.00      0.00     0.00   0.00    0.00     0.00    2.02    5.15    0.05   1.73
> sdd              0.87    102.17    24.66  96.59   16.75   117.28    3.73    367.34    89.24  95.99   15.00    98.45    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.28    0.08   3.92
> sde              0.87    101.87    24.58  96.56   19.38   116.46    3.72    367.45    89.28  96.00   16.20    98.71    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.30    0.08   3.94
> sdf              0.81     90.11    21.70  96.39   16.24   110.80    3.73    367.15    89.20  95.99   14.19    98.51    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.17    0.07   3.91
> sdg              0.68     67.91    16.28  95.97   12.17    99.30    3.73    367.20    89.21  95.98   13.28    98.32    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.10    0.06   3.86
>
> Interesting to see that sd[bc] have lower w_await,aqu-sz and %util and higher f_await.
> Even not yet understanding what these mean, I see that sd[bc] are model ST12000NM001G (recently replaced) while the rest are the original ST12000NM0007 (now 5yo).
> I expect this shows different tuning in the device fw.
>
> I do not expect this to be relevant to the current situation.
>
> I need to understand the r vs w also. I see wkB/s identical for all members, rkB/s is not.
> I expected this to be similar, but maybe md reads different disks at different times to make up for the missing one?
>
> Still, thanks for your help.

I would expect the reads to be slightly different.   Note MD is
reading 116kb/sec but the underlying disks hare having to do
500kb/sec.   MD is doing 1523kb/sec writes and doing 2200kb/sec.  So
the rads are doing needing to do 4x the real reads to recover/rebuilt
the data.   The interesting columns are r/s, rkb/s, r_await (how long
a read takes in ms) and w/s, rkb/s, w_await (how long an write takes
in ms) and the %util.   rrqm is read requests and if you divide kb/s
-> requests it indicates average io is around 4k.

The %util column  is the one to watch.   If the disk is having
internal issues %util will hit close to 100% on lowish reads/writes.
If it gets close to 100% that  is a really bad sign.

You might see what that data looks like when the disk is having
issues.  You might also start using dirty_bytes and
dirty_background_bytes that makes the io suck less when your array
gets slow.

My array has mythtv stuff and security cam images.  During the day I
save all of that to a 500GB ssd, and then at midnight move it to the
long term spinning disk, and during that window my disks are really
busy.  And depending on if a rebuild is running and/or if something
else is going on with the array how long that takes varies with the
amount collected during the day and if there are array issues it takes
longer.  I have been keeping a spare to use in emergencies.